Swap Count Artifact Analysis

Project: Pharyvac FESS Computer Vision Date: 2026-03-28 Data: 16 bilateral full primary FESS cases (Nov 2025, Mar 2026)

The Problem: Are 434 Swaps Per Case Real?

The data reports a mean of 434 instrument swaps per case (range 253–638). At first glance this seems high. Working through the math confirms the suspicion: at 434 swaps across ~121 minutes of labelled instrument time, the average instrument “bout” is only 16.8 seconds. That would mean the surgeon picks up a tool, uses it for under 17 seconds, and switches. 434 times. This doesn’t match the reality of FESS, where sustained work with a single instrument (especially forceps and microdebrider) typically lasts 30-90+ seconds.

The true physical swap count is likely 100-200 per case, meaning the YOLO pipeline inflates the count by roughly 2-4x.

Assumed True Swaps	Implied Avg Bout	Inflation Factor
50	2:26	8.7x
100	1:13	4.3x
150	0:49	2.9x
200	0:36	2.2x

A true swap count around 150-200 gives average bout durations of 36-49 seconds, which is much more realistic for FESS instrument use.

Evidence That the Count Is Inflated

1. Suspiciously Low Variance in Swap Rate

The swap rate has a coefficient of variation of only 10.7% (mean 2.70/min, SD 0.29). A real behavioral metric, how often a surgeon physically changes instruments, should vary significantly with case complexity, surgical phase, and individual surgeon habits. The extremely consistent rate suggests a detection artifact with a relatively fixed frequency, not a real behavioral signal.

Case	Swaps	Swap Rate/min
1	316	2.65
2	538	1.95
3	631	2.75
4	525	2.47
5	491	2.56
6	285	2.79
7	517	2.62
8	360	2.65
9	303	2.80
10	617	3.03
11	321	3.03
12	381	2.54
13	638	3.29
14	253	2.70
15	451	2.70
16	322	2.70

Case 2 (1.95/min) is the only notable outlier, also the longest case at 4:35. Every other case clusters tightly around 2.5-3.0/min.

2. Nav Suction Correlation Is a Smoking Gun

Instrument	Correlation with Swap Count	What This Tells Us
Nav suction	r = 0.83	Highest, model likely flickers between suction classes
Forceps	r = -0.43	Large, distinctive shape → stable, confident detection
Nav probe	r = -0.57	Low usage cases have fewer total swaps
Microdebrider	r = -0.21	Moderately distinctive
Suction bovie	r = 0.19	Weak positive, possibly confused with suction

Forceps are large and visually distinctive. YOLO can hold onto a confident classification frame after frame. Nav suction is thin, looks similar to non-nav suction and nav probe, and likely sits right at the boundary where the model flickers between classes. Cases where the surgeon spends more time with suction, the hardest instruments to distinguish, generate more “swaps” because the model keeps bouncing between suction-like classes.

3. Constant Swap Rate Across Complexity Levels

Simple cases: 2.7 swaps/min. Complex cases: 2.7 swaps/min. Identical. If swaps were real, complex cases (which involve additional procedures like septoplasty and turbinate work) should have a different instrument cycling pattern. The fact that the rate is invariant to case type points to a detection-level artifact, not a surgical behavior pattern.

What’s Causing the Inflation

Detection Flickering (Primary Cause)

When an instrument is near the confidence threshold (e.g., 0.45-0.55 when the cutoff is 0.5), the model oscillates frame-to-frame:

Frame 100: forceps (conf 0.72)
Frame 101: forceps (conf 0.51)  ← barely above threshold
Frame 102: [nothing] (conf 0.48) ← barely below threshold → counted as swap OUT
Frame 103: nav_suction (conf 0.52) ← different class briefly wins → counted as swap IN
Frame 104: forceps (conf 0.68)    ← back to forceps → counted as swap again

This sequence generates 4 phantom swaps even though the surgeon held the same forceps the entire time. At 30fps, even a few flickers per minute compound to hundreds per case.

Brief Occlusions During Use

The surgeon’s hand, gauze, or a head movement momentarily blocks the instrument from the GoPro’s view. YOLO loses detection for 2-3 frames, then re-detects, that registers as two phantom swaps (instrument A → nothing → instrument A) even though nothing physically changed.

Multi-Instrument Frames

If two instruments are briefly visible simultaneously (e.g., during a handoff between surgeon and surgical tech), YOLO may alternate which one it reports as the primary detection, creating rapid class transitions that register as swaps.

The Fix: Minimum Bout Duration Filter

Apply a smoothing filter before counting swaps. The idea: if the model says the instrument changed for fewer than N frames and then reverted, don’t count it.

# Pseudocode for temporal smoothing
MIN_BOUT_FRAMES = 30  # 1 second at 30fps
 
smoothed_labels = []
for each segment of consecutive identical labels:
    if segment.length < MIN_BOUT_FRAMES:
        merge with surrounding segment (inherit the longer neighbor's label)
    else:
        keep as-is
 
real_swaps = count_transitions(smoothed_labels)

Tuning the Threshold

Min Bout Duration	Expected Effect	Trade-off
0.5 sec (15 frames)	Removes most flicker, conservative	May miss some real rapid transitions
1.0 sec (30 frames)	Removes nearly all artifacts	Safe, physical swaps rarely take <1 sec
2.0 sec (60 frames)	Aggressive smoothing	May merge genuinely quick instrument checks

Recommended starting point: 1.0 second (30 frames). Calibrate by manually counting swaps on a 10-minute clip and adjusting until the automated count matches.

Dual Benefits

This same smoothing pass simultaneously:

Fixes the swap count, reduces from ~434 to an estimated 150-200 true physical swaps
Reduces unlabelled frames, short unlabelled gaps between identical instrument labels get absorbed into the surrounding instrument bout
Enables the transition class, gaps between different instruments get labelled as “transition” rather than remaining unlabelled

Impact on Other Analyses

OR Cost Analysis

The swap-based cost estimates in OR Efficiency and Cost Analysis need reframing:

Raw count (434): Reflects detection events, not physical actions. The per-swap cost calculation ( $1,300-$ 3,600/case) is based on inflated counts.
Smoothed count (~150-200): Represents actual physical transitions. At 3-5 seconds per true swap, the real transition cost is roughly $450-$ 1,650/case, still meaningful, but more defensible.
The total “transition time” estimate is more reliable than the per-swap estimate, because total unaccounted time is measured directly from the video rather than derived from swap counts.

Instrument Time Proportions

The proportional time for each instrument (forceps 26.8%, nav suction 13.2%, etc.) is largely unaffected by flickering. A frame that briefly flickers to the wrong class for 2-3 frames out of thousands barely moves the percentage. The time proportions remain trustworthy.

The r = 0.89 Correlation (Swaps vs Case Duration)

This correlation is real but its interpretation changes. It’s not that “more swaps cause longer cases.” Rather, longer cases naturally have more frames, more detection events, and therefore more opportunities for flickering, inflating the swap count proportionally. The correlation reflects the artifact’s proportional nature, not a causal surgical relationship.

Recommended New Metrics

After implementing smoothing, track both raw and smoothed values:

Metric	Current (Raw)	Expected (Smoothed)
Swaps per case	434	150-200
Avg bout duration	16.8 sec	36-49 sec
Swap rate per min	2.7	~1.0-1.3
CV of swap rate	10.7%	Should increase (more variability = more real)

A higher coefficient of variation after smoothing would actually be a good sign, it would mean the metric is now reflecting genuine surgical behavioral differences rather than a fixed-rate detection artifact.

Pharyvac Computer Vision

Explorer