Computer Vision — Intraoperative Detection & Labeling
New to the project? Start with Start-here (plain-English orientation, ~10 min) and keep the Glossary open for any unfamiliar term. No computer-vision or surgery background needed.
Action-camera (GoPro, fixed tower mount) + YOLO pipeline that turns FESS surgical video into an objective, per-case efficiency report — “quality data built for the surgeon, not extracted from them.” (Form factor note: data is captured on a fixed tower-mounted camera, not head-worn — see camera-data-source-assessment for why fixed wins for instrument tracking, and a flag to reconcile the grant’s “wearable” language.) Part of the Pharyvac Surgical Technologies research pipeline and the engine behind ARS DWK grant Aim 1.
Status
Last updated: 2026-06-12 Stage: Working prototype, n=16 single-surgeon corpus. Now focused on validation. Detection performance: 92.4% precision / 85.3% recall / ~88.7% F1, 7 instrument classes, ~25.4% frames unlabelled.
The Three Goals
- Intraoperative detection & labeling system — the YOLO pipeline that detects instruments and labels the surgical timeline. (working; improving recall + transition class)
- Semantic network / ontology — translate raw detections into meaningful surgical events (swaps, scope cleaning, instrument changes, pauses). → detection-ontology
- Validation ← immediate priority — prove the metrics mean what they claim, across four pillars. → validation-plan
Start Here
- validation-plan — how we validate this (the immediate goal): annotation quality → ground-truth accuracy → outcome correlation → external/prospective.
- detection-ontology — the semantic layer / shared vocabulary everything depends on.
Contents
specs/ — definitions & data contracts
- validation-plan — four-pillar validation roadmap
- detection-ontology — entities → events → outcomes semantic network (definitions LOCKED)
- annotation-protocol — the labeling standard κ is measured against
- p2-evaluation-plan — ground-truth accuracy eval (per-class, confusion, mAP, suction rollup)
- Analysis-Ready Data Format — clean schema for analysis
- FESS Cases Clean Dataset — the n=16 corpus summary
research/ — analyses
- YOLO Model Improvement Analysis — recall, precision, transition class
- Swap Count Artifact Analysis — why a precise detector can produce a meaningless metric
- OR Efficiency and Cost Analysis — efficiency & cost modeling
- camera-data-source-assessment — action-cam as data source: does it hurt us, and capture-side fixes
drafts/ — written outputs
- ARS DWK Grant - Unified Project Narrative (+ original grant PDF)
notes/ — working notes & runbooks
- cvat-self-host-runbook — how to stand up CVAT for the inter-rater study (hardware, install, privacy)
- calibration-qualification-session — step-by-step session to train + qualify annotators
- camera-side-by-side-test-protocol — head-to-head test before switching cameras (MISSION 1 PRO vs current GoPro)
scripts/ — analysis code
- evaluate_detections.py — P2 metrics engine (per-class P/R/F1, confusion, mAP, suction rollup);
--selftestverified
CSV detections/ — raw per-case detection CSVs
Computer Vision MOC.md — legacy folder-index (auto-generated)
Next Steps
- ✅ DONE (2026-06-13) — operational definitions frozen (T_min 1 s, P_min 5 s, scope cleaning annotator-labeled, suction hierarchical rollup). Critical-path gate cleared → P1 unblocked.
- ← NOW: Stand up self-hosted CVAT, build the annotation protocol from the locked definitions, train + qualify annotators (calibration-qualification-session), double-annotate 3–5 cases → P1 inter-rater κ.
- Leave-one-case-out on the 16 → honest per-class metrics + confusion matrix (P2). Runnable now, no annotation needed.
- Implement temporal smoothing (1 s min bout) before any swap metric enters P3.
- Pre-register P3 outcome correlations ahead of multi-surgeon expansion (P4).
Open Threads / Next Up
Specced or staged, not yet executed — pick any up next:
| Thread | Status | Why it matters | Where |
|---|---|---|---|
| Stand up CVAT + run P1 | Staged (docs done; hands-on install + clips + session remain) | Produces the inter-rater κ — the foundation gate | cvat-self-host-runbook, annotation-protocol, calibration-qualification-session |
| P2 leave-one-case-out | Specced; script self-tested; needs data wired in | Honest per-class accuracy + confusion for the grant. Runnable now if val labels exist | p2-evaluation-plan, evaluate_detections.py |
| Temporal smoothing | Defined (1 s min bout), not implemented | Fixes the 2–4× swap inflation before any swap metric enters P3. Runnable now | Swap Count Artifact Analysis, detection-ontology |
| Efficiency Index formula | Not yet defined | Must exist before P3 can validate it; it’s the headline composite metric | validation-plan (P3), detection-ontology (Tier 3) |
| Camera side-by-side test | Protocol written; needs a MISSION 1 PRO + session | Decides camera before P4 multi-surgeon collection | camera-side-by-side-test-protocol, camera-data-source-assessment |
| Deploy the team wiki | Runbook + sync script ready | Share the project with the team (private) | quartz-wiki-deployment |
Smaller open decisions (flagged inline in their docs): confirm P2 ground-truth/compute availability; annotate FESS phases now vs. defer; pick/onboard annotators; handle the 3 external wikilinks before wiki deploy; reconcile any remaining “wearable” language if the grant is revised.
Goal Cascade
OKR (Pharyvac quality-data platform) → Pharyvac Surgical Technologies → this project → ARS DWK Aim 1 (multi-surgeon validity + IFAR submission).