Computer Vision — Intraoperative Detection & Labeling

New to the project? Start with Start-here (plain-English orientation, ~10 min) and keep the Glossary open for any unfamiliar term. No computer-vision or surgery background needed.

Action-camera (GoPro, fixed tower mount) + YOLO pipeline that turns FESS surgical video into an objective, per-case efficiency report — “quality data built for the surgeon, not extracted from them.” (Form factor note: data is captured on a fixed tower-mounted camera, not head-worn — see camera-data-source-assessment for why fixed wins for instrument tracking, and a flag to reconcile the grant’s “wearable” language.) Part of the Pharyvac Surgical Technologies research pipeline and the engine behind ARS DWK grant Aim 1.

Status

Last updated: 2026-06-12 Stage: Working prototype, n=16 single-surgeon corpus. Now focused on validation. Detection performance: 92.4% precision / 85.3% recall / ~88.7% F1, 7 instrument classes, ~25.4% frames unlabelled.

The Three Goals

Intraoperative detection & labeling system — the YOLO pipeline that detects instruments and labels the surgical timeline. (working; improving recall + transition class)
Semantic network / ontology — translate raw detections into meaningful surgical events (swaps, scope cleaning, instrument changes, pauses). → detection-ontology
Validation ← immediate priority — prove the metrics mean what they claim, across four pillars. → validation-plan

Start Here

validation-plan — how we validate this (the immediate goal): annotation quality → ground-truth accuracy → outcome correlation → external/prospective.
detection-ontology — the semantic layer / shared vocabulary everything depends on.

specs/ — definitions & data contracts

validation-plan — four-pillar validation roadmap
detection-ontology — entities → events → outcomes semantic network (definitions LOCKED)
annotation-protocol — the labeling standard κ is measured against
p2-evaluation-plan — ground-truth accuracy eval (per-class, confusion, mAP, suction rollup)
Analysis-Ready Data Format — clean schema for analysis
FESS Cases Clean Dataset — the n=16 corpus summary

research/ — analyses

YOLO Model Improvement Analysis — recall, precision, transition class
Swap Count Artifact Analysis — why a precise detector can produce a meaningless metric
OR Efficiency and Cost Analysis — efficiency & cost modeling
camera-data-source-assessment — action-cam as data source: does it hurt us, and capture-side fixes

drafts/ — written outputs

ARS DWK Grant - Unified Project Narrative (+ original grant PDF)

notes/ — working notes & runbooks

cvat-self-host-runbook — how to stand up CVAT for the inter-rater study (hardware, install, privacy)
calibration-qualification-session — step-by-step session to train + qualify annotators
camera-side-by-side-test-protocol — head-to-head test before switching cameras (MISSION 1 PRO vs current GoPro)

scripts/ — analysis code

evaluate_detections.py — P2 metrics engine (per-class P/R/F1, confusion, mAP, suction rollup); --selftest verified

CSV detections/ — raw per-case detection CSVs Computer Vision MOC.md — legacy folder-index (auto-generated)

Next Steps

✅ DONE (2026-06-13) — operational definitions frozen (T_min 1 s, P_min 5 s, scope cleaning annotator-labeled, suction hierarchical rollup). Critical-path gate cleared → P1 unblocked.
← NOW: Stand up self-hosted CVAT, build the annotation protocol from the locked definitions, train + qualify annotators (calibration-qualification-session), double-annotate 3–5 cases → P1 inter-rater κ.
Leave-one-case-out on the 16 → honest per-class metrics + confusion matrix (P2). Runnable now, no annotation needed.
Implement temporal smoothing (1 s min bout) before any swap metric enters P3.
Pre-register P3 outcome correlations ahead of multi-surgeon expansion (P4).

Open Threads / Next Up

Specced or staged, not yet executed — pick any up next:

Thread	Status	Why it matters	Where
Stand up CVAT + run P1	Staged (docs done; hands-on install + clips + session remain)	Produces the inter-rater κ — the foundation gate	cvat-self-host-runbook, annotation-protocol, calibration-qualification-session
P2 leave-one-case-out	Specced; script self-tested; needs data wired in	Honest per-class accuracy + confusion for the grant. Runnable now if val labels exist	p2-evaluation-plan, evaluate_detections.py
Temporal smoothing	Defined (1 s min bout), not implemented	Fixes the 2–4× swap inflation before any swap metric enters P3. Runnable now	Swap Count Artifact Analysis, detection-ontology
Efficiency Index formula	Not yet defined	Must exist before P3 can validate it; it’s the headline composite metric	validation-plan (P3), detection-ontology (Tier 3)
Camera side-by-side test	Protocol written; needs a MISSION 1 PRO + session	Decides camera before P4 multi-surgeon collection	camera-side-by-side-test-protocol, camera-data-source-assessment
Deploy the team wiki	Runbook + sync script ready	Share the project with the team (private)	quartz-wiki-deployment

Smaller open decisions (flagged inline in their docs): confirm P2 ground-truth/compute availability; annotate FESS phases now vs. defer; pick/onboard annotators; handle the 3 external wikilinks before wiki deploy; reconcile any remaining “wearable” language if the grant is revised.

Goal Cascade

OKR (Pharyvac quality-data platform) → Pharyvac Surgical Technologies → this project → ARS DWK Aim 1 (multi-surgeon validity + IFAR submission).

Pharyvac Computer Vision

Explorer

index