Start Here — New to the Project?
Welcome. This page gets you oriented in ~10 minutes, even if you’ve never touched computer vision or set foot in an OR. Keep the Glossary open in another tab for any unfamiliar term.
What we’re building (plain English)
We put a camera in the operating room during sinus surgery and record the procedure. A computer-vision program watches that video and labels, moment by moment, which surgical instrument the surgeon is using. From that stream of labels we compute how the surgery went — time spent per instrument, how often the surgeon switched tools, and how much “dead time” there was.
Think of it as a fitness tracker for surgeons: just like a watch turns your movement into a recovery score, we turn surgical video into an objective per-case efficiency report — owned by the surgeon, not the hospital.
It already works on 16 real surgeries. The job now is to prove the numbers are correct so people can trust them. That proof process is called validation, and it’s most of what’s in this wiki.
Why “validation” is the whole game
A program can look impressive and still be wrong. Our own cautionary tale: the model first reported ~434 instrument switches per case. That turned out to be 2–4× too high — it was miscounting tiny flickers in the video as real switches. The true number is ~150–200. (See Swap Count Artifact Analysis.)
The lesson: an accurate-looking detector can still produce a meaningless number. So before we publish any metric, we prove it four ways:
- P1 — Do humans agree? Two trained people label the same video independently. If they don’t agree, there’s no reliable “right answer” to grade the computer against. (This comes first for a reason — see validation-plan.)
- P2 — Does the model match the humans? Grade the program’s labels against the human ground truth (accuracy per instrument, etc.). See p2-evaluation-plan.
- P3 — Do the numbers mean something? Show the metrics actually track real surgical efficiency, not just case length.
- P4 — Does it hold up on new surgeons? Test on surgeons and cases the model never saw.
How the pieces connect
camera video ─► YOLO model labels each frame ─► we clean it up (smoothing) ─►
we turn labels into events (swaps, pauses) using the ONTOLOGY's definitions ─►
we VALIDATE those numbers (P1→P2→P3→P4) ─► efficiency report + research paper
- The detection-ontology is our shared dictionary — the exact definitions of every label and event (what counts as a “swap,” a “pause,” etc.). Everything depends on these definitions, so they’re locked.
- The annotation-protocol tells human labelers exactly how to label, so two people produce the same thing.
- The validation-plan is the master plan tying P1–P4 together.
Recommended reading order
- Glossary — skim it; come back as needed.
- This page — you’re here.
- validation-plan — the master plan (read the “In plain terms” box at the top, then the four pillars).
- detection-ontology — what our labels and events actually mean.
- annotation-protocol — how labeling is done (if you’ll touch annotation).
- p2-evaluation-plan + evaluate_detections.py — the accuracy analysis (if you’re on the eval/code side).
- camera-data-source-assessment — why we use the camera setup we do.
Where things live
specs/— the definitions and plans (the “what” and “why”). Most important: validation-plan, detection-ontology.research/— analyses and findings (e.g. YOLO Model Improvement Analysis, Swap Count Artifact Analysis).notes/— how-to runbooks (setting up tools, running sessions).scripts/— the code (e.g. evaluate_detections.py).drafts/— the grant proposal narrative.
If you’re an engineer about to contribute
- The eval code is
scripts/evaluate_detections.py— runpython evaluate_detections.py --selftestto see the metrics engine work on synthetic data before touching real data. - The annotation tool is CVAT, self-hosted — see cvat-self-host-runbook.
- Don’t edit the locked definitions in detection-ontology casually — changing them means re-labeling and re-computing. Propose changes, don’t just make them.
Links
- Glossary — every term, plain English
- README — the project index (status + next steps)
- validation-plan — the master validation plan