Start Here — New to the Project?

Welcome. This page gets you oriented in ~10 minutes, even if you’ve never touched computer vision or set foot in an OR. Keep the Glossary open in another tab for any unfamiliar term.

What we’re building (plain English)

We put a camera in the operating room during sinus surgery and record the procedure. A computer-vision program watches that video and labels, moment by moment, which surgical instrument the surgeon is using. From that stream of labels we compute how the surgery went — time spent per instrument, how often the surgeon switched tools, and how much “dead time” there was.

Think of it as a fitness tracker for surgeons: just like a watch turns your movement into a recovery score, we turn surgical video into an objective per-case efficiency report — owned by the surgeon, not the hospital.

It already works on 16 real surgeries. The job now is to prove the numbers are correct so people can trust them. That proof process is called validation, and it’s most of what’s in this wiki.

Why “validation” is the whole game

A program can look impressive and still be wrong. Our own cautionary tale: the model first reported ~434 instrument switches per case. That turned out to be 2–4× too high — it was miscounting tiny flickers in the video as real switches. The true number is ~150–200. (See Swap Count Artifact Analysis.)

The lesson: an accurate-looking detector can still produce a meaningless number. So before we publish any metric, we prove it four ways:

P1 — Do humans agree? Two trained people label the same video independently. If they don’t agree, there’s no reliable “right answer” to grade the computer against. (This comes first for a reason — see validation-plan.)
P2 — Does the model match the humans? Grade the program’s labels against the human ground truth (accuracy per instrument, etc.). See p2-evaluation-plan.
P3 — Do the numbers mean something? Show the metrics actually track real surgical efficiency, not just case length.
P4 — Does it hold up on new surgeons? Test on surgeons and cases the model never saw.

How the pieces connect

 camera video ─► YOLO model labels each frame ─► we clean it up (smoothing) ─►
 we turn labels into events (swaps, pauses) using the ONTOLOGY's definitions ─►
 we VALIDATE those numbers (P1→P2→P3→P4) ─► efficiency report + research paper

The detection-ontology is our shared dictionary — the exact definitions of every label and event (what counts as a “swap,” a “pause,” etc.). Everything depends on these definitions, so they’re locked.
The annotation-protocol tells human labelers exactly how to label, so two people produce the same thing.
The validation-plan is the master plan tying P1–P4 together.

Where things live

specs/ — the definitions and plans (the “what” and “why”). Most important: validation-plan, detection-ontology.
research/ — analyses and findings (e.g. YOLO Model Improvement Analysis, Swap Count Artifact Analysis).
notes/ — how-to runbooks (setting up tools, running sessions).
scripts/ — the code (e.g. evaluate_detections.py).
drafts/ — the grant proposal narrative.

If you’re an engineer about to contribute

The eval code is scripts/evaluate_detections.py — run python evaluate_detections.py --selftest to see the metrics engine work on synthetic data before touching real data.
The annotation tool is CVAT, self-hosted — see cvat-self-host-runbook.
Don’t edit the locked definitions in detection-ontology casually — changing them means re-labeling and re-computing. Propose changes, don’t just make them.

Pharyvac Computer Vision

Explorer

Start-here

Start Here — New to the Project?

What we’re building (plain English)

Why “validation” is the whole game

How the pieces connect

Recommended reading order

Where things live

If you’re an engineer about to contribute

Links

Graph View

Table of Contents

Backlinks