Architecture
BEL & eInfochips · Defence · 2016–2019 · Public
The 10-stage pipeline
One pipeline runs end to end: sample → change-gate → perceive (vision) → filter (floor + NMS) → track → test-zones → confirm (dwell) → decide (rules) → govern (cooldown/dedup) → dispatch. Every stage is deterministic and free except perception, the single metered vision call.
The change gate is the primary cost lever and the dwell window the primary false-positive guard; together they are the faithful downscale of constant-video processing into ‘sample, don’t stream’.
Dual approach — cloud vs on-prem
Perception runs one of two ways. Cloud uses a hosted multimodal model (claude-haiku-4-5), cost-capped and fail-closed. OSS uses a self-hosted vision model on local hardware — which, on a secured site, means every frame stays on the internal network and nothing is sent to a third party.
For a defence deployment that data-sovereignty property is often decisive, so the OSS-local path is not a fallback but a first-class mode; the deterministic stages run identically either way, and the recorded divergences show where the two readers disagree.
Out of scope (the real system)
The real YOLO-style object detector, the multi-object tracker, the dedicated state classifier and constant-video processing are out of scope — downscaled here to one vision-LLM call + deterministic NMS/tracking + a change gate. Real cameras, video, alerting and many-camera concurrency are simulated and labelled.