Architecture

BEL & eInfochips · Defence · 2016–2019 · Public

PublicRepresentative · synthetic data
Live diagram — sample → change-gate → perceive → filter → track → test-zones → confirm → decide → govern → dispatch. Only the vision perceive stage is metered; everything else is deterministic / $0.

The 10-stage pipeline

One pipeline runs end to end: sample → change-gate → perceive (vision) → filter (floor + NMS) → track → test-zones → confirm (dwell) → decide (rules) → govern (cooldown/dedup) → dispatch. Every stage is deterministic and free except perception, the single metered vision call.

The change gate is the primary cost lever and the dwell window the primary false-positive guard; together they are the faithful downscale of constant-video processing into ‘sample, don’t stream’.

Dual approach — cloud vs on-prem

Perception runs one of two ways. Cloud uses a hosted multimodal model (claude-haiku-4-5), cost-capped and fail-closed. OSS uses a self-hosted vision model on local hardware — which, on a secured site, means every frame stays on the internal network and nothing is sent to a third party.

For a defence deployment that data-sovereignty property is often decisive, so the OSS-local path is not a fallback but a first-class mode; the deterministic stages run identically either way, and the recorded divergences show where the two readers disagree.

Out of scope (the real system)

The real YOLO-style object detector, the multi-object tracker, the dedicated state classifier and constant-video processing are out of scope — downscaled here to one vision-LLM call + deterministic NMS/tracking + a change gate. Real cameras, video, alerting and many-camera concurrency are simulated and labelled.

Architecture · Video Surveillance Analytics · Abhishek Saxena