Architecture

ACSI · Hospitality · 2024–2025 · Public

PublicRepresentative · synthetic data
Live diagram — sample → change-gate → perceive → track → aggregate → confirm → decide → govern → dispatch. Only the vision perceive stage is metered; everything else is deterministic / $0.

One pipeline, nine stages

sample → change-gate → perceive → track → aggregate → confirm → decide → govern → dispatch. Each stage is a pure typed function emitting the uniform trace row (stage · provenance · model · tokens · cost · confidence), shaped to swap to the real engine.

Deterministic vs metered

Exactly one stage is metered — perceiveFrame, the vision call. Everything else is deterministic and $0 in both modes. The change gate and confirmation window keep the metered surface small: vision fires once per stable situation, not per frame.

Dual approach (Cloud / OSS)

Cloud runs claude-haiku-4-5 (vision), cost-capped and fail-closed — at the budget cap it falls back to the $0 OSS path. OSS is a self-hosted vision model on local M4 hardware, recorded for the GPU-less host (a small vision model reads the iconographic synthetic frames imperfectly — an honest downscale finding; the cloud path is the capable read). Genuine Cloud-vs-OSS divergences are shown honestly in the inspector, including an un-recorded "not captured" gap.

Out of scope (the real system)

Real overhead cameras, a YOLO-style object detector and a multi-object tracker, real staff-notification integrations, and real venue data are all out of scope for this demo. They are represented by synthetic top-down frames, one structured vision call (cloud claude-haiku-4-5 / a self-hosted M4 vision model), and simulated, labelled dispatch.

Architecture · Smart Table Service Intelligence · Abhishek Saxena