Concierge & Knowledge Assistant · Documentation

Architecture

Concierge & Knowledge Assistant's pipeline, its owned data, the events it emits/consumes, and what is out of scope.

Concierge & Knowledge Assistant
retrieveterm overlapgroundtop ≤2answermeteredguardground / escalatemetered · LLMdeterministic · $0
Live diagram — retrieve, ground and guard are deterministic at $0; only draft-answer is metered (OSS qwen3:8b recorded at $0 in the prototype).
retrieve$0ground$0$answermeteredguard$0COST LEVER · grounded evidence gates the answer
Live diagram — spend accrues only on a grounded answer; an ungrounded question escalates without a model call.
ASPECTCLOUD · capable readOSS · recordedAnswer phrasingfluent proseterser, correctSource citations[1][2][1][2]Confidence %from matchfrom matchNo grounding foundescalatesnot captured - routed to a human
Live diagram — Cloud (claude-haiku-4-5) vs recorded OSS (qwen3:8b); retrieval, grounding and the confidence calc are identical in both modes.

The RAG pipeline

`draftAnswer` runs retrieve → ground → answer → guard. Retrieval scores articles by term overlap; grounding takes the top ≤2; the answer is composed from the grounded article bodies (optionally led by the pinned property); the guard decides grounded-vs-escalate by whether any evidence was found. `askConcierge` wraps this to append the guest/assistant turn pair to the persisted conversation.

Owned data + the coupling spine

The app owns `demo_eco_concierge_conversation`, `demo_eco_concierge_message`, and `demo_eco_concierge_ticket`, written through the one typed C1 adapter. It reads shared-core `property` and `booking` and the knowledge corpus (a synthetic mock-read until the C3·25 Knowledge Base CMS ships the real source). It emits `concierge.answered` and `ticket.created`; the latter is the live connection the C2 support inbox consumes. The data invariant holds: owner writes are canonical (`credential_id` NULL); a viewer's turns are credential-scoped, ephemeral, and reset-clearing, and can never clobber a canonical row (setWhere-guarded).

Metered AI, honesty inspector-only

The one metered stage is `draft-answer`. It runs dual-mode: Cloud `claude-haiku-4-5`, cost-capped and fail-closed; OSS recorded as `qwen3:8b (recorded · M4)` at $0 (the recorded-replay set — we never fabricate an un-recorded OSS row). Each metered run writes a `cost_ledger` row plus a `demo_cache` replay marker keyed on (slug · stage · prompt version · mode). Retrieval, grounding and the guard are deterministic ($0). All cost/mode/trace lives only in the dark inspector.

Out of scope (simulated + labelled)

The escalation confirmation is a simulated message in the in-app sent viewer (no real SMTP). The knowledge base is a small synthetic corpus with deterministic retrieval (no production vector store, no proprietary content). No real imagery, no PII, no production prompts appear anywhere.

Architecture · Concierge & Knowledge Assistant · Abhishek Saxena