Data Quality & Lineage Monitor · Documentation
Architecture
Data Quality & Lineage Monitor's pipeline, its owned data, the events it emits/consumes, and what is out of scope.
← Data Quality & Lineage MonitorPipeline
Run rules over entity metadata → score + classify (pass/warn/fail vs threshold) → roll up (health counts + 14-day trend) → explain a failing rule (metered AI-assist). Detection and rollups are pure functions; the lineage graph is a static derivation of the coupling spine.
Reads + the data invariant
Reads metadata and scores across the shared data layer — the entities other clusters own — with no row-level access. It holds no mutable config of its own beyond the shared C8 surfaces and performs no canonical writes; thresholds are owner-governed and the whole surface is read-only to viewers.
Events + metering, dual-mode
Each Explain emits cost.logged into the shared ledger that #27 reads. The explain stage is dual-mode (Cloud `claude-haiku-4-5` cost-capped/fail-closed · OSS recorded $0); everything else is deterministic and $0. The lineage view documents the same event spine the rest of C8 rides.
Out of scope (simulated + labelled)
Checks are computed over the synthetic dataset rather than executed against a live warehouse, and the root-cause step is a simulated metered narrative. No real DQ engine, no alerting integration, no PII. In Stage-2 the same method surface maps to real check jobs and a real lineage store with no UI change.