Skip to content

Databox

Dataset-agnostic single-operator data platform. Ingests bird, weather, and streamflow sources with dlt, transforms with SQLMesh, validates with Soda contracts, and orchestrates with Dagster on DuckDB / MotherDuck.

What's here

  • Data dictionary — every model, its columns and types, the Soda contract in effect, and direct lineage. Auto-generated from SQLMesh and Soda metadata.
  • Lineage — full model dependency graph rendered with Mermaid. Each node links to its dictionary page.
  • Metrics — semantic metrics layer over the flagship mart (analytics.fct_species_environment_daily).
  • Analytics examples — representative queries the flagship mart supports.
  • Contracts — Soda quality contract conventions.
  • Incremental loading — dlt incremental and SQLMesh incremental-by-time notes.

Architecture decisions

Six backfilled ADRs (Nygard format) explain the load-bearing choices:

The root README frames the platform as a case study with system and data-flow diagrams in Mermaid.

Regenerate

Everything under dictionary/ is generated from the repo — do not hand-edit. Rebuild with:

uv run python scripts/generate_docs.py

Target runtime: under 30 seconds; observed runtime: ~1–2 seconds for the current model set.