ADR-0006: MotherDuck as the cloud path¶
Status: Accepted · 2026-04
Context¶
The local stack (ADR-0001) is good enough for daily operation but breaks down for two use cases: sharing the data with another laptop, and handing a recruiter a live dashboard they can click without cloning the repo.
The candidate cloud paths considered:
- Managed Postgres / Snowflake / BigQuery — wrong workload fit (OLTP / expensive / egress costs), requires rewriting models off DuckDB dialect.
- Parquet-on-S3 + Athena — works but forces the dashboard layer to speak a different dialect than the local stack, doubling the test surface.
- MotherDuck — cloud DuckDB. Accepts
md:URIs as drop-in replacements for local paths. Same SQL dialect, same extensions, same features as local DuckDB. - Self-hosted DuckDB on a VM — defeats the "no always-on infra" constraint (ADR-0001).
Decision¶
Use MotherDuck as the cloud path. Keep the local path as default.
Architecture is gateway-based: DATABOX_BACKEND=local or =motherduck
switches settings.database_path and settings.raw_*_path between
file paths and md: URIs. SQLMesh gateways (local, motherduck)
mirror the same split. No SQL changes between modes.
Consequences¶
Positive:
- Identical SQL. The same transforms and the same Soda contracts run
locally and in MotherDuck. No dialect split.
- Free hobbyist tier covers the portfolio use case without any upfront
cost.
- Shared dashboards resolve md: URIs directly — no ETL-to-dashboard
copy step.
- Backup is a COPY FROM LOCAL TO md: on demand. Switching backends is
an environment-variable flip, reversible.
Negative: - MotherDuck is proprietary SaaS. The data and SQL are portable; the service is not. If MotherDuck's pricing or service changes, the fallback is "stay on local DuckDB and publish static Parquet snapshots". - Some DuckDB extensions load in a different order on MotherDuck than locally (the H3 community extension was the first example). The metrics layer hit this during implementation (see the semantic-metrics research record) — working-around requires invoking extension loading explicitly. - Network round-trips are now a thing. Local DuckDB queries returned in milliseconds; MotherDuck queries are typically 10–100× slower for the same shape due to network, though still comfortably under a second for the dashboards in this repo.
Neutral: - This violates the project's default preference for open-source over proprietary (noted in the root CLAUDE.md). The trade-off is explicit: proprietary service acceptable only because the data and SQL layer above it are fully portable — the same stack keeps running on local DuckDB if MotherDuck goes away.