Skip to content

Analytics Examples

Example queries against the flagship cross-domain mart, analytics.fct_species_environment_daily.

The Mart

analytics.fct_species_environment_daily joins three sources at species × H3 cell (resolution 6, ~36 km² hex) × day grain:

Source Values
eBird n_observations, n_checklists, total_birds_counted, n_notable_observations
NOAA tmax_c, tmin_c, temp_range_c, prcp_mm, snow_mm, wind_ms, is_rainy_day, nearest_station_id, nearest_station_distance_miles
USGS mean_discharge_cfs, mean_gage_height_ft, mean_water_temp_c, nearest_gauge_id, nearest_gauge_distance_miles

The grain is unique on (species_code, h3_cell, obs_date). Weather and streamflow are LEFT-joined — when the nearest station/gauge has no observation for a given date, the environmental columns are NULL. (NOAA GHCND data has a multi-day publish lag, so recent dates frequently have NULL weather.)

Example 1 — Species on Cold, Rainy Days

Which species show up in cold (tmax < 15 °C), rainy cells?

SELECT
    common_name,
    SUM(n_observations) AS obs,
    ROUND(AVG(prcp_mm)::NUMERIC, 2) AS avg_prcp_mm,
    ROUND(AVG(tmax_c)::NUMERIC, 1) AS avg_tmax_c
FROM databox.analytics.fct_species_environment_daily
WHERE is_rainy_day
  AND tmax_c < 15
GROUP BY common_name
ORDER BY obs DESC
LIMIT 20;

Example 2 — Species Diversity by H3 Cell

Which cells have the highest species diversity, and what does the weather look like there?

SELECT
    h3_cell,
    ROUND(AVG(cell_center_lat)::NUMERIC, 3) AS lat,
    ROUND(AVG(cell_center_lng)::NUMERIC, 3) AS lng,
    COUNT(DISTINCT species_code) AS species_count,
    SUM(n_observations) AS obs,
    ROUND(AVG(tmax_c)::NUMERIC, 1) AS avg_tmax_c,
    ROUND(AVG(prcp_mm)::NUMERIC, 2) AS avg_prcp_mm
FROM databox.analytics.fct_species_environment_daily
GROUP BY h3_cell
ORDER BY species_count DESC
LIMIT 10;

Example 3 — Bird Activity vs. Streamflow Regime

Do bird counts cluster at low-flow, medium-flow, or high-flow gauges?

SELECT
    CASE
        WHEN mean_discharge_cfs < 100 THEN 'low'
        WHEN mean_discharge_cfs < 1000 THEN 'medium'
        ELSE 'high'
    END AS flow_bucket,
    COUNT(DISTINCT species_code) AS species,
    SUM(n_observations) AS obs,
    ROUND(AVG(mean_discharge_cfs)::NUMERIC, 1) AS avg_cfs
FROM databox.analytics.fct_species_environment_daily
WHERE mean_discharge_cfs IS NOT NULL
GROUP BY 1
ORDER BY 1;

Example 4 — Cold-Snap-After-Rainfall Species

Species seen the day after a rainy day where tmax dropped ≥ 5 °C compared to the previous day:

WITH daily AS (
    SELECT
        species_code,
        common_name,
        h3_cell,
        obs_date,
        tmax_c,
        prcp_mm,
        n_observations,
        LAG(tmax_c) OVER (PARTITION BY h3_cell ORDER BY obs_date) AS prev_tmax_c,
        LAG(prcp_mm) OVER (PARTITION BY h3_cell ORDER BY obs_date) AS prev_prcp_mm
    FROM databox.analytics.fct_species_environment_daily
)
SELECT
    common_name,
    COUNT(DISTINCT obs_date) AS days_seen,
    SUM(n_observations) AS obs
FROM daily
WHERE prev_prcp_mm > 5.0
  AND prev_tmax_c - tmax_c >= 5.0
GROUP BY common_name
ORDER BY obs DESC
LIMIT 10;

Running the Queries

MotherDuck

uv run python -c "
import os, duckdb
con = duckdb.connect(f'md:databox?motherduck_token={os.environ[\"MOTHERDUCK_TOKEN\"]}')
print(con.execute('SELECT COUNT(*) FROM analytics.fct_species_environment_daily').fetchone())
"

Local DuckDB

uv run python -c "
import duckdb
con = duckdb.connect('data/databox.duckdb')
print(con.execute('SELECT COUNT(*) FROM analytics.fct_species_environment_daily').fetchone())
"

Or point the Streamlit explorer at the same database:

uv run streamlit run app/main.py