Skip to content

Ingest memory-bound test fixture — runbook

Status: Open Owner: backbone maintainers Tracking issue: #297 Parent epic: #293 — Phase 6 §6.6

This runbook documents how to provision the 5 GiB river-flood fixture that backs the memory-bound gating test (tests/worker/ingest/test_memory_bound.py). The §6.6 specification calls this the gating bar: a 5 GiB-class hazard source must ingest end-to-end without any worker process exceeding 1024 MiB RSS at any 250 ms sample point.

The test is @pytest.mark.slow and gated on the INGEST_MEMORY_TEST_FIXTURE_KEY environment variable. It skips silently when that variable is unset, so the standard fast-path CI never runs it (the §6.6 spec is explicit about that). The slow job in ci.yml provisions the fixture and exports the variable before invoking pytest -m slow.


Source dataset

Field Value
Layer River flood depth, single-band, multi-RP (return periods 10 / 50 / 100 / 250 / 500 yr)
Format netCDF4 (CF-1.8), float32, EPSG:4326
Footprint Global, 30 arc-second resolution
Approx. size 5.0 – 5.2 GiB on disk
Suggested origin JRC Global River Flood Hazard Maps (Aqueduct-compatible) — pre-merged into one netCDF

A canonical fixture is stored as a single .nc file. Hash and exact size are recorded once provisioned (see Provisioning below) and pinned in the slow CI job so a silently-mutated fixture cannot pass the gate.

Why netCDF and not GeoTIFF? §6.6 specifies netCDF because it stresses the worker's stream-from-MinIO path with a single large object — the worst case for memory accounting. The chunked-read path inside write_chunk exercises the same rasterio window APIs regardless of container format.


Provisioning

Local (dev MinIO)

  1. Bring up the dev stack and confirm MinIO is reachable on http://localhost:9000:
docker compose up -d minio postgres redis
  1. Pull or build the fixture file. Example using a pre-staged copy:
curl -L -o /tmp/rf-5gib.nc \
  "https://<your-fixture-mirror>/rf-5gib.nc"
sha256sum /tmp/rf-5gib.nc   # record the hash
  1. Upload to the dev climate-lama bucket under a stable key:
mc alias set local http://localhost:9000 minioadmin minioadmin
mc cp /tmp/rf-5gib.nc local/climate-lama/fixtures/memory-bound/rf-5gib.nc
  1. Export the env var pointing at the key and run the slow test:
export INGEST_MEMORY_TEST_FIXTURE_KEY="fixtures/memory-bound/rf-5gib.nc"
uv run pytest tests/worker/ingest/test_memory_bound.py -m slow -s

-s keeps the INGEST_MEMORY_PEAK_MIB=<value> line visible in the terminal.

CI (GitHub Actions slow job)

The slow job is triggered on workflow_dispatch or by adding the run-slow-tests label to a PR. It does not run on the standard push / pull_request fast path.

The CI job pulls the fixture from the project's release-asset mirror (private GHCR-hosted blob) into a job-scoped MinIO container before exporting INGEST_MEMORY_TEST_FIXTURE_KEY. The provisioning step is:

  1. Start an ephemeral MinIO service container.
  2. Download the fixture using gh release download (or aws s3 cp against a read-only mirror) into the workflow's tmp dir.
  3. Verify SHA-256 against the pinned hash; abort on mismatch.
  4. mc cp the file into the bucket.
  5. Export INGEST_MEMORY_TEST_FIXTURE_KEY=fixtures/memory-bound/rf-5gib.nc.

The job runs pytest -m slow -s against tests/worker/ingest/ so the peak RSS log line is visible in the GitHub Actions log.


Environment variable contract

Variable Purpose
INGEST_MEMORY_TEST_FIXTURE_KEY MinIO object key (within the configured MINIO_BUCKET, defaults to climate-lama) pointing at the 5 GiB fixture. The test skips when this is unset.

The test uses the same MinIO client configuration as the worker (MINIO_ENDPOINT, MINIO_ACCESS_KEY, MINIO_SECRET_KEY, MINIO_BUCKET); no new variables are introduced.


CI runtime + cost notes

  • Subprocess wall time on the standard ubuntu-latest runner: ~3–5 minutes for the 5 GiB stream + chunked read.
  • Total job time including MinIO provisioning + fixture download: ~6–9 minutes.
  • Frequency: opt-in only (label or manual dispatch). Typical use is pre-merge for PRs touching src/climate_lama/worker/ingest/.
  • Cost: each opt-in run consumes ~10 minutes of standard runner time.

Reporting

On every run (success or failure) the test emits a single parseable line:

INGEST_MEMORY_PEAK_MIB=<integer>

The slow CI job's log can be grepped for this line so the team can track the peak over time. No artifact upload is required for v1; the log line is enough to detect regressions.


Failure modes

Symptom Likely cause Action
INGEST_MEMORY_PEAK_MIB ≥ 1024 A worker code path materialised the full source (no streaming) Bisect the diff against src/climate_lama/worker/ingest/; check write_chunk._stream_to_disk and _read_window
Test reports "fixture key set but object missing" Bucket / key drift Re-run provisioning and confirm mc ls local/climate-lama/fixtures/memory-bound/
Subprocess crashes before any sample is taken MinIO connection / auth Verify MINIO_* env vars in the job