Ingest memory-bound test fixture — runbook¶
Status: Open Owner: backbone maintainers Tracking issue: #297 Parent epic: #293 — Phase 6 §6.6
This runbook documents how to provision the 5 GiB river-flood fixture that
backs the memory-bound gating test (tests/worker/ingest/test_memory_bound.py).
The §6.6 specification calls this the gating bar: a 5 GiB-class hazard source
must ingest end-to-end without any worker process exceeding 1024 MiB RSS at
any 250 ms sample point.
The test is @pytest.mark.slow and gated on the
INGEST_MEMORY_TEST_FIXTURE_KEY environment variable. It skips silently when
that variable is unset, so the standard fast-path CI never runs it (the §6.6
spec is explicit about that). The slow job in ci.yml
provisions the fixture and exports the variable before invoking
pytest -m slow.
Source dataset¶
| Field | Value |
|---|---|
| Layer | River flood depth, single-band, multi-RP (return periods 10 / 50 / 100 / 250 / 500 yr) |
| Format | netCDF4 (CF-1.8), float32, EPSG:4326 |
| Footprint | Global, 30 arc-second resolution |
| Approx. size | 5.0 – 5.2 GiB on disk |
| Suggested origin | JRC Global River Flood Hazard Maps (Aqueduct-compatible) — pre-merged into one netCDF |
A canonical fixture is stored as a single .nc file. Hash and exact size are
recorded once provisioned (see Provisioning below) and pinned in the slow CI
job so a silently-mutated fixture cannot pass the gate.
Why netCDF and not GeoTIFF? §6.6 specifies netCDF because it stresses the worker's stream-from-MinIO path with a single large object — the worst case for memory accounting. The chunked-read path inside
write_chunkexercises the samerasteriowindow APIs regardless of container format.
Provisioning¶
Local (dev MinIO)¶
- Bring up the dev stack and confirm MinIO is reachable on
http://localhost:9000:
- Pull or build the fixture file. Example using a pre-staged copy:
curl -L -o /tmp/rf-5gib.nc \
"https://<your-fixture-mirror>/rf-5gib.nc"
sha256sum /tmp/rf-5gib.nc # record the hash
- Upload to the dev
climate-lamabucket under a stable key:
mc alias set local http://localhost:9000 minioadmin minioadmin
mc cp /tmp/rf-5gib.nc local/climate-lama/fixtures/memory-bound/rf-5gib.nc
- Export the env var pointing at the key and run the slow test:
export INGEST_MEMORY_TEST_FIXTURE_KEY="fixtures/memory-bound/rf-5gib.nc"
uv run pytest tests/worker/ingest/test_memory_bound.py -m slow -s
-s keeps the INGEST_MEMORY_PEAK_MIB=<value> line visible in the
terminal.
CI (GitHub Actions slow job)¶
The slow job is triggered on workflow_dispatch or by adding the
run-slow-tests label to a PR. It does not run on the standard
push / pull_request fast path.
The CI job pulls the fixture from the project's release-asset mirror (private
GHCR-hosted blob) into a job-scoped MinIO container before exporting
INGEST_MEMORY_TEST_FIXTURE_KEY. The provisioning step is:
- Start an ephemeral MinIO service container.
- Download the fixture using
gh release download(oraws s3 cpagainst a read-only mirror) into the workflow's tmp dir. - Verify SHA-256 against the pinned hash; abort on mismatch.
mc cpthe file into the bucket.- Export
INGEST_MEMORY_TEST_FIXTURE_KEY=fixtures/memory-bound/rf-5gib.nc.
The job runs pytest -m slow -s against tests/worker/ingest/ so the peak
RSS log line is visible in the GitHub Actions log.
Environment variable contract¶
| Variable | Purpose |
|---|---|
INGEST_MEMORY_TEST_FIXTURE_KEY |
MinIO object key (within the configured MINIO_BUCKET, defaults to climate-lama) pointing at the 5 GiB fixture. The test skips when this is unset. |
The test uses the same MinIO client configuration as the worker
(MINIO_ENDPOINT, MINIO_ACCESS_KEY, MINIO_SECRET_KEY, MINIO_BUCKET); no
new variables are introduced.
CI runtime + cost notes¶
- Subprocess wall time on the standard
ubuntu-latestrunner: ~3–5 minutes for the 5 GiB stream + chunked read. - Total job time including MinIO provisioning + fixture download: ~6–9 minutes.
- Frequency: opt-in only (label or manual dispatch). Typical use is
pre-merge for PRs touching
src/climate_lama/worker/ingest/. - Cost: each opt-in run consumes ~10 minutes of standard runner time.
Reporting¶
On every run (success or failure) the test emits a single parseable line:
The slow CI job's log can be grepped for this line so the team can track the peak over time. No artifact upload is required for v1; the log line is enough to detect regressions.
Failure modes¶
| Symptom | Likely cause | Action |
|---|---|---|
INGEST_MEMORY_PEAK_MIB ≥ 1024 |
A worker code path materialised the full source (no streaming) | Bisect the diff against src/climate_lama/worker/ingest/; check write_chunk._stream_to_disk and _read_window |
| Test reports "fixture key set but object missing" | Bucket / key drift | Re-run provisioning and confirm mc ls local/climate-lama/fixtures/memory-bound/ |
| Subprocess crashes before any sample is taken | MinIO connection / auth | Verify MINIO_* env vars in the job |