Logan Jonesupdated jul 2026
← index

Scrounged

2026 · self-hosted · runs pullfirst.com

pullfirst.com runs on scrapes of 50+ permit systems that change without notice. Ops is the control plane that makes that survivable: 50+ jobs scheduled, chained, retried, and audited from one dashboard. Nothing runs by hand.

What it runs

  • 50+ scraper, import, and sync jobs, each with typed parameters and its own cron cadence.
  • A chainer fires downstream imports the moment upstream collection lands: scrape finishes, import starts, nobody watches it happen.
  • Retry policies decide what a failure means before a human has to. Scrapes resume from checkpoints, so a source that dies mid-run costs a resume, not a dataset.
fig. 1 · the fleet: every job, every run, 30 days · [zoom]

fig. 1 · the fleet: every job, every run, 30 days[click] zoom in · [esc]

The dashboard

One screen over the whole fleet. Every job, every run, logs streaming live over SSE. Every run keeps its parameters, logs, and outcome.

  • Materialization tracking: every table traces back to the run that built it, staleness on display.
  • One briefing endpoint summarizes the fleet: what ran, what failed, what’s stale. The first thing checked every morning.
fig. 2 · one run: parameters, delta vs previous, stored logs · [zoom]

fig. 2 · one run: parameters, delta vs previous, stored logs[click] zoom in · [esc]

Shipping to production

The pipeline runs locally against a local Postgres; pullfirst.com reads from managed Postgres in the cloud. A branch-swap sync moves finished datasets between them: copy into a fresh branch of the production database, then swap. The site never reads a half-written import.

How it’s built

Python end to end: Flask API, cron scheduling, SSE streaming, Postgres state with materialized views behind the briefing. The dashboard is a Preact app, bundled and served by the ops server itself.

Underneath, the ETL layer the jobs drive: collection, normalization, the address grammar, entity resolution, imports. Local tooling, production data; the same runs that build pullfirst.com.

The hard parts

  • Sources break silently. A jurisdiction redesigns its portal and the scrape returns plausible-looking nothing. Hence audit trails, staleness tracking, and a briefing that leads with what’s stale.
  • The whole fleet runs on one desktop. Checkpoints and resumable runs mean a crash mid-scrape is an inconvenience, not an incident.
[0] ~/portfoliominnesota