Ruby Field Notes

Lesson 1 — A richer dataset

The dispatch deck works. The role-scoped dashboards run cleanly, the maps render correctly, the queries return what they should. But the data is thin. Five thousand jobs and a hundred field officers across nine depots is enough to prove the architecture; it’s not enough to reveal what real operations look like or to stress what we’ve built.

Module 6 is about reporting — choropleths, trend charts, performance comparisons, SLA tracking. Reporting reveals patterns, and patterns need volume. Before any new code, we need data that reflects a working operation at scale.

What we’re scaling to

A new seed task for Module 6:

500,000 jobs spanning two years, distributed seasonally across the country
3,000 field officers distributed proportionally to depot population — Sydney metro carries roughly 700 of them, Darwin closer to 50

The job distribution isn’t uniform. Population-weighted across customers, seasonally varying (Mondays-Fridays busy, Sundays quiet), monthly patterns (autumn busy, midwinter steady, December slow), per-SA SLA bias (some areas reliably fast, some reliably slow), recency-weighted status (old jobs mostly closed; recent jobs include real volumes of pending and scheduled work).

The shape isn’t random. Every choice exists to make the upcoming reports tell honest stories.

Running the seed

A new rake task ships in lib/tasks/seed_module_6.rake:

1

bin/rails vera:seed_module_6

It wipes existing jobs and field officers (Module 5’s are no longer fit for the scale we want) and regenerates fresh ones. Expect 4-7 minutes depending on your machine. The task prints progress per batch.

The implementation is straightforward bulk-insert work — if you’re curious about distribution shaping or batching patterns, the file’s worth a read. For learning GIS and reporting, you don’t need to know the seed’s internals.

When it finishes, you’ll see something like:

Done. Final counts:
  Jobs:           499,847
  Field officers: 3,000
  Date range:     2024-04-29 .. 2026-04-29
  Job status distribution:
    cancelled       18,742
    complete       452,103
    in_progress      4,201
    pending          8,514
    scheduled       16,287

A real-shaped operation. Most jobs done, some active, some queued, a small fraction cancelled — proportions of a business that’s been running for two years.

What changed in the dashboards

No code changes; just refresh and look.

The manager’s dashboard carries through unchanged at the national scale — same SA-level structure, same red outlines on 340 polygons. The stats reflect non-trivial volume. (We never overlaid jobs or FOs onto the manager’s map, which is its own gap to revisit; for now the manager view stays as it was.)

The dispatcher’s dashboard for a smaller depot — try Hobart or Darwin — shows a workable picture. A handful of pending jobs, a real day’s schedule, FO trucks scattered visibly across the busy SAs. The map reads cleanly.

The dispatcher’s dashboard for Sydney is a different story. Sign in as the Sydney dispatcher and refresh. The map fills with truck icons — hundreds of them, stacked on top of each other in the metro core. Job circles bleed together into amorphous clusters. The popup machinery still works on individual features, but visually the map has stopped communicating. There are too many things in the same place.

This is a real problem, and it’s not one we created with bad data. A 9-depot operation across Australia really does have this distribution — Sydney metro carries the bulk of the workforce because that’s where most of the work is. The dataset is honest; the visualisation has hit its limit.

The lesson the data is teaching us

When the same kind of feature stacks densely enough that individual instances stop being readable, the map needs a different rendering technique. Each point is no longer the unit that matters — the density is. A Sydney dispatcher doesn’t need to see every truck individually at the city-wide zoom level. They need to see where the trucks are and roughly how many, with the option to drill in for detail.

That technique is clustering. The map renders dense regions as bubbles labelled with their count, and the bubbles split apart as the user zooms in until individual features become visible again.

Lesson 2 introduces clustering, applies it to both the FO and job layers on the dispatcher’s map, and demonstrates it with the exact dataset you just seeded. Sydney becomes legible again without losing any data.

What this also enables

Clustering is the immediate need. The dataset enables much more.

Choropleths reveal real patterns. With job density varying across SAs by an order of magnitude, density-coloured polygons show visible structure — Sydney lights up bright, remote NT sits quiet. Coming in Lesson 3.

Aggregations have meaningful variation. Counting jobs by SA, by depot, by month — every group has different numbers. Reports tell stories rather than reading uniformly.

Trends show seasonal shape. With two years of data, weekly cycles and monthly patterns become visible on charts. Lesson 6 introduces trend visualisations.

Performance pressure becomes real. Some queries that ran in milliseconds at 5,000 records take seconds at 500,000. The right indexes and query patterns matter substantially. Lesson 8 makes this concrete — drop an index, watch a query collapse.

The architecture we built in Module 5 was designed for this scale all along. This lesson hasn’t changed any of it. It’s just turned up the volume so we can hear what the system plays — including the parts that need fixing.

Where this leaves us

A working dispatch deck with real data. Most of the dashboards are richer; some now have problems that need solving. From here the work is about both extracting insight (the reports proper) and making the visualisations scale to the data. The two threads weave together over the rest of the module.

Lesson 2 starts with the immediate problem: rendering thousands of points on a map without losing the user.

Clustering and spiderfy