Lesson 4 — Importing ABS SA3 boundaries
This is the lesson where the dispatch database stops being a
toy. We add a ServiceArea model and import about 360 real
polygon boundaries from the Australian Bureau of Statistics —
the SA3 statistical regions, which divide the entire country
into administratively meaningful areas. By the end of the
lesson, every part of Australia is covered by a ServiceArea
record with a real polygon, real population data, and a real
SA3 code that you can look up on ABS publications.
This data becomes the foundation for everything spatial in the rest of the tutorial. Module 4 renders these polygons on the map. Module 6 uses them for click-to-drilldown and choropleth views. Module 7 stress-tests them when we add half a million job records weighted by SA3 population.
The lesson is mostly typing — generating a model, writing a small rake task, running it. The conceptual content from Lessons 1, 2, and 3 is what makes the typing make sense.
What an SA3 is
The Australian Bureau of Statistics divides Australia into a hierarchy of statistical areas:
- SA1 — about 60,000 small areas, each containing roughly 200-800 people. Used for fine-grained demographic analysis.
- SA2 — about 2,500 communities, neighbourhoods, or rural regions of 3,000-25,000 people.
- SA3 — about 360 regional groupings, often corresponding to a council area or major suburb cluster. Population typically 30,000-130,000.
- SA4 — about 100 broader regions, typically corresponding to a labour market.
For the dispatch deck, SA3 is the right scale. Smaller than SA4 (which would lump entire states together) but larger than SA2 (which would give us 2,500 polygons to render — too many for a national overview map). Each SA3 is a plausible service region for a dispatcher to manage.
The data files for this lesson aren’t committed to the chassis. Tutorial data lives in the appendix — separate from the codebase, with each lesson’s data linked from there. Download the files for this lesson now:
sa3.geojson.tgz(≈2 MB) — ABS SA3 boundaries with population data, ready to import
Extract it into db/data/:
|
|
You should now have db/data/sa3.geojson (≈6 MB uncompressed).
Each feature in the file has:
- A polygon or multipolygon
geometry— the SA3 boundary - A
propertiesobject with the SA3 code, the SA3 name, and a population number
We’ll write the import to consume this file directly.
Generating the ServiceArea model
|
|
This produces the migration and an empty app/models/service_area.rb.
Adding the geometry column
Open the generated migration and edit it:
|
|
Three things worth noting:
t.st_multi_polygon rather than t.st_polygon. The
postgis adapter’s st_polygon type accepts only single
polygons. ABS SA3 boundaries are multipolygons because some
SA3s are made of more than one disconnected piece — coastal
regions that include offshore islands, or council areas that
wrap around enclaves. A multipolygon is one or more polygons
treated as a single feature. Using st_multi_polygon accepts
both single-polygon and multi-polygon shapes (a single polygon
is a degenerate multipolygon with one ring set).
null: false, default: 0 on population. Most SA3s have
population data; the few that don’t (mostly remote
administrative regions) get a sensible default.
The GiST index on boundary for the same reasons we
indexed Depot#location. We’ll use this index in Module 7;
adding it now is the right discipline.
Run the migration:
|
|
Writing the import task
Create lib/tasks/import.rake:
|
|
A few details worth understanding.
The task is idempotent. It uses upsert keyed by the SA3
code, so re-running the task updates existing records rather
than duplicating them. Idempotent imports are valuable —
during development you’ll inevitably run the import several
times.
The geometry conversion. GeoJSON’s geometry object isn’t
WKT, and PostGIS’s column accepts WKT (or RGeo objects, or
EWKT — extended WKT that includes an SRID prefix). The RGeo::GeoJSON.decode
call (from the rgeo-geojson gem, which the activerecord-postgis-adapter brings in
transitively) parses the GeoJSON shape into an RGeo geometry object,
which we then convert to WKT with .as_text. Prefixing
SRID=4326; makes it EWKT, which PostGIS accepts directly into
the column.
The transaction wrapper. All 360 inserts happen inside one database transaction. If anything goes wrong partway through — malformed feature, network error, anything — the whole import rolls back rather than leaving you with a half-populated table.
Progress output every 50 features. Imports of this size run in a few seconds, but seeing progress is reassuring.
Running the import
|
|
You should see progress output:
|
|
If anything errors out, the transaction rolls back. Common
issues: the GeoJSON file is missing (re-pull from chassis),
the SRID 4326 isn’t matching (check your t.st_multi_polygon
declaration), or the boundary column rejects something
(check the WKT conversion).
Activity 1 — Verify the import
In the Rails console:
|
|
The names and numbers will match real Australian regions because the data is real. The largest-population SA3s are typically outer-suburban areas of the major capitals — places that have grown rapidly in the last decade.
Inspecting a polygon
Each ServiceArea#boundary is now a real polygon. Let’s look
at one in detail.
In psql:
|
|
You should see something like:
|
|
A few things to notice:
ST_NPoints counts the vertices in the polygon. Our dataset
has reduced the vertex_count to make the data more manageable (the raw
dataset was around 312Mb)
ST_Area returns square degrees when called on a
geometry column with SRID 4326, because the underlying math
doesn’t know what unit the coordinates are in. Square degrees
are not a useful unit; they vary in physical size depending on
latitude. The number is just there.
ST_Area(boundary::geography) returns square metres —
real area on Earth’s surface, computed using spheroidal math.
Dividing by 1,000,000 converts to square kilometres. Sydney’s
Inner City is around 25 km² — verifiable against any reference.
This is the analogue of the ST_DistanceSphere pattern from
Lesson 1 for area: when you need real-world area in square
metres, cast to geography. We won’t reach for this often
either, but it’s good to know the cast pattern works for area
the same way ST_DistanceSphere works for distance.
Activity 2 — Find the SA3 containing a point
The classic spatial query: which SA3 contains this point?
|
|
You should get back the SA3 covering Sydney CBD —
something like Brisbane Inner. The query is
the spatial equivalent of “which row has this id” — except
instead of matching an integer, it’s testing geometric
containment.
ST_Contains(polygon, point) returns true if the point is
inside the polygon. PostGIS computes this fast on indexed
geometry columns — even faster after Module 7’s tuning. We’ll
use this query shape extensively in Module 6 when we build
click-to-drilldown UI on the map.
Try a few other locations:
- Hobart CBD:
(147.3257, -42.8826)→Hobart - South and Westor similar - Perth CBD:
(115.8575, -31.9523)→Perth Cityor similar
Each returns the SA3 covering that point. The same query works for any latitude/longitude pair anywhere in Australia. This is the kind of spatial query a dispatch app’s “what service area is this customer in?” feature would run.
Where this leaves us
Module 3 is complete. The database now has:
- A
Depotmodel with two rows (Sydney Central, Brisbane North) - A
ServiceAreamodel with about 340 rows covering all of Australia, each with a real polygon boundary, an SA3 code, and 2021 census population - GiST indexes on both geometry columns
- A repeatable rake task for re-running the import
You’ve also seen:
- The
t.multi_polygonmigration syntax for multipolygon columns - How to use
ST_Containsto find polygons containing a point - Why
ST_Areaon a geometry column returns square degrees and how to get real square metres
Module 4 takes the next step: connecting this data to the
map. We’ll write a controller action that returns
ServiceArea boundaries as GeoJSON, wire it up as a source on
a Vera::Map, and watch Australia’s polygons appear on the
screen for the first time.