Skip to content

Lesson 4 — Importing ABS SA3 boundaries

This is the lesson where the dispatch database stops being a toy. We add a ServiceArea model and import about 360 real polygon boundaries from the Australian Bureau of Statistics — the SA3 statistical regions, which divide the entire country into administratively meaningful areas. By the end of the lesson, every part of Australia is covered by a ServiceArea record with a real polygon, real population data, and a real SA3 code that you can look up on ABS publications.

This data becomes the foundation for everything spatial in the rest of the tutorial. Module 4 renders these polygons on the map. Module 6 uses them for click-to-drilldown and choropleth views. Module 7 stress-tests them when we add half a million job records weighted by SA3 population.

The lesson is mostly typing — generating a model, writing a small rake task, running it. The conceptual content from Lessons 1, 2, and 3 is what makes the typing make sense.

What an SA3 is

The Australian Bureau of Statistics divides Australia into a hierarchy of statistical areas:

  • SA1 — about 60,000 small areas, each containing roughly 200-800 people. Used for fine-grained demographic analysis.
  • SA2 — about 2,500 communities, neighbourhoods, or rural regions of 3,000-25,000 people.
  • SA3 — about 360 regional groupings, often corresponding to a council area or major suburb cluster. Population typically 30,000-130,000.
  • SA4 — about 100 broader regions, typically corresponding to a labour market.

For the dispatch deck, SA3 is the right scale. Smaller than SA4 (which would lump entire states together) but larger than SA2 (which would give us 2,500 polygons to render — too many for a national overview map). Each SA3 is a plausible service region for a dispatcher to manage.

The data files for this lesson aren’t committed to the chassis. Tutorial data lives in the appendix — separate from the codebase, with each lesson’s data linked from there. Download the files for this lesson now:

  • sa3.geojson.tgz (≈2 MB) — ABS SA3 boundaries with population data, ready to import

Extract it into db/data/:

1
2
mkdir -p db/data
tar -xzf sa3.geojson.tgz -C db/data/

You should now have db/data/sa3.geojson (≈6 MB uncompressed). Each feature in the file has:

  • A polygon or multipolygon geometry — the SA3 boundary
  • A properties object with the SA3 code, the SA3 name, and a population number

We’ll write the import to consume this file directly.

Generating the ServiceArea model

1
bin/rails generate model ServiceArea name:string code:string population:integer

This produces the migration and an empty app/models/service_area.rb.

Adding the geometry column

Open the generated migration and edit it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class CreateServiceAreas < ActiveRecord::Migration[8.0]
  def change
    create_table :service_areas do |t|
      t.string             :name,       null: false
      t.string             :code,       null: false
      t.integer            :population, null: false, default: 0
      t.multi_polygon      :boundary,   srid: 4326, null: false

      t.timestamps
    end

    add_index :service_areas, :code,     unique: true
    add_index :service_areas, :boundary, using: :gist
  end
end

Three things worth noting:

t.st_multi_polygon rather than t.st_polygon. The postgis adapter’s st_polygon type accepts only single polygons. ABS SA3 boundaries are multipolygons because some SA3s are made of more than one disconnected piece — coastal regions that include offshore islands, or council areas that wrap around enclaves. A multipolygon is one or more polygons treated as a single feature. Using st_multi_polygon accepts both single-polygon and multi-polygon shapes (a single polygon is a degenerate multipolygon with one ring set).

null: false, default: 0 on population. Most SA3s have population data; the few that don’t (mostly remote administrative regions) get a sensible default.

The GiST index on boundary for the same reasons we indexed Depot#location. We’ll use this index in Module 7; adding it now is the right discipline.

Run the migration:

1
bin/rails db:migrate

Writing the import task

Create lib/tasks/import.rake:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
namespace :vera do
  namespace :import do
    desc "Import ABS SA3 boundaries from the vendored GeoJSON file"
    task service_areas: :environment do
      path = Rails.root.join("db/data/sa3.geojson")
      raise "Source file not found at #{path}" unless File.exist?(path)

      data = JSON.parse(File.read(path))
      total = data["features"].size

      puts "Importing #{total} SA3 boundaries..."

      ServiceArea.transaction do
        data["features"].each_with_index do |feature, i|
          properties = feature["properties"]
          geometry   = feature["geometry"]

          ServiceArea.upsert(
            {
              name:       properties["sa3_name"],
              code:       properties["sa3_code"],
              population: properties["population"] || 0,
              boundary:   "SRID=4326;#{geometry_to_wkt(geometry)}",
              created_at: Time.current,
              updated_at: Time.current
            },
            unique_by: :code
          )

          if (i + 1) % 50 == 0
            puts "  #{i + 1} / #{total}"
          end
        end
      end

      puts "Done. ServiceArea count: #{ServiceArea.count}"
    end

    private

    def geometry_to_wkt(geometry)
      RGeo::GeoJSON.decode(geometry.to_json).as_text
    end
  end
end

A few details worth understanding.

The task is idempotent. It uses upsert keyed by the SA3 code, so re-running the task updates existing records rather than duplicating them. Idempotent imports are valuable — during development you’ll inevitably run the import several times.

The geometry conversion. GeoJSON’s geometry object isn’t WKT, and PostGIS’s column accepts WKT (or RGeo objects, or EWKT — extended WKT that includes an SRID prefix). The RGeo::GeoJSON.decode call (from the rgeo-geojson gem, which the activerecord-postgis-adapter brings in transitively) parses the GeoJSON shape into an RGeo geometry object, which we then convert to WKT with .as_text. Prefixing SRID=4326; makes it EWKT, which PostGIS accepts directly into the column.

The transaction wrapper. All 360 inserts happen inside one database transaction. If anything goes wrong partway through — malformed feature, network error, anything — the whole import rolls back rather than leaving you with a half-populated table.

Progress output every 50 features. Imports of this size run in a few seconds, but seeing progress is reassuring.

Running the import

1
bin/rails vera:import:service_areas

You should see progress output:

1
2
3
4
5
6
7
8
9
Importing 358 SA3 boundaries...
  50 / 340
  100 / 340
  150 / 340
  200 / 340
  250 / 340
  300 / 340
  340 / 340
Done. ServiceArea count: 340

If anything errors out, the transaction rolls back. Common issues: the GeoJSON file is missing (re-pull from chassis), the SRID 4326 isn’t matching (check your t.st_multi_polygon declaration), or the boundary column rejects something (check the WKT conversion).

Activity 1 — Verify the import

In the Rails console:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
ServiceArea.count
# => 358 (or close to it; ABS occasionally revises)

ServiceArea.first
# => <ServiceArea id: 1, name: "...", code: "...",
#                 population: ..., boundary: <RGeo::...>, ...>

# Population spot-check
ServiceArea.order(population: :desc).limit(5).pluck(:name, :population)
# => [["Dandenong", 178000], ["Wyndham", 173000], ...]

The names and numbers will match real Australian regions because the data is real. The largest-population SA3s are typically outer-suburban areas of the major capitals — places that have grown rapidly in the last decade.

Inspecting a polygon

Each ServiceArea#boundary is now a real polygon. Let’s look at one in detail.

In psql:

1
2
3
4
5
6
7
8
SELECT
  name,
  ST_NPoints(boundary) AS vertex_count,
  ST_Area(boundary)    AS area_in_square_degrees,
  ST_Area(boundary::geography) / 1000000 AS area_in_sq_km
FROM service_areas
WHERE name = 'Sydney Inner City'
LIMIT 1;

You should see something like:

1
2
3
       name        | vertex_count | area_in_square_degrees |   area_in_sq_km   
-------------------+--------------+------------------------+-------------------
 Sydney Inner City |           37 |   0.002461573555478633 | 25.25763072281401

A few things to notice:

ST_NPoints counts the vertices in the polygon. Our dataset has reduced the vertex_count to make the data more manageable (the raw dataset was around 312Mb)

ST_Area returns square degrees when called on a geometry column with SRID 4326, because the underlying math doesn’t know what unit the coordinates are in. Square degrees are not a useful unit; they vary in physical size depending on latitude. The number is just there.

ST_Area(boundary::geography) returns square metres — real area on Earth’s surface, computed using spheroidal math. Dividing by 1,000,000 converts to square kilometres. Sydney’s Inner City is around 25 km² — verifiable against any reference.

This is the analogue of the ST_DistanceSphere pattern from Lesson 1 for area: when you need real-world area in square metres, cast to geography. We won’t reach for this often either, but it’s good to know the cast pattern works for area the same way ST_DistanceSphere works for distance.

Activity 2 — Find the SA3 containing a point

The classic spatial query: which SA3 contains this point?

1
2
3
4
5
6
SELECT name, population
FROM service_areas
WHERE ST_Contains(
  boundary,
  ST_SetSRID(ST_MakePoint(153.0251, -27.4698), 4326)
);

You should get back the SA3 covering Sydney CBD — something like Brisbane Inner. The query is the spatial equivalent of “which row has this id” — except instead of matching an integer, it’s testing geometric containment.

ST_Contains(polygon, point) returns true if the point is inside the polygon. PostGIS computes this fast on indexed geometry columns — even faster after Module 7’s tuning. We’ll use this query shape extensively in Module 6 when we build click-to-drilldown UI on the map.

Try a few other locations:

  • Hobart CBD: (147.3257, -42.8826)Hobart - South and West or similar
  • Perth CBD: (115.8575, -31.9523)Perth City or similar

Each returns the SA3 covering that point. The same query works for any latitude/longitude pair anywhere in Australia. This is the kind of spatial query a dispatch app’s “what service area is this customer in?” feature would run.

Where this leaves us

Module 3 is complete. The database now has:

  • A Depot model with two rows (Sydney Central, Brisbane North)
  • A ServiceArea model with about 340 rows covering all of Australia, each with a real polygon boundary, an SA3 code, and 2021 census population
  • GiST indexes on both geometry columns
  • A repeatable rake task for re-running the import

You’ve also seen:

  • The t.multi_polygon migration syntax for multipolygon columns
  • How to use ST_Contains to find polygons containing a point
  • Why ST_Area on a geometry column returns square degrees and how to get real square metres

Module 4 takes the next step: connecting this data to the map. We’ll write a controller action that returns ServiceArea boundaries as GeoJSON, wire it up as a source on a Vera::Map, and watch Australia’s polygons appear on the screen for the first time.