Areal Interpolation of Attributes with QGIS
Turn-in for grading: This lab includes material that must be turned in for grading. Complete the required deliverables and submit them as instructed by the course.
Overview
This lab introduces areal interpolation in QGIS.
The main problem is a common one in GIS: we want to summarize population by watershed basin, but the population data is stored by Census block group. Those two boundary systems do not line up neatly. Some block groups fall entirely within one watershed, while others cross watershed boundaries.
To handle that mismatch, you will:
- calculate the original area of each Census block group
- split the block groups where they intersect watershed boundaries
- calculate the area of the resulting child polygons
- compute a weight based on the proportion of overlap
- use that weight to estimate the share of population assigned to each watershed piece
- summarize the weighted population by watershed
Concept note: Areal interpolation is used when attributes are available for one set of polygons, but the analysis needs to be reported for a different set of polygons. In this lab, the key assumption is simple area weighting: if 40% of a block group's area falls inside a watershed, we will estimate that 40% of the block group's population belongs to that watershed piece.
Getting Ready
You will need:
- the workshop dataset: CT_Watershed_Data.gdb.zip
- the Group Stats QGIS plugin
Download and unpack the data
- Download CT_Watershed_Data.gdb.zip.
- Unzip it somewhere stable on your computer.
- Create a new project folder for this lab.
- Save a new QGIS project in that folder as
areal_interpolation.qgz.
Install the Group Stats plugin
You will use Group Stats at the end of the lab to summarize weighted population by watershed.
- In QGIS, go to Plugins > Manage and Install Plugins.
- Search for
Group Stats. - Install the plugin.
Concept note: Group Stats works like a pivot table. It is useful when you need to group records by one field and summarize the values in another field.
Data for This Exercise
From the geodatabase, you will use these layers:
CT_State_BoundaryCT_Block_GroupsCT_Major_Basins
The important relationship is this:
CT_Block_Groupscontains the population attribute you want to redistributeCT_Major_Basinscontains the watershed boundaries you want to report to
Part 1: Add the Layers and Get Oriented
- In the Browser panel, browse to the unzipped
CT_Watershed_Data.gdb. - Drag these layers into the map canvas in this order:
CT_State_BoundaryCT_Block_GroupsCT_Major_Basins
- Open the Layer Styling panel.
- Style
CT_Block_GroupsandCT_Major_Basinswith Transparent Fill and a visible outline so you can compare the two boundary systems easily.
Concept note: Before doing any geoprocessing, it is worth taking a minute to visually confirm the analytical problem. You should be able to see that many block groups do not align with watershed boundaries.
Placeholder image: QGIS map canvas showing Connecticut block groups and major basins with transparent fills so the overlapping boundary systems are visible.
Part 2: Calculate the Parent Area of the Block Groups
Before splitting the polygons, calculate the original area of each block group.
- Open the Attribute Table for
CT_Block_Groups. - Look through the available fields, especially the population field you will use later.
- Open the Field Calculator.
- Create a new virtual field named
P_AREA. - Set the field type to Whole number (integer) or Decimal number (real).
- Use the expression:
$area
- Click OK.
Concept note:
P_AREAstands for parent area. This is the area of each intact source polygon before it gets split by the watershed boundaries.Why a virtual field here? For this workflow, a virtual field is a good choice because it lets you calculate and use the value immediately without altering the original source layer inside the geodatabase.
Part 3: Use Union to Split the Block Groups by Watershed
Now create the overlap geometry that makes the weighting possible.
- Open the Processing Toolbox.
- Search for Union and open the tool.
- Set:
- Save the output to your project folder as
union.gpkg. - Click Run.
This will produce a new layer containing the combined boundaries and attributes of both inputs.
Concept note: The Union tool is doing the key spatial transformation in this lab. It creates new polygons wherever the source boundaries intersect. Those new polygons are the smaller units on which you can calculate overlap proportions.
Workflow note: The union may take a little time because the block group polygons are fairly detailed.
Part 4: Calculate the Child Area
Now measure the area of the new polygons created by the union.
- Open the Attribute Table for the
unionlayer. - Open the Field Calculator.
- Create a new field named
CH_AREA. - Use a numeric field type.
- Use the expression:
$area
- Click OK.
Concept note:
CH_AREAstands for child area. These are the smaller pieces created after the original block groups were cut by watershed boundaries.
Part 5: Exclude Records with Null Parent Area
Some polygons in the union result will not represent meaningful source polygons for the interpolation step. Remove those from the weight calculation.
"P_AREA" IS NULL
- Click Select Features.
- Inspect the selected polygons on the map.
- In the attribute table, click Invert Selection so that the selected records are the polygons where
P_AREAis not null.
Concept note: The null records are usually slivers or pieces that do not carry a valid source-area value from the original block group layer. If you leave them in the weight calculation, you risk dividing by null and producing invalid results.
Part 6: Calculate the Area Weight
Now calculate the proportion of each child polygon relative to its original parent polygon.
- With the non-null records still selected, open the Field Calculator.
- Create a new field named
WEIGHT. - Use Decimal number (real) as the output type.
- If available, check Only update selected features.
- Use the expression:
"CH_AREA" / "P_AREA"
- Set a precision that keeps several decimal places.
- Click OK.
Concept note: A value of
1means the original block group was not split at all for that record. A value smaller than1means only part of the block group's area falls in that watershed polygon.
Part 7: Calculate Weighted Population
Now apply the area weight to the population attribute.
- Open the Field Calculator again for the
unionlayer. - Create a new field named
WT_POP. - Use Decimal number (real) as the output type.
- Use the expression:
"WEIGHT" * "POP2004"
- Click OK.
- Save your edits if QGIS prompts you to do so.
- Clear the selection.
Concept note:
WT_POPis the estimated population assigned to each child polygon after the area-based redistribution. This is the actual interpolated value you will summarize by watershed.Important caution: This method assumes population is spread evenly within each block group. That is rarely perfectly true in real life. Areal interpolation is an estimate, not a direct observation.
Part 8: Summarize Weighted Population by Watershed
Now aggregate the interpolated values to the watershed level.
- Open Vector > Group Stats.
- Use the
unionlayer as the input table if prompted. - Build the summary as follows:
- Rows:
MAJOR - Values:
WT_POP - Columns or statistic:
sum
- Rows:
- Click Calculate.
The result should be a grouped summary table showing the estimated total population for each major basin.
Concept note: This final step is where the interpolation becomes useful. Up to now, you have been preparing weighted pieces. Group Stats recombines those pieces by watershed so the results can be interpreted at the reporting geography you actually care about.
Placeholder image: Group Stats window showing
MAJORwatershed names in rows and the summedWT_POPvalues as the summary result.
Optional Export Step
If you want to save the table for later use:
- Select the results in Group Stats.
- Use the plugin's export or save option to write the table to CSV.
- Save it in your project folder with a clear name such as
ct_major_basins_weighted_population.csv.
Deliverable
Submit the following:
- a screenshot of the Group Stats results table showing the weighted population summarized by watershed
What You Should Understand After This Lab
By the end of the exercise, you should be able to explain:
- why areal interpolation is needed when two polygon systems do not align
- why the union step creates the geometry needed for weighting
- why
CH_AREA / P_AREAproduces the area share used in the estimate - why the final watershed population values are estimates rather than direct Census counts