How to hold an imaginary election
Many of Australia's political contests are closer than you'd think.
As part of building our ‘Contact Your MP’ widget, we wanted to include a way to check how close each seat was. We figured that’s a way to add value to you, as part of the contextual experience, when you’re writing to someone who represents you. If they’re in a really safe seat, you should know that — and maybe you can write to your Senators instead! If their seat is really close, you can tell them that!
But to be able to actually tell how close a seat was, we needed to model it. Because you can’t rely on the 2PP margin (for reasons we’ll go into later). So, we build a model. And this is how it works.
Our model
This Python script simulates our federal election process, focusing on something called the “tipping point”. That’s the number of votes that need to change to flip the election result.
How it works
Let’s break down how this simulator works:
- Data crunching:
The script starts by loading election data from a CSV file. This could be from past elections or simulated data.
def load_election_data(file_path):
election_data = defaultdict(lambda: defaultdict(lambda: defaultdict(float)))
candidates = set()
with open(file_path, 'r') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
# Process each row of the CSV file
# ...
- Preference flows:
Remember how your preferences flow to other candidates when your first choice is eliminated? The script models this using something called a “transfer matrix”.
transfer_matrix = np.zeros((len(candidates), len(candidates)))
for count in range(1, len(election_data)):
for i, from_candidate in enumerate(candidates):
total_transfer = sum(election_data[count][to_candidate]['Transfer Count'] for to_candidate in candidates)
if total_transfer > 0:
for j, to_candidate in enumerate(candidates):
transfer_matrix[i, j] = election_data[count][to_candidate]['Transfer Count'] / total_transfer
- Election simulation:
The script then simulates the whole preference distribution process, just like on election night.
def simulate_election(votes_tuple):
votes = list(votes_tuple)
total_votes = sum(votes)
majority = total_votes / 2 + 1
while len(votes) > 2:
if max(votes) >= majority:
return votes.index(max(votes))
min_index = votes.index(min(votes))
for i, transfer_percent in enumerate(transfer_matrix[min_index]):
if i != min_index:
votes[i] += votes[min_index] * transfer_percent
votes[min_index] = 0
- Finding the ‘tipping point’:
The script removes votes one by one and redistributes them, then basically does a fresh whole new vote count, for every single vote that changes. If we change one vote — and only one — and there’s no change to the result, then we remove another one. And so on, and so on. We keep doing that until we get a new election result. That’s our tipping point.
def find_tipping_point(election_data, candidates):
# ... (setup code)
votes_removed = 0
while True:
votes_removed += 1
current_votes = original_votes.copy()
current_votes[original_winner] -= votes_removed
# Reassign removed votes
reassignments = np.random.choice(len(candidates), votes_removed, p=primary_percentages)
for reassigned_to in reassignments:
if reassigned_to != original_winner:
current_votes[reassigned_to] += 1
new_winner = simulate_election(tuple(current_votes))
if new_winner != original_winner:
return votes_removed, margin
The theory behind it
The idea here is to understand how stable an election result is. In a tight race like we often see in marginal seats, a small shift in votes could change everything. This simulator helps us put a number on just how close an election is.
It’s based on the assumption that we can model voting patterns using data from past elections. The transfer matrix is key here – it captures how preferences typically flow between different candidates and parties.
Potential limitations
Now, no model is perfect, and this one has a few limitations to keep in mind:
- It doesn’t project forward: You can’t predict how votes will change in the future based on the preference flows in the past, necessarily. These things are somewhat sticky but they do change between elections and they really do depend on campaign effects. Also, there’s a good chance the identities of candidates and parties contesting each election will differ, so the distribution of preferences will be different.
- It’s got to be interpreted conservatively: The script relies on the assumption that the “shook loose” vote reduced from the winner’s 2PP pile goes to anybody other than that party, and it bases this on the distribution of non-winner votes. This is a potential issue as it does not actually account for individual voter preference: if the winner was Labor, there’s les of a chance that a vote removed from their pile would go to the Liberals, perhaps. And vice versa. The assumption we’re making is that this will all even out in the rub. Plus, while the script does account for seat-by-seat preference flows, it doesn’t model how changes in one seat might influence others. In reality, political trends often cross electorate boundaries.
- It doesn’t factor in strategic voting: Some voters might change their preferences if they knew how close the race was in their electorate. You can’t really say that it’s going to play out this way, and that’s okay.
There’s upsides to this approach
Despite these limitations, this simulator has some pretty cool benefits:
- It’s fast and can handle large datasets, thanks to some clever use of NumPy and caching.
- It gives us a concrete number (the tipping point) for how close an election is, which tells us more than just looking at the final two-party preferred margin. (This concrete number, too, is remarkably stable, considering the inherent uncertainty that comes with assigning votes on probability over thousands of iterations).
- It can be easily updated with new data, so we could use it for different elections or even hypothetical scenarios (so long as we’ve got a transfer matrix available, either based on historical seat-specific preference flows, or based on geographic trends in the absence of one).
- It models our actual preferential voting system pretty closely, including seat-by-seat preference flows, making it more accurate than simpler models.
- By using seat-specific preference flows, it captures the nuances of different electorates, recognizing that voting patterns can vary significantly across the country.
Let’s look at the part of the script that handles this seat-specific data:
def load_election_data(file_path):
election_data = defaultdict(lambda: defaultdict(lambda: defaultdict(float)))
candidates = set()
with open(file_path, 'r') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
division = row['DivisionNm'] em># This captures the specific electorate
candidate = (row['Surname'], row['GivenNm'], row['PartyAb'])
count_number = int(row['CountNumber'])
calculation_type = row['CalculationType']
calculation_value = float(row['CalculationValue'])
candidates.add(candidate)
election_data[division][count_number][candidate][calculation_type] = calculation_value
This function loads data for each division (electorate) separately, allowing the simulation to use seat-specific preference flows. This is a strength of the model, as it recognizes the diversity of voting patterns across different parts of Australia.
Why do any of this?
You might be wondering, “Why go through all this trouble? Can’t we just look at the two-party preferred (2PP) margin to see how close a seat is?”
Basically, no. While the 2PP margin is useful, it doesn’t tell the whole story in our preferential voting system. Here’s why:
The order of elimination matters: In preferential voting, the final 2PP result can mask how close the race really was. A seat might have a seemingly comfortable 65/35 2PP split, but it could have been incredibly tight in earlier preference distributions. For example, imagine a scenario like this:
First preferences: Labor: 34% Liberal: 33% Greens: 32% One Nation: 1%
Assuming One Nation’s preferences don’t flow to the Greens (which seems reasonable), but split 50/50 between Labor and Liberal (which seems less reasonable but it’s close enough), you get this:
Labor: 34.5% Liberal: 33.5% Greens: 32%
You need to eliminate the candidate with the fewest votes, because nobody’s over 50% yet, so you eliminate the Greens. Let’s say that 75% of the Greens preferences go to Labor, and 25% go to the Libs:
Final 2PP after Greens preferences: Labor: 58.5% Liberal: 41.5%
The final 2PP looks like a landslide, but it was actually a nail-biter! Just a small shift in first preferences could have changed the outcome entirely. There was a 1.5% difference in the Greens and the Liberals vote, but looking at that 2PP you’d say that’s a very safe Labor seat.
Minor party influence: The 2PP margin doesn’t capture the influence of minor parties and independents. In some seats, these candidates can significantly impact the result, even if they don’t win.
Strategic voting: Knowing only the 2PP doesn’t tell us about potential strategic voting. Voters might change their preferences if they knew how close the race was at each stage of counting.
This is where our simulator shines. By modeling the entire preference distribution process, it can identify these knife-edge scenarios that might look deceptively comfortable in the final 2PP figures.
Let’s look at how the script simulates this process:
def simulate_election(votes_tuple):
votes = list(votes_tuple)
total_votes = sum(votes)
majority = total_votes / 2 + 1
while len(votes) > 2:
if max(votes) >= majority:
return votes.index(max(votes))
min_index = votes.index(min(votes))
for i, transfer_percent in enumerate(transfer_matrix[min_index]):
if i != min_index:
votes[i] += votes[min_index] * transfer_percent
votes[min_index] = 0
This function goes through each round of preference distribution, capturing those crucial moments where the race could tip one way or the other. It’s this detailed simulation that allows us to find the true “tipping point” of an election, beyond what the final 2PP margin might suggest.
So there you have it! There’s probably quite a few uses for this, which we haven’t even thought about yet.
But if all it does is show how much closer some seats are than they appear, and how easy it would be for the winner to lose their seat, that’ll make a few pollies nervous. So that’s worth it just for that.
The Blog, Delivered
Subscribe to get the latest from Tam's team direct to your inbox.