Explaining STV from Scratch
Table of Contents
This post is going to be less mathematical than usual for me – this is essentially how I explain Single Transferable Vote elections to people who have never heard about them before. There are many great explanations of STV already out there; two of my favorite are this series by CGP Grey, and this voter education video out of Portland.
My angle for this post is that I’m going to explain STV the same way as I would to my mother. Rather than just explain the mechanics of the rules, I’ll take my time to motivate everything, and explain why we would want to make elections so complicated in the first place, with plenty of examples along the way.
The least bad way to run a single-winner election.
Ok, let’s start at the beginning. Most people, when they imagine an election, imagine a paper ballot that everyone has to select a candidate’s name from, then we count (a.k.a. tally up) the number of paper ballots that went for every candidate, and whoever got the most votes wins the election. We call that a plurality election. It’s simple, it’s elegant, and it’s something that an elementary school class can wrap their heads around. It’s also perverse and deceptive – exactly how so is maybe a story for another time (it’s gerrymandering) – but the simplicity of plurality has the effect of making people wrongfully believe that all democracies are born equal. Surely, as long as everyone got to cast a vote, and they were tallied correctly, we live in a democracy?
Let’s talk about spoiler candidates. Pick any American presidential election you’re familiar with; it doesn’t matter which because it will necessarily have a candidate that might have spoiled the election. Two notable examples include Teddy Roosevelt/Howard Taft splitting the conservative vote in 1912 (giving Woodrow Wilson an easy win), or Ross Perot running as an independent in 1992 against George H. W. Bush and Bill Clinton (both of whom he appears to have undermined in equal measure). I’ll use the 2016 election between Hillary Clinton and Donald Trump, with Jill Stein running for the Green Party as a possible spoiler for Clinton – although factually Stein’s bid had no impact on the outcome of the race – because it’s an election I’m familiar with; it was the first one that happened after I moved to the USA.
If you voted in an election like this one, you might have had to make an unpleasant choice: to vote either ideologically or strategically. In 2016, for example, many supporters of Jill Stein’s platform ended up voting for Clinton, because of the perception that Stein stood no serious chance of winning, and that voting for her was paramount to ’throwing a vote away.’ This has a pretty toxic effect on American democracy – it makes it so third parties have to overcome much more adversity in order to be viable, and as a result many elections are decided at partisan primaries. But surely we could come up with a solution where Green Party voters could still indicate their support for Stein, while also making sure that their vote ‘falls back’ to Clinton if the race comes down to Clinton vs. Trump?
Some countries – including my native France – use a two-round system for these high-profile elections (and we are very very proud of it, and similarly we never conceive that democracy could look any different). We run the first round of the election in exactly the same way as the USA, except we use it to determine the two candidates with the most support, rather than a single winner (unless a candidate has >50% support, in which case they just win – but the French electorate would never be so straightforward). Two weeks later, we ask people to vote again in a runoff, picking one out of the two remaining candidates, and the winner of that round is our president. This does seem to help with ideological diversity; currently, France has around 5 viable mainstream parties that receive non-trivial amounts of support in each election (and the most popular parties frequently change). It would would also fix our Jill Stein dilemma; in our 2016 example the second round would have probably narrowed down the race to Clinton vs. Trump, but voters could still have safely cast ideological ballots for Stein in the first round.
But ok, you could still imagine situations where this system could suffer from spoiler candidates, if there are many parties, and one of the major parties splits its ticket. In our 2016 example, if the Green Party somehow had 45% of the support, but ran two equally popular candidates receiving 22.5% of the vote, and both the other parties ran a single candidate receiving 27.5% of the vote, then neither Green candidate would make it to the runoff.
This isn’t just a hypothetical problem, either. In both the 1969 and the 2002 French presidentials, the political left failed to get a candidate into the runoff, because they had too many candidates competing on similar platforms. In the 2002 case, this opened the door for far-right candidate Jean-Marie Le Pen to make it to the runoff for the first time – this was quite shocking back then – where he was handily defeated by the ‘mainstream’ Jacques Chirac, who might have had to put on a more serious campaign had he faced off against a more progressive candidate (although, just making it to the runoff already did a lot of work to legitimize Le Pen’s at the time taboo ideology to the mainstream).
This also happens in the USA, where California is notable for its ‘jungle primary’ top-two runoff elections, some of which have gone similarly astray. For example, the CA SD-4 state senate election in 2022 was considered a safely republican-leaning race – around 60% of the electorate was republican – but no republican candidate made it to the runoff because too many candidates split the ticket:
| Tim Robertson | Marie Alvarado-Gil | George Radanovich | Steven C. Bailey | Jeff McKay | Jack Griffith | Michael Gordon | Jolene Rehana Daly | |
|---|---|---|---|---|---|---|---|---|
| Party | DEM | DEM | REP | REP | REP | REP | REP | REP |
| Vote share | 22.1% | 18.7% | 17.1% | 16.8% | 15.7% | 4.7% | 2.8% | 2.1% |
One solution to this problem is to setup the election with as many rounds as there are candidates. Every round, you eliminate exactly one candidate – the person with the least votes – until a candidate has more than 50% of the vote. In the SD-4 example, this would have led to the elimination of several Republican candidates one-by-one, and their voters would have probably added their weight to another Republican’s pile, until eventually one of them would have overcome the two Democrats.
But ok, this would get old pretty quickly; you’d be asked to go back to the polling booth 6 times for every election, and you’ve got places to be. So, rather than have you vote once in every round, we might have you fill out a single ranked-choice vote, where you rank every candidate in the race from most to least favorite. This way, whoever we eliminate from the race, we can still know who your favorite candidate is among the remaining (so-called ‘hopeful’) candidates. Everyone casts a single (slightlty longer) vote, and we can use them to run every elimination round of our election in a single go. We’ve just invented Instant Runoff Voting, or IRV; this is what New York City uses for its mayoral race, for example.
The least bad way to run a multi-winner election.
The thing is, no matter how much lipstick you put on this pig, there’s still no saving how bad single-winner winner elections are. Whether you are using plurality, top-two runoff, or IRV, the bottom line is still the same: if you control 50+$\varepsilon$% of the vote, you win, and that means that up to 50% of the electorate can be disenfranchised. This can lead to polarization – why appeal to everyone if you can win easier by othering half of the electorate – and it also makes single-winner systems vulnerable to gerrymandering. With the wonders of modern data science, we can make pretty good models of how geographic areas are going to vote on average, which means that linedrawers can often decide the election before it ever happens: just pack your some of your opponent’s supporters into districts where they’ll have overwhelming support, and crack the rest of their supporters into district where they hover right below a majority. And even IRV suffers from this vulnerability to gerrymandering (see for example this 2022 Alaska supreme court case about their State Senate district K, which uses IRV).
Some elections, like presidentials, gubernatorials, and mayorals, must inherently be single-winner elections, and there’s nothing we can do there; but thankfully there is usually no line-drawing process for these elections, since they are run in fixed geographical units with historical significance. But other elections, like congressionals, parliamentaries, city councilors, and school board councils, have no reason to be inherently single-winner, and in practice we do see these elections being gerrymandered all the time. This seems to throw a wrench into the ’everyone gets to vote, so it’s a democracy’ idealogy – sure, everyone got to vote, but in practice the election was decided by the lawmakers who drew the map.
For political bodies with many seats, some of which come from adjacent geographical units, it can make sense to merge several adjacent single-winner districts together into a bigger multi-winner district. This doesn’t inherently fix any problems on its own (there are still wrong ways of doing multi-winner elections), but it opens up the door for some more proportional electoral systems.
Here’s a cautionary tale to show that mult-winner elections aren’t inherently better if the election rule you use isn’t chosen well. Many city councils – including my current one in Boulder – use a multi-winner system where each voter casts as many votes as there are seats, and the winners are still decided using simple plurality. This might seem innocent enough, but it can get pretty pathological. Boulder’s census data says the city is about 77% white & non-hispanic (this doesn’t correlate exactly with registered voter demographics, but let’s work with that number for the sake of argument). Every two years, four of our councilor seats are up for election. If that whole 77% bloc of voters voted for the same four candidates in every election, those four candidates would systematically win, and the remaining 23% of Boulder would never have an opportunity for representation – despite seamingly ‘deserving’ about one seat’s worth of representation. If you asked me to design the worst possible multi-winner election system, I would have a pretty hard time doing worst than this.A word about multiwinner plurality block voting.
The typical example is the UK parliament: each of their members of parliament comes from a single-winner district using a plurality rule. This gives a so-called winner bonus to the nationally most popular party; basically, if a party gets 55% of the vote, but they do that in every single-member district across the country, then they would win 100% of the seats. This is how Labour ended up with twice as many seats proportionally as they got votes in 2024 (although the Conservatives were pulling the same magic trick for years before them). It’s also how Reform UK only won 5 out of 650 seats (<1% of the seats), despite earning 14.3% of the votes – those votes were spread out across many single-winner districts, and rarely represented a plurality in any given race.
| Labour | Conservative | Reform UK | Liberal Democrat | Green | SNP | Plaid Cymru | |
|---|---|---|---|---|---|---|---|
| UK vote share | 33.7% | 23.7% | 14.3% | 12.2% | 6.4% | 2.5% | 0.7% |
| Seat share | 63.2% | 18.6% | 0.8% | 11.1% | 0.6% | 1.4% | 0.6% |
| Seat/vote ratio | 1.88× | 0.79× | 0.05× | 0.91× | 0.10× | 0.55× | 0.88× |
If instead of using single-member districts, the UK combined 3-5 adjacent parliamentary districts and ran their elections with 3-5 winners, then this winner bonus would be a lot less pronounced, and parties like Reform UK would stand much more of a chance: as we will see below, in a 5 seat contest, it would only take 16.7% of the vote per district for them to be guaranteed a seat. Labour would still be the most represented party – winners are still winners – but we invite other parties to the dinner table as well. We can see this illustrated in Australia’s senate elections, which use such a multi-member proportional election system. If I sketch out what the same table as above looks like in Australia’s case, we get something like this:
| Labor | Coalition* | Greens | One Nation | David Pocock | Jacqui Lambie Network | |
|---|---|---|---|---|---|---|
| National vote share | 35.11% | 29.89% | 11.72% | 5.67% | 0.72% | 1.05% |
| Seat share (of 40) | 40.0% | 32.5% | 15.0% | 7.5% | 2.5% | 2.5% |
| Seat/vote ratio | 1.14× | 1.09× | 1.28× | 1.32× | 3.47× | 2.38× |
Disclaimer: measuring proportionality for Ranked Choice Voting Elections is notoriously difficult – even now there are still academic papers being written about the correct way to do this. What I did is very basic, and fails to capture a lot of subtleties of STV. Basically, I just used the first-place vote of each ballot to flag which party bloc they belonged to. So, for instance, if a voter ranked a Animal Justice Party candidate first, and a Labor candidate second, they would not have been counted as a member of the Labor bloc – which is questionable. But this gets us numbers. One effect of this methodology is that many voters get sorted into minor parties, which is why the numbers in the ‘National vote share’ row don’t add up to 100%, and why every party seems to be over-represented. There are around 15.84% of voters that get sorted into one of these parties that won no seats, the largest of which was Legalise Cannabis Australia, with 3.49% of the electorate. But as mentioned above, some of these voters may have been casting an ideological vote, with the knowledge that their vote would ‘fall back’ onto one of the major parties they listed afterwards. Australia has many other quirks. For instance, each of their states (not including the Northern Territory and the Australian Capital Territory) elect the same number of senators; but these states have very different numbers of voters! Because of this, a vote in Tasmania is worth 15 votes in Sydney, or 12 votes in Melbourne. This is how David Pocock manages to get a seat in the senate despite his comparatively low national vote share – he comes from a territory which is proportionally overrepresented, not because of the election rule, but because of the apportionment of seats. Despite this quirkiness, the numbers still manage to look pretty proportional! *Finally the “Coalition” party is Australia’s center-right party; they are primarily made up of the Liberal Party of Australia and the National Party of Australia.Many footnotes about this table.
When your election has two winners instead of one, then the least amount of votes a candidate can get before they are sure to win a seat (in a plurality setting) is 33+$\varepsilon$% of the vote; in other words, it would not be possible for three candidates to get 334 votes in an election with 1000 voters. For three seats, that number goes down to 25% of the vote, for four seats it’s 20%, etc. We call this critical number of votes a candidate must get before they are certain to win a seat the quota for the election, and it is most common to use the so-called Droop quota, which for an election with $N$ voters and $m$ seats is given by
$$ q = \text{floor}\left(\frac{N}{m+1}\right)+1$$To make this more familiar, below are the first few values of the Droop quota for an election with 1000 voters. To completely lock out a group from representation in the multi-winner setting, you have to make sure they control less than that number of votes – which can be quite hard in practice.
| m | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|
| Droop quota | 501 | 334 | 251 | 201 | 167 | 143 | 126 |
But ok, if we just ran these multi-winner elections with a straight-up plurality rule, we would see some frustrating spoiler-type effects again; imagine a three-seat election where a party commands a full 70% of the vote, but one of their candidates is way more charismatic than the others, so the 70% is split as 60% + 10%, and only one of their candidates get elected, because the other party did a good job of splitting their 30% of the electorate evenly among their two candidates.
One solution is to ask voters to fill out ranked choice ballots again – rank every candidate from most to least favorite – and find a way for winners to ’transfer’ their surplus votes to the next preferences of their voters. In our previous example, if there were 1000 voters, maybe the vote breaks down like this:
| Full Ranking on Ballot | (A,B) | (A,C) | (B,A) | (C,D) | (D,C) | ||
|---|---|---|---|---|---|---|---|
| Number of times cast | 500 | 100 | 100 | 150 | 150 |
It’s clear in this election that candidate A deserves a seat; she has 600 votes initially, a lot more than the quota of 251. After we elect her, we need to decide how to transfer her surplus of 349 votes. Most of her 600 votes (500/600) listed B as their next preference, but some (100/600) listed C. We can’t transfer all of A’s votes to their next preferences – at least 251 votes should be ‘spent’ to elect her, otherwise we would be back in proportionality hell where the dominant party steamrolls the others – so which ones of the 600 votes should we allow to go to their next preferences?
One answer to that question is to just pick 349 votes at random from candidate A’s pile and transfer only those. This is the solution that was used for most of the 20th century, when these elections had to be tabulated by hand by picking up physical ballots from the winner’s pile and moving them to their next preference’s pile; and it was still in use in Cambridge, Massachusetts until 2025, when the city voted to move to a modern variant of this process.
But there’s nothing fundamentally wrong with random transfers; if we used that method, on average, we would see 58 of the 349 surplus votes transfer to C, and the remaining 291 would go to B (on average). In this case, it is impossible for the randomness of the transfer to change the outcome of the election, because the ’least probable outcome’ would be for all 100 of the votes listing C to transfer, and even in this case the 249 votes transfered to B would be enough to get her quota. And even in elections where the randomness could change the outcome, the law of large numbers tells us that, on average, there are some outcomes that we should expect more than others (except in smaller elections with even smaller transfers, which has occasionally been Cambridge’s position). But ok, random transfers do make the election a lot more difficult to tabulate (a lot has to go into making sure the source of randomness isn’t biased), and even more difficult to independently verify or audit; so we might want a more sophisticated solution.
One such solution is to transfer all of A’s 600 votes, but with a decreased weight: instead of counting for one full vote, they now count for a fraction of a vote. This seems like the fairest solution because, after all, all 600 of those votes contributed to A’s quota of 251, so all of them should be spent in equal part to fulfill that quota.
In our example, if we transfered all of the votes onwards with a new weight of $t = 349/600$, we would make sure that, on the whole, the equivalent of 251 votes have disappeared from the election. There would remain 400 votes with their full weight of 1, and another 600 votes with a weight of $349/600$ each, representing the equivalent of 349 votes. So in total, there is the equivalent of 749 = 1000-251 votes left in the election; it’s just the way we get to that number is by adding up a bunch of fractional votes.
We can extrapolate the general pattern from this example: if a winner gets quota $q$ with a surplus of $s$ votes, her “transfer value” $t$ is calculated via $t=s/(q+s)$. We multiply the weight of each of the votes that got her quota by this transfer value $t$, and then transfer them onwards to their next preference. We’ve just invented (a version of) STV!
In our specific example, candidate A would transfer a weight of $500\cdot349/600\approx 290.83$ votes from her pile to candidate B, and $100\cdot349/600\approx58.17$ of her surplus votes would go to candidate C; none go to D, because nobody who voted for A listed D as their next preference. As a result of this transfer, candidate B gets quota in the second round of the election, and in the last round, neither C or D have quota, but C has more votes than D (because of the votes she received from A), so we eliminate D and elect C – this seems fair! Notice, by the way, how similar the transfers from A’s election were to the average transfers we would get if we used random transfers – that’s the law of large numbers putting in some work!
There are other proportional representation methods than STV, but there are several really nice things about STV:
- It’s apartisan: STV manages to achieve proportionality without needing to know which candidates come from which party. This is how Australia is able to see a non-trivial number of “independent” candidates – like David Pocock – still have a chance at winning a seat, compared to systems like Germany’s party list elections. This is also helpful for smaller elections, like city councils, where candidates don’t necessarily run on a partisan platform. This is not to say that STV is incompatible with partisan systems! Australia is still very politically partisan – they are notorious for their use of “Above the Line Voting,” which allows voters just to vote for parties instead of candidates, and the parties then decide how to fill out these above-the-line votes.
- It’s naturally insulated to gerrymandering: as I mentioned previously, it’s a lot harder to marginalize a group when the quota $q$ they need to get represented is 25% or lower. It’s often not possible to draw geographic units with less than this quota’s worth of a group, and it’s also difficult to make predictions with that degree of precision from the data we have.
- It favors sincere voting: even though it is famously impossible for any election rule to be strategy-proof, strategic voting in STV is very difficult to figure out in a low-information setting, and voters are often better off just filling out their honest preferences. The cases where you are better off filling out your ballot differently than your actual preferences are rare, and usually hard to predict – certainly harder to predict than in plurality.
- It punishes polarizing candidates: compared to single-winner systems, where it can be viable to alienate up to 50% of the electorate, candidates with broad appeal are more competitive in STV. If a candidate can earn the favor of a second ideological bloc beyond their own, just getting those voters to rank them second can make a difference to the election outcome, since they can receive transfers after that bloc gets their own candidate(s) elected.
Running an STV Election Yourself
[WIP]