Recursive Stochastic Counting: the Nesting Doll Method

Posted on May 6, 2025 8 mins

Table of Contents

Ok, I know I sound like a broken record, but I really really want to figure out a way to count the districtings of the 10 x 10 grid graph. And this problem feels within reach!

See here for context and a previous attempt at this.

Genealogy of this Idea.

The idea for this chain was partially inspired by section 14.4 of Levin & Peres [1], which itself attributes the original idea to Jerrum & Sinclair [2]. These publications gave a method to “approximately count” the number of proper \(q\)-colorings of a graph \(G\) of \(n\) vertices with maximal degree \(Δ\). They name \(𝒳\) the set of all proper \(q\)-colorings of \(G\), and they name the vertices of \(G\) as \( \lbrace v_1,…,v_n\rbrace\). Then, fixing a coloring \(x_0 \in 𝒳\), they define

\[𝒳_k = \lbrace x \in 𝒳 : x(v_j) = x_0(v_j) \text{ for } j>k\rbrace. \]

Plainly, \(𝒳_k\) is the set of proper colorings which agree with our priviledged coloring \(x_0\) on the last \(n-k\) vertices; in particular \(𝒳_0 = \lbrace x_0\rbrace\) and \(𝒳_n=𝒳\).

Their scheme is then to run Glauber dynamics chains to estimate each ratio \(|𝒳_{k-1}|/|𝒳_{k}|\), and multiply these estimates to get a numerical approximation of \(|𝒳_n|\):

\[|𝒳_n| = \frac{|𝒳_{n}|}{|𝒳_{n-1}|} \cdot \frac{|𝒳_{n-1}|}{|𝒳_{n-2}|}\cdots \frac{|𝒳_{1}|}{|𝒳_{0}|}.\]

Here are some ways that we’ll need to modify this chain to work on \(q\)-districtings rather than \(q\)-colorings:

Glauber dynamics don’t work as well for districtings; it technically exists (the so-called “flip chain”), but it has a prohibitively large mixing time, and it does not preserve strict population balance. We’ll fix this by using a Metropolis version of the Swap Step chain instead.
It’s not viable to fix the assignment of certain nodes for a districting, because the requirements of contiguity and population balance are too strict. We can fix this by instead using a metric on the set of districtings. We’ll fix \(x_0\) as before and calculate the fractional volume of two balls centered on \(x_0\), \(f = V_{i+1}/V_i\). If we call \(M_q\) the number of \(q\)-districtings of our graph \(G\), the resulting calculation is then:

\[M_q = \frac{M_q}{V_n} \cdot \frac{V_n}{V_{n-1}}\cdots \frac{V_2}{V_1}V_1.\]

A few more technical details.

2a. You’ll notice the trailing \(V_1\) at the end of this computation; this is because we’ll need to enumerate the size of our smallest metric ball manually (compare this to estimating \(|𝒳_{1}|/|𝒳_{0}|= |𝒳_{1}|\) in the original version of this chain; you could do this via probabilistic integration – just walking inside of \(𝒳_{1}\) and counting how many times you hit \(𝒳_0 = \lbrace x_0\rbrace\) – but it is a lot more sensible to directly enumerate \(𝒳_{1}\) if at all possible).

2b. Technically what we’ll actually be calculating is the expectation of each fractional volume \(f\), to account for the fact that these fractional volumes might look slighlty different depending on which starting point \(x_{0}\) you pick. So there should really be expectations symbols everywhere in my last computation.

I call this chain the “Nesting Doll” method, because I think of it as estimating the volume of a large babushka doll by first measuring the volume of the smallest doll, and then dividing it by the fractional volume of each doll within the next biggest one.

Implementation.

Recall the pencil metric – which I defined here – counts the distance between two districtings as the number of edges in the symmetric difference of their cut edge sets. This metric is similar but not equivalent to the graph metric on the set of districtings connected under swap steps, and crucially it is very cheap to compute.

To calculate our fractional volumes \(V_{i}/V_{i+1}\), the strategy is to take a Leashed Simple Random Walk in \(V_{i+1}\) using a swap step proposal, and then metropolize it to get the uniform distribution. Then we simply count the number of times the walk landed inside of \(V_{i}\), and divide by the number of steps in the walk; this is probabilistic integration.

A warning about this.

It turns out that these metric balls grow exponentially with respect to their radius – cf. here . This means most of their volume is contained near their boundary.

I mention this because it seems like this boundary contains a lot of districtings which are within radius \(r\) of the center, but are not reachable from the rest of the ball via swap steps. This leads to the Leashed SRW underestimating the volume of the large ball; compounding this error over multiple fractional volumes getting multiplied together, this led to my initial estimate of \(M\) undershooting by a factor of 4, which caused me quite a bit of despair.

The way I fixed this was by relaxing the leash on the SRW; allow the walk to go ~4 units farther away from the center \(x_0\), and divide the number of samples inside \(V_i\) by the number inside \(V_{i+1}\). The only problem with this relaxation is that it drastically increases the computational time of sampling one of these fractional volumes, from 2 to ~13 hours per ball on my machine. This is again because of the exponential growth of these balls; relaxing the radius even a little means the chain will waste a lot of time in the new boundary outside of \(V_{i+1}\).

My version of working this out in Python is in this repo , but I’ll go over the processes in case someone more technically savvy than I (a.k.a. Peter) wants to try their hand at it.

The Swap Step SRW + Leashed SRW.

You’ll want a way to enumerate all the swap step neighbors of a given districting \(x\). This is the computationally most expensive part of this chain, but if we’re Metropolizing things I don’t think there’s any way around it. This means:

Listing all legal swap steps from \(x\), and
Eliminating redundancies from this list if necessary (I found the pencil metric useful for this).

You might also want to make a leashed version of this process, where only neighbors within a given radius \(r\) of a priviledged center districting \(c\) are considered legal.

Metropolizing these SRWs.

Simple Random Walks want to spend too much time at higher degree nodes, so to target the uniform distribution we want to accept a move from \(x\) to \(y\) according to the function

\[a(x,y) = \min\left(1,\frac{\text{deg}(x)}{\text{deg}(y)}\right).\]

If we’re running continuous sampling to estimate the fractional volume of balls of radius \(r_1< r_2 \), the process is then:

Pick a center districting \(c\), by walking with the Metropolized SRW for a long time;
Pick a starting districting \(x\), by taking a walk with leash \(r_2 +4 \) for a long time;
Repeat the following steps until you’ve gathered sufficient samples (~200,000 is what I’ve been doing, although you can gather less samples for smaller radii):
- Pick a neighbor \(y\) uniformly at random from the leashed neighbors of \(x\) (the leash in question being \(r_2+4\)).
- With probability \(a(x,y)\), replace \(x\) with \(y\). Crucially, if the move is rejected, you act as though the walk had just sampled \(x\) again, and you forget about \(y\).
- Determine if \(x\) is within radius \(r_1\) or \(r_2\) of \(c\), and update the corresponding statistics.

Results and Discussion.

It’s still a bit early to say, but here’s the data I have so far for the 8x8 grid graph:

r_1	r_2	Values of f	Average
0	12	[0.00223]	0.00223
12	20	[0.00438]	0.00438
20	29	[0.00515, 0.00447, 0.00517, 0.00516, 0.0039]	0.00477
29	37	[0.01472, 0.0138, 0.014315, 0.014515, 0.014705, 0.01333]	0.01423
37	46	[0.026955, 0.030515, 0.040715, 0.033855]	0.03301
46	\(\infty\)	[0.1116, 0.1252, 0.128, 0.1274]	0.12305

This gives the estimate \(M \approx 371,383,124,984 \), which is about twice as large as the known value of \(M=187,497,290,034\). This overshot could just be because of the still-too-small number of datapoints for some of these values of \(f\) – if both of the first rows had averages that were overshooting by \(\sqrt{2}\), it would make up for this error – or it could be because of some systematic error in the chain. We would be looking for reasons why the leashed chain either overestimates the volume of the large ball or underestimates the volume of the small ball.

One possible source of error is the continuous sampling I’m using for these chains. As long as the number of samples gathered is much larger than the mixing time of the chain, this should not be an issue; but there’s currently no good way to estimate this mixing time, and Jeanne and I have very different opinions about its order of magnitude (Jeanne thinks it could at least order \(10^5\)), which might make this sample size problematic).

Another thing I need to re-do is enumerate the volume of balls of radius 12; the current estimate I’m using is from a previous project, which was using a completely different farmework to navigate the state space.

Still, I’m overall pretty happy with these results. I used to be undershooting by a factor of 4, and just relaxing the leash a little got me to this 2x overshot.

I’ll finish by saying that I’m optimistic that this method could work in principle; it was viable for proper colorings, and you can run toy versions of this nested ball process by randomly generating nodes on the unit square \([0,1]\times[0,1]\), where it performs quite well. So there’s every reason to believe that it could work here, if enough iterative improvements are made to the chain.

Bibliography.

Bibliography

Levin DA, Peres Y. Markov Chains and Mixing Times, second edition.

Jerrum M, Sinclair A. THE MARKOV CHAIN MONTE CARLO METHOD: AN APPROACH TO APPROXIMATE COUNTING AND INTEGRATION.