Week(ly quick)ies: Random walks

Introduction

Since my blog has been experiencing a shortage of content (mostly due to the fact that I am busy with my current research work), I decided to try out a new format that will hopefully motivate me to post more often. “Weekies” i.e. “weekly quickies” are going to be weekly posts of small puzzles that I come up with or come across on the web. I plan to follow up each “weeky” with a longer write-up of solution and discussion roughly within a month or so of the original post. In the meantime do not hesitate to post solutions and discuss the puzzle in the comments under the original post.

Random walks

Consider the K4 graph/tetrahedron depicted below. The following questions concern different random walks starting at the vertex A.

Q1. Assume that at any time step we will move to a random neighboring vertex with the equal probability of 1/3. What’s the probability that we will visit all vertices of the graph before returning to A?

Q2. Given the same walk as in question 1, what’s the probability that we will visit any one vertex out of {B, C, D} at least n times before returning to A?

Q3. Now, let’s change our walk to have a 1/2 probability of remaining at the current vertex and probabilities of 1/6 to move to each of the neighboring vertices. How do answers to questions 1 and 2 change for this walk?

Q4. With the walk described in question 3, what is the expected number of steps we need to take to visit all vertices at least once?

Q5. Let us record the sequence of visited vertices as a string of characters A, B, C, D. Consider a random walk in question 3 and record its first 10 vertices. What’s the probability that the string “ABCD” occurs as a subsequence in the recorded sequence?

Q6. We can generalize the walk described in question 3 using the parameter q ∈ [0, 1] by letting the probability of staying at the current vertex be q and probability of going to any given neighbor be (1-q)/3. Additionally, let the start vertex be picked uniformly at random. Let n be the length of the sequence of vertices that we record, analogously to question 5. Let Pq,n be the distribution on the Ω={A, B, C, D}n generated by the described process. Let Un be the uniform probability distribution on the Ω. Let D be the total variation distance function and let us denote d(q,n) = D(Pq,n, Un). Investigate behavior of the function d : [0, 1] ⨉ ℕ → [0, 1].

Monte Carlo your way out of a puzzle: Why events of measure 0 matter.

I was sitting in my office, a couple of weeks ago, minding my regular bioinformatics flavored business with a cup of oolong tea, when Senthil stopped by and told me that he has a puzzle he’d like some help with. Now, both me and Senthil enjoy math and we especially enjoy small nifty tricky problems. Thus, the puzzle he shared was the following.

Consider a unit grid on a plane. Pick a non-negative real number L. Now, we can place a random segment of length L in the plane by picking a point and an angle uniform at random and drawing a segment of length L originating at the chosen point and oriented based on the chosen angle. Which value of L maximizes the probability that this segment will intersect exactly one line of the grid?

We spent about 5 minutes chatting about different placements of the segment, probabilities, and our lack of desire to compute a nasty triple integrals. We had to wrap up quickly since, I was short on time and had to finish up some other tasks.

Last Thursday I was co-organizing our department’s happy hour, and Senthil reminded me of the outstanding problem. I made a pinky promise that I’ll help on Friday, which brings us to the story I am about to tell you.

Do not integrate, just simulate!

Since, I was not sold on the idea of computing integrals first thing on a Friday morning, I decided to quickly write up a simulation that would do exactly what the problem statement described. After about 20 minutes of typing away, I had my tiny C program up and running and spitting out probabilities with 6 sweet significant digits. Senthil came by quite soon, and within 40 minutes of back and forth of computer assisted linear search we were quite certain our answer for L was 1. Yay, problem solved, L = 1, probability is approximately 0.63774, and we can all go drink coffee and do whatever Computer Science PhD students do on a Friday. Well, no.

Of course we were not happy, I meant what is 0.63774? Have you heard of this number before? Does it have any meaning? Is this the number of goats you must sacrifice to Belial before you learn the great mysteries of the world? We took to OEIS and found (among other amazing things) that 0, 6, 3, 7, 7 is a subsequence in the decimal expansion of Mill’s constant, assuming the Riemann hypothesis is true. Other than that the number didn’t quite give us an immediate answer. We also checked the magic of WolframAlpha, which gave us some neat ideas, like (3 π)/(2 e^2). However, as you can guess reverse engineering this fraction was 1) stupid; 2) hard; and finally 3) something we didn’t do.

In the meantime, we decided to put our theory hats back on and figure out what is happening, and how we can get an analytical expression of this probability. Senthil recalled that in fact there is a very similar well-known question, called Buffon’s needle problem, which has an amazingly clean solution. However, we could not see how one can easily translate the setting of that problem to ours. After some amount of drawing, complaining about integrals and debating which events are measure zero, we decided to take a coffee break and walk outside for a bit. This is when the major theoretical breakthrough happened.

Senthil noticed that instead of thinking about the grid as a whole, we can consider two sets of parallel lines independently. The advantage of this view was clear, now we have the Buffon’s needle problem embedded into ours directly! We multiplied some probabilities, got wrong numbers and complained about futility of theoretical efforts. Then we of course reimagined how stuff should be multiplied and counted things again, and again, and again. Are the events independent? Do we count this in or subtract it out? Do we go back to the integral formula?

Clearly the two intersection events are not independent, since the orientation (based on angle) that works well for horizontal intersection, is not favorable for the vertical one. Thus we can’t just multiply stuff out and subtract from one. Thus, we went back to the integral. This is where things got tangled up. We recomputed the triple integral a few times, but something was not tying out. We computed integral of a product of integrals and something still didn’t tie out. We have been running in circles, seeing numbers like 0.202 and 0.464 all over the place. Finally, the original answer to Buffon’s needle, the 2/π, was temptingly close to our own, since it was 0.636619. Finally, around 4pm we threw in the towel and parted ways.

Simulation clearly saved the day, and even if we were not sure of the theory behind this number, we knew the answer with high confidence, and that was already a win. But…

Back to the drawing and integration

I took out a piece of paper, folded it in four, and started sketching. There is a nice video out there that explains how to derive solution to the Buffon’s needle problem. Thus, I just went through the steps, drawing out the circle, and thinking what are the exact conditions we need to satisfy to get our segment intersect exactly one grid line.

The shaded region above represents the range of admissible positions for the needle (black arrow) when the angle formed by needle and x-axis does not exceed 90 degrees. Note that we only consider intersection with the line A_0A_1, since the intersection with the line B_0B_1 is identical by symmetry.

Now, considering the angle of the needle to x-axis to be \theta\leq\frac{\pi}{2} and distances to horizontal and vertical lines to be y\phantom{\frac{1}{1}} and x\phantom{\frac{1}{1}} respectively, we can define the conditions for the admissible position as 0\leq\theta\leq\frac{\pi}{2},\,y\leq\frac{L}{2}\sin\theta,\,x\geq\frac{L}{2}\cos\theta. Thus, we can write down our desired integral, and evaluate it as \displaystyle\int_0^{\frac{\pi}{2}}\displaystyle\int_0^{\frac{L}{2}\sin\theta}\displaystyle\int_{\frac{L}{2}\cos\theta}^{\frac{1}{2}}dx\,dy\,d\theta=-\frac{1}{8}(L-2)L. Now, dividing through by the total area and multiplying by 2, due to symmetry for the intersection to the second line, we arrive at the coveted \frac{2}{\pi}\approx 0.636619. Case can be rested, we now have a neat theoretical solution to the question, and a beautiful mathematical expression as an answer. However, I am still unhappy, because I cannot ascribe the error we see in the simulation to pure numerical precision at this scale.

The devil is in the details: wording and measure zero events

One of the first things you learn in a probability theory class is that events of measure zero don’t matter. Picking a specific point on a line, or even picking a rational point on a line are measure zero events, hence they don’t really matter when you calculate probabilities. Nevertheless simulations are not only countable, they are finite! Which means accidentally dropping in an event of measure zero that shouldn’t be there can trip you up. This is exactly what happened in our case. We want to count the number of times a random segment intersects exactly one line of the grid. Now, I assumed this condition meant instead: segment intersects the grid at exactly one point! The difference shines for the points that are intersections of the two sets of parallel lines forming the grid, because they are a single grid point, but a segment crossing this point in fact intersects not one, but two lines!

I have adjusted my counting scheme to correctly reflect this edge case as intersecting two lines, and therefore, not being a favorable outcome. Guess what happened next… Exactly, the sweet shiny 0.6366313 popped out of my simulation as the predicted probability of the event. Of course, this number is closer to \frac{2}{\pi} than our previous candidate, and now all the doubts are gone, the theory and simulation agree!

Discussion and lessons learned

First of all, if my simulation runs in about 2 minutes (going through 500,000,000 random segments), why not run a longer simulation to counter the effect of nasty accidental event? Well, numbers have their limits on computers, and overflows happen. I tried running the simulation for 2,000,000,000 and 20,000,000,000 segments. The first run gave a slightly smaller number than the original 0.6377, but still not quite 0.6366. The second one, gave a whooping probability of -8.648, which even those not too familiar with measure theoretic probability would call out as absurd. Can I spend more time bugfixing this, and get a great simulator, that can scale to 1,000,000,000,000 segment trial? Yes. Is it reasonable to pour time into this? No!

Second, why not do the theory neatly from the start? At the end of the day the problem really takes about 30 minutes of careful drawing and setup and 3 minutes of WolframAlpha-ing the integral. Well, first this involves dissecting the problem theoretically which takes time. Second, writing up the simulator took 20 minutes, for contrast understanding how the Buffon’s needle problem is solved took about the same 15-20 minutes. The latter on its own didn’t give a solution, the former at least provided a good numerical estimate. Finally, recall how the problem is actually about finding the optimal length L? Well, that was quite a block to developing nice theoretical answer, once we were quite certain L is 1, we could actually focus on the theory that mattered.

Third, why not read the problem statement carefully and not jump to wild assumptions? Well, this one goes into lessons learned bucket, alongside the entire finiteness of simulations shenanigans. You would expect that years of problem solving would teach me that, and yet again, I make the same mistakes.

Overall I can say that this was an amazing exercise in basic probability, geometry and mocking up simulations to solve puzzle problems. I am happy to learn that writing a quick program can help in figuring out theory behind a question. Furthermore, I am also quite delighted to experience nuances of probability first hand, without putting important questions and results on the line.

Final notes and resources

Many thanks to Senthil Rajasekaran for bringing this problem to my attention and guiding me through the crucial steps on getting the theoretical answer in the scopes. Looking forward to more puzzles to solve!

My code for the simulator is available in this (zipped) file. Note that the random number generator was lifted from a course by Benoit Roux at UChicago, and the rest of the code is mine. The commented out portion of the intersect function is exactly what caused the annoying of by 0.0011 error.

Thanks for the attention, and happy bug catching y’all!