Skip to main content

Using Probability to Make Difficult Limits Obvious

05 Jan 2022 - Tags: quick-analysis-tricks

Hands down one of my worst subjects is probability, which is a shame because it’s the source of a lot of powerful arguments, both formal and heuristic. I remember struggling with the analysis of randomized algorithms, and the closely related probabilistic method in combinatorics was no easier. Of course, the more you work with these things the easier they become, and I’m starting to feel a bit better about the basics at least1. One pattern that I’ve seen a few times recently was so slick that I wanted to write something up about it in my first “quick analysis tricks” post in a little while!

For the uninitiated, “quick analysis tricks” is a series I do where I state analysis facts that are probably super obvious to anyone who knows them, despite these facts not being obvious to me. It’s easy to forget what is and isn’t obvious to newcomers in an area, and oftentimes these little “obvious” facts aren’t actually written down anywhere. You’re supposed to just pick them up over time. I know seeing some of these written out would have helped me, so I’m writing them up for the students of the future (and also in a desparate bid to try and remember them myself :P).

Now, let’s take a look at the mse question that inspired this post2:

Say $f : \mathbb{R} \to \mathbb{R}$ is continuous and bounded.

Prove that for every $x \in (0, \infty)$ we have

\(\displaystyle \lim_{n \to \infty} e^{-nx} \sum_{k=0}^\infty \frac{(nx)^k}{k!} f \left ( \frac{k}{n} \right ) = f(x)\)

All I have to say is

If you’re anything like me, your first question upon seeing this is something of the form “How do people come up with this stuff3?”.

Your next question is probably “How can we tackle this?” For me, at least, it’s entirely opaque how we might proceed.

As you might have guessed from the title of this post, the answer to both of these questions lies in probability theory. So let’s take a brief detour into the world of Poisson Distributions.

We know that there’s no uniform distribution on $\mathbb{N}$ (which is a cute exercise if you’re not familiar with this theorem), but obviously we want to be able to randomly choose some $n \in \mathbb{N}$. The solution is to weight higher numbers as less likely, with the probabilities decaying quickly enough for the sum to converge. One of the most obvious ways to do this is with the Poisson Distribution, where we say

\[\mathbb{P}(X = k) = e^{-1} \frac{1}{k!}\]

The $e^{-1}$ out front is chosen so that $\sum_k \mathbb{P}(X = k) = 1$. It turns out that a useful generalization4 of this obvious distribution is to add a parameter $\lambda > 0$:

\[\mathbb{P}(X = k) = e^{-\lambda} \frac{\lambda^k}{k!}\]

Now we need a quick fact about poisson distributions. Namely:

Say $X_1, \ldots, X_n$ are independent and generated by poisson distributions with parameters $\lambda_1, \ldots, \lambda_n$.

Then $X_1 + \ldots + X_n$ is also poisson, with parameter $\lambda_1 + \ldots + \lambda_n$.

If you’ve not seen this before, it’s a (fun?) exercise, and you really just have to follow your nose.

So now here’s the idea. It’s intuitively clear that if we keep picking random $X_k$s from the same distribution, we would expect the average $\frac{1}{n} \sum_{k \leq n} X_k$ to approach $\mathbb{E}[X_k]$. This is basically the law of large numbers, and now we’re ready to see how someone might have come up with the theorem we’re proving:

The clever trick is to choose our point of interest $x$ to be the parameter in a poisson variable! Then by the law of large numbers5, we’ll know that, if the $X_k$ are iid poisson with parameter $x$:

\[\frac{1}{n} \sum_{k \leq n} X_k \to \mathbb{E}[X_k] = x\]

So, intuitively then, we would expect

\[f \left ( \frac{1}{n} \sum_{k \leq n} X_k \right ) \to f(x).\]

In fact, if we take expectations of both sides6, we can make this dream a reality!

Since the $X_k$ are all poisson with parameter $x$, the sum $S_n = \sum_{k \leq n} X_k$ is poisson with parameter $nx$. That is

\[\begin{align} \underset{X_k \sim \text{Poisson}(x)}{\mathbb{E}} \left [ f \left ( \frac{1}{n} \sum_{k \leq n} X_k \right ) \right ] &= \underset{S_n \sim \text{Poisson}(nx)}{\mathbb{E}} \left [ f \left ( \frac{S_n}{n} \right ) \right ] \\ &= \sum_{j = 1}^\infty f \left ( \frac{j}{n} \right ) \mathbb{P}(S_n = j) \\ &= \sum_{j = 1}^\infty f \left ( \frac{j}{n} \right ) e^{-nx} \frac{(nx)^j}{j!} \end{align}\]

and now some measure theoretic jiggery-pokery7 shows that this indeed converges to $f(x)$ as $n \to \infty$.

Which is exactly what we set out to show!

Let’s see another example in the same vein which I learned years ago8:

Recall the Stone-Weierstrass Theorem says that any continuous function on $[0,1]$ can be (uniformly) approximated by polynomials9.

Wouldn’t it be nice to know how to find these polynomials explicitly? Well probability comes to the rescue again!

Let $f : [0,1] \to \mathbb{R}$ be a continuous function. Then as $n \to \infty$, the polynomials

\[p_n(x) \triangleq \sum_{k=0}^n \binom{n}{k} x^k (1-x)^{n-k} f \left ( \frac{k}{n} \right )\]

converge to $f(x)$.

Notice the similarities between this limit and the previous one. Both are limits of the form

\[\displaystyle \lim_{n \to \infty} \sum_k \text{gross stuff}(x) \ f \left ( \frac{k}{n} \right ) \to f(x)\]

where the “gross stuff” looks like some kind of probability distribution. In the previous example it was poisson, and now it’s binomial10.

We ask why $p_n \to f$, and in the context of this post the idea should be clear!

If $X_n$ is binomially distributed (over $n$ trials) with parameter $x$, then

\[\mathbb{E}[X_n] \to nx\]

This is saying that, given a coin which lands heads with probability $x$, if we flip it $n$ times (for large values of $n$) we expect $nx$ many heads (which is entirely obvious).

Again, we now look at $f \left ( \frac{X_n}{n} \right )$, which we expect to converge to $f \left ( \frac{nx}{n} \right ) = f(x)$. Indeed, we find

\[\begin{align} \underset{X_n \sim \text{Binomial}(x,n)}{\mathbb{E}} \left [ f \left ( \frac{X_n}{n} \right ) \right ] &= \sum_{j=0}^n f \left ( \frac{j}{n} \right ) \mathbb{P}(X_n = j) \\ &= \sum_{j=0}^n f \left ( \frac{j}{n} \right ) \binom{n}{j} x^j (1-x)^{n-j} \end{align}\]

And again, I’ll leave out the finer points of the derivation, where we actually show this converges to $f(x)$. This is a quick application of Chebyshev’s Inequality (which, you’ll notice, can be made uniform in $x$ since the standard deviation of $X_n$ is bounded)11.

Next time you’re faced with a tricky limit that looks like some kind of “weighted average” of points of $f$ in order to approximate $f(x)$, try viewing it as an expectation with respect to some probability distribution! Then the law of large numbers will basically hand you the desired claim ^_^.

Of course, this is only one way that probability can show up in unexpected places. There’s a whole mathoverflow thread dedicated to this, which might be worth browsing through if you’re looking for other ones!

Now, though, I’m off to bed. See you soon!

  1. Next on my list is martingales. I’ve been told these really aren’t that complicated, but there’s so much machinery involved! Once I sort it out I’m definitely going to have to write a post which is basically a “users manual” for martingale techniques in, say, combinatorics, that doesn’t get bogged down in filtrations of $\sigma$-algebras. 

  2. Because of course this was inspired by an mse question 

  3. Though if you’re anything like me, you probably didn’t use the word “stuff” 

  4. I’m not going to go into why this is a useful distribution here, but a quick answer is this:

    If something happens with low probability, but we try a lot of times, how many times do we expect it to occur? The answer turns out to be (approximately) poisson.

    You can read more about this in Rozanov’s cute book Probability Theory: A Concise Course. This book really is concise, but unlike May’s book on algebraic topology, this one is a fairly easy read! I actually read the whole thing on my plane ride from NY back to California, with time to do a smattering of exercises too. 

  5. I guess we technically need to know that the expected value of a poisson distribution is the parameter, but that’s on the wikipedia page. It’s also another quick exercise if you prefer. 

  6. Why do we need to take expectations? Remember random variables are secretly functions from some probability space $\Omega$. So when we say

    \[\frac{1}{n} \sum X_k \to x\]

    we’re really saying these functions converge (in one of any number of senses) to the constant function $x$.

    Then we compose these functions with $f$ to see that the functions

    \[f \left ( \frac{1}{n} \sum X_k \right ) = f \circ \left ( \frac{1}{n} \sum X_k(\omega) \right )\]

    converge to the constant function $\omega \mapsto f(x)$.

    In order to get an actual number out of this constant function, we need to take expectations of both sides. 

  7. I’m not going to show the jiggery-pokery here. It’s entirely routine (though I guess saying that is against the spirit of quick analysis tricks…) and I don’t want to spend too long on this post.

    I’m especially inclined to not write it up, because I might write a follow-up post all about the nitty-gritty details of the derivation.

    Whenever I see a result like this, I want to know how fast the convergence is. I think with a smidge of effort I can get error bounds for this limit, but I don’t feel like doing it right now (after all, these are supposed to be quick analysis tricks!). With that in mind, I’ll probably make a post soon where I go through this derivation in detail and try to keep track of error bounds. 

  8. In Po-Shen Loh’s putnam seminar, maybe? 

  9. In fact there’s more abstract versions that say that any continuous function can be well approximated by elements of any family of functions which doesn’t obviously avoid some continuous functions.

    We won’t need that extra strength here, though. 

  10. It’s not lost on me that the poisson distribution is in some sense the limit of binomial distributions as $n \to \infty$ (and $p_n \to 0$ “fast enough”). I would love to know if there’s some relationship between these two examples, because it feels like there should be. I just can’t seem to make it precise. 

  11. I probably won’t be working this out in full in a follow-up blog post, so I’ll point you towards the (short and eminently readable) paper A Probabilistic Proof of the Weierstrass Approximation Theorem by Levasseur.