Quantitative Cesaro Sums

18 Nov 2020 - Tags: quick-analysis-tricks

To the surprise of no one, I was on math stackexchange earlier and saw an interesting analysis question. I have a weird fascination with tricky limit questions because I feel like I’ve always been bad at them. I like working on them for the same reason I like practicing the difficult parts of pieces of music – it makes me feel like I’m improving (in a “no pain no gain” kind of way).

Anyways, the question was as follows (paraphrased):

Compute the following limit

$lim_{n \to \infty} (1 + \sqrt{2} + \sqrt[3]{3} + \dots + \sqrt[n]{n}) \ln (\frac{2 n + 1}{2 n})$

The beginning of the solution makes sense: We want to approximate the $\ln$ , so we rewrite it as $\ln (1 + \frac{1}{2 n})$ and approximate that¹ as $\frac{1}{2 n} - O (\frac{1}{n^{2}})$ .

Then (rewriting as a summation as well) we get

lim_{n \to \infty} (\sum_{k \leq n} k^{\frac{1}{k}}) (\frac{1}{2 n} - O (\frac{1}{n^{2}}))

Now here’s the clever idea: We use Cesàro Averages backwards! This is mentioned in a comment on the original mse question, and it’s a trick I’ll absolutely have to remember!

To explain what I mean, let’s take a quick detour into the world of Cesàro Averages:

If $(x_{n})$ is a sequence, then the Cesàro Average $a_{k}$ is the average of the first $k$ terms:

$a_{k} = \frac{1}{k} \sum_{j \leq k} x_{j}$

People care about the Cesàro averages because it’s possible for the Cesàro averages to converge even if the original series doesn’t. Moreover, the notion of “averaging” in this way comes up very naturally in Fourier Analysis (see here) and Ergodic Theory (see here).

The fundamental theorem in this area is this:

Whenever $x_{n}$ is already convergent, the sequence of Cesàro averages converges to the same limit.

So the notion of Cesàro convergence is a true generalization of the original notion of convergence. It allows us to evaluate certain limits that used to be divergent, but it doesn’t mess up any limits that already converged.

And now we get to the application here:

We recognize $\frac{1}{n} \sum_{k \leq n} k^{\frac{1}{k}}$ as a Cesàro average of the sequence $k^{\frac{1}{k}}$ . Since we know $k^{\frac{1}{k}} \to 1$ , we conclude the same is true of the Cesàro averages. So

\begin{aligned} lim_{n \to \infty} (\sum_{k \leq n} k^{\frac{1}{k}}) (\frac{1}{2 n} - O (\frac{1}{n^{2}})) & = lim_{n \to \infty} \frac{\sum_{k \leq n} k^{\frac{1}{k}}}{n} (\frac{1}{2} - O (\frac{1}{n})) \\ = lim_{n \to \infty} 1 (\frac{1}{2} - O (\frac{1}{n})) \\ \to \frac{1}{2} \end{aligned}

So we did properly figure out what the limit should be… But I did a little bit of sleight of hand in the above manipulation. Did you catch it?

As people interested in computer science, we have to be slightly pickier than a lot of mathematicians when it comes to computing limits. It’s often important exactly how quickly we get convergence², and so we kept track of the big-Oh for the whole problem.

But above we just substituted $n^{\frac{1}{n}} \to 1$ . If we’re careful, it’s not too hard to get error bounds for the convergence of this series… But who’s to say what the error bounds will be for the Cesàro averages? These averages keep track of terms earlier in the series. It’s reasonable to worry that the convergence might be slower, and this worry turns out to be legitimate:

Say $x_{n} = L \pm O (e (n))$ . That is, $x_{n}$ converges to a limit $L$ , and we can bound the error by $O (e)$ . Then

\begin{aligned} \frac{\sum_{k \leq n} x_{k}}{n} & = \frac{\sum_{k \leq n} L \pm O (e (k))}{n} \\ = \frac{n L \pm O (\sum_{k \leq n} e (k))}{n} \\ = L \pm O (\frac{1}{n} \sum_{k \leq n} e (k)) \end{aligned}

So the error in the Cesàro averages is the average of the errors. Let’s see how this comes up in our analysis of this particular problem.

First, we need to know the error bounds on $n^{\frac{1}{n}}$ . This isn’t too hard to figure out:

n^{\frac{1}{n}} = e^{\frac{1}{n} \ln (n)} = 1 + O (\frac{\ln (n)}{n})

Now we can find the error of the Cesàro averages:

\frac{1}{n} \sum_{k \leq n} k^{\frac{1}{k}} = \frac{1}{n} \sum_{k \leq n} (1 + O (\frac{\ln (n)}{n})) = 1 + O (\frac{1}{n} \sum_{k \leq n} \frac{\ln (k)}{k})

While this is technically true, I would argue it isn’t useful yet. We can clean it up.

The important observation is that, for $k \geq 4$ , $\frac{\ln (k)}{k}$ is monotone decreasing. Then we do the old “approximate by the nearest power of $2$ ” trick that we use to prove the harmonic series diverges³:

\begin{aligned} \frac{\ln (4)}{4} + \frac{\ln (5)}{5} + \frac{\ln (6)}{6} + \frac{\ln (7)}{7} + \frac{\ln (8)}{8} + \frac{\ln (9)}{9} + \dots + \frac{\ln (15)}{15} + \frac{\ln (16)}{16} + \dots \\ \leq \frac{\ln (4)}{4} + \frac{\ln (4)}{4} + \frac{\ln (4)}{4} + \frac{\ln (4)}{4} + \frac{\ln (8)}{8} + \frac{\ln (8)}{8} + \dots + \frac{\ln (8)}{8} + \frac{\ln (16)}{16} + \dots \\ = \ln (4) + \ln (8) + \ln (16) + \dots \end{aligned}

Notice we can safely assume $n$ is a power of $2$ by adding some extra terms. Since we’re trying to upper bound the error anyways, this is no issue. Also, since we’re not interested in additive constants, we can go aheaed and approximate $\frac{\ln (3)}{3}$ by $\frac{\ln (2)}{2}$ to make the sum more uniform.

\begin{aligned} O (\frac{1}{n} \sum_{k \leq n} \frac{\ln (k)}{k}) & = O (\frac{1}{n} (\frac{\ln (2)}{2} + \frac{\ln (3)}{3} + \ln (4) + \ln (8) + \dots + \ln (n))) \\ = O (\frac{1}{n} \sum_{i \leq \log_{2} n} \ln (2^{i})) \\ = O (\frac{1}{n} \sum_{i \leq \log_{2} n} i \ln (2)) \\ = O (\frac{1}{n} \sum_{i \leq \log_{2} n} i) \\ = O (\frac{(\log n)^{2}}{n}) \end{aligned}

Importantly, this error bound is different from the error bound of our original series! As expected, this converges slightly slower (by a log factor) because it’s being weighed down by earlier terms in the series.

Why go through this pain, though? Because now we can finally put a nice bow on things: How quickly does the limit converge?

(1 + \sqrt{2} + \sqrt[3]{3} + \dots + \sqrt[n]{n}) \ln (\frac{2 n + 1}{2 n}) = \frac{1}{2} (1 \pm O (\frac{(\log n)^{2}}{n}))

A notational quirk I picked up from Ryan O’Donnell that my brain likes: I will only ever write $O (f)$ when $f$ is a positive function. So I meaningfully distinguish between $blah + O (\frac{1}{n})$ and $blah - O (\frac{1}{n})$ .

This is a pretty minor thing, but it’s led to notably more comfort on my part, since my brain tends to implicitly make positivity assumptions when I’m not looking. Explicitly writing down when things can be negative forces my brain to pay extra attention to it. ↩
In fact, since I have Ryan O’Donnell on the brain, I audited his Theorist’s Toolkit class during my year off. In one of the lectures, he asserted the Central Limit Theorem was the most useless theorem in mathematics. Of course, he was being intentionally inflammatory. He prefaced that statement with some history – the Central Limit Theorem got its name not (as I originally thought) because summing random variables “tends to lie at the center” of a gaussian. It’s because Pólya thought it was the most important result in probability. It was central to the field.

Ryan went on to tell us about the Berry-Esseen Theorem, which gives good error bounds on the convergence guaranteed by the Central Limit Theorem. ↩
As a quick exercise, why do we get an upper bound here? We prove the harmonic series diverges by lower bounding it… ↩