https://grossack.site
Hilbert Spaces<p>Hilbert Spaces are banach spaces whose norms come from an inner product.
This is fantastic, because inner product spaces are a very minimal amount
of structure for the amount of geometry they buy us. Beacuse of the new
geoemtric structure, many of the pathologies of banach spaces are absent from
the theory of hilbert spaces, and the rigidity of the category of hilbert
spaces (there is a complete cardinal invariant describing hilbert spaces up to
isometric isomorphism) makes it extremely easy to make an abstract hilbert
space concrete. Moreover, this “concretization” is exactly the genesis of
the fourier transform!</p>
<p>With that introduction out of the way, let’s get to it!</p>
<hr />
<p>First, a word on conventions. I’ll be working with the inner product
$\langle x,y \rangle$ which is linear in the <em>second</em> slot, and
conjugate linear in the first. This is often called the “physicist” notation,
but I have a reason for it:</p>
<div class="boxed">
<p><span class="defn">Riesz Representation Theorem</span></p>
<p>If $\mathcal{H}$ is a hilbert space, then the map</p>
\[x \mapsto \langle x, \cdot \rangle : \mathcal{H} \to \mathcal{H}^*\]
<p>is a conjugate linear isometry $\mathcal{H} \cong \mathcal{H}^*$.</p>
<p>In particular, every linear functional on a hilbert space is of the
form $\langle x, \cdot \rangle$ for some $x$<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">1</a></sup>.</p>
</div>
<p>Since we write function application on the left<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">2</a></sup>, it makes sense for the
associated functional $\langle x, \cdot \rangle$ to also be “on the left”.
With the standard “mathematician” convention, we instead have the linear map
$\langle \cdot, y \rangle$ which acts on $x$ “on the right” by
$x \mapsto \langle x, y \rangle$.</p>
<p>Obviously this doesn’t matter at all, but I feel the need to draw attention to it<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">3</a></sup>.</p>
<p>There are two key examples of hilbert spaces:</p>
<ul>
<li>
<p>$L^2(X,\mu)$ is a hilbert space, with the inner product</p>
\[\langle f, g \rangle \triangleq \int \overline{f} g \ d\mu\]
</li>
<li>
<p>A special case of the above, if $A$ is any set and $\#$ is the counting measure,
then $\ell^2(A) = L^2(A, \#)$ is a hilbert space with the inner product</p>
\[\langle (x_\alpha), (y_\alpha) \rangle \triangleq \sum \overline{x_\alpha} y_\alpha\]
</li>
</ul>
<p>In fact, as we will see, <em>every</em> hilbert space is isometrically isomorphic to
some $\ell^2(A)$, and just as we can distinguish vector spaces by the cardinality
of their basis, we can distinguish hilbert spaces by the cardinality of their
“hilbert basis”.</p>
<p>First, though, why should we care about inner products? How much extra structure
does it <em>really</em> buy us? The answer is: lots!</p>
<p>As is so often the case in mathematics, <em>theorems</em> in a concrete setting become
<em>definitions</em> in a more general setting, and once we do this much of the
intuition for the concrete setting can be carried over. For us, then, let’s
see what we can do with inner products.</p>
<hr />
<p>First, it’s worth remembering that an inner product defines a norm</p>
\[\lVert f \rVert \triangleq \sqrt{\langle f, f \rangle}\]
<p>so every inner product preserving function <em>also</em> preserves the norm.</p>
<p>It turns out we can go the other way as well, and the <a href="https://en.wikipedia.org/wiki/Polarization_identity">polarization identity</a>
lets us write the inner product in terms of the norm<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. This is fantastic,
as it means any norm preserving function automatically preserves the inner product!</p>
<p>We can also define the <em>angle</em> between to vectors by</p>
\[\cos \theta \triangleq \frac{\langle f, g \rangle}{\lVert f \rVert \lVert g \rVert}\]
<p>and when $\theta = \pm \frac{\pi}{2}$ (that is, when $\langle f, g \rangle = 0$)
we say that $f$ and $g$ are <span class="defn">orthogonal</span>, written $f \perp g$.</p>
<p>Of course, once we have orthogonality, we have a famous theorem from antiquity:</p>
<div class="boxed">
<p><span class="defn">The Pythagorean Theorem</span></p>
<p>If $f_1, \ldots, f_n$ are pairwise orthogonal, then</p>
<p>\(\lVert \sum f_k \rVert^2 = \sum \lVert f_k \rVert^2\)</p>
</div>
<div class="boxed">
<p>As a quick exercise, you should prove the <span class="defn">Law of Cosines</span>:</p>
<p>For <em>any</em> $f$ and $g$, if the angle between $f$ and $g$ is $\theta$, then:</p>
<p>\(\lVert f+g \rVert^2 =
\lVert f \rVert^2 + \lVert g \rVert^2 - 2 \lVert f \rVert \lVert g \rVert \cos \theta\)</p>
</div>
<p>Once we have orthogonality, we <em>also</em> have the notion of orthogonal complements.
You might remember from the finite dimensional setting that there’s (a priori)
no distinguished complement to a subspace. For instance, any two distinct
$1$-dimensional subspaces are complements in $\mathbb{R}^2$.
But if we decomposed $\mathbb{R}^2$ as a direct sum of the subsaces shown
below, we would likely feel there’s a “better choice” of complement for
the orange subspace:</p>
<p><img src="/assets/images/hilbert-spaces/complement.png" width="50%" /></p>
<p>The blue subspace is one of many complements of the orange, so why should
we choose it over anything else? Instead, we can use the inner product
(in particular the notion of orthogonality) to choose a <em>canonical</em>
complement:</p>
<p><img src="/assets/images/hilbert-spaces/orthogonal-complement.png" width="50%" /></p>
<p>Why is this complement canonical? Because it is the <em>unique</em> complement
so that every blue vector is orthogonal to every orange vector.</p>
<p>In the banach space setting (where we don’t have access to an inner product)
recall there are subspaces which have no complement. In the hilbert space
setting this problem vanishes – the orthogonal complement always exists,
and is a complement<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>!</p>
<div class="boxed">
<p>If $U$ is a subspace of $\mathcal{H}$,
the <span class="defn">orthogonal complement</span> of $U$ is</p>
\[U^\perp \triangleq \{ x \mid \forall u \in U . \langle u, x \rangle = 0 \}.\]
<p>$U^\perp$ is always a closed subspace of $\mathcal{H}$, and moreover</p>
\[(U^\perp)^\perp = \overline{U}\]
<p>so if $U$ is a <em>closed</em> subspace of $\mathcal{H}$, then $(U^\perp)^\perp = U$
and $\mathcal{H} = U \oplus U^\perp$.</p>
</div>
<p>You might also remember from the finite case that we can find particuarly
nice bases for inner product spaces (called <a href="https://en.wikipedia.org/wiki/Orthonormal_basis">orthonormal bases</a>). It’s
then natural to ask if there’s an analytic extension of this concept to
the hilbert space setting.</p>
<p>The answer, of course, is “yes”:</p>
<div class="boxed">
<p>If \(\{u_\alpha\} \subseteq \mathcal{H}\) is orthonormal
(that is, pairwise orthogonal and of norm $1$), the following are equivalent:</p>
<ol>
<li>
<p>(Completeness) If $\langle u_\alpha, x \rangle = 0$ for every $\alpha$, then $x = 0$</p>
</li>
<li>
<p>(Parseval’s Identity) $\lVert x \rVert^2 = \sum_\alpha \lvert \langle u_\alpha, x \rangle \rvert^2$</p>
</li>
<li>
<p>(Density) $x = \sum_\alpha \langle u_\alpha, x \rangle u_\alpha$, and the sum converges
no matter how the terms are ordered<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup></p>
</li>
</ol>
<p>If any (and thus all) of the above are satisfied, we say that
\(\{ u_\alpha \}\) is a <span class="defn">Hilbert Basis</span> for $\mathcal{H}$.</p>
<p>Moreover, <em>every</em> hilbert space admits a hilbert basis!</p>
</div>
<p>These conditions are <em>very</em> foundational to hilbert space theory, and are worth
remembering. The way that I like to remember is</p>
<div class="boxed">
<p>Let \(\{u_\alpha\}_{\alpha \in A}\) be a hilbert basis for $\mathcal{H}$.</p>
<p>The map $x \mapsto (\langle u_\alpha, x \rangle)_{\alpha \in A}$ is a
<a href="https://en.wikipedia.org/wiki/Unitary_operator">unitary</a> map witnessing the (isometric!) isomorphism</p>
\[\mathcal{H} \cong \ell^2(A).\]
<p>Here Parseval’s Identity is tells us that this map is isometric,
and Density tells us that the obvious inverse map
$(c_\alpha) \mapsto \sum c_\alpha u_\alpha$ really is an inverse.</p>
</div>
<p>In fact, one can show that the size of a hilbert basis is a complete
invariant for hilbert spaces. That is, any two hilbert bases for $\mathcal{H}$
have the same cardinality
(often called the <span class="defn">Hilbert Dimension</span>), and
$\mathcal{H}_1 \cong \mathcal{H}_2$ if and only if they have the same
hilbert dimension<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>.</p>
<hr />
<p>That’s a lot of information about hilbert spaces in the abstract. But why
should we <em>care</em> about any of this? Let’s see how to solve some problems
using this machinery!</p>
<p>Let’s work with $L^2(S^1)$, the hilbert space of square-integrable functions
on the unit circle. Classically, we would extend these to be <em>periodic</em>
functions on $\mathbb{R}$, but that turns out to be the wrong point of view<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>.</p>
<p>A hilbert space is separable if and only if it has a countable
hilbert basis. This is nice since most hilbert spaces arising in practice
(including $L^2(S^1)$) <em>are</em> separable, so we can work with a <em>countable</em> sum
(even though the theorem is true in more generality).
For us, we note that \(\{ e^{inx} \}_{n \in \mathbb{Z}}\) is a hilbert
basis for $L^2(S^1)$, sj we would expect periodic functions to decompose
into an infinite sum of these basic functions.</p>
<p>Historically, it was a very important problem to understand the convergence
of “fourier series”. That is, if we define</p>
\[\hat{f}(n) \triangleq \int e^{-inx} f\]
<p>(which we, with the benefit of hindsight, recognize as $\langle e^{inx}, f \rangle$)</p>
<p>when is it the case that we can recover $f$ as the sum
$\displaystyle \sum_{-\infty}^\infty \hat{f}(n) e^{inx}$?
That is, as the $n \to \infty$ limit of the <em>partial</em> fourier series</p>
\[S_n f \triangleq \sum_{-n}^n \hat{f}(n) e^{inx}\]
<p>In particular, for “nice” functions $f$, is it the case that
$\displaystyle \lim_{n \to \infty} S_n f(x) \to f(x)$ pointwise?
If not, is there <em>some</em> sense in which $S_n f \to f$?</p>
<p>This problem was fundamental for hundreds of years, with Fourier publishing
his treatise on the theory of heat in $1822$, and there are textbooks written
as recently as $1957$ which says it is unknown whether the fourier coefficients
of a continuous function has to converge at even <em>one</em> point! See the
historical survey <a href="https://golem.ph.utexas.edu/category/2013/01/carlesons_theorem.html">here</a> for more information, as well as the full course
notes <a href="https://www.maths.ed.ac.uk/~tl/fa/fa_notes.pdf">here</a> by the same author.</p>
<p>The language of $L^p$ and Hilbert spaces wasn’t developed until the $1910$s,
and they are <em>integral</em> (pun intended) in phrasing the solution of the
fourier series convergence problem (<a href="https://en.wikipedia.org/wiki/Carleson%27s_theorem">Carleson’s Theorem</a>).</p>
<p>Carleson’s theorem is famously hard to prove, be we can get a partial solution
for free using the theory of hilbert spaces!</p>
<div class="boxed">
<p>For any $L^2$ function $f$, we have</p>
<p>\(S_n f \overset{L^2}{\longrightarrow} f\)</p>
</div>
<p>This is <em>exactly</em> the “density” part of the equivalence above!
With some work, one can show that $S_n f \to f$ in the $L^p$ norm for any
$p \neq 1, \infty$<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>.</p>
<p>Hilbert spaces are <em>also</em> useful in the world of <a href="https://en.wikipedia.org/wiki/Ergodic_theory">ergodic theory</a>.
Say we have a function $T$ from some space $X$ to itself, which we should
consider as describing how $X$ evolves over one time-step.
One might expect that if we <em>average</em> the position of a point $x$ over time,
we should converge on a fixed point of the transformation<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>.</p>
<p>A result in hilbert space theory tells us we aren’t off base!</p>
<div class="boxed">
<p><span class="defn">Von Neumann Ergodic Theorem</span></p>
<p>If $U : \mathcal{H} \to \mathcal{H}$ is a unitary operator on a separable
hilbert space, then for every $x \in \mathcal{H}$ we have</p>
\[\lim_{n \to \infty} \frac{1}{n} \sum_{k=0}^{n-1} U^k x = \pi(x)\]
<p>where $\pi$ is the orthogonal projection onto the subspace of $U$-fixed points
\(\{ x \mid Ux = x \}\).</p>
</div>
<p>This gives us the <span class="defn">Mean Ergodic Theorem</span> as a corollary!</p>
<div class="boxed">
<p>If $T : X \to X$ is measure preserving and $f \in L^2(X)$, then</p>
\[\frac{1}{n} \sum_{k=0}^{n-1} T^n f \to \pi(f)\]
<p>where $\pi$ is (orthogonal) projection onto the $T$-invariant functions.</p>
</div>
<hr />
<p>Alright! We’ve seen some of the foundational results in hilbert space theory,
and it’s worth remembering our techniques from the banach space world still
apply. Hilbert spaces are very common in analysis, with application in
PDEs, Ergodic Theory, Fourier Theory and more. The ability to
basically do algebra as we would expect, and leverage our geometric intuition,
is extremely useful in practice.</p>
<p>Next time, we’ll give a quick tour of applications of the
<a href="https://en.wikipedia.org/wiki/Baire_category_theorem">Baire Category Theorem</a>, and then it’s on to the Fourier Transform
on $\mathbb{R}$!</p>
<p>The qual is a week from today, but I’m starting to feel better about the
material. This has been super useful in organizing my thoughts, and if I’m
lucky, you all might find them helpful as well.</p>
<p>If you have been, thanks for reading! And I’ll see you next time ^_^.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:3" role="doc-endnote">
<p>The proof is pretty easy once we have a bit more machinery.</p>
<p>Take a functional $f$. If $f = 0$, then $x=0$ works.
Otherwise, $\text{Ker}(f)$ is a proper closed subspace of
$\mathcal{H}$, and has a nontrivial orthogonal complement.
If we take some $x \neq 0$ in the orthogonal complement, then
$\overline{fx}x$ works (as you should verify). <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:1" role="doc-endnote">
<p>Though we <em>really</em> shouldn’t, it seems to be unchangeable at this point.</p>
<p>I tried switching over a few years ago, but it made communication
terribly confusing. Monomorphisms, for instance, cancel on the left
with the usual notation, but cancel on the right with the other notation.
I tried to get around this by remembering monos “cancel after” and epis
“cancel before”, but I got horribly muddled up anytime I tried to talk
with another mathematician.</p>
<p>Teaching and communication are <em>extremely</em> important to me, so I sacrificed
my morals and went back to functions on the left. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>And to evangelize. Until we switch over to functions on the right, this
really <em>is</em> the correct convention.</p>
<p>So I suppose the “mathematician” convention is correct, but inconsistent
with how we (incorrectly) write function application on the left, while
the “physicist” convention is incorrect, but consistent with the rest of
our (incorrect) notation… What a world to live in :P.</p>
<p>If you happen to have a convincing argument for using the other convention,
though, I would love to hear it! <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>It’s worth taking a moment to ask yourself why we can’t use polarization
to turn <em>every</em> normed space into an inner product space. The answer has
to do with the <a href="https://en.wikipedia.org/wiki/Parallelogram_law">parallelogram law</a>. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>In fact, I mentioned this last time as well, but this feature characterizes
hilbert spaces! If every subspace of a given banach space is complemented,
then that banach space is actually a hilbert space! <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>This does <em>not</em> mean the terms are absolutely summable! <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>This isomorphism is in the category of hilbert spaces and unitary maps,
so it is automatically an isometry. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>The natural domain of a periodic function really is $S^1$, and the
mathematics reflects this.</p>
<p>There is a notion of fourier transform on arbitrary (locally compact)
abelian groups. This “<a href="https://en.wikipedia.org/wiki/Pontryagin_duality">pontryagin duality</a>” swaps “compactness” and
“discreteness” (amongst other things) and we see this already. $S^1$ is
compact and $\mathbb{Z}$ is discrete, and they are pontryagin duals of
each other.</p>
<p>Abstract harmonic analysis
(the branch of math that studies this duality theory)
seems <em>really</em> interesting, and I want to learn more about it
when I have the time. It seems to have a lot of connections to
representation theory, which is also on my to-learn list. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>The case of $L^1$ and $L^\infty$ are both interesting, and the disproofs
proceed by the <a href="https://en.wikipedia.org/wiki/Uniform_boundedness_principle">uniform boundedness principle</a>!</p>
<p>If $S_N f \to f$ in $L^1$ or $L^\infty$, we would know the maps
$S_N$ are bounded pointwise.
But then by banach space-ness we have the uniform boundedness principle,
and we would know that the $S_N$s would need to be <em>uniformly</em> bounded.
But we can find functions so that $S_N f$ have arbitrarily high $L^1$
and $L^\infty$ norm, giving us the contradiction. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>This idea of averaging to land on a fixed point is both common and powerful.
It is the key idea behind <a href="https://en.wikipedia.org/wiki/Maschke%27s_theorem">Maschke’s Theorem</a> and many other results.
I’ve actually been meaning to write a blog post on these kinds of
averaging arguments, but I haven’t gotten around to it yet… <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 15 Sep 2021 00:00:00 +0000
https://grossack.site/2021/09/15/hilbert-spaces.html
https://grossack.site/2021/09/15/hilbert-spaces.htmlBanach Spaces and Preserving Finite Dimensional Theorems<p>Banach Spaces are ubiquitous in analysis, and they let us rein in analytic
objects using algebra (in particular, vector spaces) and completeness.
Infinite dimensional vector spaces can be pathological, but by restricting
attention to <em>continuous</em> operations for our topology, we can recover
analogues for a lot of the finite dimensional theory!</p>
<hr />
<p>Recall a Banach Space is a vector space equipped with a norm so that
the resulting metric space is <a href="https://en.wikipedia.org/wiki/Complete_metric_space">complete</a>. We have already seen some
examples of this, but let’s explicitly list some to get a sense of <em>just</em>
how common Banach Spaces are!</p>
<ul>
<li>
<p>$L^p$ spaces, with the $L^p$ norm</p>
\[\lVert f \rVert_p \triangleq \left ( \int |f|^p \right )^{1/p}\]
</li>
<li>
<p>Bounded continuous functions with the $\sup$ norm</p>
\[\lVert f \rVert_\infty \triangleq \sup_{x \in X} |fx|\]
<p>(A particular case is <em>all</em> continuous functions on a compact space<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">1</a></sup>)</p>
</li>
<li>
<p>Complex valued measures with the <a href="https://en.wikipedia.org/wiki/Total_variation#Total_variation_norm_of_complex_measures">total variation</a> norm<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>
\[\lVert \mu \rVert \triangleq |\mu|(X)\]
</li>
<li>
<p>$k$-times differentiable functions on $[0,1]$ with the norm</p>
\[\lVert f \rVert \triangleq \sum_{j=0}^k \left \lVert \frac{d^j}{dx^j} f \right \rVert_\infty\]
</li>
<li>
<p>Lipschitz functions on $[0,1]$ with the norm</p>
\[\lVert f \rVert \triangleq C_f + \lVert f \rVert_\infty\]
<p>(where $C_f$ is the lipshitz constant for $f$)</p>
</li>
<li>
<p><a href="https://en.wikipedia.org/wiki/Sobolev_space">Sobolev Spaces</a>, which I’m told are extremely important.</p>
</li>
</ul>
<p>Of course, we can also build new Banach Spaces from old, and these work in
much the same way as in classical (by which I mean finite dimensional)
linear algebra.</p>
<ul>
<li>
<p>If $X$ and $Y$ are banach spaces, then so is $X \times Y$ with the norm</p>
\[\lVert (x, y) \rVert \triangleq \max \{ \lVert x \rVert, \lVert y \rVert \}\]
<p>moreover this is the categorical product in $\mathsf{Ban}_1$<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">3</a></sup>.</p>
</li>
<li>
<p>More generally, if $(X_\alpha)$ is a family of banach spaces, then by
$\prod X_\alpha$, we mean
\(\{ (x_\alpha) \mid \exists C . \forall \alpha . \lVert x_\alpha \rVert \leq C \}\).</p>
<p>We define $\lVert (x_\alpha) \rVert$ to be the least such $C$
(which necessarily equals $\sup { \lVert x_\alpha \rVert }$).
This is the categorical product in $\mathsf{Ban}_1$, and since one can
show $\mathsf{Ban}_1$ also has equalizers, we see it is complete.</p>
</li>
<li>
<p>If $X$ and $Y$ are banach spaces, then so is $X \oplus Y$ with the norm</p>
\[\lVert (x,y) \rVert \triangleq \lVert x \rVert + \lVert y \rVert\]
<p>moreover this is the categorical coproduct in \(\mathsf{Ban}_1\).
Notice, as in the finite dimensional case, that
$X \times Y \cong X \oplus Y$, and the difference only becomes relevant
for <em>infinite</em> products/coproducts<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>
</li>
<li>
<p>More generally, if $(X_\alpha)$ is a family of banach spaces, then by
$\bigoplus X_\alpha$ we mean
\(\{ (x_\alpha) \mid \sum \lVert x_\alpha \rVert \lt \infty \}\)<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">5</a></sup>.</p>
<p>Unsurprisingly, we define the norm to be</p>
\[\lVert (x_\alpha) \rVert \triangleq \sum \lVert x_\alpha \rVert.\]
<p>Again, this is the categorical coproduct in $\mathsf{Ban}_1$, and since
one can show $\mathsf{Ban}_1$ has coequalizers, we see it is cocomplete.</p>
</li>
<li>
<p>If $A$ is a closed subspace of $X$, then $A$ is itself a banach space
with the induced norm<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">6</a></sup>.</p>
</li>
<li>
<p>If $A$ is a closed subspace of $X$, then $X / A$ is a banach space with
the norm</p>
\[\lVert x + A \rVert \triangleq \inf_{a \in A} \lVert x + a \rVert\]
<p>the topology generated by this norm agrees with the quotient topology,
so we find this does indeed satisfy the universal property of quotients<sup id="fnref:16" role="doc-noteref"><a href="#fn:16" class="footnote" rel="footnote">7</a></sup>.</p>
</li>
</ul>
<div class="boxed">
<p>⚠ Short exact sequences of banach spaces do <em>not</em> split in general!
That is, if $A$ is a closed subspace of $X$, it is <em>not</em> the case that
$X \cong A \oplus (X / A)$!</p>
<p>See, for instance, <a href="https://arxiv.org/pdf/math/0501048v1.pdf">this survey</a> by Mohammad Sal Moslehian<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>.</p>
</div>
<ul>
<li>
<p>If $X$ and $Y$ are banach spaces, then $\mathcal{L}(X,Y)$, the space
of continuous linear maps $X \to Y$ is a banach space with the norm<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup></p>
\[\lVert T \rVert \triangleq \sup_{\lVert x \rVert = 1} \lVert Tx \rVert\]
</li>
</ul>
<div class="boxed">
<p>As a quick exercise, prove</p>
\[\lVert Tx \rVert \leq \lVert T \rVert \lVert x \rVert\]
<p>This is one of the most important inequalities in the subject.</p>
</div>
<ul>
<li>As a special case of the previous example, $\mathcal{L}(X,\mathbb{C})$
is always a banach space. We denote it by $X^*$, the
<span class="defn">dual</span> of $X$<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>.</li>
</ul>
<hr />
<p>Now that we have a wide array of examples of banach spaces, we should start
asking how much of our intuition from finite dimensional linear algebra carries
over. After all, a lot of these constructions looked really familiar, but
then we got blindsided by the lack of complements.</p>
<p>Thankfully, there are lots of foundational theorems in banach space theory
which tell us that certain things work exactly as we’d like!</p>
<p>For instance, in the finite dimensional case, if we’ve defined a functional
on some subspace, then we can always extend it to the whole space. But the
proof crucially relies on a choice of basis, so in the infinite dimensional case,
we need to be a bit careful to guarantee that the extension is still continuous.</p>
<p>Thankfully, everything works out:</p>
<div class="boxed">
<p><span class="defn">The Hahn-Banach Theorem</span><sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup></p>
<p>If $f$ is a continuous linear functional defined on a subspace of $X$,
then $f$ extends to a continuous linear functional $F$ defined on all of $X$.</p>
<p>Moreover, $\lVert F \rVert = \lVert f \rVert$.</p>
</div>
<p>Another piece of intuition from the finite dimensional case is that a
bijective linear map is automatically an isomorphism
(that is, the inverse map is automatically linear). Is it the case that
a <em>continuous</em> bijective linear map is automatically an isomorphism?
That is, must its inverse <em>also</em> be continuous? Again, the answer is “yes”!</p>
<div class="boxed">
<p><span class="defn">The Open Mapping Theorem</span></p>
<p>Let $X$ and $Y$ be banach spaces. If $T \in \mathcal{L}(X,Y)$ is surjective,
then it is <a href="https://en.wikipedia.org/wiki/Open_and_closed_maps">open</a>.</p>
<p>In particular, if $T$ is bijective, then $T^{-1}$ is continuous.</p>
</div>
<p>There’s another nice corollary of the open mapping theorem too. Topologically
we expect a quotient map to be open, and we know from the homomorphism theorems
that we can factor any surjection $T : X \to Y$ as</p>
\[X \to X / \text{Ker}(T) \cong Y\]
<p>the open mapping theorem says that this quotient map is open, as we would expect.
In fact, the projection $\pi : X \to X / A$ (for $A$ closed, of course) always
has norm $1$.</p>
<p>Continuing with our examples, in the finite dimensional case, we think of
subspaces as being “much smaller” than the ambient space. For instance, the
$xy$-plane is measure $0$ inside $\mathbb{R}^3$ (with lebesgue measure) because
it has no thickness. One can ask if sub-banach spaces (that is, closed subspaces)
must be “small” in some sense. Again, the answer is “yes”<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote" rel="footnote">12</a></sup>, but we
have to use a different notion of “small”<sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote" rel="footnote">13</a></sup>:</p>
<div class="boxed">
<p><span class="defn">The (Strong) Open Mapping Theorem</span></p>
<p>If $X$ and $Y$ are banach spaces<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote" rel="footnote">14</a></sup> and $T \in \mathcal{L}(X,Y)$, then if
the image $T[X]$ is nonmeagre in $Y$, we automatically have surjectivity
and open-ness.</p>
<p>As a simple corollary, every proper closed subspace is meagre.</p>
</div>
<hr />
<p>Of course, we can’t talk about banach spaces without talking about a theorem
which honestly feels like magic.</p>
<div class="boxed">
<p><span class="defn">The Uniform Boundedness Principle</span></p>
<p>Say ${ T_\alpha }$ is a family of continuous linear maps between
banach spaces<sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote" rel="footnote">15</a></sup> $X$ and $Y$. If, this family is bounded <em>pointwise</em>
in the sense that every $x$ satisfies</p>
\[\sup \lVert T_\alpha x \rVert \lt \infty\]
<p>(where the precise bound is allowed to depend on $x$)</p>
<p>then we actually get a <em>uniform</em> bound for free!</p>
<p>\(\sup \lVert T_\alpha \rVert \lt \infty\)</p>
</div>
<p>The ability to boost pointwise results to uniform results is often important
(this is one of many reasons to care about compactness), and there are
innumerable appplications of this theorem. Here’s one that seems to show up
on a lot of practice quals:</p>
<div class="boxed">
<p>Show that a weakly convergent sequence $(x_n)$ is bounded.</p>
<p>Recall $(x_n)$ is weakly convergent if $(T x_n)$ is convergent in $\mathbb{C}$
for every continuous functional $T$.</p>
</div>
<p>And here’s one that says a certain pathology you might remember from an
undergraduate analysis class doesn’t happen in the banach space setting:</p>
<div class="boxed">
<p>Show that if $X$, $Y$, and $Z$ are banach spaces, then every <em>separately</em>
continuous bilinear map $X \times Y \to Z$ is automatically <em>jointly</em>
continuous<sup id="fnref:18" role="doc-noteref"><a href="#fn:18" class="footnote" rel="footnote">16</a></sup>.</p>
<p>Here $X \times Y$ is meant as merely the product of topological spaces,
rather than as the product of banach spaces defined earlier.</p>
</div>
<p>As with the open mapping princple, it’s actually enough to know that you’re
pointwise bounded on some set that isn’t small:</p>
<div class="boxed">
<p><span class="defn">The (Strong) Uniform Boundedness Principle</span></p>
<p>If $X$ and $Y$ are normed vector spaces<sup id="fnref:17" role="doc-noteref"><a href="#fn:17" class="footnote" rel="footnote">17</a></sup> and $(T_\alpha)$ is a
sequence of continuous linear maps from $X$ to $Y$, then
as long as the family is <em>pointwise</em> bounded on a nonmeagre set, it’s
actually uniformly bounded!</p>
<p>More formally, if $E \subseteq X$ is nonmeagre, and we have a bound</p>
\[\sup \lVert T_\alpha x \rVert \lt \infty\]
<p>for each $x \in E$</p>
<p>then we actually get a uniform bound for free!</p>
<p>\(\sup \lVert T_\alpha \rVert \lt \infty\)</p>
</div>
<p>This is fairly indicative of working with meagre and nonmeagre sets.
We frequently get a kind of dichotomy where things are either bad
almost everywhere or good almost everywhere (by which I mean on a comeagre set).</p>
<p>So if you can show that something good/bad happens on a set that isn’t small
(meagre) then you often get for free that good/bad things happen almost everywhere!</p>
<p>For instance, let’s look at the contrapositive of the above theorem. It says
that if $\sup \lVert T_\alpha \rVert = \infty$, then actually for
<em>comeagrely many</em> choices of $x$, we must have
$\sup \lVert T_\alpha x \rVert = \infty$!</p>
<hr />
<p>Next time<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">18</a></sup>, we’ll talk about <a href="https://en.wikipedia.org/wiki/Hilbert_space">Hilbert Spaces</a>, where we’ll require
even more algebraic structure, and in exchange we’ll gain better structure
theorems telling us about our spaces. These structure theorems will lead us
into the beautiful world of Fourier Analysis, which we’ll discuss afterwards.</p>
<p>In the meantime, you should definitely read Terry Tao’s post about banach
spaces <a href="https://terrytao.wordpress.com/2009/02/01/245b-notes-9-the-baire-category-theorem-and-its-banach-space-consequences/">here</a>.</p>
<p>See you soon! ^_^</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:3" role="doc-endnote">
<p>In fact, this accounts for <em>all</em> banach spaces!</p>
<p>The <a href="https://en.wikipedia.org/wiki/Banach%E2%80%93Mazur_theorem">Banach-Mazur Representation Theorem</a> says that every banach space
is isometric to a closed subspace of $C(K)$ for some compact space $K$
(in fact, the unit ball in the dual with the weak-* topology works).</p>
<p>In case your banach space is separable, we can do better – it is a closed
subspace of $C([0,1])$! <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Intuitively this makes sense, but formally it is far from obvious
(at least to me!). That said, you can find a smattering of proofs
<a href="https://math.stackexchange.com/questions/178921/space-of-complex-measures-is-banach-proof">here</a>. I like the proof by Radon-Nikodym, if you want to do it directly.</p>
<p>The most conceptual way to see this is by
citing the <a href="https://en.wikipedia.org/wiki/Riesz%E2%80%93Markov%E2%80%93Kakutani_representation_theorem">Riesz Representation Theorem</a>, which says that this
space of measures is actually isometric to $C_0^*$, and thus is banach.
Of course, that only works when $X$ is locally compact hausdorff. The
theorem as proven in that mse link works more generally. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>The category $\mathsf{Ban}$ of banach spaces with <em>all</em> continuous linear
maps turns out to be somewhat badly behaved.
But if we restrict to <em>contracting</em> maps, we get a much nicer category,
$\mathsf{Ban}_1$.</p>
<p>Notice we can rescale any linear transformation $T$ by a constant to make
it a contraction, so this is not really a limitation. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>In fact, in the finite case we can take $\lVert (x,y) \rVert$ to be any of</p>
<ul>
<li>$\max { \lVert x \rVert_X, \lVert y \rVert_Y }$</li>
<li>$\sqrt{ \lVert x \rVert_X^2 + \lVert y \rVert_Y^2 }$</li>
<li>$\lVert x \rVert_X + \lVert y \rVert_Y$</li>
</ul>
<p>and we’ll get the same banach space up to isomorphism (but NOT isometry!).</p>
<p>If you’ve not seen this before, you should prove it!
It’s a fairly quick exercise to show these are all equivalent norms,
and that they are complete. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>Notice the coproduct you might expect</p>
\[\{ (x_\alpha) \mid x_\alpha \neq 0 \text{ for finitely many $\alpha$ } \}\]
<p>is not complete, and thus not banach (do you see why?). The actual definition
of the coproduct is exactly the completion of this space in the coproduct
norm, though. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>Notice we need $A$ to be closed so that it is itself complete, and is thus
a sub-banach space. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:16" role="doc-endnote">
<p>Again, we need $A$ to be closed so that we’re working with sub-banach spaces.
Notice this is not a serious issue – as a quick exercise, you might show
that the kernel of a continuous linear map is always a closed subspace. <a href="#fnref:16" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>In fact, there’s more to say! A (nontrivial) theorem says that
if a banach space satisfies “every closed subspace has a complement” then
it must actually be a hilbert space!</p>
<p>So every non-hilbert banach space must contain a noncomplemented closed subspace!</p>
<p>See <a href="https://math.stackexchange.com/questions/2176497/banach-space-with-non-complemented-subspace">here</a> (as well as the linked article), for more. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>Recall a linear function is continuous if and only if it is bounded in
the sense that the norm we’re defining is finite. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>In fact, $X^*$ is complete even when $X$ <em>isn’t</em>.
One slick corollary of this is the construction of the completion of
a normed space.</p>
<p>Since $X$ (isometrically!) embeds into its double dual
$X \hookrightarrow X^{**}$, we can define the completion of $X$ to be the
closure of $X$ under this embedding. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p>The proof of this fact makes use of the fact that any vector space is
the filtered colimit of its finite dimensional subspaces. We show how to
extend by one basis element at a time in a way that preserves the norm,
then we apply Zorn’s Lemma to the partial order of these extensions.</p>
<p>It turns out this appeal to Zorn’s Lemma is somewhat unavoidable. There
are models of $\mathsf{ZFC}$ where Hahn-Banach fails in full generality,
so we need <em>some</em> amount of choice to prove it. However it’s strictly
weaker than full AC (see <a href="https://mathoverflow.net/questions/5351/whats-an-example-of-a-space-that-needs-the-hahn-banach-theorem">here</a> for more discussion).</p>
<p>Thankfully, in many concrete situations, we <em>don’t</em> need choice! If $X$
is <a href="https://en.wikipedia.org/wiki/Separable_space">separable</a>, then we can extend one dimension at a time, making
sure we eventually choose each element of our countable dense subset.
At the end of this (countable length!) process, we can extend to the whole
space by continuity. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:12" role="doc-endnote">
<p>Interestingly, we can ask about more general subspaces
(which are necessarily not closed). It turns out the answer here is
a firm “no”.</p>
<p>Every infinite dimensional banach space has a proper nonmeagre subspace
(which is necessarily not closed). It turns out such a subspace must be
dense and cannot have <a href="https://en.wikipedia.org/wiki/Property_of_Baire">BP</a>.</p>
<p>These subspaces arise as kernels of discontinuous functionals, so the
next question is “does every discontinuous functional work?”, and the
answer here is “it’s subtle”.</p>
<p>Using <a href="https://en.wikipedia.org/wiki/Martin%27s_axiom">Martin’s Axiom</a> one can show that every separable
banach space has a discontinuous functional whose kernel is still meagre.
See <a href="https://mathoverflow.net/questions/3188/are-proper-linear-subspaces-of-banach-spaces-always-meager">here</a> for more info. <a href="#fnref:12" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:14" role="doc-endnote">
<p>The post about the baire category theorem is coming up! <a href="#fnref:14" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:13" role="doc-endnote">
<p>Actually we only need $X$ to be banach –
a priori $Y$ can be any normed vector space.</p>
<p>Interestingly, though. As soon as we know that $T[X] = Y$, we also know
that $Y \cong X / \text{Ker}(T)$ is banach. <a href="#fnref:13" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:15" role="doc-endnote">
<p>Again, we only <em>really</em> need $X$ to be banach. <a href="#fnref:15" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:18" role="doc-endnote">
<p>It might be helpful to recall that a bilinear operator is jointly
continuous if and only if it is “jointly bounded” in the sense that
\(\sup \{ \lVert f(x,y) \rVert \ \mid \ \lVert x \rVert = 1, \lVert y \rVert = 1 \} \lt \infty\).</p>
<p>As a bigger hint, you might try applying the uniform boundedness principle
to the family of maps $f(x,-) : Y \to Z$.</p>
<p>Also, as a fun ~ bonus game ~ for particularly enthusiastic readers,
it turns out we don’t need all of $X$, $Y$, and $Z$ to be banach spaces!
How weak can you make the assumptions and still prove this theorem? <a href="#fnref:18" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:17" role="doc-endnote">
<p>In particular neither needs to be complete. <a href="#fnref:17" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:1" role="doc-endnote">
<p>I didn’t forget about the <a href="https://en.wikipedia.org/wiki/Closed_graph_theorem_(functional_analysis)">closed graph theorem</a>, I just couldn’t
find a way to make it fit into the narrative of this blog post
(comparing finite dimensional and infinite dimensional banach spaces).</p>
<p>I have some interesting stuff to say, though, since it’s analogous to a theorem
in <a href="https://en.wikipedia.org/wiki/Universal_algebra">universal algebra</a>.
Terry Tao talks some about this in one of his <a href="https://terrytao.wordpress.com/2012/11/20/the-closed-graph-theorem-in-various-categories/">blog posts</a>
(and gives a <em>very</em> interesting application in <a href="https://terrytao.wordpress.com/2016/04/22/a-quick-application-of-the-closed-graph-theorem/">another</a>), but the
idea behind the closed graph theorem is true in very high generality:</p>
<div class="boxed">
<p>Show that $f : A \to B$ is a homomorphism of algebras if and only if
its graph is a subalgebra of $A \times B$.</p>
</div>
<p>Moreover, if $Y$ is compact and hausdorff, then $f : X \to Y$ is continuous
if and only if its graph is closed in $X \times Y$. This should make some
vague sense, since compact hausdorff spaces behave a lot like algebras
(in fact, they <em>are</em> a category of algebras for the ultrafilter monad.
See <a href="https://ncatlab.org/nlab/show/compactum">here</a>, for instance), so $f$ is a “homomorphism” (is continuous)
exactly when its graph is a subalgebra (sub-compact-hausdorff space)
of $X \times Y$. Of course, to be sub-compact-hausdorff, it suffices to
check closedness.</p>
<p>Now the closed graph theorem says that this is true of banach spaces as well.
$f : X \to Y$ is a continuous linear map if and only if its graph is a
sub-banach space of $X \times Y$. Linearity comes from the “space”
part of “sub-banach space” and continuity comes from the “banach” part.
Of course, as in the compact hausdorff case, it suffices to check closedness.</p>
<p><em>Unlike</em> the case of compact hausdorff spaces, though, I don’t know of
any categorical justification for this theorem! If anyone happens to have
one, I would love to hear about it! <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Thu, 09 Sep 2021 00:00:00 +0000
https://grossack.site/2021/09/09/banach-spaces.html
https://grossack.site/2021/09/09/banach-spaces.htmlExamples of Syntax/Semantics Theorems Throughout Math<p>I’ve been promising a blog post on syntax and semantics for a long time now.
There’s a <em>lot</em> to say, as this duality underlies much of mathematical logic,
but I want to focus on one particular instance of the syntax/semantics
connection which shows up <em>everywhere</em> in mathematics. I talked about this
briefly in a talk last year (my blog post debriefing from the talk is <a href="https://terrytao.wordpress.com/2020/09/08/zarankiewiczs-problem-for-semilinear-hypergraphs/">here</a>)
but it’s a kind of squishy and imprecise observation. Because of that, this
post is going to be less expository than my usual ones, and is instead going
to be a “definition by examples” of sorts. I’ll try to show examples from as
many brances of math as possible, and hopefully by the end it becomes clear
what the flavor of these theorems is, as well as how ubiquitous they are!</p>
<p>This post has been in the pipeline for a few months now, and I’m glad to
have it finished! It was pretty tiring to revise, but was also a welcome
break from all the analysis I’ve been reading lately. I hope you find it
interesting ^_^!</p>
<p>As a reminder, <span class="defn">Syntax</span> is the part of mathematics that
deals with symbols, and rules for manipulating them. So the <em>syntax</em> associated
to groups are variables and the symbols $1$, $\cdot$, and ${}^{-1}$. There
are also rules for manipulating the syntax. For instance, $1 \cdot x$ can always
be replaced by $x$ (and vice versa).</p>
<p>Syntax, it turns out, gives us a way of <em>asking questions</em> or <em>talking about</em>
an object. For instance we can ask the question
“$\forall x . \forall y . xy = yx$”.
The answer to this question, of course, depends on which group we’re talking about.</p>
<p>Dually, <span class="defn">Semantics</span> tell us what the syntax <em>means</em>.
If you fix an actual group $G$, then you can <em>interpret</em> the syntax in $G$,
and answer the questions that get asked. We typically denote this with the
symbol $\models$ which is read “models” or “satisfies”.</p>
<p>As an example, $\mathbb{Z} \models \forall x . \forall y . xy = yx$, but
$\mathfrak{S}_3 \not \models \forall x . \forall y . xy=yx$.</p>
<p>This is a very <a href="https://en.wikipedia.org/wiki/Model_theory">model theoretic</a> notion of syntax and semantics, but I think
it’s a good fundamental example<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. The kind of theorem that I’m going to focus
on in this post, though, are <a href="https://en.wikipedia.org/wiki/Descriptive_set_theory">descriptive</a> in nature. The idea is this:</p>
<div class="boxed">
<p>If your object is simple syntactically, then it must be simple semantically as well.</p>
</div>
<p>Another way to phrase this is</p>
<div class="boxed">
<p>If your object has a simple description, it can’t be too complicated.</p>
</div>
<p>Now let’s see some examples of this idea in action!
I would love to hear any other examples that you can think of as well ^_^.</p>
<hr />
<h2 id="definable-sets-in-a-topological-space">Definable Sets in a Topological Space</h2>
<p>Let’s start simple. Say $f, g : X \to \mathbb{R}$ are continuous. Then</p>
<ul>
<li>\(\{x \mid fx = gx \}\) and \(\{ x \mid fx \leq gx \}\) are closed</li>
<li>\(\{x \mid fx \neq gx \}\) and \(\{ x \mid fx \lt gx \}\) are open</li>
</ul>
<p>This is the kind of theorem that you probably know intuitively, but you
might have never thought about formally. Notice that just by looking at the
<em>syntax</em> (that is, how the set is defined) we can learn something nontrivial
about the set itself (in this case, its topological complexity).</p>
<div class="boxed">
<p>As a simple exercise, if you’ve never seen this before you should prove this!</p>
<p>You should recall that $\mathbb{R}$ is hausdorff, so
\(\Delta = \{ (x,x) \}\) is closed in $\mathbb{R}^2$.
Similarly, notice \(\{ (x,y) \mid x \leq y \}\) is closed in $\mathbb{R}^2$.</p>
<p>Do you see how to use this to show the claim?</p>
</div>
<p>Let’s make things a little more complicated. Again, say $f, g : X \to \mathbb{R}$
is continuous. Then we can bound the complexity of various sets defined by
referencing $f$ and $g$ by counting quantifiers.</p>
<p>First, let’s look at things that are quantifier free. We can turn logical
symbols in the definition of a set into boolean operations on sets themselves.
So</p>
<ul>
<li>
\[\{x \mid fx = 0 \land gx = 0\} = \{ x \mid fx = 0 \} \cap \{ x \mid gx = 0 \}\]
</li>
<li>
\[\{x \mid fx = 0 \lor gx = 0 \} = \{ x \mid fx = 0 \} \cup \{ x \mid gx = 0 \}\]
</li>
<li>
\[\{x \mid \lnot (fx = 0) \} = \{x \mid fx = 0 \}^c\]
</li>
</ul>
<p>Using the fact from earlier that $fx=0$ is a closed condition, we can immediately
see that the first two sets are closed, and the third is open. Any connectives
in the definition can be handled in this way.</p>
<p>That actually <em>includes</em> infinite connectives! For instance, say we have a
sequence of functions $(f_n)_{n \in \mathbb{N}}$. Then</p>
<ul>
<li>\(\{x \mid \bigvee_{n \in \mathbb{N}} f_n x = 0\} = \bigcup_{n \in \mathbb{N}} \{ x \mid f_n x = 0\}\) is $F_\sigma$.</li>
</ul>
<p>A countable conjunction/disjunction is often viewed as a <em>countable quantifier</em> since</p>
<ul>
<li>$\exists n \in \mathbb{N} . \varphi(n) \iff \bigvee_{n \in \mathbb{N}} \varphi(n)$</li>
<li>$\forall n \in \mathbb{N} . \varphi(n) \iff \bigwedge_{n \in \mathbb{N}} \varphi(n)$</li>
</ul>
<p>Now we’ve extended our syntax to be more expressive. We can now use
countable conjunctions/disjunctions, or equivalently countable quantifiers.
Since our syntax is slightly more complicated, we should be able to describe
more complex sets in this way.</p>
<div class="boxed">
<p>As another fun exercise, you should check that every set definable with
countable quantifiers is borel. In fact, there’s a <a href="https://en.wikipedia.org/wiki/Borel_hierarchy">hierarchy</a> of complexity
for borel sets, and the position of \(\{x \mid \varphi \}\) in this hierarchy is
<em>exactly</em> in correspondence with the (countable) quantifier complexity of
$\varphi$.</p>
</div>
<p>A big part of <a href="https://en.wikipedia.org/wiki/Descriptive_set_theory">descriptive set theory</a> is trying to turn real-valued
quantifiers into countable quantifiers. For instance,
let’s look at the following classic analysis exercise:</p>
<div class="boxed">
<p>Let $(f_n : X \to \mathbb{R})_{n \in \mathbb{N}}$ be a sequence of measurable functions.
Show the set of $x$ where $f_n(x)$ converges is measurable.</p>
</div>
<p>We can solve this by writing down what it means to be convergent, and converting
this syntactic definition into a semantic one. Since we’re gunning for borel,
we know we’re only allowed to use natural number quantifiers. This leads to
the following argument:</p>
\[\begin{aligned}
\{ x \mid f_n x \text{ converges} \}
&= \left \{ x \ \middle | \ \forall k . \exists N . \forall m, n \geq N . | f_n x - f_m x| \leq \frac{1}{k} \right \} \\
&= \bigcap_k \bigcup_N \bigcap_{m,n \geq N} \left \{ x \ \middle | \ | f_n x - f_m x| \leq \frac{1}{k} \right \} \\
&= \bigcap_k \bigcup_N \bigcap_{m,n \geq N} \left \{ x \ \middle | \ x \in | f_n - f_m |^{-1} \left [ 0, \frac{1}{k} \right ] \right \}
\end{aligned}\]
<p>Since $|f_n - f_m|$ is measurable, the set at the end is measurable, which
makes our whole set measurable too. If the $f_n$ are assumed to be continuous
instead, then we can get a more precise bound: The set is $\mathbf{\Pi^0_3}$.</p>
<p>You might be wondering what happens if we <em>do</em> allow real valued quantifiers.
It turns out there are <em>still</em> theorems bounding the complexity of our sets!</p>
<p>Sets with only existential real quantifiers are called
<span class="defn">Analytic</span>, and sets with only universal real quantifiers
are called <span class="defn">Coanalytic</span>. A (very!) nontrivial theorem
in descriptive set theory says that both classes of sets are
<a href="https://en.wikipedia.org/wiki/Universally_measurable_set">universally measurable</a>, have the <a href="https://en.wikipedia.org/wiki/Property_of_Baire">property of baire</a>, etc.
See <a href="http://www.personal.psu.edu/jsr25/Spring_11/Lecture_Notes/dst_lecture_notes_2011_Regularity-Analytic.pdf">here</a> for a proof.</p>
<p>But <em>now</em> you might be wondering if we allow alternating quantifiers! For instance,
what if we have a set of the form
\(\{ z \mid \forall x . \exists y . \varphi(x,y,z) \}\)
where $x$ and $y$ are reals? This turns out to be independent of $\mathsf{ZFC}$!</p>
<p>The relevant search term is <a href="https://en.wikipedia.org/wiki/Axiom_of_projective_determinacy">projective determinacy</a>, which follows from
certain large cardinal axioms. I was planning to write up a blog post about
this, but I’ve been beaten to the punch! There’s a great introduction at
the blog <strong>Complex Projective $4$ Space</strong> (which you can read <a href="https://cp4space.hatsya.com/2021/07/10/determinacy/">here</a>),
and if you find set theory and large cardinals interesting, you might want to
read Tom Leinster’s take on set theory <a href="https://golem.ph.utexas.edu/category/2021/06/large_sets_1.html">here</a>. It’s a great introduction
so far, and is phrased in a way that I suspect will be a bit more accessible
to the generic mathematician than a traditional set theoretic reference might be.</p>
<hr />
<h2 id="restricting-the-syntax-of-an-algebraic-structure">Restricting the Syntax of an Algebraic Structure</h2>
<p>Say we have a group $G = \langle g_1, \ldots, g_n \mid R_1, \ldots, R_m \rangle$
defined by generators and relations. This is a <em>syntactic</em> description of the
group, since it tells us what symbols to use and what the rules are for
pushing those symbols around.</p>
<p>It’s often the case in algebra that we have some syntactic description of an
algebraic object like this. A general life pro tip in this situation is to
try to make restrictions on the syntax. Oftentimes this will give you
surprisingly powerful restrictions on how complicated your object itself is.
That is, powerful restrictions on the <em>semantics</em>.</p>
<p>Again, let’s start easy. The <span class="defn">Word Problem</span> for a
group $G = \langle S \mid R \rangle$ is the
set of words<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> $w \in S^*$ which equal the identity in $G$.
What can we say about $G$ if we assume the word problem isn’t too complicated?</p>
<p>The word problem is a subset $W \subseteq S^*$, which we can view as a
<a href="https://en.wikipedia.org/wiki/Formal_language">formal language</a>. There are a few well known classes of languages,
of increasing complexity, and we can ask what must be true of our group if we
assume $W$ falls into each of these classes.</p>
<ul>
<li>$W$ is <a href="https://en.wikipedia.org/wiki/Regular_language">regular</a> if and only if $G$ is finite.</li>
<li>$W$ is <a href="https://en.wikipedia.org/wiki/Context-free_language">context free</a> if and only if $G$ is <a href="https://en.wikipedia.org/wiki/Virtually">virtually free</a></li>
<li>$W$ is <a href="https://en.wikipedia.org/wiki/Computable_set">computable</a> (that is, the word problem is decidable, or solvable)
if and only if it embeds into every <a href="https://en.wikipedia.org/wiki/Algebraically_closed_group">algebraically closed group</a>.</li>
<li>$W$ is <em>always</em> <a href="https://en.wikipedia.org/wiki/Recursively_enumerable_language">computably enumerable</a>, so this does not actually
pose any restrictions on $G$.</li>
</ul>
<p>Notice how natural restrictions on the syntax of $G$ correspond to
(relatively) natural restrictions on the semantics of $G$.</p>
<p>One particularly famous result in this vein is Gromov’s theorem on groups of
polynomial growth. This theorem says that a group $G$ has “polynomial growth”
(a syntactic condition) if and only if $G$ is virtually nilpotent.</p>
<p>This is also leaving aside all of the work done on <a href="https://en.wikipedia.org/wiki/One-relator_group">one relator groups</a>.
Mostly I’m leaving this aside because I’m not particularly familiar with
this area. I know that there <em>are</em> theorems of this form, though. For instance,
if the one relator is not a proper power, then $G$ is torsion free. If anyone
is more familiar than me with one relator groups (and this is not a high bar to clear)
I would love to hear about other examples!</p>
<p>But why stop at group theory? The notion of “presentation” exists in
commutative algebra as well. Can we find similar theorems there<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">3</a></sup>?</p>
<p>Obviously the answer is “yes”, but I’m still <em>very</em> new to commutative algebra,
and I wasn’t familiar with any theorems in this area.
I knew that there should be some examples in the theory
of <a href="https://en.wikipedia.org/wiki/Monomial_ideal">monomial ideals</a>, but I didn’t know of any concrete examples.
Thankfully, in my first year I got to know
<a href="https://eloisagrifo.github.io/">Eloísa Grifo</a> and <a href="https://sites.google.com/view/alessandracostantini/home">Alessandra Costantini</a>, and even though
they’re both leaving<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">4</a></sup>, they were nice enough to give me some examples!</p>
<p>First, what <em>is</em> a monomial ideal? Well, it’s more or less what it says on the
tin. If we work in a ring $k[x_1, \ldots, x_n]$ of polynomials over a field $k$,
then a <span class="defn">Monomial Ideal</span> is an ideal
$I = (f_1, \ldots, f_m)$ where each $f_i$ is a monomial.</p>
<p>There are lots of algorithms in commutative algebra
(<a href="https://en.wikipedia.org/wiki/Buchberger%27s_algorithm">Buchberger’s algorithm</a> and <a href="https://en.wikipedia.org/wiki/Gr%C3%B6bner_basis">gröbner bases</a> come to mind) which
treat polynomials as formal symbols to be pushed around. So putting
restrictions on what kinds of polynomials we have to work with is a syntactic
condition, which we should expect to give us nice semantic theorems!</p>
<p>Indeed, monomial ideals admit particularly simple <a href="https://en.wikipedia.org/wiki/Primary_decomposition">primary decompositions</a>.
Moreover, there is an <a href="https://math.stackexchange.com/a/628586/655547"><em>extremely</em> simple algorithm</a> to compute the
primary decomposition of a monomial ideal, which starkly contrasts the
difficulty of the general computation<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">5</a></sup>.</p>
<p>Monomial ideals (in particular the squarefree monomial ideals) can be studied
with combinatorial tools using the dictionary of <a href="https://en.wikipedia.org/wiki/Stanley%E2%80%93Reisner_ring">Stanley-Reisner Theory</a>.
This theory associates simplicial complexes to (squarefree) monomial ideals,
and geometric conditions on this complex get turned into syntactic conditions
on the monomials generating the ideal (and also into semantic theorems on the
ideal and the quotient ring).</p>
<p>One deep open problem in commutative algebra is the question</p>
<div class="boxed">
<p>When do the <a href="https://en.wikipedia.org/wiki/Symbolic_power_of_an_ideal">symbolic powers</a> $I^{(n)}$ of an ideal
agree with the ordinary powers $I^n$?</p>
</div>
<p>In the special case of squarefree monomial ideals whose simplicial complex is
a graph, we completely understand this problem! It is a theorem of Gitler,
Valencia, and Villareal that the following are equivalent for the edge ideal
of a (simplicial) graph $G$:</p>
<ol>
<li>$G$ is bipartite</li>
<li>$I_G^{(n)} = I_G^n$</li>
<li>$I_G$ is “packed<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">6</a></sup>”</li>
</ol>
<p>I’m sure there are other syntax/semantics theorems in commutative algebra,
and I would love to hear about them!</p>
<hr />
<h2 id="some-miscellany">Some Miscellany</h2>
<p>There are <em>way</em> more results of this form, and I feel like I could keep
listing them forever. But this blog post is already getting long, and the
thought of revising it is becoming daunting. Plus, I want to include at least
a bit of exposition regarding each example so that people with less familiarity
can still get something out of it.</p>
<p>With that in mind, I really <em>should</em> stop. But I want to show some examples
that don’t quite fit the mold of any of the examples we’ve seen so far.
This miscellany section is here to scratch that itch!</p>
<p>This one is a bit more overtly logical in nature, as it’s a corollary of the
<a href="https://en.wikipedia.org/wiki/O-minimal_theory">$o$-minimality</a> of <a href="https://en.wikipedia.org/wiki/Real_closed_field">real closed fields</a>. But the statement itself
doesn’t require any logical terminology, so I figure it counts:</p>
<div class="boxed">
<p>If $f : \mathbb{R} \to \mathbb{R}$ is a <a href="https://en.wikipedia.org/wiki/Semialgebraic_set">semialgebraic function</a>,
then we can partition $\mathbb{R} = I_1 \cup I_2 \cup \ldots \cup I_n \cup X$,
where $X$ is finite and each $I_k$ is an open interval, so that $f$ is
continuous on each $I_k$.</p>
</div>
<p>Here semialgebraicity says that the graph \(\Gamma_f = \{ (x, fx) \mid x \in \mathbb{R} \}\)
of $f$ can be carved out by polynomial inequalities.
This theorem says that semialgebraicity roughly guarantees continuity everywhere!
This is one manifestation of Brouwer’s folklore result that
(for the right definition of “definable”) all definable functions are continuous.
You can find a discussion of this “theorem” <a href="https://math.stackexchange.com/questions/176279/all-real-functions-are-continuous">here</a>.</p>
<p>Unrelatedly, we have definable combinatorics. The idea here is that by
putting restrictions on how complicated a combinatorial object is allowed to
be, we can get sharper extremal bounds for the complexity of that object.
I don’t know much about this myself, but Terry Tao has talked about it
on his blog (see <a href="https://terrytao.wordpress.com/2020/09/08/zarankiewiczs-problem-for-semilinear-hypergraphs/">here</a>).</p>
<p>Lastly, to end on a logical theorem close to my heart, lots of families of
logical formulas are <span class="defn">decidable</span>. For an easy example,
if you promise to only ask questions with bounded quantifiers, I can always
tell you whether your formula is true or false in $\mathbb{N}$. There are lots
of results which say that every formula satisfying some syntactic condition
is decidable, and so a computer program can tell you whether that formula is
true or false! I talked about this a bit in <a href="/2021/01/23/why-think.html">a talk of mine</a>,
but I’m planning to write up a full blog post on it at some point.</p>
<p>Maybe that’s something to look forward to!</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Like any good example, this is true but far from the whole story.</p>
<p>For instance, free groups
(and free objects more generally) tend to be made out of pure syntax. We
are able to interpret the strings of symbols as themselves having a
(purely formal) group structure. This method of building semantics out of
syntax is a great way to make sure that you get a “universal” object in
some sense. Oftentimes it turns out that something is true of <em>every</em>
model if and only if it’s true of this “syntactic” model, though this takes
some work to formalize.</p>
<p>We can go the other way as well, and make our syntax so rich it forces
the semantics to have some properties. This trick of turning semantics
into syntax is <em>extremely</em> useful throughout model theory, because our
big tool, <a href="https://en.wikipedia.org/wiki/Compactness_theorem">compactness</a> is syntactic. We do this using a gadget called
the <a href="https://en.wikipedia.org/wiki/Diagram_(mathematical_logic)">diagram</a> of a model, and you can see a simple example of this
in <a href="/2020/10/09/model-theory-and-you.html">my talk</a> about model theory. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Let’s assume $S = S^{-1}$ is closed under inverses for simplicity. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Arguably certain homological conditions are “syntactic” too, since we’re putting
restrictions on the number of generators/relators/higher syzygies
(or, more commonly, on how long we need a resolution to be). This feels a
little bit more abstract than the rest of the post, though, so I’m relegating
it to a footnote. The stuff in the body is a little bit more down to earth,
and is also obviously “syntactic” in a way that homological conditions aren’t. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>A fact which is extremely sad for me and the department, though I’ll put
aside my selfishness and wish them the best! Goodness knows they deserve it
^_^. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>See Decker, Greuel, and Pfister’s
<em>Primary Decomposition: Algorithms and Comparisons</em> for more details. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>The significance of the definition of “packed” is not one I’ll pretend to
understand, but at the very least I can recount it for the interested reader.</p>
<p>A (squarefree monomial) ideal is called <span class="defn">König</span>
if it contains a regular sequence of monomials of the same length as its
height.</p>
<p>Now a (squarefree monomial) ideal is called <span class="defn">Packed</span>
if every ideal obtained by setting any of the variables to $0$ or $1$ is
König.</p>
<p>There is a conjecture that for <em>all</em> squarefree monomial ideals,
the symbolic and ordinary powers coincide
if and only if $I$ is packed. This is called the Packing Problem, and
it’s a topic of much interest. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Tue, 07 Sep 2021 00:00:00 +0000
https://grossack.site/2021/09/07/examples-of-sns-theorems.html
https://grossack.site/2021/09/07/examples-of-sns-theorems.html$L^p$ Spaces<p>On to day $2$ of qual prep boogaloo. <s>Yesterday</s> Tuesday<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote" rel="footnote">1</a></sup> was all about measures
and integrable functions. Today, then, let’s talk about spaces of integrable
functions. It’s time for $L^p$ spaces!</p>
<p>Thankfully, this should be a comparatively short post, since $L^p$ spaces
are already pretty concrete, and (at least in my mind) don’t need a ton of
motivation. We want to integrate things, and $L^p$ spaces
(for $p \lt \infty$) collect the functions which we <em>can</em> integrate.</p>
<p>The lower values of $p$, such as $p=1$, put emphasis on the <em>decay</em> of a
function near $\infty$. The benefit we gain is that they place relatively
little emphasis on singularities.</p>
<p>Conversely, large values of $p$, such as $p=\infty$, put little emphasis on
the decay of a function, but are more sensitive to (potentially mild)
singularities.</p>
<p>As an example, think of $\frac{e^{-x}}{\sqrt{x}}$:</p>
<p><img src="/assets/images/Lp-spaces/singular.png" /></p>
<p>This has a singularity at $0$, but is still in $L^1(\mathbb{R})$
because it decays rapidly. This function is <em>not</em> in $L^2(\mathbb{R})$
(and <em>certainly</em> not in $L^\infty$), because its singularity is too sharp.</p>
<p>Conversely, consider the constant $1$ function, shown here for completeness
more than anything else:</p>
<p><img src="/assets/images/Lp-spaces/bounded.png" /></p>
<p>This function doesn’t decay <em>at all</em>, so is certainly not $L^1$.
It turns out this function is <em>only</em> $L^\infty$, which doesn’t care at
all about decay. It only cares about boundedness.</p>
<p>A less silly example might be $\chi_{[1,\infty)} \frac{1}{x}$:</p>
<p><img src="/assets/images/Lp-spaces/bounded2.png" /></p>
<p>This function decays too slowly to be $L^1$, but it <em>does</em> decay quickly
enough to be $L^2$.</p>
<p>The most important cases are $L^1$, $L^2$, and $L^\infty$,
but <a href="https://mathoverflow.net/questions/28147/why-do-we-care-about-lp-spaces-besides-p-1-p-2-and-p-infty">there are reasons</a> to care about other choices of $p$ as well.
Psychologically, they let us interpolate between how much singularity we
want to allow, versus how slowly we want to allow our functions to decay.</p>
<hr />
<p>To start, let’s remind ourselves what $L^p$ spaces even <em>are</em>. We fix a set
$X$ with a measure $\mu$. Then (for $1 \leq p \lt \infty$), we define the
$p$-norm to be</p>
\[\lVert f \rVert_p \triangleq \left ( \int |f|^p \ d \mu \right )^{1/p}\]
<p>As is often the case, we need to treat $L^\infty$ specially. We write<sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote" rel="footnote">2</a></sup></p>
\[\lVert f \rVert_\infty \triangleq \inf \Big \{ M \ \Big | \ |f| \leq M \text{ a.e.} \Big \}\]
<p>Now we can define (for $1 \leq p \leq \infty$)</p>
\[L^p(X, \mu) \triangleq
\bigg \{
f : X \to \mathbb{C} \ \bigg | \lVert f \rVert_p \lt \infty
\bigg \}\]
<p>As usual, we will consider two functions the same if they agree
almost everywhere. We will also often write $L^p(X)$, $L^p(\mu)$, or even
$L^p$ if the measure or space is clear from context.</p>
<p>It turns out $L^p(\mu)$ comes equipped with a <em>ton</em> of structure!
The norm $\lVert f \rVert_p$ really <em>is</em> a norm<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">3</a></sup>,
and this norm makes $L^p(\mu)$ into a <a href="https://en.wikipedia.org/wiki/Banach_space">banach space</a>
(so we can talk about convergence without worry<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">4</a></sup>).</p>
<p>Let’s see some important examples of $L^p$ spaces before we move on:</p>
<ol>
<li>
<p>If $m$ is the lebesgue measure on $\mathbb{R}^n$, then $L^p(\mathbb{R}^n, m)$ is
one of the most common examples used in practice. This forms the basis
for a lot of intuition about $L^p$ spaces.</p>
</li>
<li>
<p>Sometimes $\mathbb{R}^n$ is too big. We might think of working
with $L^p([a,b])$. Common choices are $[0,1]$ and $[0,2\pi]$.</p>
</li>
<li>
<p>Another important example is the circle $S^1$. The space
$L^2(S^1)$ is the archetypal first example for fourier theory<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">5</a></sup>.</p>
</li>
<li>
<p>An example in a different vein is $L^p(\mathbb{N})$ with the counting measure.
This is often called $\ell^p$. Here our functions can be thought of as
sequences of complex numbers, which makes this a particularly appealing
setting for small examples. We’ll find $\ell^2$ extremely useful when we
talk about <a href="https://en.wikipedia.org/wiki/Schwartz_space">hilbert spaces</a>.</p>
</li>
<li>
<p>The counterpoint to example (3) is $L^p(\mathbb{Z})$, again with the
counting measure. These are bi-infinite sequences of complex numbers,
often written $\ell^p(\mathbb{Z})$, and these will be related to functions
on the circle by means of the fourier transform.</p>
</li>
</ol>
<hr />
<p>One of the most important tools we have for proving properties about $L^p$
functions is the wide variety of dense subsets. If we want to prove something
for all $L^p$ functions, we can start out by proving that thing for some
subclass where we have more available tools, then argue by continuity that
the result is actually true for all $L^p$ functions.</p>
<p>These sets of functions are dense in $L^p(\mathbb{R}^n)$ and $L^p([0,1])$,
where $p \neq \infty$. More generally, they are true in $L^p(X,\mu)$
for radon measures on locally compact hausdorff spaces. See Rudin’s
<em>Real and Complex Analysis</em> for more details.</p>
<ul>
<li>
<p>Simple Functions (with finite-measure support).
That is, functions of the form \(\sum_{k=1}^n c_k \chi_{E_k}\).
These are dense almost by definition<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">6</a></sup> and form the base for proving other,
more exiting density results. After all, to show a class of functions is dense
in $L^p$, it suffices to show it can approximate the simple functions.
If we allow infinite measure support, then these functions are dense in
$L^\infty$ as well.</p>
</li>
<li>
<p>Continuous Functions (with compact support).
Obviously these are great. We don’t always need the
extra power of compact support, but it’s good to know it’s there!</p>
</li>
</ul>
<p>These families of dense functions require a notion of differentiation.
They’re definitely true in $\mathbb{R}^n$ and $[0,1]$, and I suspect they’re
true over any manifold. I don’t have a reference on hand, though.</p>
<ul>
<li>
<p>Smooth Functions (with compact support).
We love differentiation in this household.</p>
</li>
<li>
<p><a href="https://en.wikipedia.org/wiki/Schwartz_space">Schwarz Functions</a>.
As long as we’re discussing things that only work in $\mathbb{R}^n$,
these are functions which are not only smooth, but which vanish more quickly
than any polynomial, and all their derivatives vanish that quickly too.
We’ll talk more about these in the post about fourier theory.</p>
</li>
</ul>
<p>These families of functions also require some compactness condition
(because we’re using the <a href="https://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theorem">stone-weierstrass theorem</a>).</p>
<ul>
<li>
<p>Polynomials.
On $[0,1]$, we can approximate by polynomials</p>
</li>
<li>
<p>Polynomials in $e^{ix}$.
On $S^1$, we can approximate by “trigonometric polynomials”. This will be
extremely important when we start talking about fourier theory.</p>
</li>
</ul>
<p>There’s actually a kind of master theorem for density results in $L^p$
spaces ($p \neq \infty$), which takes some basic properties of a class of
functions and tells you it’s dense. There are also generalizations of the
theorems above to much more broad classes of measure spaces. For each of these
results, see <a href="http://www.math.ucsd.edu/~bdriver/240A-C-03-04/Lecture_Notes/Older-Versions/chap22.pdf">this</a> pdf from Bruce Driver’s UCSD lecture notes.</p>
<p>Before we move on, these density results are one potential source of
motivation for $L^1$.
We’re obviously interested in continuous functions, and if we want to
integrate things, the $L^1$ norm seems like a fairly reasonable metric to
look at. The issue is that $L^1$ limits of continuous functions don’t need
to be continuous! That is, $CX$ is <em>not</em> complete with respect to the
$L^1$ norm. We like to know we can take limits with reckless abandon<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">7</a></sup>,
so we should take its completion. When we do this, we get exactly the
$L^1$ functions.</p>
<p>This is one reason that $L^\infty$ behaves “pathologically” compared to
the other $L^p$ spaces. When you take the completion of continuous functions
with compact support in the $p$-norm for $p \lt \infty$, get get exactly $L^p$.
<em>However</em>, when you take the completion in the $\infty$-norm, you get
the functions <a href="https://en.wikipedia.org/wiki/Vanish_at_infinity">vanishing at $\infty$</a>. You might try to solve this by
looking at the bounded continuous functions, rather than those of compact
support<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">8</a></sup>, but these <em>already</em> form a banach space! It seems that,
unavoidably, $L^\infty$ has a bunch of extra stuff in it, and it’s this
extra stuff that is responsible for many of the differences between
$L^\infty$ and the other $L^p$ spaces.</p>
<hr />
<p>Perhaps the most important inequality associated with $L^p$ spaces is
<a href="https://en.wikipedia.org/wiki/H%C3%B6lder%27s_inequality">Hölder’s inequality</a>, which generalizes the <a href="https://en.wikipedia.org/wiki/Cauchy%E2%80%93Schwarz_inequality">Cauchy-Schwarz</a> inequality.</p>
<div class="boxed">
<p>If $\frac{1}{p} + \frac{1}{q} = 1$, then</p>
\[\lVert fg \rVert_1 \leq \lVert f \rVert_p \lVert g \rVert_q.\]
<p>This includes the formal case where $\frac{1}{1} + \frac{1}{\infty} = 1$:</p>
\[\lVert fg \rVert_1 \leq \lVert f \rVert_1 \lVert g \rVert_\infty.\]
<p>This condition on $p$ and $q$ shows up frequently, and we call $q$ the
<span class="defn">conjugate</span> of $p$ for convenience.</p>
</div>
<p>Here we might think of $g$ as defining a functional $\phi_g$, where
$\phi_g f \triangleq \int f g$. Then Hölder’s inequality says that
(with the operator norm) $\lVert \phi_g \rVert \leq \lVert g \rVert_q$.</p>
<p>From this, one might ask if we actually have equality above. If you’re feeling
particularly optimistic, you might even wonder if we can characterize which
functionals in $(L^p)^*$ arise as $\phi_g$ for some $g \in L^q$.</p>
<p>Now for the magic part:</p>
<div class="boxed">
<p>For $p \neq 1, \infty$ the dual space
\((L^p)^* \triangleq \{ T : L^p \to \mathbb{C} \mid T \text{ is continuous and linear} \}\)
has an explicit characterization:</p>
\[(L^p)^* \cong L^q\]
<p>where $q$ is the conjugate of $p$.</p>
<p>When $\mu$ is $\sigma$-finite<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">9</a></sup>, we moreover have $(L^1)^* \cong L^\infty$.
Unfortunately, though, $(L^\infty)^*$ is almost never $L^1$
(at least, assuming the axiom of choice<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">10</a></sup>).</p>
</div>
<p>The proof of this fact actually goes through complex valued measures!
If $\phi$ is a functional on $L^p$, then it’s not hard to see that
$\nu E \triangleq \phi \chi_E$ is a (complex valued) measure! Moreover,
if $E$ is $\mu$-null, then $\chi_E = 0$ a.e., and so
$\nu E = \phi \chi_E = 0$. This means $\nu \ll \mu$, and so by radon-nikodym
$\nu = \mu_g$ for some function $g$. Now some simple bookkeeping is all we
need to do to show that $g \in L^q$ and $\phi f = \int fg$.</p>
<p>Notice this means that for $1 \lt p \lt \infty$ we have</p>
\[(L^p)^{**} = (L^q)^* = L^p\]
<p>so $L^p$ is <a href="https://en.wikipedia.org/wiki/Reflexive_space">reflexive</a>. It’s still not entirely obvious to me why
reflexive spaces are as interesting as they are. It’s obviously a natural
question (and a fairly algebraic one<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">11</a></sup>), and it gives you quite a bit of
bonus information (see <a href="https://math.stackexchange.com/questions/887189/properties-of-reflexive-banach-spaces">here</a>, for instance). I suspect that I’ll come to
appreciate it more as I do more functional analysis<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">12</a></sup>. At the very least,
it’s nice to have a full characterization of the dual space, and it’s
satisfying to get back where you started when you dualize twice.</p>
<hr />
<p>Frequently one has a function $f \in L^p$, which we would <em>like</em> to know
is in $L^q$. More generally, we would like to relate functions in $L^p$
to functions in $L^q$ with $q$ on either side of $p$.</p>
<p>For instance, when doing fourier analysis, our hilbert space techniques only
work for $L^2(S^1)$ functions, which get mapped by the fourier transform
isometrically onto $\ell^2(\mathbb Z)$. However, we can show that
$\ell^1(\mathbb Z) \subseteq \ell^2(\mathbb Z)$, and thus we see that if
$f$ is $L^1(S^1)$ and $\hat{f}$ is $\ell^1(\mathbb Z)$, then the results
(which we’ll talk about soon) still hold.</p>
<p>With that said, here are the main theorems for useful inclusions:</p>
<div class="boxed">
<p>If $X$ has finite measure, then <em>going down</em> is allowed.
That is, if $q \geq p$, then every $L^q$ function is automatically $L^p$.</p>
<p>Formally, if $q \geq p$, then $L^q \subseteq L^p$.</p>
<p>(As an easy exercise, you should prove this)</p>
</div>
<div class="boxed">
<p>If $X$ is “discrete” in the sense that it does not have sets of arbitrarily
small measure, then <em>going up</em> is allowed.
That is, if $p \leq q$, then every $L^p$ function is automatically $L^q$.</p>
<p>Formally, if $p \leq q$, then $L^p \subseteq L^q$.</p>
<p>(Again, this is a fairly easy exercise)</p>
</div>
<p>In fact, both of the above statements are equivalences.
So going down is allowed <em>if and only if</em> $X$ has finite measure,
and going up is allowed <em>if and only if</em> $X$ has subsets of minimal measure<sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote" rel="footnote">13</a></sup>.</p>
<p>Notice how these line up with the singularity/rapid decay tradeoff we
mentioned earlier. If $X$ has finite measure, then
(intuitively) there’s no “infinity” to worry about decaying towards.
So the decay penalization is irrelevant, and we get an inclusion of
“lower singularity” functions into “higher singularity” functions with
no further qualifications<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote" rel="footnote">14</a></sup>.</p>
<p>Conversely, if there are sets of minimal measure, then (intuitively)
<em>any</em> singularity on one of your atoms can’t be avoided, and we get the
inclusion of “rapidly decaying” functions into “slowly decaying” functions
with no further qualifications<sup id="fnref:16" role="doc-noteref"><a href="#fn:16" class="footnote" rel="footnote">15</a></sup>.</p>
<p>Of course, once we know that, say $L^2([0,1]) \subseteq L^1([0,1])$, it’s natural
to ask “how many $L^1$ functions are actually $L^2$?”.
The answer is “almost none”!</p>
<p>It turns out that $L^2([0,1])$ is <a href="https://en.wikipedia.org/wiki/Meagre_set">meagre</a> in $L^1([0,1])$. This is a
topological notion of “smallness” which is roughly akin to being a nullset<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">16</a></sup>.
This is a cute application of the <a href="https://en.wikipedia.org/wiki/Baire_category_theorem">baire category theorem</a>, which is
the topic of an upcoming blog post.</p>
<p>We also have <em>interpolation</em> functions which, if $p \lt q \lt r$,
lets us relate $L^q$ functions to those on either side.</p>
<div class="boxed">
<p>If $p \lt q \lt r \lt \infty$, then $L^q \subseteq L^p + L^r$.</p>
<p>That is, every $L^q$ function can be written as a sum of an $L^p$ function
and an $L^r$ function.</p>
</div>
<div class="boxed">
<p>If $p \lt q \lt r \lt \infty$, then $L^p \cap L^r \subseteq L^q$.</p>
<p>That is, the $L^p$ classes are “connected”. If you’re $L^p$ and you’re
$L^r$, then you’re automatically everywhere in between as well.</p>
<p>In fact, we have the bound</p>
\[\lVert f \rVert_q \leq \lVert f \rVert_p^\lambda \lVert f \rVert_r^{(1 - \lambda)}\]
<p>for $\lambda = \frac{q^{-1} - r^{-1}}{p^{-1} - r^{-1}}$.</p>
</div>
<hr />
<p>Alright, that was another huge information dump! This post felt less like a
motivated tour through $L^p$ functions and more like a copy of a study guide,
haha. I think that’s reasonable, though, since in my experience $L^p$ functions
are pretty motivated by themselves. I’m mainly writing this so that when I
talk about Banach spaces soon we’ll have all these examples to draw from.
Plus, selfishly, it <em>was</em> nice to organize all this information while
planning out this post.</p>
<p>I’ve learned from my mistakes, and I’m not going to try and give a concrete
date for the next post. All I’ll say is: See you soon ^_^</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:13" role="doc-endnote">
<p>I’ve been busier than expected… So much for one post a day, haha. <a href="#fnref:13" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:14" role="doc-endnote">
<p>The notation comes from a (non-obvious) theorem that</p>
\[\lVert f \rVert_\infty = \lim_{p \to \infty} \lVert f \rVert_p.\]
<p>See <a href="https://math.stackexchange.com/questions/242779/limit-of-lp-norm">here</a> for a proof. <a href="#fnref:14" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:1" role="doc-endnote">
<p>This is one pressing reason to quotient by almost-everywhere equivalence.
Do you see why $\lVert f \rVert_p$ would <em>not</em> be a norm if we didn’t
identify equivalent functions?</p>
<p>One reasonable question might be whether we can always find a canonical
representative for a given equivalence class modulo nullsets. That is,
given a “function” in $L^p$ (which is really an equivalence class of functions)
can we select an honest-to-goodness function from each class? Obviously
the axiom of choice trivializes this, but we would like to do so in a way
that’s actually implementable.</p>
<p>This is an extremely interesting question, and is the fundamental question
in <a href="https://en.wikipedia.org/wiki/Lifting_theory">lifting theory</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Don’t worry – we’ll be talking a <em>lot</em> about banach spaces in an upcoming
post. But $L^p$ spaces are very fundamental examples, so it’s worth talking
about them first. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>We’ll talk about Fourier theory in an upcoming post too! <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Recall we <em>define</em> the lebesgue integral of $f$
to be the limit of the integrals of simple functions below $f$. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>This is in line with an apocryphal quote of Grothendieck
(which is probably <a href="https://mathoverflow.net/questions/133020/what-is-the-source-of-this-famous-grothendieck-quote">not actually his</a>) that it’s better to have
<em>good</em> categories with <em>bad</em> objects, than <em>bad</em> categories with <em>good</em>
objects.</p>
<p>That is, we want our categories to have all limits, colimits, exponentials, etc.
even if it means introducing “pathological” objects. It’s easier to know you
can take limits and worry about what the result looks like later than it is
to constantly worry about whether you can take the limit at all.</p>
<p>Similarly, from an analytic lens, it’s better to have a <em>complete</em> space
(which admits potentially ugly functions) than it is to have an
<em>incomplete</em> space in which every function is nice. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p>You need boundedness in order to guarantee that the sup norm
$\lVert \cdot \rVert_\infty$ is finitely valued. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>Actually, we can get by with slightly less.</p>
<p>We always have a natural map $L^\infty \to (L^1)^*$ which sends
$g$ to the functional $f \mapsto \int fg$.</p>
<p>It turns out this map is an isometric embedding if and only if $\mu$
is semifinite. That is, when any infinite measure set contains a set of
finite measure. This is obviously a <em>super</em> mild assumption.</p>
<p>Moreover, this map is <em>surjective</em> (and thus an isometry) when $\mu$
is “localizable”. See <a href="https://math.stackexchange.com/questions/405357/when-exactly-is-the-dual-of-l1-isomorphic-to-l-infty-via-the-natural-map">here</a>, for instance. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>$L^1$ embeds isometrically inside $(L^\infty)^*$, but in $\mathsf{ZFC}$
this is <em>not</em> surjective.</p>
<p>To build a functional on $L^\infty$ that doesn’t come from some $L^1$
function, we look at the subspace of continuous functions, and
consider evaluation at a point. We can extend this functional to the
whole space via the <a href="https://en.wikipedia.org/wiki/Hahn%E2%80%93Banach_theorem">Hahn-Banach Theorem</a>, and it’s pretty quick to
see that this can’t come from an $L^1$ function
(since a dirac delta function doesn’t <em>really</em> exist).</p>
<p>We’ll talk more about the Hahn-Banach theorem in a future post, but
importantly it relies on the axiom of choice! There are actually models
of set theory where Hahn-Banach fails, and which think that
$L^1$ really <em>is</em> $(L^\infty)^*$! See Martin Väth’s
<a href="https://www.sciencedirect.com/science/article/pii/S0019357798800396?via%3Dihub"><em>The Dual Space of $L^\infty$ is $L^1$</em></a>, as well as the excellent
discussion <a href="https://math.stackexchange.com/questions/103476/ell1-vs-continuous-dual-of-ell-infty-in-zfad">here</a> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>The question of which $R$-modules are reflexive has also been asked.
It’s a more subtle question, but finitely generated projective modules
always have this property.</p>
<p>As an aside that I can’t help but mention, apparently there is a banach
space which is isomorphic to its double dual, but for which the canonical
evaluation map is <em>not</em> the isomorphism! It’s called <a href="https://en.wikipedia.org/wiki/James%27_space">James’s Space</a>,
for the interested. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>For instance, an integral that takes values in a banach space might not
satisfy the radon-nikodym theorem. If your banach space is reflexive, though,
then it <em>does</em>. This came from the linked mse question, so I might be
misquoting the result or dropping some hypotheses, but this already makes
the notion of reflexivity seem more interesting. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:15" role="doc-endnote">
<p>For proofs of these exercises, and indeed proofs of the stronger equivalences,
see <a href="https://math.stackexchange.com/questions/66029/lp-and-lq-space-inclusion">here</a> <a href="#fnref:15" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:12" role="doc-endnote">
<p>This may require some meditation.</p>
<p>The idea is that we don’t care if a function is singular as long as the
infinity can be contained to a set of small enough measure.
But if there <em>are</em> no sets of “small enough” measure, then <em>any</em> singularity
is bad. <a href="#fnref:12" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:16" role="doc-endnote">
<p>In the motivational section before this, we mention that
$\ell^1(\mathbb Z) \subseteq \ell^2(\mathbb Z)$. Now we see why: $\mathbb{Z}$
has no (nonnull) sets of measure $\lt 1$. <a href="#fnref:16" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>In the sense that meagre sets and nullsets both form <a href="https://en.wikipedia.org/wiki/Sigma-ideal">$\sigma$-ideals</a>.
Note, however, that they are <em>very</em> different notions of “small”, and
a set which is small in one sense need not be small in the other.</p>
<p>For instance, <a href="https://en.wikipedia.org/wiki/Smith%E2%80%93Volterra%E2%80%93Cantor_set">fat cantor sets</a> can have arbitrarily large measure
(in particular, they are <em>not</em> null) yet they are always meagre. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Thu, 02 Sep 2021 00:00:00 +0000
https://grossack.site/2021/09/02/Lp-spaces.html
https://grossack.site/2021/09/02/Lp-spaces.htmlMeasure Theory and Differentiation (Part 2)<p>This post has been sitting in my drafts since Feb 22, and has been mostly
done for a <em>long</em> time. But, with my upcoming analysis qual, I’ve finally
been spurred into finishing it. My plan is to put up a new blog post every
day this week, each going through some aspect of the analysis that’s going to
be on the qual. Selfishly, this will be great for my own preparation
(I definitely learn through teaching) but hopefully this will also help future
students who want to see a motivated treatment of the standard
analysis curriculum.</p>
<p>The first half of this post is available <a href="/2021/02/21/lebesgue-ftc-1.html">here</a>,
as well as at the measure theory and differentiation
<a href="/tags/measure-theory-and-differentiation">tag</a>. I’m also going to make a new
<a href="/tags/analysis-qual-prep">tag</a> for this series of analysis qual prep posts,
and I’ll retroactively add part 1 to that tag.</p>
<p>With that out of the way, let’s get to the content!</p>
<hr />
<p>In <a href="/2021/02/21/lebesgue-ftc-1.html">part 1</a> we talked about two ways of
associating (regular, borel) measures to functions on $\mathbb{R}$:</p>
<ul>
<li>
<p>To an increasing, right continuous $F$ we associate the measure $\mu_F$
defined by $\mu_F((a,b]) \triangleq F(b) - F(a)$. In the special case where
$F$ is the identity function, we get Lebesgue Measure $m$ from this construction.</p>
</li>
<li>
<p>To a positive, locally $L^1$ function $f$ we associate the measure $m_f$
defined by $m_f(E) \triangleq \int_E f\ dm$.</p>
</li>
</ul>
<p>Perhaps surprisingly, we can go the <em>other</em> way too!</p>
<ul>
<li>
<p>Given a measure $\lambda$, we can define an increasing, right continuous
function $F_\lambda$ so that $\mu_{F_\lambda} = \lambda$</p>
</li>
<li>
<p>Given a measure $\lambda \ll m$, we can find a function $f_\lambda$ so that $m_{f_\lambda} = \lambda$</p>
</li>
</ul>
<p>These facts together give us a correspondence</p>
<div class="boxed">
\[\bigg \{ \text{increasing, right-continuous functions $F$} \bigg \}
\longleftrightarrow
\bigg \{ \text{regular borel measures $\mu_F$} \bigg \}\]
<p>\(\bigg \{ \text{positive locally $L^1$ functions $f$} \bigg \}
\longleftrightarrow
\bigg \{ \text{regular borel measures $m_f \ll m$} \bigg \}\)</p>
</div>
<p>You should think of the increasing, right-continuous functions $F$ as being
antiderivatives of the positive locally $L^1$ functions $f$,
and theorems like the <a href="https://en.wikipedia.org/wiki/Lebesgue_differentiation_theorem">Lebesgue Differentiation Theorem</a> link
the (measure theoretic) <a href="https://en.wikipedia.org/wiki/Radon%E2%80%93Nikodym_theorem#Radon%E2%80%93Nikodym_derivative">Radon-Nikodym Derivative</a>
of $\mu_F$ with the classical derivative of $F$.</p>
<div class="boxed">
<p>As an exercise to recap what we did in the last post, prove that
every monotone function $F : \mathbb{R} \to \mathbb{R}$ is
differentiable almost everywhere<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
</div>
<p>This result <em>can</em> be proven without the machinery of measure theory
(see, for instance, Botsko’s
<em>An Elementary Proof of Lebesgue’s Differentiation Theorem</em>),
but the proof is much more delicate, and certainly less conceptually obvious.
Also, <em>some</em> sort of machinery seems to be required. See <a href="https://math.stackexchange.com/q/1523829/655547">here</a>, for instance.</p>
<p>This should feel somewhat restrictive, though. There’s more to life than
increasing, right continuous functions, and it would be a shame if all this
machinery were limited to functions of <em>such</em> a specific form. Can we push
these techniques further, and ideally get something that works for a large
class of functions? Moreover, can we <em>use</em> these techniques to prove
interesting theorems about this class of functions?
Obviously I wouldn’t be writing this post if the answer
were “no”, so let’s see how to proceed!</p>
<hr />
<p>Differentiation is a nice motivation, but integration
is theoretically much simpler. We can’t expect to be able to differentiate
most functions, but it is reasonable to want to integrate them. With this
in mind, rather than trying to guess the class of functions we’ll be able
to differentiate, let’s try to guess the class of functions we’ll be able
to <em>integrate</em>. Then we can work backwards to figure out what we can differentiate.</p>
<p>Previously we were restricting ourselves to positive locally $L^1$ functions.
Since we want to meaningfully integrate our new class, it seems
unwise to try and lift the $L^1$ condition. Positivity, however,
seems like a natural thing to drop. Let’s be optimistic and see what happens if
we work with <em>all</em> (complex valued) $L^1$ functions!</p>
<p>The correspondence says to take $f$ and send it to the measure
$m_f(E) \triangleq \int_E f \ dm$. Of course, now that $f$ is complex valued,
this integral might take complex values as well! To that end, let’s introduce
the idea of <a href="https://en.wikipedia.org/wiki/Complex_measure">Complex Valued Measures</a>
and see how much of measure theory we’re able to recover.</p>
<p>If we meditate on what properties $m_f$ will have, we land on the following
definition<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">2</a></sup>:</p>
<div class="boxed">
<p>A <span class="defn">Complex Measure</span> on a $\sigma$-algebra $\mathcal{A}$
is a function $\nu : \mathcal{A} \to \mathbb{C}$ so that</p>
<ol>
<li>$\nu \ \emptyset = 0$</li>
<li>$\nu \left ( \bigcup E_n \right ) = \sum \nu E_n$ for any disjoint $E_n$.
Importantly, this sum automatically converges absolutely<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote" rel="footnote">3</a></sup>.</li>
</ol>
<p>Notice $\nu E$ is never allowed to be $\infty$! This is an important difference
between complex and positive measures<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote" rel="footnote">4</a></sup>.</p>
</div>
<p>⚠ Be careful! Now that our measures allow nonpositive values, we might
“accidentally” have $\nu E = 0$. If $E$ is the disjoint union of
$E_+$ and $E_-$, where $\nu E_+ = 3$ and $\nu E_- = -3$ (say), then
$\nu E = 0$, even though we really shouldn’t call it a nullset!</p>
<p>Because of this, we redefine the notion of nullset to be more restrictive:
We say $E$ is <span class="defn">$\nu$-null</span> if and only if
$\nu A = 0$ for every $A \subseteq E$.</p>
<div class="boxed">
<p>As an exercise, can you come up with a concrete signed measure $\nu$ for
which $\nu E = 0$ even though $E$ is <em>not</em> null?</p>
<p>As another exercise, why does this agree with our original definition of
nullsets when we restrict to positive measures?</p>
</div>
<p>Now, we <em>could</em> try to build measure theory entirely from scratch in this
setting. But it seems like a waste, since we’ve already done so much measure
theory already… It would be nice if there were a way to relate signed measures
to ordinary (unsigned) measures and leverage our previous results!</p>
<p>We know that $m_{f+g} = m_f + m_g$ in the unsigned case. So in the complex
case, it’s natural to try and get this linearity to go further! But we know
we can write any complex function $f : X \to \mathbb{C}$ as a linear combination
of $4$ positive functions, by breaking up into real and imaginary parts,
then positive and negative parts:</p>
\[f = (f_R^+ - f_R^-) + i (f_I^+ - f_I^-)\]
<p>So we should expect</p>
\[m_f = m_{f_R^+} - m_{f_R^-} + i (m_{f_I^+} - m_{f_I^-})\]
<p>and the <a href="https://en.wikipedia.org/wiki/Hahn_decomposition_theorem#Jordan_measure_decomposition">Jordan Decomposition Theorem</a> says<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup> that we can decompose
every complex measure $\nu$ in exactly this way!</p>
<p>Formally, it says that every complex measure $\nu$ decomposes uniquely into
a linear combination of finite positive measures
$\nu = (\nu_R^+ - \nu_R^-) + i (\nu_I^+ - \nu_I^-)$ with the bonus property
that $\nu_R^+ \perp \nu_R^-$ and $\nu_I^+ \perp \nu_I^-$. Here, as usual,
$\perp$ means that two measures are <a href="https://en.wikipedia.org/wiki/Singular_measure">mutually singular</a>, which we should
intuitively think of as having disjoint support.</p>
<p>It can still be nice to work with an unsigned measure directly sometimes,
rather than having to split our measure into $4$ parts. Thankfully
we have a convenient way of doing so!</p>
<p>There is a positive measure $|\nu|$, called the
<a href="https://en.wikipedia.org/wiki/Total_variation#Total_variation_norm_of_complex_measures">Total Variation</a> of $\nu$, which is defined so that</p>
\[|m_f| = m_{|f|}.\]
<p>This possesses all the amenities the notation suggests, including:</p>
<div class="boxed">
<ol>
<li>(Triangle Inequality) $\lvert \nu + \mu \rvert \leq \lvert \nu \rvert + \lvert \mu \rvert$</li>
<li>(Operator Inequality) $\lvert \nu E \rvert \leq \lvert \nu \rvert E$</li>
<li>(Continuity) $\nu \ll \lvert \nu \rvert$</li>
</ol>
<p>In fact, the collection of complex measures on $X$ assembles into a
Banach Space under the norm $\lVert \nu \rVert \triangleq \lvert \nu \rvert X$.</p>
</div>
<hr />
<p>Ok. This has been a lot of information. How do we actually <em>compute</em>
with a complex measure? Thankfully, the answer is easy: We use the
Jordan Decomposition. We <em>define</em></p>
\[\int f \ d\nu \triangleq
\left ( \int f \ d\nu^+_R - \int f \ d\nu^-_R \right )
+ i \left ( \int f \ d\nu^+_I - \int f \ d\nu^-_I \right ).\]
<p>In particular, in order to make sense of this integral, we need to know
that $f$ is in $L^1$ for each of these measures. So again, we just <em>define</em></p>
\[L^1(\nu) \triangleq L^1(\nu^+_R) \cap L^1(\nu^-_R) \cap L^1(\nu^+_I) \cap L^1(\nu^-_I).\]
<div class="boxed">
<p>As an easy exercise, show that the dominated convergence theorem is true
when we’re integrating against $\nu$!</p>
</div>
<p>We <em>can</em> split up $\nu$ if we need to, but oftentimes we don’t. Remember
that if $\nu = m_f$ (which is the whole reason we embarked on this journey!)
we should have $\int g \ d\nu = \int gf \ dm$.</p>
<div class="boxed">
<p>Show using the definition of $\int g \ d\nu$ that we gave that
$\int g \ dm_f = \int gf \ dm$ actually holds.</p>
</div>
<p>So, as a quick example computation:</p>
\[\int_0^{2\pi} x^2 \ dm_{e^{ix}} = \int_0^{2\pi} x^2 e^{ix} dm = 4 \pi - 4 I \pi^2\]
<p>Notice, as usual, that once we’ve phrased the integral in terms of $dm$,
we can simply use integration by parts, or any other tricks we know
(such as asking <a href="https://sagemath.org">sage</a>) to compute the integral.</p>
<hr />
<p>Now, that example might have felt overly simplistic. After all, it was
mainly a matter of moving the $e^{ix}$ from downstairs below the $m$ to
upstairs inside the integral. What if we needed to integrate against a more
complicated complex measure? Thankfully, up to singular measures,
<em>every</em> measure is of this simple form!</p>
<p>Remember last time, we had a structure theorem that told us <em>every</em> measure is
of the form $m_f$, possibly plus a “singular” part $\lambda$. Moreover, the function
$f$ so that $\nu = m_f + \lambda$ was the “derivative” of $\nu$, and this
led us to the fruitful connection between measure theoretic and classical
derivatives. Thankfully, the same theorem is still true in the complex
setting!</p>
<div class="boxed">
<p><span class="defn">Lebesgue-Radon-Nikodym Theorem</span></p>
<p>If $\nu$ is a $\sigma$-finite signed measure and $\mu$ is a
$\sigma$-finite <em>positive</em> measure<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>, then $\nu$ decomposes uniquely as</p>
\[\nu = \lambda + \mu_f\]
<p>for a $\sigma$-finite measure $\lambda \perp \mu$ and
$\mu_f(E) \triangleq \int_E f\ d\mu$.</p>
<p>As in the unsigned case, we write $f = \frac{d\nu}{d\mu}$.</p>
</div>
<p>I realized while writing this post that last time I forgot to mention an
important aspect of the Radon-Nikodym derivative! It satisfies the obvious
laws you would expect a “derivative” to satisfy<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">7</a></sup>. For instance:</p>
<div class="boxed">
<ul>
<li>(Linearity) $\frac{d(a\nu_1 + b\nu_2)}{d\mu} = a\frac{d\nu_1}{d\mu} + b\frac{d\nu_2}{d\mu}$</li>
<li>(Chain Rule) If $\nu \ll \mu \ll \lambda$, then
$\frac{d \nu}{d\lambda} = \frac{d \nu}{d\mu} \frac{d\mu}{d\lambda}$</li>
</ul>
<p>It’s worth proving each of these. The second is harder than the first, but
it’s not too bad.</p>
</div>
<p>At last, we have a complex measure theoretic notion of “derivative”,
as well as half of the correspondence we’re trying to generalize:</p>
<p>Given an $L^1$ function $f : \mathbb{R} \to \mathbb{C}$
we can build a (complex) measure $m_f$, and given a complex measure
$m_f \ll m$, we can recover $f$ as the derivative $\frac{d m_f}{dm}$.</p>
<p>But which functions will generalize the increasing right continuous ones?
The answer is Functions of <a href="https://en.wikipedia.org/wiki/Bounded_variation">Bounded Variation</a>!</p>
<hr />
<p>To see why bounded variation functions are the right things to look at,
let’s remember how the correspondence went in the unsigned case:
We took an unsigned measure $\mu$ and looked at (up to sign of $x$)
the (increasing, right continuous) function $F_\mu(x) = \mu \left ( (0,x] \right )$.</p>
<p>Now, for a <em>complex</em> measure $\nu$, we know we can write it as a combination
of unsigned functions, $\nu = (\nu_R^+ - \nu_R^-) + i (\nu_I^+ - \nu_I^-)$.
So what would happen if we just… did our old construction to each of these
individually, then put them back together?</p>
<p>Then if we look at the real part, we would be looking at functions
$F_{\nu_R^+} - F_{\nu_R^-}$, where each $F_{\nu_R^\pm}$ is increasing and
right continuous. Moreover, since complex measures are finite, we know that
each of these functions is <em>bounded</em> as well.
We could define bounded variation functions as exactly
this class! That is, $f$ is bounded variation if and only if its real and
imaginary parts are both a difference of bounded increasing functions!</p>
<p>Of course, nothing in life is so simple, and for what I assume are historical
reasons, this is not the definition you’re likely to see
(despite it being equivalent).</p>
<p>The more common definition of bounded variation is slightly technical,
and is best looked up in a reference like Folland. The idea, though, is
that $F$’s ability to wiggle should be bounded – whence the name<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">8</a></sup>.</p>
<p>For instance, $F(x) = x^2\sin \left ( \frac{1}{x} \right )$ is bounded variation,
while $G(x) = x^2 \sin \left ( \frac{1}{x^2} \right )$ is <em>not</em>. A picture
is worth a thousand words, and indeed $F$ has a maximum rate of wiggle:</p>
<p><img src="/assets/images/lebesgue-ftc-2/F.png" /></p>
<p>Even though $F$ wiggles faster and faster as $x \to 0$, the vertical distance
it travels gets small fast enough to compensate. Contrast that with $G$:</p>
<p><img src="/assets/images/lebesgue-ftc-2/G.png" /></p>
<p>whose wiggle-density is obviously less controlled<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">9</a></sup>.</p>
<p>We also get some results that will be familiar from the last post.
These are akin to the properties of monotone functions that relate them
to increasing right continuous functions, and thus measures.</p>
<div class="boxed">
<ol>
<li>If $F$ is bounded variation, then it has at most countably many discontinuities</li>
<li>If $F$ is bounded variation, then it can be made right continuous by
looking at $F^+(x) \triangleq \lim_{y \to x} F(y)$. By (1) these are
equal almost everywhere.</li>
</ol>
</div>
<p>Lastly, if $F$ is bounded variation, then
$\displaystyle \lim_{x \to - \infty} F(x)$ exists
and is finite. We say $F$ is <span class="defn">Normalized</span> if
$\displaystyle \lim_{x \to -\infty} F(x) = 0$.
We can always normalize $F$ by replacing it
with $\displaystyle F^N = F - \lim_{x \to -\infty} F(x)$.
Notice $F$ and $F^N$ have the same derivative (since they differ by a constant).</p>
<p>This brings us to our punchline!</p>
<div class="boxed">
<p>If $\nu$ is a complex borel measure on $\mathbb{R}$, then</p>
\[F_\nu(x) \triangleq \nu \left ( (-\infty, x] \right )\]
<p>is normalized bounded variation.</p>
<p>Conversely, if $F$ is normalized bounded variation, then there exists a
unique complex borel measure $\nu_F$ so that $F = F_{\nu_F}$.</p>
</div>
<p>So we have correspondences:</p>
<div class="boxed">
\[\bigg \{ \text{normalized bounded variation functions $F$} \bigg \}
\longleftrightarrow
\bigg \{ \text{regular complex borel measures $\nu_F$} \bigg \}\]
<p>\(\bigg \{ \text{$L^1$ functions $f$} \bigg \}
\longleftrightarrow
\bigg \{ \text{regular complex borel measures $m_f \ll m$} \bigg \}\)</p>
</div>
<p>Here, $F_{m_f}$ is the antiderivative of $f$, and each $F$ is differentiable
with $F’ = \frac{d \nu_F}{dm}$ almost everywhere. Moreover, $F’$ is $L^1$,
and if $\nu_F \ll m$ we have $F(x) = \int_{-\infty}^x F’$.</p>
<p>In fact, the class of functions $F$ so that $\nu_F \ll m$ is the largest
class of functions making the fundamental theorem of calculus true<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>:</p>
<div class="boxed">
<p><span class="defn">Lebesgue Fundamental Theorem of Calculus</span></p>
<p>The Following Are Equivalent for a function $F : [a,b] \to \mathbb{C}$:</p>
<ol>
<li>$F$ is differentiable almost everywhere on $[a,b]$, $F’$ is in $L^1([a,b])$,
and $F(x) - F(a) = \int_a^x F’ \ dm$</li>
<li>$F(x) - F(a) = \int_a^x f \ dm$ for some $f \in L^1([a,b])$</li>
<li>$F$ is bounded variation and $\nu_F \ll m$ on $[a,b]$.</li>
</ol>
</div>
<div class="boxed">
<p>As one last exercise for the road, you should use this machinery to prove
<a href="https://en.wikipedia.org/wiki/Rademacher%27s_theorem">Rademacher’s Theorem</a>:</p>
<p>If $F : \mathbb{R} \to \mathbb{C}$ is locally lipschitz,
then $F$ is differentiable almost everywhere<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup>.</p>
</div>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>The idea is to show that $F$ is basically increasing and right continuous,
then apply the results from the part $1$ of this blog post. We can get
increasing-ness by possibly replacing $F$ by $-F$. We can get right
continuity by replacing $F$ with $F^+(x) = \lim_{y \to x^+} F(y)$,
and checking that $F = F^+$ almost everywhere. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>It’s easy to wonder what complex measures are good for. One justification
is the one we’re giving in this post: they provide a clean way of extending
the fundamental theorem of calculus to a broader class of functions.</p>
<p>There’s actually much more to say, though. Complex measures are linear
functionals, where we take a function $f$ and send it to the number
$\int f\ d\nu$. One version of the <a href="https://en.wikipedia.org/wiki/Riesz%E2%80%93Markov%E2%80%93Kakutani_representation_theorem#The_representation_theorem_for_the_continuous_dual_of_C0(X)">Riesz Representation Theorem</a>
says that <em>every</em> linear functional on $C_0 X$ is of this form
(when $X$ is locally compact, hausdorff).</p>
<p>This is great, because it means we can bring measure theory to bear on
problems in pure functional analysis. As one quick corollary, this lets
us transfer the dominated convergence theorem to the setting of functionals.
Of course, without the knowledge of complex measures, we wouldn’t be able
to talk about this, since most functionals are complex valued!</p>
<p>At this point, the category theorist in me <em>needs</em> to mention a cute
result from Emily Riehl’s <em>Category Theory in Context</em>. There are two
functors $\mathsf{cHaus} \to \mathsf{Ban}$
(that is, from compact hausdorff spaces with continuous maps
to (real) banach spaces with bounded linear maps). The first sends $X$
to its banach space of signed measures, and the second sends $X$ to the
(real) dual of its continuous functions $X \mapsto CX^*$. It turns out
these functors are naturally isomorphic
(see also Hartig’s <em>The Riesz Representation Theorem Revisited</em>).</p>
<p>It seems reasonable to me that there would also be a natural isomorphism
between the complex dual of the functions vanishing at infinity and the
space of compelx valued measures, but I can’t find a reference and I’m
feeling too lazy to work it out myself right now… Maybe one day a kind
reader will leave a comment letting me know? <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:13" role="doc-endnote">
<p>By the <a href="https://en.wikipedia.org/wiki/Riemann_series_theorem">Riemann Rearrangement Theorem</a>. <a href="#fnref:13" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:12" role="doc-endnote">
<p>In fact, since $\nu$ can never be infinite, we will focus our attention
on functions $f$ which are $L^1$, rather than just <em>locally</em> $L^1$
as we were able to do before. If $f$ is real valued, then we can relax
this a little bit by using <a href="https://en.wikipedia.org/wiki/Signed_measure">signed measures</a>, but I won’t be going
into that in this post. <a href="#fnref:12" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>Really it’s a statement about real valued signed measures, and so it
allows for either $\pm \infty$ (but not both!) to occur. We won’t need
this extra flexibility, though.</p>
<p>I went back and forth for a long time on whether to include a discussion
about signed measures in this post. Eventually, I decided it made the
post too long, and it encouraged me to include details that obscure the
main points. I want these posts to show the forest rather than the trees,
and here we are. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>Notice for us $\mu$ will almost always be lebesgue measure. The theorem
is true much more broadly, though, and we might ask
“why do we care about more general measures?”. The answer is that there
are other measures which are <em>also</em> easy to compute in practice
(<a href="https://en.wikipedia.org/wiki/Haar_measure">haar measures</a>, and <a href="https://en.wikipedia.org/wiki/Counting_measure">counting measures</a> on countable sets come to mind).</p>
<p>With this generality, we can know for sure that we can work in any
space which admits an effectively computable ($\sigma$-finite) measure
(and there are lots of such spaces besides $\mathbb{R}^n$).</p>
<p>Any space with a computable notion of integration <em>also</em> admits a computable
notion of integrating complex measures by application of Radon-Nikodym! <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>If you’re familiar with <a href="https://en.wikipedia.org/wiki/Product_measure">product measures</a>, there’s actually a
useful fact:</p>
<p>If $\nu_1 \ll \mu_1$ and $\nu_2 \ll \mu_2$ and everything in sight is
$\sigma$-finite, then we have $\nu_1 \times \nu_2 \ll \mu_1 \times \mu_2$,
and</p>
\[\frac{d(\nu_1 \times \nu_2)}{d(\mu_1 \times \mu_2)}(x_1, x_2) =
\frac{d \nu_1}{d \mu_1}(x_1) \frac{d \nu_2}{d \mu_2}(x_2)\]
<p>This is great, since it means to integrate against some product measure
on $\mathbb{R}^2$, we can separately integrate against each component. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>Mathematicians like to use words like “variation” and “oscillation”
rather than “wiggliness”. I can’t imagine why. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>Another way of viewing the issue (particularly in light of the upcoming theorem)
is that $G’$ is <em>not</em> $L^1$. It turns out that a $C^1$ function is bounded
variation if and only if its derivative is $L^1$. That is, if and only if
$\int \lvert G’ \rvert \lt \infty$. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>There is actually a characterization of $\nu_F \ll m$ that doesn’t refer
to $\nu_F$. If $F$ satisfies this condition, then $F$ is called
absolutely continuous. Thankfully, any $F$ which satisfies this is
automatically bounded variation. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p>For this, it might be useful to look up the more technical definitions of
“bounded variation” and “absolutely continuous” that I’ve omitted in this
post. Both cam be found in Chapter $3.6$ of Folland’s
<em>Real Analysis: Modern Techniques and their Applications</em>. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Tue, 31 Aug 2021 00:00:00 +0000
https://grossack.site/2021/08/31/lebesgue-ftc-2.html
https://grossack.site/2021/08/31/lebesgue-ftc-2.htmlShowing There is No Element of Some Order in a Group<p>I took a practice algebra qual the other day, and was totally stumped by a
pretty basic group theory question:</p>
<div class="boxed">
<p>Assume that $G$ is a simple group of order $168$.</p>
<p>(a) How many elements of order $7$ are there in $G$?</p>
<p>(b) Show that $G$ does not contain any elements of order $14$.</p>
</div>
<p>Part (a) is a pretty routine application of the Sylow theorems<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, but part (b)
stumped me. I don’t think I’ve ever seen a problem like tihs before, and it’s
(at least to me) entirely unobvious what to do. Of course, we discussed it
after the mock qual, and after meditating on the solution for a bit, I’m
writing this up so I’ll hopefully remember the idea.</p>
<p>$\ulcorner$</p>
<p>Let $H \leq G$ with $|H| = 14$. Then $H$ has a $7$-sylow subgroup $P$,
and since $[H:P] = 2$, we must have $P \vartriangleleft H.$ So $H$
is contained in $N_G(P)$ (the normalizer<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> of $P$).</p>
<p>Now we use a fact that I had entirely forgotten:
\([G : N_G(P)] = n_p = \# p\text{-sylow subgroups}\).
Once you see this, it becomes obvious (exercise: why?<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>),
and I’m going to try to not forget it again.</p>
<p>With this in hand, though, the rest comes fairly quickly.
If $[G : N_G(P)] = n_7 = 8$, then</p>
\[|N_G(P)| = \frac{|G|}{[G : N_G(P)]} = \frac{168}{8} = 21.\]
<p>But we know $H \leq N_G(P)$, so \(|H| \, \bigg \vert \, |N_G(P)|\) and we find
$14 \, \big \vert \, 21$, giving us the desired contradiction.</p>
<p><span style="float:right">$\lrcorner$</span></p>
<hr />
<p>The most obvious way to show that $H \not \leq G$ is to use lagrange’s theorem,
and this is still using lagrange’s theorem, just in a slightly sharper way.</p>
<p>Normally, we show that $|H|$ does not divide $|G|$, but if we can control
$H$ enough to guarantee $H \leq K \leq G$, then we can actually apply lagrange’s
theorem to this subgroup instead! As we saw above, $|G|$ was not restrictive
enough to prevent a subgroup of order $14$, but the smaller group $N_G(P)$
was able to get the job done!</p>
<p>Now, how could we have been clued into thinking about $H \leq N_G(P)$?
Well, primed by the previous paragraph we notice that $14$ <em>does</em> divide the
order of $G$. So if we’re going to show the nonexistence of this subgroup,
we’ll need something stronger. Without knowing how $H$ embeds into $G$,
though, we have very few tools for finding a $K$ with $H \leq K \leq G$.
In fact, we really only have two tools which let us move up in the subgroup
lattice.</p>
<ol>
<li>Taking normalizers (note $H \leq N_G(H)$ for every $H$)</li>
<li>Working with $p$-subgroups</li>
</ol>
<p>The normalizer angle isn’t promising, since we don’t know how $H$ should
sit inside $G$, so we don’t know anything about how it conjugates.
The $p$-subgroup idea seems reasonable, though. After all, every $p$-subgroup
is contained in a Sylow $p$-subgroup. So we might be able to pass information
from the bottom of the subgroup lattice to the top.</p>
<p>This is doubly effective in our current case, since $7$ is the maximal power
of $7$ in both $H$ <em>and</em> $G$. So actually the (unique!) $7$-sylow subgroup of
$H$ is already one of the $7$-sylow subgroups of $G$. A very common idea with
$p$-groups is to take normalizers<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>, indeed if you’ve done much finite group
theory, you’re surely familiar with the <a href="https://en.wikipedia.org/wiki/Frattini%27s_argument">frattini argument</a>. The standard
example of this argument is showing</p>
<div class="boxed">
<p>If $H \vartriangleleft G$, and $P$ is a $p$-sylow subgroup of $H$,
then $G = N_G(P) H$.</p>
</div>
<p>So when we see $p$-sylow subgroups, we might be trained to think about
their normalizers. But as soon as we look at the normalizer of $P$, we
see it contains $H$, and we can finish the proof from there.</p>
<hr />
<p>As a quick epilogue, I <a href="https://math.stackexchange.com/q/4232945/655547">asked mse</a> for general techniques for showing
$G$ does not have a subgroup of order $k$. There have been some comments,
but nothing substantial (at least at time of writing), which makes me
wonder if there <em>are</em> any general techniques for this.</p>
<p>One idea which was brought up (twice) is letting $G$ act on the cosets of
$H$. Then if $G$ is simple, $G$ acts faithfully on the cosets, and so there
is an injection \(G \hookrightarrow \mathfrak{S}_{[G:H]}.\) In particular, if
the order of $G$ doesn’t divide $[G:H]!$, there can be no subgroup of order $H$.</p>
<p>I seem to remember seeing this idea on an exam from an undergraduate algebra
midterm, but I can’t find the problem…</p>
<p>There’s a lot of results describing how the prime factorization of $|G|$
relates to the subgroups that it <em>does</em> have
(I’ll discuss this some more in a footnote<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>), but I’m still struggling to come
up with more ways to show $G$ <em>omits</em> a subgroup of some order. If anyone
has any ideas, I would love to hear about it, either at my <a href="https://math.stackexchange.com/q/4232945/655547">mse question</a>
or in the comments here ^_^.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>For completeness, we know each element of order $7$ is in exactly one
$7$-sylow subgroup. How many $7$-sylow subgroups are there? Well, it’s
$1$ mod $7$ and divides $\frac{168}{7} = 24$. It’s quick to see that
the only choices are $1$ and $8$, but we know it cannot be $1$
(otherwise the $7$-sylow subgroup would be normal, contradicting
simplicity of $G$). So there are $8$ of them. Each contributes $6$ elements
of order $7$ (since they each contain the identity), and we get $48$
elements total. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Recall the normalizer of $P$ is the <em>largest</em> subgroup of $G$ in which $P$
is normal. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>As a little hint, consider the orbit-stabilizer theorem.</p>
<p>As a big hint, let $G$ act on the set of sylow subgroups by conjugation,
then consider orbit-stabilizer. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>In fact, the mantra for proving things about finite $p$-groups is
“normalizers grow”. This mantra extends to finite nilpotent groups,
since a finite nilpotent group is the direct product of its $p$-sylow
subgroups.</p>
<p>This doesn’t immediately seem helpful, since there are groups where
some $p$-sylow subgroup satisfies $N_G(P) = P$.
Though since looking it up, it seems like this is actually a pretty safe
bet. <em>Self Normalizing Sylow Subgroups</em> by Guralnick, Malle, and Navarro
(available from the <a href="https://www.ams.org/journals/proc/2004-132-04/S0002-9939-03-07161-2/S0002-9939-03-07161-2.pdf">ams</a>) shows that if $P$ is a $p$-sylow subgroup for
$p \gt 3$ then $P = N_G(P)$ implies $G$ is solvable!</p>
<p>Since our group is simple (and $7 \gt 3$), we could retroactively justify
this idea. I like the Frattini Argument angle more, though, so that’s
what I put in the main body of this post. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>The Sylow Theorems tell you that we can always find
subgroups of prime power order, and more generally <a href="https://en.wikipedia.org/wiki/Hall_subgroup">Hall’s Theorem</a> tells
us that if $G$ is solvable, then $G$ has a subgroup of order $k$ whenever
$k$ and $\frac{|G|}{k}$ are coprime<sup id="fnref:5:1" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>.</p>
<p>But we know that lots of groups are solvable! For instance, the famed
<a href="https://en.wikipedia.org/wiki/Feit%E2%80%93Thompson_theorem">Feit-Thompson Theorem</a>, says that every group of odd order is
solvable! Citing another famous result, we know by
<a href="https://en.wikipedia.org/wiki/Burnside%27s_theorem">Burnside’s $p^aq^b$ Theorem</a> that any group with only two prime divisors
(that is, every group of order $p^aq^b$) is solvable too.</p>
<p>Moreover, it’s easy to see that every group of squarefree order is solvable
(this is a cute exercise), and thus satisfies the converse of lagrange’s
theorem (here we heavily use squarefree-ness of $|G|$). So any group which
omits a subgroup of some order <em>must</em> have a factor of $p^2$ for some prime
$p$.</p>
<p>There may be odd order groups (or groups with only two prime divisors) which
omit subgroups, but we see that they can’t omit <em>many</em> subgroups by Hall’s
Theorem. By the remark about squarefree order groups, the simplest case is
groups of order $p^2 q$, and indeed there are already groups of this
order which omit a subgroup.
We see this with $A_4$, of order $12 = 2^2 \cdot 3$, for instance,
which famously has <a href="https://math.stackexchange.com/questions/582658/a-4-has-no-subgroup-of-order-6">no subgroup of order $6$</a>.</p>
<p>Groups which have subgroups of all possible orders are called
<span class="defn">CLT Groups</span> for “Converse to Lagrange’s Theorem”.
For more information about the relation between the prime factors of $|G|$
and the CLT-ness (or indeed the non-CLT-ness) of $G$, you should check out
the (extremely readable!) thesis
<em>Groups Satisfying the Converse to Lagrange’s Theorem</em> by Jonah N. Henry.
You can find a copy <a href="https://bearworks.missouristate.edu/cgi/viewcontent.cgi?article=4484&context=theses">here</a>. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Thu, 26 Aug 2021 00:00:00 +0000
https://grossack.site/2021/08/26/no-subgroups-of-some-order.html
https://grossack.site/2021/08/26/no-subgroups-of-some-order.htmlA geometric proof that $D_{2m} \leq \mathfrak{S}_n$ is possible for $m > n$<p>To nobody’s surprise, I was on MSE tonight, and saw a <a href="https://math.stackexchange.com/q/4225528/655547">simple question</a> about
group theory. The original question doesn’t matter as much as a question it
made me wonder to myself:</p>
<div class="boxed">
<p>Is there an $m > n$ so that the dihedral group $D_{2m}$ is a subgroup of the
symmetric group $S_n$?</p>
</div>
<p>Intuitively it feels like the answer should be “yes”, but I wasn’t able to
come up with a proof myself. Thankfully it didn’t take much googling to find
an excellent example due to pjs36. I’ll show it for completeness, but you can
find the original post <a href="https://math.stackexchange.com/a/1710743/655547">here</a>.</p>
<p>The idea is to embed $D_{12}$ in \(\mathfrak{S}_5\) by working with <em>subpolygons</em>
instead of <em>vertices</em>. This is analogous to showing the symmetry group of a cube
is \(\mathfrak{S}_4\) by looking at the <em>diagonals</em> inside the cube, rather than
looking at the vertices/edges/faces individually.</p>
<p>Since a picture is worth a thousand words, I’ll steal pjs36’s picture:</p>
<p><img src="/assets/images/embedding-dihedral-groups-efficiently/stolen-image.png" /></p>
<p>If you know where these $5$ subpolygons get sent, you actually know where the
whole hexagon gets sent! This witnesses $D_{12} \leq \mathfrak{S}_5$ in a
starkly beautiful way.</p>
<p>It got me wondering, though. Can we run a similar argument to get
$D_{2m} \leq \mathfrak{S}_n$ for other choices of $m$? Since you’ve read
the title of this blog post, you know the answer is “yes”<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. But at this point
I should issue a quick clarification: This (very clever!) idea is not actually
my own – I found it in yet another <a href="https://math.stackexchange.com/a/3740706/655547">mse post</a>. I was already planning on
writing up a blog post about this problem, but when I found the solution I knew
I had to talk about it. Now, let’s talk through how we
might have solved this problem ourselves:</p>
<p>Obviously if $p$ is prime, then $D_{2p}$ first shows up as a subgroup of
$\mathfrak{S}_p$ in the natural way (by permuting the vertices). We can’t do
any better than this since for $p$ prime, $p \not \mid n!$ for any $n \lt p$.</p>
<p>A moment’s thought (or more likely, quite a few moments’ thoughts) shows that
actually \(D_{2p^k} \not \leq \mathfrak{S}_n\) for $n \lt p^k$. The idea here is
that $D_{2p^k}$ has a $p^k$ cycle. The order of an element
$\sigma \in \mathfrak{S}_n$ is the $\text{lcm}$ of the cycle lengths in $\sigma$,
so even though $p^k$ might divide $n!$, there’s no way to get an $\text{lcm}$
of things $\lt p^k$ to equal $p^k$.</p>
<p>Pjs36’s solution <em>feels</em> like it should generalize, and
looking more carefully at the pictures above, we’re considering the $2$-gons
and $3$-gons living inside of our $6$-gon… This idea links up well with our
counterexamples from earlier, since the subpolygons of an $m$-gon are exactly
the $\ell$-gons for $\ell \mid m$ and prime powers are special in the
divisibility order.</p>
<p>We want to make $n$ (the number of generators of our symmetric group) as small
as possible, so we should make the subpolygons we look at as <em>big</em> as possible
(so that there aren’t many). Since we know from our earlier
investigation that $p^k$-gons are the obstruction to “shrinking” $n$,
whatever construction we do should give us $p^k$ objects to permute when we look
at a $p^k$-gon.</p>
<p>Eventually, this might lead us to consider the $p^k$ many $\frac{m}{p^k}$-gons
living inside our $m$-gon. In the $p^k$-gon case, this means we’re considering
the $p^k$ many $1$-gons (that is, just the vertices), which is exactly what we
expect. In the $6$-gon case, this mean we consider the $\frac{6}{2}$-gons and
the $\frac{6}{3}$-gons, but this is exactly pjs36’s original example! In the
case of a $2^3 \cdot 3^2 \cdot 5$-gon, this means we’re looking at the
$45$-gons (there’s $8$ of them), the $40$-gons (there’s $9$ of them),
and the $72$-gons (there’s $5$ of them). Notice here how we’re able to keep
the number of objects small (there’s only $8+9+5 = 22$ things we’re permuting)
by keeping our subpolygons big.</p>
<p>In general, let’s write $m = \prod P_i$ where each $P_i = p_i^{k_i}$ is a (maximal)
prime power. We can find $P_i$ many $\frac{m}{P_i}$-gons living inside our $m$-gon,
and every symmetry of our $m$-gon permutes these subpolygons amongst themselves.
That is, we get a permutation in \(\mathfrak{S}_{P_i}\) for each $i$.</p>
<p>Next, we can glue these together into a map</p>
\[D_{2m} \to \prod \mathfrak{S}_{P_i}.\]
<p>which I claim is actually injective.</p>
<p>To see why, say that some $g \in D_{2m}$ is in the kernel of the above map.
Then it’s in the kernel of each \(D_{2m} \to \mathfrak{S}_{P_i}\), so $g$
fixes each of our subpolygons. But this can only happen if $g$ fixes the
entire $m$-gon, so $g = 1$ and our map is injective.</p>
<p>Now we see the light at the end of the tunnel! We have an embedding
\(D_{2m} \hookrightarrow \prod \mathfrak{S}_{P_i}\), and we want an embedding
$D_{2m} \hookrightarrow \mathfrak{S}_n$ for some $n$. But there’s an “obvious”
way to do this! If you have a permutation of $k$ objects and a permutation of
$\ell$ objects, we can just put them next to each other and call it a permutation
of $k + \ell$ objects.</p>
<p>Using this “put them next to each other” embedding, we see that</p>
\[D_{2m} \hookrightarrow \prod \mathfrak{S}_{P_i} \hookrightarrow \mathfrak{S}_{\sum P_i}.\]
<p>So $D_{12}$ embeds in \(\mathfrak{S}_{2+3} = \mathfrak{S}_5\), as we’ve seen.
Likewise, each $D_{2p^k}$ embeds in \(\mathfrak{S}_{p^k}\), which agrees with
our earlier experiments. Finishing our concrete example from earlier shows
$D_{2 \cdot (2^3 \cdot 3^2 \cdot 5)}$ embeds in
\(\mathfrak{S}_{2^3 + 3^2 + 5} = \mathfrak{S}_{22}\),
which is <em>much</em> smaller than the obvious
\(\mathfrak{S}_{2^3 \cdot 3^2 \cdot 5} = \mathfrak{S}_{360}\)!</p>
<p>As one last question: <em>how</em> much smaller is it? If \(m = \prod p_i^{k_i}\), then let’s call
each \(p_i^{k_i}\) a <span class="defn">Principal Divisor</span> of $m$. Moreover,
let’s write</p>
\[s^*(m) = \sum p_i^{k_i}.\]
<p>According to
<em>Upper Bounds on the Sum of Principal Divisors of an Integer</em>
by Eggleton and Galvin (up to change of variable name, to be consistent with
the other variables in this post):</p>
<blockquote>
<p>If $m$ is any positive integer with $\ell \geq 2$ principal divisors, and
each greater than $\ell / 2$, then</p>
\[s^*(m) \leq \frac{m}{\ell^{\ell - 2}}.\]
<p>Moreover, this holds with equality when $m = 30$.</p>
</blockquote>
<p>This tells us that we can embed $D_{2m}$ in a symmetric group with generators
that shrink rapidly as the number of principal divisors of $m$ increases
(provided each is not individually too small).</p>
<div class="boxed">
<p>We’ve shown that \(D_{2m} \hookrightarrow \mathfrak{S}_{s^*(m)}\),
but maybe we can do even better!</p>
<p>As a (fun?) exercise, can you show that that isn’t the case?
That is, if \(D_{2m}\) embeds in \(\mathfrak{S}_n\), show that
$n \geq s^*(m)$, so our construction here was best possible.</p>
<p>The proof is similar to how we showed \(D_{2p^k}\) can’t embed in
\(\mathfrak{S}_n\) for $n \lt p^k$.</p>
</div>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>This is actually a special case of a very hard problem. In general
for a group $G$ we have very little idea what the minimal $n$ with
$G \hookrightarrow \mathfrak{S}_n$ is. It’s super cool that we can
solve this explicitly for dihedral groups! <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Mon, 16 Aug 2021 00:00:00 +0000
https://grossack.site/2021/08/16/embedding-dihedral-groups-efficiently.html
https://grossack.site/2021/08/16/embedding-dihedral-groups-efficiently.htmlSolving Solvable Polynomials with Galois Theory (Part 1)<p>I’m super excited to be writing this post up! It’s been haunting me for
almost exactly a month now, and it feels good to be close enough to done
that I can finally share my hard work with the world ^_^.</p>
<p>I’ve been spending a lot of time studying galois theory in the past little
while, since it’s easily my weakest area in the standard algebra curriculum.
I’ve been meaning to really sit down and learn it for a few years now, but there’s
always been something more pressing. I’m not a fan of the qual system overall,
but I guess nothing is entirely without merit, and if I’m being honest with
myself? I don’t know when I would have gotten around to learning galois theory
if it weren’t for the upcoming quals.</p>
<p>Now, one of the most famous theorems in galois theory is that
the roots of a polynomial $f$ (in $\mathbb{Q}[x]$, say) can be expressed
with nested radicals if and only if the galois group $G$ of $f$ is solvable.
<em>So</em>, if I were to give you a polynomial with solvable galois group, would you
know how to actually… solve it?</p>
<p>The proof that’s given in most books <em>is</em> actually constructive, but enough
details are left out to make it mostly unimplementable. I eventually found
Gaal’s <em>Classical Galois Theory With Examples</em>, which is very explicit about
how to do the solving process. In this blog post, we’re going to focus in
on the case where most of the work is done:
that of a cyclic group of prime order<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">1</a></sup>.
You’ll remember that every solvable group is an interated extension of these
groups (in fact, this is basically the definition of solvable), so by tackling
this case, we’ll get the full solvability case by iteration. I’ll write up a
follow up blog post soon where we talk about that process.</p>
<p>For now, though let’s see the algorithm:</p>
<div class="boxed">
<p>Let $f$ be a monic polynomial of degree $n$ with cyclic galois group, generated
by $\sigma$.</p>
<ol>
<li>
<p>Let $\alpha_0, \ldots, \alpha_{n-1}$ be the roots of $f$ – these are just
symbols for now. But without loss of generality order them so that
$\sigma \alpha_j = \alpha_{j+1}$.</p>
</li>
<li>
<p>Let $\omega$ be an $n$th root of unity</p>
</li>
<li>
<p>Look at the equations $\theta_k = \sum_{0 \leq j \lt n} \omega^{j k} \alpha_j$.
Notice $\sigma \theta_k = \omega^k \theta_k$ (do you see why?). This means
<br /><br />
\(\sigma \theta_k^n = (\sigma \theta_k)^n = \omega^{nk} \theta_k^n = \theta_k^n\)
<br /><br />
so $\theta_k^n$ is fixed by $\sigma$ (and thus the whole galois group).</p>
</li>
<li>
<p>Since $\theta_k^n$ is fixed by $G$, it must lie in the base field. So
it’s just a number, $\psi_k$. That means $\theta_k = \sqrt[n]{\psi_k}$.</p>
</li>
<li>
<p>Now we recover $\alpha_0 = \frac{1}{n} \sum \sqrt[n]{\psi_k}$. In fact,
we can recover each of the $\alpha_j$ as a weighted average of the $\psi_k$.
Do you see how?<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">2</a></sup></p>
</li>
</ol>
</div>
<p>This is great and all, but there’s a bit of magic that happens in step
$4$… How do we actually figure out what the $\psi_k$s are in the base field?
Getting $\psi_0$ is pretty easy: It’s the first
<a href="https://en.wikipedia.org/wiki/Elementary_symmetric_polynomial">elementary symmetric polynomial</a>, and as such, it’s just (the negation of)
the $x^{n-1}$ coefficient of $f$!</p>
<p>This is actually a clue for how we might in theory get each of the $\psi_k$.
Even though the other $\psi_k$ <em>aren’t</em> symmetric in the $\alpha_j$, we can
still leverage the theorem of symmetric polynomials (with some work) to
figure out which numbers the $\psi_k$ really are.
This is where Gaal’s book was first helpful – it introduced me to the following
idea:</p>
<p>Consider the polynomial (which I now know is called the <a href="https://en.wikipedia.org/wiki/Resolvent_(Galois_theory)">galois resolvent</a>)</p>
\[\mathcal{L}(t) =
\prod \{ t - \tilde{\psi} \mid \tilde{\psi} \in \mathfrak{S}_n \cdot \psi_1 \}.\]
<p>This product ranges over the orbit of $\psi_1$ under the action of the symmetric
group (which permutes the $\alpha_j$ inside $\psi_1$). Notice that each of the $\psi_k$
for $k \neq 0$ are in this orbit (do you see why?) so each nonzero $\psi_k$
is a root of $\mathcal{L}$.</p>
<p>A priori, $\mathcal{L}$ lives in $\mathbb{Q}(\omega)[\alpha_0, \ldots, \alpha_{n-1}][t]$,
since the $\psi$s are all polynomials in the $\alpha_j$. But the coefficients
of $\mathcal{L}$ are all symmetric in the orbit of $\psi_1$, and thus,
symmetric in the $\alpha_j$ (again, do you see why?). So $\mathcal{L}$ is
actually an element of $\mathbb{Q}(\omega)[t]$! But we know how to find the roots
of a polynomial with constant coefficients (or rather, sage does), and these
roots are exactly the $\psi_k$!</p>
<p>I coded it up, and… it crashed my desktop. In lieu of downloading more ram,
I tried to optimize my code (see <a href="https://ask.sagemath.org/question/58035/polynomial-multiplication-is-unexpectedly-slow/">here</a>), which still didn’t work. I
also tried to find a more effective procedure (see <a href="https://math.stackexchange.com/questions/4204419/solving-a-solvable-polynomial-by-radicals-effectively">here</a>), but it seems
like this is really how it’s done. I know it’s been done before
(in the gap <a href="https://www.gap-system.org/Packages/radiroot.html">radiroot</a> package, for instance), but I didn’t want to have
to reverse engineer someone else’s code unless I absolutely had to<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">3</a></sup>.
So, I kept looking.</p>
<p>Eventually I learned that $\mathcal{L}$ is called the resolvent, and I spent
some time learning more about resolvents and why they’re interesting and
useful<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. So my next step was to find an efficient algorithm for evaluating
resolvents, and I found one! My code here is an implmentation of the algorithm
described in Lehobey’s
<em>Resolvent Computations By Resultants Without Extraneous Powers</em>, which is
super readable if you’re interested!</p>
<p>It’s still kind of slow, but it’s less memory intensive, and it definitely
gets the job done!</p>
<div class="linked_auto">
<script type="text/x-sage">
def interpolating_functions(f):
"""
Build a list of interpolating functions for f
We require f : K[x1,...,xn][x]
"""
R = f.parent().base_ring()
xs = R.gens()
n = f.degree()
out = [f]
for k in range(f.degree()):
fk = (out[-1] - out[-1].subs(x=xs[n-k-1]))/(x - xs[n-k-1])
out += [f.parent()(fk.simplify_full())]
return out[::-1]
def stabilizer(G,p):
"""
Compute the stabilizer of p by G
"""
elems = []
for g in G:
if p * g == p:
elems += [g]
return G.subgroup(elems)
def truncated_root(p,r,d):
"""
Compute q so that q^r = p (working mod x^d)
Assumes p : A[t] has constant term 1
and that such a q : A[t] actually exists!
"""
r = int(r)
t = p.variables()[0]
n = p.degree()
p = p.truncate(d)
ps = p.coefficients(sparse=False)
# these will be the coefficients of q
qs = [1]
for k in range(n // r):
qs += [1/(k+1) * sum([(k+1 - (r+1)*j) * qs[j] * ps[k+1-j] / r for j in range(k+1)])]
return sum([qs[j] * t^j for j in range(len(qs))])
def resolvent(f,Theta):
"""
Compute mathcal{L}_{Theta,f} as per the paper
Assumes f : K[x1,...,xn][x] and Theta : K[x1,...,xn]
"""
R = Theta.parent()
K = R.base_ring()
xs = R.gens()
SIterated.<t> = PolynomialRing(R)
S = SIterated.flattening_morphism().codomain()
T = K[t]
Rj = (t - SIterated(Theta)).reverse()
Rj = S(Rj)
fs = interpolating_functions(f)
HprevOrder = 1
n = f.degree()
for j in range(1,n+1):
print(j, "/", n)
Sj = SymmetricGroup(j)
Hj = stabilizer(Sj,Theta)
dj = factorial(j) / Hj.order()
mj = Hj.order() / HprevOrder
# update the previous order for the next cycle
HprevOrder = Hj.order()
# there's an annoying off-by-one error with the variable names
# compared to everything else
fj = S(fs[j].subs(x=xs[j-1]))
res = fj.resultant(Rj, S(xs[j-1]))
Rj = truncated_root(SIterated(res),mj,dj+1)
Rj = S(Rj)
return T(Rj).reverse()
def solveByRadicals(f):
"""
Compute a root of f using radicals
f(x) is assumed to be symbolic
"""
n = int(f.degree(x))
K.<w> = CyclotomicField(n)
R = PolynomialRing(K,n,'x')
xs = R.gens()
R1 = R[x]
f = R1(f)
Theta = sum(xs[k] * w^(k) for k in range(n))
# Theta^n is preserved under the action of the galois group,
# while Theta itself is an eigenvector with eigenvalue w
L = resolvent(f,Theta^n)
psis = L.roots(multiplicities=False)
thetas = [psi^(1/n) for psi in psis]
# we need to choose the ~correct~ nth root for each psi.
# I don't actually know how you're supposed to know which
# one is right, so we just try them all...
#
# There must be a better way to do this, but I want to start
# working on other things.
from itertools import product
for es in product([w^k for k in range(n)], repeat=n-2):
r = (-list(f)[-2] + thetas[0] + sum(es[k-1] * thetas[k] for k in range(1,n-1)))/n
# there's definitely a better way to do this too...
if abs(f(r).n()) < 0.000000001:
return r
# if we never found a root
print("Uh oh!")
R = QQ[x]
deg3s = [x^3 - x^2 - a*x + b for (a,b) in [(26,-41), (32,79), (34,61), (36,4), (42,-80), (46,-103)]]
deg5s = [x^5 + x^4 - 4*x^3 - 3*x^2 + 3*x + 1,
x^5 + x^4 - 12*x^3 - 21*x^2 + 1*x + 5,
x^5 + x^4 - 16*x^3 + 5*x^2 + 21*x - 9,
x^5 + x^4 - 24*x^3 - 17*x^2 + 41*x - 13]
deg7s = [x^7 + x^6 - 12*x^5 - 7*x^4 + 28*x^3 + 14*x^2 - 9*x + 1]
fs = deg3s + deg5s + deg7s
@interact
def _ (f=selector(fs, label="$f$"), auto_update=False):
show(solveByRadicals(f))
</script>
</div>
<p>The sagemath online server times these out for polynomials of degree $5$ and $7$,
but you can run this code locally to see that it does give the right answer<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>.
The degree $5$ case is slow, but manageable. The degree $7$ case is…
really impressively slow. You’ve been warned.</p>
<div class="boxed">
<p>As a slightly tricky exercise, we’re assuming throughout that we have access
to roots of unity… But how do we know that we can write roots of unity
in terms of radicals?</p>
<p>Show that roots of unity can always be written in terms of radicals<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</p>
</div>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:3" role="doc-endnote">
<p>I’m fairly confident this will work on cyclic groups of composite order too,
and I’ve even tested it on a few polynomials of degree $4$. But the prime
case is all we need for solvability, and I can say for sure that it
always works in that case, so that’s what we’re going with. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:1" role="doc-endnote">
<p>As a little hint, you’ll only need to scale the $\psi_k$ by powers of
$\omega$. You should try writing these sums out, say in the $n=3$ case,
to try and get a handle for what’s going on. You might also want to watch
Richard E Borcherds’ (characteristically excellent) video <a href="https://youtu.be/UaeJNQ5x17g">here</a></p>
<p>As a <em>massive</em> hint (and a cute connection with the rest of mathematics),
the $\theta_k$ are actually the <a href="https://en.wikipedia.org/wiki/Discrete_Fourier_transform#The_unitary_DFT">Discrete Fourier Transform</a> of the
$\alpha_j$! So once we know the $\theta_k$s, we can recover the $\alpha_j$s
using the <em>inverse</em> DFT! <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Especially since the package is based on Andreas Distler’s
<a href="http://www.icm.tu-bs.de/ag_algebra/software/distler/Diplom.pdf">thesis</a>, and my German is…. nicht so gut. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>There’s not much sense writing up a post about it, though, since there’s
actually a ton of great resources on this! Healey’s
<em>Resultants, Resolvents, and the Computation of Galois Groups</em>
(available <a href="http://www.alexhealy.net/papers/math250a.pdf">here</a>) certainly comes to mind. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>The answers <em>are</em> pretty unwieldy, though. So maybe it’s for the best you
need to run it locally.</p>
<p>For instance, if $\zeta$ is an $11$th root of
unity, then $\mathbb{Q}(\zeta)$ is a cyclic extension of degree $10$.
This means $\zeta + \zeta^{-1}$ (which is of degree $2$ <em>under</em> $\mathbb{Q}(\zeta)$)
generates an extension of degree $5$ over $\mathbb{Q}$.</p>
<p>Sage tells us the minimal polynomial of $\zeta + \zeta^{-1}$ is
$x^5 + x^4 - 4x^3 - 3x^2 + 3x + 1$, and so we can use this code to write
$\zeta + \zeta^{-1}$ as:</p>
\[\begin{aligned}
-\frac{\alpha^3 + 4 \alpha^2 + 16 \alpha + 64}{320} \,
{\left(
-\frac{165}{64} \,
{\alpha}^{3}
- \frac{385}{16} \, {\alpha}^{2}
- \frac{275}{4} \, \alpha - 451
\right)}^{\frac{1}{5}} \\
+ \frac{\alpha}{20} \,
{\left(-\frac{55}{32} \,
{\alpha}^{3}
+ \frac{55}{8} \, {\alpha}^{2}
+ \frac{275}{4} \, \alpha - 176
\right)}^{\frac{1}{5}} \\
+ \frac{1}{10} \,
{\left(
\frac{385 \, {\alpha}^{3} + 440 \, {\alpha}^{2} + 3520 \, \alpha - 4224}{2}
\right)}^{\frac{1}{5}}\\
+ \frac{1}{5} \,
{\left(-\frac{55}{32} \,
{\alpha}^{3}
+ \frac{165}{16} \, {\alpha }^{2}
- 55 \, \alpha - 286
\right)}^{\frac{1}{5}} \\
- \frac{1}{5}
\end{aligned}\]
<p>where $\alpha = \sqrt{5} + i \, \sqrt{2 \, \sqrt{5} + 10} - 1$.</p>
<p>NB: I made the substitution for $\alpha$ myself, so there might be a minor
arithmetic error in the above expression… Though I don’t think that’s
actually very important :P <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>As a hint, let $\zeta$ be an $n$th root of unity for $n$ odd.
First show that you can recover $\zeta$ from $\zeta + \zeta^{-1}$,
and that $\zeta + \zeta^{-1}$ satisfies a (cyclic) equation of degree $\frac{n-1}{2}$.
Then, inductively, we can solve this by radicals, and then get $\zeta$
with one more square root.</p>
<p>Do you see how to do the $n$ even case as well? <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Fri, 06 Aug 2021 00:00:00 +0000
https://grossack.site/2021/08/06/cyclic-extensions.html
https://grossack.site/2021/08/06/cyclic-extensions.htmlIteration Asymptotics<p>I really like recurrences, and the kind of asymptotic
analysis that shows up in combinatorics and computer science. I think I’m drawn
to it because it melds something I enjoy (combinatorics and computer science)
with something I historically struggle with (analysis).</p>
<p>My usual tool for handling reccurences (particularly for getting asymptotic
information about their solutions) is <a href="https://en.wikipedia.org/wiki/Generating_function">generating functions</a>. They slaughter
linear recurrences (which nowadays I just solve with <a href="https://sagemath.org">sage</a>), but through
functional equations, <a href="https://en.wikipedia.org/wiki/Lagrange_inversion_theorem">lagrange inversion</a>, and complex analysis, they form
an extremely sophisticated theory<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">1</a></sup>. Plus, I would be lying if I didn’t say
I was drawn to them because of how cool they are conceptually. I’m a sucker for
applying one branch of math to another, so the combination of complex analysis
and enumerative combinatorics is irresistable.</p>
<p>Unfortunately, you can’t solve all asymptotics problems with generating
functions, and it’s good to have some other tools around as well. Today
we’ll be working with the following question:</p>
<div class="boxed">
<p>If $f$ is some function, what are the asymptotics of</p>
\[x_{n+1} = f(x_n)\]
<p>where we allow $x_0$ and $n$ to vary?</p>
</div>
<p>If $f$ is continuous and $x_n \to r$, it’s clear that $r$ must be a fixed
point of $f$. If moreover, $f’(r)$ exists, and $|f’(r)| \lt 1$, then anything
which starts out near $r$ will get pulled into $r$. Also, we might as well
assume $r = 0$, since we can replace $f$ by $f(x+r) - r$ without loss of
generality.</p>
<p>These observations tell us we should restrict attention to
those systems where $f(x) = a_1 x + a_2 x^2 + \ldots$ is
analytic at $0$ with $f(0) = 0$ and $|a_1| \lt 1$. Indeed, we’ll focus
on exactly this case for the rest of the post<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">2</a></sup>.</p>
<p>As a case study, let’s take a simple problem I found on mse the other day.
I can’t actually find the question I saw anymore, or else I’d link it.
It looks like it’s been asked <a href="https://approach0.xyz/search/?q=%24a_%7Bn%2B1%7D%20%3D%20%5Cfrac%7Ba_n%20%2B%203%7D%7B3a_n%20%2B%201%7D%24&p=1">a few times now</a>, but none of the options
are recent enough to be the one I saw. Oh well.</p>
<div class="boxed">
<p>Define $x_0 = 2$, $x_{n+1} = \frac{x_n + 3}{3 x_n + 1}$. What is
$\lim_{n \to \infty} x_n$?</p>
</div>
<p>The original problem is fairly routine, and we can solve it using
<a href="https://en.wikipedia.org/wiki/Cobweb_plot">cobweb diagrams</a>. The asymptotics are more interesting, though.
It turns out we can read the asymptotics right off of $f$, which is
super cool! I guess I hadn’t seen any examples because people who are in the
know feel like it’s too obvious to talk about, but that makes it the perfect
topic for a blog post!</p>
<p>Notice $f(x) = \frac{x+3}{3x+1}$ has a fixed point at $1$, so we’ll need to
translate it to the origin. We’ll replace $f$ by
$g(x) = f(x+1) - 1 = \frac{-2x}{3x+4}$, remembering to replace $x_0 = 2$
by $y_0 = x_0 - 1 = 1$ as well.</p>
<p>Notice $g(x) = - \frac{1}{2} x + \frac{3}{8}x^2 - O(x^3)$ is analytic at $0$
with $| - \frac{1}{2} | \lt 1$.</p>
<p>Let’s get started!</p>
<hr />
<h2 id="simple-asymptotics">Simple Asymptotics</h2>
<p><br /></p>
<p>We’ll be following de Bruijn’s <em>Asymptotic Methods in Analysis</em> in both
this section and the next. In the interest of showcasing how to actually
<em>use</em> these tools, I’m going to gloss over a lot of details. You can find
everything made precise in chapter $8$.</p>
<p>First, if $x \approx 0$, then $f(x) \approx f’(0) \cdot x$. Since we are
assuming $|f’(0)| \lt 1$, if $x_n \approx 0$, then
$x_{n+1} \approx f’(0) x_n \approx 0$ too. An easy induction then shows that</p>
\[x_n \approx (f'(0))^n x_0.\]
<p>But now we have our asymptotics! If we formalize all of the $\approx$ signs
above, we find</p>
<div class="boxed">
<p>For each $|f’(0)| \lt b \lt 1$, there is a radius $\delta$ so that as long
as $x_0 \in (-\delta, \delta)$ we’re guaranteed</p>
<p>\(|x_n| \lt b^n |x_0|\)</p>
</div>
<p>Since $x_n \to 0$, we’re guaranteed that eventually our $x_n$s will be inside
however tight a radius we want! Since big-oh notation ignores the first
finitely many terms anyways, this tells us</p>
<div class="boxed">
<p>Life Pro Tip:</p>
<p>If $x_n \to r$ (a fixed point of $f$) with $\lvert f’(r) \rvert \lt 1$, then
$x_n \to r$ exponentially quickly. More precisely, for any $\epsilon$ you want<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">3</a></sup></p>
<p>\(x_n = r \pm O((\lvert a_1 \rvert + \epsilon)^n)\)</p>
</div>
<p>What about our concrete example? We know $f$ has $1$ as a fixed point,
and $f’(1) = - \frac{1}{2}$. Then for $x_0 = 2$, we get
(by choosing $\epsilon = 0.00001$) that $x_n = 1 \pm O ( 0.50001^n )$.
Which is fast enough convergence for most practical purposes.</p>
<hr />
<h2 id="asymptotic-expansion">Asymptotic Expansion</h2>
<p><br /></p>
<p>But what if you’re a real masochist, and the exponential convergence above
isn’t enough for you? Well don’t worry, becaue de Bruijn has more to say.</p>
<p>If $x_0$ is fixed, then $x_n a_1^{-n}$ converges to a limit, which we’ll call
$\omega(x_0)$. But since $x_{n+1} = f(x_n)$, we get an important restriction
on $\omega$ by considering $x_1$ as the start of its own iteration<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">4</a></sup>:</p>
\[\omega(f(x)) = a_1 \omega(x) \quad \quad \quad \quad (\star)\]
<p>In the proof that $\omega$ exists (which you can find in de Bruijn’s book),
we also show that $\omega$ is analytic whenever $f$ is!
Then using <a href="https://en.wikipedia.org/wiki/Lagrange_inversion_theorem">lagrange inversion</a>, we can find an analytic $\Omega$ so that
$\Omega(\omega(x)) = x$.</p>
<p>But now we can get a great approximation for $x_n$! By repeatedly applying
$(\star)$ we find</p>
\[\omega(x_n) = a_1^n \omega(x_0)\]
<p>which we can then invert to find</p>
\[x_n = \Omega(a_1^n \omega(x_0)).\]
<p>If we use the first, say, $5$ terms of the expansion of $\Omega$, this will
give us accuracy up to $\tilde{O}(a_1^{5n})$. There are also some lower order terms
which come from how much of $\omega$ we use, but I’m sweeping under the $\tilde{O}$.</p>
<p>How do we actually do this in practice, though? The answer is “with sage”!</p>
<p>This code will take a function $f$ like we’ve been discussing, and will
recursively compute the first $N$ coefficients of $\omega$. It turns out
the $j$th coefficient of $\omega$ depends only on the first $j-1$ coefficients plus
equation $(\star)$. Then it will lagrange invert to get the first $N$
terms of $\Omega$, and it will use these to compute an asymptotic expansion
for $x_n$ in terms of $n$ and $x_0$ (which it’s writing as $x$).</p>
<p>Since I had to (rather brutally) convert back and forth between symbolic
and ordinary power series, sage wasn’t able to keep track of the error
term for us. Thankfully, it’s pretty easy to see that the error term is
always $O(a_1^n x_0^N)$, so I just manually wrote it in.</p>
<div class="linked_auto">
<script type="text/x-sage">
"""
This is super sloppy because sage actually has two
different kinds of power series:
- symbolic power series
- "ordinary" power series
the ordinary power series are better in almost every way, except they
don't allow variables inside them! Since we need variables to build the
recurrence that we solve for the next coefficient, this is a problem.
The only way I was able to get this working is by hacking back and
forth between the two types of power series. If anyone has a better way
to do this PLEASE let me know.
"""
def omega(f, x, N):
"""
Compute the first N terms of omega(x)
"""
# this is a symbolic power series
f = f.series(x,N)
a1 = f.coefficient(x,1)
# initialize omega (as a symbolic power series)
o = (0 + x).series(x,N)
d = var('d')
for j in range(2,N+1):
# set up the linear recurrence that defines the jth coefficient.
# if you didn't believe symbolic series are more cumbersome than
# "ordinary" series, hopefully you do now.
#
# this comes from looking at the jth coefficient of the equation
# omega(f(x)) == a1 * omega(x)
eqn = (o + d * x^j).subs(x=f).series(x,N).coefficient(x,j) == a1 * d
o = (o + solve(eqn, d)[0].rhs() * x^j).series(x,N)
# this is a symbolic power series
return o
def iterationAsymptotics(f,N=5):
"""
Compute the first N many terms of an asymptotic expansion for x_n
"""
n = var('n')
# for some reason extracting the coefficient gives us a constant function
# and it only breaks things here? Oh well, we'll evaluate it to make
# sage happy.
a1 = f.series(x,2).coefficient(x,1)()
# we convert o to an ordinary power series, since there's no way
# to do lagrange inversion to a symbolic power series
o = omega(f,x,N).power_series(QQbar)
O = o.reverse() # lagrange inversion
# dirty hack to convert back to a symbolic series
return O.truncate().subs(x=(a1^n * o.truncate())).expand().series(x,N)
def stats(f,n=10,N=5):
"""
Run 1000 tests to see how well the asymptotic expansion
agrees with the expected output.
"""
approx = iterationAsymptotics(f,N).truncate().subs(n=n)
tests = []
for _ in range(1000):
x0 = random()
# compute the exact value of xn
cur = x0
for _ in range(n):
cur = f(cur)
# compute the approximate value of xn
guess = approx.subs(x=x0)
tests += [cur - guess]
avg_diff = mean([abs(t) for t in tests])
max_diff = max([abs(t) for t in tests])
median_diff = median([abs(t) for t in tests])
show("maximum error: ", max_diff)
show("mean error: ", avg_diff)
show("median error: ", median_diff)
show(histogram(tests, bins=50, title="frequency of various signed errors (actual $-$ approximation)"))
show(html("Type in $f$ with fixed point $0$ and $0 < |f'| < 1$"))
@interact
def _(f=input_box(-2*x / (3*x + 4), width=20, label="$f$"),
n=input_box(100, width=20, label="$n$"),
N=input_box(3, width=20, label="$N$")):
f(x) = f
a1 = f.series(x,2).coefficient(x,1)()
# we have to show things in this weird way to get things on one line
# it's convenient, though, because it also lets us modify the latex
# to print the error bounds
show(html(f"$$f = {latex(f().series(x,N).power_series(QQbar))}$$"))
show(html(f"$$\\omega = {latex(omega(f,x,N).power_series(QQbar))}$$"))
series = f"x_n = {latex(iterationAsymptotics(f,N).truncate())}"
error = f"O \\left ( \\left ( {latex(abs(a1))} \\right )^n x^{N} \\right )"
show(html("$$" + series + " \\pm " + error + "$$"))
show("How good is this approximation?")
stats(f,n,N)
</script>
</div>
<div class="boxed">
<p>As a nice exercise, you might try to modify the above code to work with
functions with a fixed point at $r \neq 0$. You can do this either by
taylor expanding at $r$ directly, or by translating $r$ to $0$, then using
this code, then translating back.</p>
<p>Be careful, though! We get much more numerical precision near $0$, so if you
do things near $r$ you might want to work with <a href="https://doc.sagemath.org/html/en/reference/rings_numerical/sage/rings/real_mpfr.html">arbitrary precision reals</a>.</p>
</div>
<p>So, in our last moments together, let’s finish up that concrete example in
far more detail than anyone ever wanted. The default function that I put in
the code above is our function translated to $0$. If you look at the first $10$
terms of the sequence (that is, set $N=5$) and work with $x_0 - 1 = 1$
(since we translated everything left by $1$) we find</p>
\[x_n - 1 \approx
\frac{5}{8} \left ( \frac{-1}{2} \right )^{n} +
\frac{3}{8} \left ( \frac{-1}{2} \right )^{2n} -
\frac{1}{8} \left ( \frac{-1}{2} \right )^{3n} +
\frac{1}{8} \left ( \frac{-1}{2} \right )^{4n}\]
<p>For $n = 10$, say, we would expect</p>
\[x_n \approx 1.0006107\]
<p>the actual answer is</p>
\[x_n = 1.0006512\]
<p>which, seeing as $x_0 = 2$ is pretty far away from $1$ (the limit),
$5$ is a pretty small number of terms to use,
and $10$ really isn’t <em>that</em> many iterations, is good enough for me.</p>
<p>Of course, you should look at the statistics in the output of the code above
to see how close we get for $n=100$, or any other number you like ^_^.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:2" role="doc-endnote">
<p>I know the basics, but there’s some real black magic people are
able to do by considering what type of singularities your function has.
This seems to be outlined in Flajolet and Sedgewick’s <em>Analytic Combinatorics</em>,
but every time I’ve tried to read that book I’ve gotten quite lost quite
quickly. I want to find some other references at some point, ideally at a
slower pace, but if I never do I’ll just have to write up a post about it
once I finally muster up the energy to really understand that book. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>It turns out you can sometimes say things if $|a_1| = 1$. But convergence
is slow (if you get it at all) and the entire discussion is a bit more
delicate. You should see de Bruijn’s <em>Asymptotic Methods in Analysis</em>
(chapter $8$) for more details. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>Notice these techniques can’t remove the $\epsilon$. For instance,
$n C^n = O((C+\epsilon)^n)$ for each $\epsilon$, but is <em>not</em> $O(C^n)$. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Which is apparently called <a href="https://en.wikipedia.org/wiki/Schr%C3%B6der%27s_equation">Schröder’s equation</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Thu, 17 Jun 2021 00:00:00 +0000
https://grossack.site/2021/06/17/iteration-asymptotics.html
https://grossack.site/2021/06/17/iteration-asymptotics.htmlHow Many Group Structures on a Set?<p>And so ends my first year of grad school. I’m pretty tired, and my mental
health has taken a turn for the worse, though it’s hard to piece together if
the last few weeks were tiring because my mental health was declining, or if
my mental health is in decline because the last few weeks were tiring. Probably
a little bit of both. Anyways, I have some free time again and a backlog of
ideas for blog posts. Speaking of, now that my life update is out of the way,
let’s see a kind of cute computation!</p>
<p>So the other day someone on mse <a href="https://math.stackexchange.com/q/4166508/655547">asked</a>:</p>
<div class="boxed">
<p>Given a random binary operation on a finite set $G$, what is the probability
that it makes $G$ into a group?</p>
</div>
<p>The answer is, of course, vanishingly small. But it’s interesting to see
<em>how</em> vanishingly small. The answer is actually quite memorable!</p>
<p>We can get a lower bound by assuming $|G| = n$ is prime. After all, there is only
one group structure that will work in this case, so this is the lowest possible.
The only thing we can do is rename the elements, so there are $n!$ many group
operations on $G$.</p>
<p>We can get an upper bound if we assume $|G| = 2^n$. We also have to assume the
(widely believed) conjecture that “most groups are $2$ groups” in order to know
that this is an upper bound. At the very least, it is an upper bound for groups
of size at most $2000$, since <a href="https://math.stackexchange.com/q/241369/655547">$99\%$ of these groups have order $1024$</a>.</p>
<p>The same mse question I just linked provides an asymptotic formula for the
number of group structures on a set of size $N = 2^n$. Multiplying by $N!$
because we are interested in all group structures, not just groups up to
isomorphism, we find an upper bound of (very roughly) $N! \ N^{\frac{2}{27} \log(N)^2}$</p>
<p>Putting our upper and lower bounds together, we see</p>
\[N! \leq
\text{ \# group structures on a set of size $N$ } \leq
N! \ N^{\frac{2}{27} \log(N)^2}\]
<p>But by approximating $N! \approx \left ( \frac{N}{e} \right )^N$, we get</p>
\[e^{-N} N^N \leq
\text{ \# group structures on a set of size $N$ } \leq
e^{-N} N^{N + \frac{2}{27} \log(N)^2}\]
<p>logging everything in sight shows the number of group structures is
$\Theta(N \log(N))$. We can write this as the (rather memorable)</p>
\[\text{\# group structures on a set of size $N$} = N^{\Theta(N)}\]
<p>and to finally answer the problem, there are $N^{N^2}$ many distinct binary
operations on a set of size $N$. So the probability that a random one is a
group operation decays like $N^{- \Theta(N^2)}$, which is vanishingly small,
as promised.</p>
Thu, 17 Jun 2021 00:00:00 +0000
https://grossack.site/2021/06/17/how-many-groups.html
https://grossack.site/2021/06/17/how-many-groups.html