https://grossack.site
Linearly Ordered Groups and CH<p>Earlier today <a href="https://math.jonathanalcaraz.com/">Jonathan Alcaraz</a> gave a GSS talk about
Linearly Ordered (LO) Groups, which are a fun topic with connections to
dynamics, topology, geometric group theory, etc. This reminded me of a
problem I told myself to think about a while ago, and so I decided to
finally do that. After a bit of thought, a friend from CMU (Pedro Marun)
and I were able to figure it out. This post is going to be somewhat
more meandering than usual (if you can imagine such a thing), because I want
to showcase what the flow of thoughts was in solving the problem.
At the end I’ll clean things up and write them linearly.</p>
<p>I guess we should start with what a LO Group is, but it’s pretty much
what it says on the tin:</p>
<div class="boxed">
<p>A (Left) <span class="defn">Linearly Ordered Group</span> is a
group $G$ equipped with a total order $\leq$ which is compatible
with (left) multiplication in the following sense:</p>
<p>\(g_1 \leq g_2 \quad \implies \quad hg_1 \leq hg_2\)</p>
</div>
<p>I first heard of LO Groups from an exercise in Marker’s
“Model Theory: An Introduction”, where an exercise has you
use compactness to show every torsion free abelian group admits
a compatible linear order. I heard about them again on mse,
to the surprise of nobody. Somebody <a href="https://math.stackexchange.com/q/3928388/655547">asked</a> for examples of
finitely generated left orderable groups. I knew about the abelian
example because of Marker, but I was curious about nonabelian examples.</p>
<p>This led me down a rabbit hole of papers to skim, including
Katheryn Mann’s “Left Orderable Groups that Don’t Act on the Line”
(see <a href="https://e.math.cornell.edu/people/mann/papers/germsatinfinity.pdf">here</a>). This paper mentions a classical result:</p>
<div class="boxed">
<p>A countable group is LO if and only if it embeds in $\text{Homeo}_+(\mathbb{R})$,
the group of orientation preserving homeomorphisms of $\mathbb{R}$.</p>
<p>The order on $\text{Homeo}_+(\mathbb{R})$ is as follows:</p>
<p>Enumerate \(\mathbb{Q} = \{q_n\}\). Then we say $f \lt g$ exactly when
$f q_i \lt g q_i$, where $i$ is least with $f q_i \neq g q_i$
(this is more or less the lex order on $\prod_{\omega} \mathbb{R}$).</p>
</div>
<p>As soon as I saw this, I wondered if anything was special about “countable” here.
If we assume the Continuum Hypothesis (CH) fails, what can we say about other
LO groups of size $\lt \mathfrak{c}$? Do they all have to embed in
$\text{Homeo}_+(\mathbb{R})$ as well?</p>
<p>I keep a list of “problems to think about”, so I added this and left some
brief thoughts before going back to answering mse questions.</p>
<hr />
<p>When Jonathan brought up this theorem in his talk, it reminded me to think
about that problem. I found a proof of the result to see if it immediately
worked for larger cardinalities, and much to my surprise it relies <em>heavily</em>
on the countability of $G$! This is a summary of a proof from Clay and Rolfsen’s
“Ordered Groups and Topology” (see <a href="https://arxiv.org/abs/1511.05088">here</a>), Theorem 2.23.</p>
<p>$\ulcorner$
Let $G$ be countable and LO. Then by looking at $G \times \mathbb{Q}$ with
the lex order, we can assume $G$’s ordering is dense. Moreover, it is easy
to see that $G$ is torsion free, so for any element $g$, there is always
some $g_L \lt g$ and some $g_R \gt g$ ($g^{-1}$ works for one, and $g^2$ for the other).</p>
<p>So $G$ is a countable dense linear order without endpoints! If you’re a
logician your heart should be leaping now. Cantor’s famed
<a href="https://en.wikipedia.org/wiki/Back-and-forth_method">back and forth argument</a> shows that <em>any</em> such ordering is isomorphic
(as an order) to $(\mathbb{Q}, \lt)$. It was really exciting to see this
familiar face pop up in this proof! But since $(G, \lt) \cong (\mathbb{Q}, \lt)$
embeds densely in $(\mathbb{R}, \lt)$, we can extend the left action of $G$ on
itself to a homeomorphism of $\mathbb{R}$.
<span style="float:right">$\lrcorner$</span></p>
<p>This theorem relies on a back and forth argument for most of the heavy lifting,
and that argument fails <em>spectacularly</em> for uncountable cardinalities.
In fact, for any uncountable $\kappa$ there are $2^\kappa$ nonisomorphic
dense linear orders without endpoints of cardinality $\kappa$
(see <a href="https://math.stackexchange.com/q/2580875/655547">here</a>, for instance). This made me start wondering if the theorem is
actually <em>false</em> for groups of size, say, $\aleph_1$.</p>
<p>I texted Pedro, a close friend and set theorist, with some ideas that he
pretty quickly found flaws in. He had a good idea, though, and reminded me
that $\mathbb{R}$ doesn’t contain any chains of length $\omega_1$. That is,
there’s no monotone function $f : \omega_1 \to \mathbb{R}$.</p>
<p>I thought if we could find a LO group $G$ with some $\omega_1$ chain,
then we should be done. My thought process was baiscally:</p>
<ul>
<li>If \(G \hookrightarrow \text{Homeo}_+(\mathbb{R})\), then $G \curvearrowright \mathbb{R}$.</li>
<li>If \(\{ g_\alpha \}_{\omega_1}\) is a chain in $G$, then \(\{ g_\alpha x \}_{\omega_1}\) should be a chain in $\mathbb{R}$ once we pick
some initial value $x$.</li>
</ul>
<p>Of course, this turned out to be wrong too. It’s not hard to find homeomorphisms
$f \lt g$ where $fx \not \lt gx$. It was a good start, though, on the way to
the right answer.</p>
<p>If nothing else, we should just build such a group to show we know how, right?
This is a simple compactness argument:</p>
<ul>
<li>Look at the language of ordered groups, but add \(\omega_1\) many ~ bonus constants ~ \(x_\alpha\).</li>
<li>Look at the theory which includes the sentences
<ul>
<li>“I am an ordered group”</li>
<li>”\(x_\alpha \lt x_\beta\)” for each $\alpha \lt \beta$.</li>
</ul>
</li>
<li>Now each finite subtheory only refers to finitely many constants, so $\mathbb{Z}$ is a model.</li>
<li>Then compactness buys us a model of the whole theory – an ordered group with a chain of length $\omega_1$.</li>
<li>Now by Lowenheim-Skolem, we look at the elementary submodel containing this chain.
This is also an ordered group with a chain of length $\omega_1$, but it’s guaranteed to
have cardinality $\aleph_1$.</li>
</ul>
<p>So we’ve successfully found a LO group of size $\aleph_1$ which contains an
increasing chain of length $\omega_1$… But didn’t we say this doesn’t actually
solve our problem?</p>
<p>This is where I remembered a fact from Descriptive Set Theory: For a
compact metric space $X$, we actually know that \(\text{Homeo}(X)\) is
<a href="https://en.wikipedia.org/wiki/Polish_space">polish</a> (see Kechris’s “Classical Descriptive Set Theory”, I.9B, example 8).
There’s a classic argument that $\mathbb{R}$ doesn’t contain any chains of
length $\omega_1$ which seems to only use separability<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">1</a></sup>, and the
dream would be to show this continues to hold in a general polish space.</p>
<p>Of course, we <em>also</em> need to check that \(\text{Homeo}_+(\mathbb{R})\) is
actually polish. The above theorem only guarantees polishness for <em>compact</em>
spaces $X$, and the reals are (among other things) not compact.</p>
<p>First, I searched for “borel ordering” in Kechris’s book, and found a
reference to Harrington, Marker, and Shelah’s “Borel Orderings”
(see <a href="https://www.ams.org/journals/tran/1988-310-01/S0002-9947-1988-0965754-3/S0002-9947-1988-0965754-3.pdf">here</a>). Corollary 3.2 gives exactly what we want<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote">2</a></sup>, but it’s phrased
in terms of subsets of $\mathbb{R}$… But now we know what to search for,
and we quickly find a <a href="https://math.stackexchange.com/q/184200/655547">mse question</a> which cites the paper and makes me
feel confident that I’m not misinterpreting it.</p>
<p>All that’s left is to show $\text{Homeo}_+(\mathbb{R})$ is really polish,
but our journey ends like it began, on <a href="https://math.stackexchange.com/q/732380/655547">mse</a>.</p>
<div class="boxed">
<p>As a nice exercise, can you show that the order on $\text{Homeo}_+(\mathbb{R})$
really is borel? That is, can you show</p>
\[\{ (f,g) ~|~ f \leq g \}
\subseteq
\text{Homeo}_+(\mathbb{R}) \times \text{Homeo}_+(\mathbb{R})\]
<p>is a borel subset?</p>
</div>
<hr />
<p>Ok. Now that the exposition is out of the way, we’re holding a draft of a
proof in our heads. It was a wandering path, but look how deceptively simple
it looks once we organize our thoughts and write it down:</p>
<div class="boxed">
<p>Theorem ($\lnot \mathsf{CH}$):</p>
<p>There exists a LO group of size $\aleph_1$ which does not embed in
$\text{Homeo}_+(\mathbb{R})$</p>
</div>
<p>$\ulcorner$
Since $(\mathbb{Z}, \lt)$ is an infinite LO group, a standard
logical compactness argument furnishes an LO group of size
$\aleph_1$ which contains an increasing sequence \(\{g_\alpha\}\)
of length $\omega_1$. Call such a group $G$.</p>
<p>Then since \(\text{Homeo}_+(\mathbb{R})\) is a polish space
(cf. <a href="https://math.stackexchange.com/q/732380/655547">here</a>) and its ordering is borel, a theorem of
Shelah and Harrington (cf. Corollary 3.2 <a href="https://www.ams.org/journals/tran/1988-310-01/S0002-9947-1988-0965754-3/S0002-9947-1988-0965754-3.pdf">here</a>)
shows that no chain of length $\omega_1$ can exist in
$\text{Homeo}_+(\mathbb{R})$.</p>
<p>Since $G$ contains such a chain, it cannot embed into
$\text{Homeo}_+(\mathbb{R})$.
<span style="float:right">$\lrcorner$</span></p>
<p>Can you believe that teeny little proof took <em>hours</em> of reading and thinking
(times two people, no less!) to figure out? It really makes you appreciate
how much work goes into some of the longer and tricker theorems you come across.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Any two elements
\(x_\alpha \lt x_{\alpha_1}\) must contain a rational between them. Since
there’s only countably many rationals, we can’t have a chain of uncountable
length. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>And after looking through the paper, I’m <em>extremely</em> grateful I didn’t
try to stubbornly prove it myself. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Fri, 26 Feb 2021 00:00:00 +0000
https://grossack.site/2021/02/26/lo-groups.html
https://grossack.site/2021/02/26/lo-groups.htmlSage Sums<p>Today I learned that <a href="https://www.sagemath.org">sage</a> can automatically
simplify lots of sums for us by interfacing with <a href="https://maxima.sourceforge.io/">maxima</a>.
I also learned recently that the <code class="language-plaintext highlighter-rouge">init.sage</code> file exists, which let me fix some
minor gripes with my sage. Notably, I was able to add commands <code class="language-plaintext highlighter-rouge">aa</code> and <code class="language-plaintext highlighter-rouge">nn</code>
which automatically get ascii art or a numeric answer for the most recent
expression! This is going to be a short post just to highlight how these things
work, since I had to figure them out for myself.</p>
<p>I was reading Concrete Mathematics the other night when I came across the
section on <a href="https://en.wikipedia.org/wiki/Gosper%27s_algorithm">Gosper’s Algorithm</a>.
This promises to solve a large class of sums
(the <a href="https://en.wikipedia.org/wiki/Hypergeometric_function">hypergeometric</a> ones)
algorithmicaly, which is extremely alluring. I periodically find myself trying to
simplify tricky sums (either for mse questions or for some problem I’m thinking about)
and it would be nice to offload that thinking to a computer.</p>
<p>Unfortunately, googling around for “gosper’s method sage” and similar didn’t
actually give anything useful (at least not quickly). In hindsight, it turns
out there’s actually a <code class="language-plaintext highlighter-rouge">gosper_sum</code> built in
(see the <a href="https://doc.sagemath.org/html/en/reference/calculus/sage/symbolic/expression.html#sage.symbolic.expression.Expression.gosper_sum">docs</a>),
but for some reason I didn’t find it at the time. After some searching I instead
found a <a href="https://github.com/benyoung/AeqB-sage">github repo</a> that coded
up gosper’s algorithm, as well as a bunch of other algorithms from a book.
This was how I found my way to Petkovsek, Wilf, and Zeilberger’s
<a href="https://www2.math.upenn.edu/~wilf/AeqB.html">A=B</a>, which has lots of similar
algorithms for algorithmically simplifying sums. It’s an extremely interesting
read, both mathematically and historically, and I’ve been enjoying it so far.</p>
<p>Before I learned it was built-in after all, I was planning to put up a blog post
with an implementation of gosper’s algorithm so that people could come here to
simplify their sums. Thankfully, I did eventually find the implementation, which
saved me a bunch of coding! Sage actually aliases over
python’s default <code class="language-plaintext highlighter-rouge">sum</code> function, and will pass off symbolic sums to maxima
where they’re evaluated using tons of powerful techniques (one of which is
gosper’s algorithm). The reason I was having trouble finding a function for this
was (in part) because it’s baked into the <code class="language-plaintext highlighter-rouge">sum</code> function already!</p>
<p>I’m still putting up this post, in part to share this realization with
anyone else who didn’t know about it (which probably isn’t many people…),
but in part to still provide a place where people can come to simplify sums.
In case you don’t have a sage installation on your own computer, you can
modify one of the examples below and evaluate your favorite summation here!</p>
<p>Let’s see some quick examples. The syntax <code class="language-plaintext highlighter-rouge">sum(f,k,a,b)</code> corresponds to
$\sum_{k=a}^b f$.</p>
<div class="auto">
<script type="text/x-sage">
n,k = var('n,k')
# I think we're legally obligated to make this our first sum.
# hold=True keeps it from evaluating
sum1 = sum(binomial(n,k),k,0,n, hold=True)
# so that we can display the original sum as the LHS here
# unhold then lets the expression evaluate as it would naturally
show(sum1 == sum1.unhold())
# You can also define a symbolic function, then use it in the sum
f = k * binomial(n,k)
sum2 = sum(f,k,0,n, hold=True)
show(sum2 == sum2.unhold())
</script>
</div>
<p>This can solve fairly complex sums. These come from an exercise in A=B
(exercise 1d and 1e from chapter 5). After seeing the solutions,
I’m definitely glad I didn’t have to work them out by hand!</p>
<div class="auto">
<script type="text/x-sage">
n,k = var('n,k')
soln_d = sum(k^4 * 4^k / binomial(2*k,k), k, 0, n, hold=True)
show(soln_d == soln_d.unhold())
f = factorial(3*k) / (factorial(k) * factorial(k+1) * factorial(k+2) * 27^k)
soln_e = sum(f,k,0,n, hold=True)
show(soln_e == soln_e.unhold())
</script>
</div>
<p>This has already helped me “in the wild”. There is an
<a href="https://math.stackexchange.com/q/4039066/655547">mse question</a>
which asked about the sum $\sum_{n=0}^\infty \frac{16n^2 + 20n + 7}{(4n+2)!}$.
A commenter asks whether OP wants a closed form, or merely a convergence result.
The sum certainly <em>looks</em> like it doesn’t admit a nice closed form, but I’ve
been deceived before. Instead of wasting a few minutes trying to find a
nice closed form (which is what I would have done even a few days ago),
we can simply ask sage:</p>
<div class="auto">
<script type="text/x-sage">
n = var('n')
f = (16*n^2 + 20*n + 7) / factorial(4*n + 2)
# I also just learned oo is an alias for Infinity!
soln = sum(f,n,0,oo, hold=True)
show(soln)
print("This is exactly: ")
show(soln.unhold())
print("This is approximately: ")
show(soln.unhold().n())
</script>
</div>
<p>Sage happily computed a closed form for this sum… It just happens to use
a bunch of hypergeometric functions! This pretty quickly answers the
“does OP want a closed form” question, assuming OP’s professor isn’t a sadist<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">1</a></sup>.</p>
<hr />
<p>As an aside, in the above example we computed something exactly, but then
used <code class="language-plaintext highlighter-rouge">.n()</code> in order to get a numerical value
(which is often better for getting a sense of things). Since sage will let you
write <code class="language-plaintext highlighter-rouge">_</code> to get the output of the last command, <code class="language-plaintext highlighter-rouge">_.n()</code> is probably my most
typed command. Either that or <code class="language-plaintext highlighter-rouge">ascii_art(_)</code>, which draws an ascii version of
whatever your most recent output was. Since I use sage in a terminal, rather
than a jupyter notebook, this saves me endless parsing related headaches
when it comes to actually reading sage’s output.</p>
<p>If you also find yourself using these commands all the time, I can’t recommend
the following aliases enough. These are part of my <code class="language-plaintext highlighter-rouge">init.sage</code>, and have changed
my life. If you want to see my entire sage configuration, you can find it
(with the rest of my dotfiles)
<a href="https://github.com/HallaSurvivor/dotfiles/blob/master/init.sage">here</a>.</p>
<div class="no_out">
<script type="text/x-sage">
# get the ipython instance so we can
# do black magic with our repl
_ipy = get_ipython()
# add a macro so typing nn will
# automatically convert the most
# recent output to a numeric.
_ipy.define_macro('nn', '_.n()')
# add a macro so typing aa will
# automatically run ascii_art
# on the most recent output.
_ipy.define_macro('aa', 'ascii_art(_)')
</script>
</div>
<hr />
<p>Before I end this post, there are a few more parting observations
that I want to squeeze in.</p>
<p>First, sage can solve recurrences for you as well,
by using either maxima’s <code class="language-plaintext highlighter-rouge">solve_rec</code> or
<a href="https://www.sympy.org/en/index.html">sympy</a>’s <code class="language-plaintext highlighter-rouge">rsolve</code>. I
have a wrapper in my <code class="language-plaintext highlighter-rouge">init.sage</code> that makes using the latter
slightly more convenient.</p>
<p>Second, if you’re faced with a particularly stubborn sum that sage won’t
simplify for you, you should try using maxima directly. You can actually
do this from inside sage by using <code class="language-plaintext highlighter-rouge">maxima.console()</code> and then loading the
<code class="language-plaintext highlighter-rouge">simplify_sum</code> package. You can see a worked out example
<a href="https://stackoverflow.com/a/28663533/3911897">here</a>, and you can see all the
high-power tools that <code class="language-plaintext highlighter-rouge">simplify_sum</code> buys you
<a href="https://github.com/andrejv/maxima/blob/master/share/solve_rec/simplify_sum.mac">here</a>.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Rather magically, though, this numerically agrees with $e + \sin(1)$ up
to all available digits, according to an
<a href="http://wayback.cecm.sfu.ca/cgi-bin/isc/lookup?number=3.55975281326694&lookup_type=simple">inverse symbolic calculator</a>.
Sage says this is a fluke (that is, they aren’t actually equal), but it’s an
extremely bizarre coincidence. Life is full of mysteries. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Thu, 25 Feb 2021 00:00:00 +0000
https://grossack.site/2021/02/25/sage-sums.html
https://grossack.site/2021/02/25/sage-sums.htmlMeasure Theory and Differentiation (Part 1)<p>So I had an analysis exam <s>yesterday</s> <s>last week</s> a while ago
(this post took a bit of time to finish writing). It roughly covered the material in
chapter 3 of Folland’s “Real Analysis: Modern Techniques and Their Applications”.
I’m decently comfortable with the material, but a lot of it has always felt
kind of unmotivated. For example, why is the <a href="https://en.wikipedia.org/wiki/Lebesgue_differentiation_theorem">Lebesgue Differentiation Theorem</a>
called that? It doesn’t <em>look</em> like a derivative… At least not at first glance.</p>
<p>A big part of my studying process is fitting together the various theorems
into a coherent narrative. It doesn’t have to be linear
(in fact, it typically isn’t!), but it should feel like the theorems share
some purpose, and fit together neatly.
I also struggle to care about theorems before I know what they do. This is
part of why I care so much about examples – it’s nice to know what problems
a given theorem solves.</p>
<p>After a fair amount of reading and thinking<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">1</a></sup>, I think I’ve finally fit the
puzzle pieces together in a way that works for me. Since I wrote it all down
for myself as part of my studying, I figured I would post it here as well in
case other people find it useful. Keep in mind this is probably obvious to
anyone with an analytic mind, but it certainly wasn’t obvious to me!</p>
<p>Let’s get started!</p>
<hr />
<p>To start, we need to remember how to relate functions and measures. Everything
we say here will be in $\mathbb{R}$, and $m$ will be the ($1$-dimensional)
Lebesgue Measure.</p>
<div class="boxed">
<p>If $F$ is increasing and continuous from the right, then there is a
(unique!) regular borel measure $\mu_F$
(called the <a href="https://en.wikipedia.org/wiki/Lebesgue%E2%80%93Stieltjes_integration">Lebesgue-Stieltjes Measure</a> associated to $F$)
so that</p>
\[\mu_F((a,b]) = F(b) - F(a)\]
<p>Moreover, given any regular borel measure $\mu$ on $\mathbb{R}$, the function</p>
\[F_\mu \triangleq
\begin{cases}
\mu((0,x]) & x \gt 0 \\
0 & x = 0 \\
-\mu((x,0]) & x \lt 0
\end{cases}\]
<p>is increasing and right continuous.</p>
</div>
<p>This is more or less the content of the <a href="https://en.wikipedia.org/wiki/Carath%C3%A9odory%27s_extension_theorem">Carathéodory Extension Theorem</a>.
It’s worth taking a second to think where we use the assumptions on $F$.
The fact that $F$ is increasing means our measure is positive. Continuity
from the right is a bit more subtle, though. Since $F_\mu$ is always right
continuous, we need to assume our starting function is right continuous
in order to guarantee $F_{\mu_F} = F$.</p>
<p>This is not a big deal, though. A monotone function is automatically continuous
except at a countable set (see <a href="https://math.stackexchange.com/questions/84870/how-to-show-that-a-set-of-discontinuous-points-of-an-increasing-function-is-at-m">here</a> for a proof) and at its countably many
discontinuities, we can force right-continuity by defining</p>
\[\tilde{F}(x_0) \triangleq \lim_{x \to x_0^+} F(x)\]
<p>which agrees with $F$ wherever $F$ is continuous.
If we put our probabilist hat on, we say that $F_\mu$ is the
<span class="defn">Cumulative Distribution Function</span> of $\mu$.
Here $F_\mu(x)$ represents the total (cumulative) mass we’ve seen so far.</p>
<p>It turns out that Lebesgue-Stieltjes measures are extremely concrete, and
a lot of this post is going to talk about computing with them<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote">2</a></sup>. After all,
it’s entirely unclear which (if any!) techniques from a calculus class carry
over when we try to actually integrate against some $\mu_F$. Before we can
talk about computation, though, we have to recall another (a priori unrelated)
way to relate functions to measures:</p>
<div class="boxed">
<p>Given a positive, locally $L^1$ function $f$, we can define the regular measure $m_f$ by</p>
\[m_f(E) \triangleq \int_E f dm\]
<p>Moreover, if $m_f = m_g$, then $f=g$ almost everywhere.</p>
</div>
<p>The locally $L^1$ conditions says that $\int_E f dm$ is finite
whenever $E$ is bounded. It’s not hard to show that this is equivalent to
the regularity of $m_f$, which we’ll need shortly.</p>
<p>Something is missing from the above theorem, though.
We know sending $F \rightsquigarrow \mu_F$ is
faithful, in the sense that $F = F_{\mu_F}$ and $\mu_{F_\mu} = \mu$. We’ve
now introduced the measure $m_f$, but we didn’t say how to recover $f$
from $m_f$… Is it even possible? The answer is yes, as a corollary of a
much more powerful result:</p>
<div class="boxed">
<p><span class="defn">Lebesgue-Radon-Nikodym Theorem</span></p>
<p>Every measure $\mu$ decomposes (uniquely!) as</p>
\[\mu = \lambda + m_f\]
<p>for some measure $\lambda \perp m$ and some function $f$.</p>
<p>Moreover, we can recover $f$ from $\mu$ as<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote">3</a></sup></p>
\[f(x) = \lim_{r \to 0} \frac{\mu(B_r(x))}{m(B_r(x))}\]
<p>for almost every $x$. Here, as usual $B_r(x) = (x-r,x+r)$ is the ball of
radius $r$ about $x$.</p>
<p>People often write $f = \frac{d \mu}{dm}$, and call it the
<span class="defn">Radon-Nikodym Derivative</span>. Let’s see why.</p>
</div>
<p>In the case $\mu = m_f$, then this shows us how to recover
$f$ (uniquely) from $m_f$, and life is good:</p>
\[\frac{d m_f}{dm} = f\]
<p>The converse needs a ~bonus condition~. In order to say
$\mu = m_{\frac{d\mu}{dm}}$, we need to know that $\mu$ is
<a href="https://en.wikipedia.org/wiki/Absolute_continuity#Absolute_continuity_of_measures">absolutely continuous</a> with respect to $m$, written $\mu \ll m$.</p>
<div class="boxed">
<p>As an exercise, do you see why this condition is necessary? If
$\mu \not \ll m$, why don’t we have a chance of writing $\mu = m_f$
for any $f$?</p>
</div>
<p>In the case of Lebesgue-Stieltjes measures, Lebesgue-Radon-Nikodym buys us
something almost magical. For almost every $x$, we see:</p>
\[\begin{aligned}
\frac{d\mu_F}{dm}
&= \lim_{r \to 0} \frac{\mu_F(B_r(x))}{m(B_r(x))} \\
&= \lim_{r \to 0} \frac{F(x+r) - F(x-r)}{x+r - (x-r)} \\
&= \lim_{r \to 0} \frac{F(x+r) - F(x-r)}{2r} \\
&= F'(x)
\end{aligned}\]
<p>Now we see why we might call this $f$ the Radon-Nikodym <em>derivative</em>. In
the special case of Lebesgue-Stieltjes measures, it literally <em>is</em> the
derivative. We saw earlier that $F = F_{m_f}$ acts like an antiderivative of $f$,
and now we see $f = \frac{d \mu_F}{dm}$ works as a derivative of $F$ as well!</p>
<p>In fact, yet more analogies are true! Let’s take a look at the
<span class="defn">Lebesgue Differentiation Theorem</span></p>
<div class="boxed">
<p>For almost every $x$, we have:</p>
<p>\(\lim_{r \to 0} \frac{1}{m B_r(x)} \int_{B_r(x)} f(t) dm = f(x)\)</p>
</div>
<p>Why is this called the <em>differentiation</em> theorem?
Let’s look at $F_{m_f}$, which you should remember is a kind of antiderivative
for $f$.</p>
<p>For $x > 0$ (for simplicity), we have $F_{m_f}(x) = m_f((0,x]) = \int_{(0,x]} f dm$.
If we rewrite the theorem in terms of $F_{m_f}$, what do we see?</p>
\[\begin{aligned}
f(x)
&= \lim_{r \to 0} \frac{1}{m B_r(x)} \int_{B_r(x)} f dm \\
&= \lim_{r \to 0} \frac{1}{(x+r) - (x-r)} \int_{x-r}^{x+r} f dm \\
&= \lim_{r \to 0} \frac{1}{2r} \left ( \int_{0}^{x+r} f dm - \int_{0}^{x-r} f dm \right )\\
&= \lim_{r \to 0} \frac{F_{m_f}(x+r) - F_{m_f}(x-r)}{2r} \\
&= F_{m_f}'(x)
\end{aligned}\]
<p>So this is giving us part of the fundamental theorem of calculus<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup>! This theorem
(in the case of Lebesgue-Stieltjes measures) says exactly that (for almost every $x$)</p>
\[\left ( x \mapsto \int_0^x f dm \right )' = f(x)\]
<p>Let’s take a moment to summarize the relationships we’ve seen. Then we’ll
use these relationships to actually <em>compute</em> with Lebesgue-Stieltjes integrals.</p>
<div class="boxed">
\[\bigg \{ \text{increasing, right-continuous functions $F$} \bigg \}
\leftrightarrow
\bigg \{ \text{regular borel measures $\mu_F$} \bigg \}\]
\[\bigg \{ \text{positive locally $L^1$ functions $f$} \bigg \}
\leftrightarrow
\bigg \{ \text{regular borel measures $m_f \ll m$} \bigg \}\]
<p>Moreover:</p>
<ul>
<li>
<p>By considering $F_{m_f}$ we see functions of the first kind are antiderivatives
of functions of the second kind.</p>
</li>
<li>
<p>By considering $\frac{d \mu_F}{dm}$, we see functions of the second kind
are (almost everywhere) derivatives of functions of the first kind.</p>
</li>
<li>
<p>Indeed, $\frac{d \mu_F}{dm} = F’$ almost everywhere.</p>
</li>
<li>
<p>And $F_{m_f}’ = f$ almost everywhere.</p>
</li>
</ul>
</div>
<hr />
<p>Why should we care about these theorems? Well, Lebesgue-Stieltjes integrals
arise fairly regularly in the wild, and these theorems let us actually
compute them! It’s easy to integrate against $m_f$, since monotone convergence
gives us $\int g dm_f = \int g f dm$.</p>
<p>Then this buys us the (very memorable) formula:</p>
\[\int g d \mu_F = \int g \frac{d \mu_F}{dm} dm = \int g F' dm\]
<p>and now we’re integrating against lebesgue measure, and all our years of
calculus experience is applicable!</p>
<p>Of course, I’ve left out an important detail: Whatever happened to that
measure $\lambda$?
The above formula is true exactly when $F$ is continuous everywhere. At points
where it is <em>discontinuous</em> we need to change it slightly by using
$\lambda$. These are called <a href="https://en.wikipedia.org/wiki/Singular_measure">singular measures</a>, and they can be
pretty <a href="https://en.wikipedia.org/wiki/Cantor_distribution">pathological</a>. A good first intuition, though, is to think of them
like <a href="https://en.wikipedia.org/wiki/Dirac_measure">dirac measures</a>, and that’s the case that we’ll focus on in this post<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote">5</a></sup>.</p>
<p>Let’s write \(H = \begin{cases} 0 & x \lt 0 \\ 1 & 0 \leq x \end{cases}\).
This us usually called the <span class="defn">heaviside function</span>.</p>
<p><img src="/assets/images/lebesgue-ftc-1/heaviside.png" /></p>
<p>Recall our interpretation of this function: $H(x)$ is supposed to
represent the mass of $(-\infty, x]$. So as we scan from left to right,
we see the mass is constantly $0$ until we hit the point $0$. Then suddenly
we jump up to mass $1$. But once we get there, our mass stays constant again.</p>
<p>So $H$ thinks that $0$ has mass $1$ all by itself, and thinks that there’s
no other mass at all!</p>
<p>Indeed, we see that</p>
\[\mu_H((a,b]) = H(b) - H(a) = \begin{cases} 1 & 0 \in (a,b] \\ 0 & 0 \not \in (a,b] \end{cases}\]
<p>So $\mu_H$ is just the dirac measure at $0$ (or $\delta_0$ to its friends)!
Notice this lets us say the “derivative” of $H$ is $\delta_0$, by analogy
with the Lebesgue-Stieltjes case. Or conversely, that $H$ is the
“antiderivative” of $\delta_0$. This shows us that recasting calculus in
this language actually buys us something new, since there’s no way to make
sense of $\delta_0$ as a traditional function.</p>
<p>It’s <em>finally</em> computation time! Since we know $\int g d\delta_0 = g(0)$,
and (discrete) singular measures look like (possibly infinite) linear
combinations of dirac measures, this lets us compute
all increasing right-continuous Lebesgue-Stieltjes measures that are likely
to arise in practice. Let’s see some examples! If you want to see more,
you really should look into Carter and van Brunt’s
“The Lebesgue-Stieltjes Integral: A Practical Introduction”. I mentioned
it in a footnote earlier, but it really deserves a spotlight. It’s full of
concrete examples, and is extremely readable!</p>
<hr />
<p>Let’s start with a continuous example. Say
\(F = \begin{cases} 0 & x \leq 0 \\ x^2 & x \geq 0 \end{cases}\).</p>
<p><img src="/assets/images/lebesgue-ftc-1/example1.png" /></p>
<p>So $\mu_F$ should think that everything is massless until we hit $0$.
From then on, we start gaining mass faster and faster as we move to the right.
If you like, larger points are “more dense” than smaller ones, and thus
contribute more mass in the same amount of space.</p>
<p>Say we want to compute</p>
\[\int_{-\pi}^\pi \sin(x) d \mu_F = \int_{-\pi}^\pi \sin(x) \cdot F' dm\]
<p>We can compute \(F' = \begin{cases} 0 & x \leq 0 \\ 2x & x \geq 0 \end{cases}\),
so we split up our integral as</p>
\[\int_{-\pi}^0 \sin(x) \cdot 0 dm + \int_0^\pi \sin(x) \cdot 2x dm\]
<p>But both of these are integrals against lebesgue measure $m$! So these are
just “classical” integrals, and we can use all our favorite tools.
So the first integral is $0$, and the second integral is $2\pi$
(integrating by parts). This gives</p>
\[\int_{-\pi}^\pi \sin(x) d \mu_F = 2\pi\]
<p>That wasn’t so bad, right?</p>
<hr />
<p>Let’s see another, slightly trickier one. Let’s look at
\(F = \begin{cases} x & x \lt 0 \\ e^x & x \geq 0 \end{cases}\)</p>
<p><img src="/assets/images/lebesgue-ftc-1/example2.png" /></p>
<p>You should think through the intuition for what $\mu_F$ looks like.
You can then test your intuition against a computation:</p>
\[\mu_F = \lambda + m_f\]
<p>In the previous example, $\lambda$ was the $0$ measure since our function
was differentiable everywhere. Now, though, we aren’t as lucky. Our
function $F$ is not differentiable at $0$, so we will have to work with
some nontrivial $\lambda$.</p>
<p>Let’s start with the places $F$ <em>is</em> differentiable. This gives us the
density function \(f = F' = \begin{cases} 1 & x \lt 0 \\ e^x & x \gt 0 \end{cases}\).</p>
<p>We can also see the point $0$ has mass $1$. In this case we can more or less
read this off the graph (since we have a discontinuity where we jump up by $1$),
but in more complex examples we would compute this by using
$\mu_F({ 0 }) = \lim_{r \to 0^+} F(r) - F(-r)$. You can see that this
does give us $1$ in this case, as expected. So we see (for $f$ as before)</p>
\[\mu_F = \delta_0 + m_f\]
<p>So to compute</p>
\[\int_{-1}^1 4 - x^2 d\mu_F =
\int_{-1}^1 4 - x^2 d(\delta_0 + m_f) =
\int_{-1}^1 4 - x^2 d \delta_0 + \int_{-1}^1 (4 - x^2)f dm\]
<p>we can handle the $\delta_0$ part and the $f dm$ part separately!</p>
<p>We know how to handle dirac measures:</p>
\[\int_{-1}^1 4 - x^2 d \delta_0 =
\left . (4 - x^2) \right |_{x = 0} = 4\]
<p>And we also know how to handle “classical” integrals:</p>
\[\int_{-1}^1 (4 - x^2) f dm =
\int_{-1}^0 (4 - x^2) dm + \int_0^1 (4 - x^2) e^x dm =
\frac{11}{3} + (3e-2)\]
<p>So all together, we get \(\int_{-1}^1 4 - x^2 d\mu_F = 4 + \frac{11}{3} + (3e-2)\).</p>
<div class="boxed">
<p>As an exercise, say
\(F = \begin{cases} e^{3x} & x \lt 0 \\ 2 & 0 \leq x \lt 1 \\ 2x+1 & 1 \leq x \end{cases}\)</p>
<p>Can you intuitively see how $\mu_F$ distributes mass?</p>
<p>Can you compute</p>
<p>\(\int_{-\infty}^2 e^{-2x} d\mu_F\)</p>
</div>
<div class="boxed">
<p>As another exercise, can you intuit how $\mu_F$ distributes mass when
$F(x) = \lfloor x \rfloor$ is the floor function?</p>
<p>What is $\int_1^\infty \frac{1}{x^2} d\mu_F$? What about $\int_1^\infty \frac{1}{x} d\mu_F$?</p>
</div>
<hr />
<p>Ok, I hear you saying. There’s a really tight connection between
increasing (right-)continuous functions $F$ on $\mathbb{R}$ and
positive integrable functions $f$. This connection is at its tightest
wherever $F$ is actually continuous, as then the measures $\mu_F$ and $m_f$
have a derivative relationship, which is reflected in the same derivative
relationship of functions $F’ = f$. Not only does this give us a way to
generalize the notion of derivative to functions that might not normally
have one (as in the case of the heaviside function and the dirac delta),
it gives us a concrete way of evaluating Lebesgue-Stieltjes integrals.</p>
<p>But doesn’t this feel restrictive? There are lots of functions $F$ which aren’t
(right-)continuous or increasing that we might be interested in differentiating.
There are <em>also</em> lots of nonpositive functions $f$ which we might be interested
in integrating. Since we got a kind of “fundamental theorem of calculus” from
these measure theoretic techniques, if we can show how to apply these techniques
to a broader class of functions, we might be able to get a more general
fundamental theorem of calculus.</p>
<p>Of course, to talk about more general functions $F$, we’ll need to allow
our measures to assign <em>negative</em> mass to certain sets. That’s ok, though,
and we can even go so far as to allow <em>complex</em> valued measures! In fact,
from what I can tell, this really is the raison d’être for signed and
complex measures. I was always a bit confused why we might care about these
objects, but it’s beginning to make more sense.</p>
<p>This post is getting pretty long, though, so we’ll talk about the signed
case in a (much shorter, hopefully) part 2!</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>I was mainly reading Folland (Ch. 3), since it’s the book for the course.
I’ve also been spending time with Terry Tao’s lecture notes on the subject
(see <a href="https://terrytao.wordpress.com/2010/10/16/245a-notes-5-differentiation-theorems/">here</a>, and <a href="https://terrytao.wordpress.com/2009/01/04/245b-notes-1-signed-measures-and-the-radon-nikodym-lebesgue-theorem/">here</a>), as well as
<a href="http://web.stanford.edu/~eugeniam/math205a/L3.pdf">this</a> PDF from Eugenia Malinnikova’s measure theory course
at Stanford. I read parts of Axler’s new book, and while I meant to read
some of Royden too, I didn’t get around to it. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>As an aside, I really can’t recommend Carter and van Brunt’s
“The Lebesgue-Stieltjes Integral: A Practical Introduction” enough.
It spends a lot of time on concrete examples of computation, which is
exactly what many measure theory courses are regrettably missing.
Chapter 6 in particular is great for this, but the whole book is excellent. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>We can actually relax this from balls $B_r(x)$ to a family ${E_r}$
that “shrinks nicely” to $x$, though it’s still a bit unclear to me
what that means and what it buys us. It seems like one important feature
is that the $E_r$ don’t have to contain $x$ itself. It’s enough to take up
a (uniformly) positive fraction of space near $x$. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>There’s another way of viewing this theorem which is quite nice. I
think I saw it on Terry Tao’s blog, but now that I’m looking for it I
can’t find it… Regardless, once we put on our nullset goggles, we
can no longer evaluate functions. After all, for any particular point
of interest, I can change the value of my function there without changing
its equivalence class modulo nullsets. However, even with our nullset
goggles on, the integral $\frac{1}{m B_r(x)} \int_{B_r(x)} f dm$ is well
defined! So for almost every $x$, we can “evaluate” $f$ through this
(rather roundabout) approach. The benefit is that this notion of evaluation
does not depend on your choice of representative! <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>In no small part because I’m not sure how you would actually integrate
against a singular continuous measure in the wild… <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Sun, 21 Feb 2021 00:00:00 +0000
https://grossack.site/2021/02/21/lebesgue-ftc-1.html
https://grossack.site/2021/02/21/lebesgue-ftc-1.htmlTalk - Why Think -- Letting Computers do Math for Us<p>Yesterday I gave my second talk at the Graduate Student Seminar at UCR.
I decided to talk about something near and dear to me, a topic which first
got me interested in logic: decidability.
The idea of decidability is to look for theories which are simple enough to
admit a computer program (called a <span class="defn">decider</span>)
which can tell you whether or not a given sentence is true.</p>
<p>I was first introduced to the decidability of certain logics in
Computational Discrete Math (<a href="https://cs.cmu.edu/~sutner/CDM/index.html">CDM</a>)
which was the class where I met my advisor Klaus Sutner. I’m lazy at times,
and so the idea of a decidable logic was really exciting. I wanted to
know as much as I could so that, one day, I would be able to answer a tricky
question by asking a computer!</p>
<p>Of course, now I realize that a lot of the time it’s faster to just
solve the problem yourself than ask a decider
(particularly if you have to write the decider yourself…) but
the interest has stuck with me. I think it’s a super cool topic,
and I was really pleased to get a chance to talk at length about it!</p>
<p>There are a <em>ton</em> of decidability results in logic, but only an hour
in a talk. Also, a lot of the techniques for proving decidability are
somewhat technical. This made drafting the talk a little bit tricky,
particularly since I was speaking to a room of non-logicians.
The best bet, I thought, was to try and survey a
few powerful techniques (completeness and automaticity)
and give a high level description of how the proofs go. Then I ended with a
discussion of some negative results, and a tangent on the
resolution of <a href="https://en.wikipedia.org/wiki/Hilbert%27s_tenth_problem">Hilbert’s 10th Problem</a>.
It’s not directly related, but Matiyasevich’s Theorem was my favorite
theorem for a while, and the material was so nearby that I couldn’t resist including it!</p>
<p>All in all, I think the talk went quite well ^_^. Klaus is (among other things)
an Automata Theorist, and working closely with him for a few years
made me love automata too. I was happy to get to share some of that
enthusiasm with people, even if I had to gloss over some details.
Plus, I’m really proud of the example computation done on an automaton.
I made it with Evan Wallace’s <a href="http://madebyevan.com/fsm">FSM Maker</a> and
then tweaked the tikz slide by slide to do the animation. Normally I would
draw the machine on a blackboard and point at the current state, but I think the colors served
as a nice substitute for a talk with slides.</p>
<p>I had someone ask me how the automatic structure is useful in
deciding a theory, and in hindsight I should have included a slide on that.
I’m glad they asked, though, because it gave me a chance to describe it
verbally. That’s a really obvious oversight, and I’m a bit upset that
people who only download the slides won’t get a chance to see an
overview of that proof. Maybe I’ll write it up in a blog post one day
and include a simple example…</p>
<p>As for the other material, it’s always harder to tell over zoom,
but I think people followed the talk fairly well.
The section on completeness was necessarily a bit technical, and I’m
almost certain I lost some people in the proof that $\mathsf{DLO}$ has
Quantifier Elimination. That said, I expected to lose some people there
(it’s not an obvious bit of combinatorics, even if it’s easy once you’re
familiar with it), so I’m not too upset about it. The point of that slide
was to show that Quantifier Elimination is a strategy for getting control
over your favorite theory. If anyone ever needs it for their research,
they’ll have plenty of time to look for it and learn it themselves
now that they know it exists.</p>
<p>As one last regret for the road, someone asked where people are studying
these questions. I threw out Cornell and UIUC as guesses, but in hindsight
I don’t think I made it clear that I don’t actually know for certain
that people there are working on this stuff. I said that in any good logic
department you’ll have people interested in these things, and that’s true,
but I don’t actually know if there’s anyone explicitly doing work in this area.
Also, I should have probably plugged CMU…</p>
<p>As always, the abstract and slides are below. Plus a link to a
youtube video<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">1</a></sup> with a recording of the talk.</p>
<hr />
<p>Why Think? Letting Computers Do Math For Us</p>
<p>There are a number of results in logic of the form “If your question can be
asked a certain way, then there is an algorithm which tells you the answer”.
These are called Decidability Theorems, because an algorithm can Decide
if a proposition is true or false. In this talk we will survey some results in this
area, discuss a few techniques for proving these results, and discuss some
open problems are only a few computer cycles away from being solved.</p>
<p>The talk slides are <a href="/assets/docs/why-think/talk.pdf">here</a>.
There’s at least one minor typo
(I say Łoś-Tarski at one point when I mean Łoś-Vaught)</p>
<p>A recording of the talk is embedded below.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/ClmQ3OW11Qg" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Unrelated, but I just got my first subscriber and commenter!
It’s cool to see at least one person care about what I say
enough to ask a question in the comments. And even though
youtube subscriptions are free (which is definitely a good
thing given my youtube habit…) it’s still exciting that
someone (who I don’t know!) subbed ^_^ <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Sat, 23 Jan 2021 00:00:00 +0000
https://grossack.site/2021/01/23/why-think.html
https://grossack.site/2021/01/23/why-think.htmlTalk - Syntax and Semantics (Trans Math Day 2020)<p>I’ve been busy with some assignments and grading, so it took me a while to
post this. We got there eventually, though! I gave a talk at an online
conference for Trans Math Day on December 5th. There were a lot of interested
speakers, so the organizers gave us 5 minute, 10 minute, and 20 minute spots.
I was given a 5 minute talk, which is a borderline impossible assignment –
Obviously I was still exited to give it, there were just a slew of challenges
to work out.</p>
<p>I’m not a complete stranger to this format. As an undergraduate, I was given a
15 minute slot to present my thesis work (on Abelian Autaomata Groups), and
I wanted to discuss as much as I could (after all, I was proud of that work!).
I ended up leaning into the manic energy and powered through, barely stopping
to breathe.</p>
<p>I didn’t want to give that kind of talk, through. This was an online conference,
and nobody present knew me. It’s fun to give a comically fast paced
presentation when you know everyone knows you, but in a room of strangers,
it’s better to be professional. So I had to find something which was simultaneously
interesting and explicable in 5 minutes… I settled on Syntax and Semantics.</p>
<p>There are <em>lots</em> of fun theorems that place restrictions on the complexity
of an object (its semantics) based on how simple that object is to describe
(its syntax). I decided to give a short outline of the idea, followed by one
example from algebra and one example from analysis. In the end, I used 7 minutes.
While I shudder to think of using almost 150% of my allotted time, I think
every talk goes around 2 minutes over. So I’ll happily think additively
instead of multiplicatively in this instance :P</p>
<p>I asked if the talk could be recorded, but in the interest of protecting the
privacy of some attendees, the organizers politely declined. I still have the
slides, though, and sometime soon I’ll put up that blog post I keep promising
outlining some more examples…</p>
<p>As a last aside, the conference was a lot of fun. It was great to be in a
(zoom) room with a bunch of other trans mathematicians, and many of the talks
were extremely interesting! One person gave the most lucid account of Riemann-Roch
I’ve ever seen (which is made more impressive by the size of the time slots),
and I even left with some fun problems to think about once I have some more
free time! All in all, I hope this becomes an annual event, and I hope to
continue being involved ^_^</p>
<hr />
<p>Syntax and Semantics as an Organizing Principle</p>
<p>The key observation in Mathematical Logic is that the syntax
of mathematics (the symbols we write on a page)
is different from the semantics of mathematics (the meaning
we as humans prescribe those symbols). By studying them
separately, we can prove theorems of the form “any object
which is easy to describe cannot be too complicated”.
In this talk we will survey a collection of results of this form
which the speaker has found useful in their own research and
education.</p>
<p>The slides are <a href="/assets/docs/tmd-syntax-and-semantics/talk.pdf">here</a></p>
Wed, 16 Dec 2020 00:00:00 +0000
https://grossack.site/2020/12/16/tmd-syntax-and-semantics.html
https://grossack.site/2020/12/16/tmd-syntax-and-semantics.htmlAutomorphisms Don't Extend<p>I was on mse last night (later than I should have been…) when I saw a really
<a href="https://math.stackexchange.com/q/3928573/655547">interesting question</a>. In the interest of keeping the blog post self
contained, I’ll transcribe the question here (with some notational edits):</p>
<div class="boxed">
<p>Let $\text{Aut}(G)$ denote the group of automorphisms of $G$, and let
$A$ be a subgroup of $B$.</p>
<p>Is $\text{Aut}(A)$ a subgroup of $\text{Aut}(B)$?</p>
<p>If not, is there a necessary and sufficient condition for this to hold?</p>
</div>
<p>I think <a href="https://groupprops.subwiki.org/wiki/Automorphism_group_of_a_group">automorphism groups</a> are really interesting. They’re a very natural
operation on groups, and yet they seem to be quite difficult to understand
in general. I haven’t done any extensive work with them, but the fact that
they’re so difficult to compute surprises and excites me. Moreover, if a group
$G$ measures the symmetries of an object $X$, then in some sense $\text{Aut}(G)$
measures the “symmetries among the symmetries”. This seems like an interesting
topic of study, analogous to the study of <a href="https://en.wikipedia.org/wiki/Linear_relation">Syzygies</a> in ring
theory<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">1</a></sup>. I’ve actually asked <a href="https://math.stackexchange.com/q/3864726/655547">a question</a> about this myself, though
I didn’t phrase the question as clearly as I could have so I didn’t get any
answers. It’s also possible the problem is Hard™️, which would be another
reason there’s no answers. In general, automorphism groups are one of many
objects I’d like to learn more about, so I got excited to see this question
about them.</p>
<p>My instinct, upon seeing this problem, was to answer “no”. That is,
there’s no reason $\text{Aut}(A)$ should be a subgroup of $\text{Aut}(B)$.
After all, it’s possible for $A$ to be some very symmetric object living inside
some less symmetric object $B$. Then there’s no reason why a symmetry of $A$
should extend to a symmetry of the larger object $B$.</p>
<p>One picture you might have in mind is this<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote">2</a></sup>:</p>
<p><img src="/assets/images/automorphisms-dont-extend/deathly-hallows.jpg" alt="deathly hallows symbol" width="200px" /></p>
<p>The circle has LOTS of symmetries, but the deathly hallows logo<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote">3</a></sup> has
only two. The “do nothing” symmetry, and the “reflect horizontally” symmetry.
Most of the symmetries of the circle <em>don’t</em> extend to symmetries of the
entire object, and there’s no reason to expect symmetries of groups to behave
any differently. Mark Bennet phrases this well in <a href="https://math.stackexchange.com/questions/3928573/if-a-is-a-subgroup-of-b-just-it-hold-that-textauta-is-a-subgroup-of/3928582?noredirect=1#comment8102277_3928573">his comment</a> under the
original question.</p>
<p>Gerry Myerson asked if I could give a counterexample, rather than informally
arguing that such a theorem shouldn’t be true. It’s a good question, but
he asked it around 2am, so when I couldn’t immediately think of one
(again, I’m not very comfortable with computing automorphism groups myself)
I told him I’d think harder (and ask sage) in the morning.</p>
<hr />
<p>By the time I checked again, there were already some counterexamples floating
around. Ancientmathematician mentions $\mathfrak{S}_6 \leq \mathfrak{S}_7$,
and uses some “well known” facts about their orders
(which happened to be not-so-well-known to me :P). Moreover, in a comment
on my answer, Derek Holt mentions $C_2 \times C_2 \leq D_8$, where again
the automorphisms can’t work out (though again this wasn’t immediately obvious to me).</p>
<p>I was still interested in verifying this by hand, and since sage doesn’t seem
to have a way to compute automorphism groups directly<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup> I decided making
calls to <code class="language-plaintext highlighter-rouge">gap.eval()</code> every other line was too much of a hassle
(also it kept weirdly segfaulting?)… If I wanted to do this, I was going
to have to do it directly in <a href="https://www.gap-system.org/">gap</a>. It took me some experimenting to get
the syntax right, but at the end of the day, it really wasn’t too hard to
get some (naive) code working:</p>
<div class="gap">
<script type="text/x-sage">
# Some code to find a counterexample to the mse question here:
# https://math.stackexchange.com/q/3928573/655547
#
# In short, can we find groups A < B with
# Aut(A) NOT a subgroup of Aut(B)?
Embeds := function(G,H)
# Does G embed into H?
# This was shockingly hard to get working. Eventually
# I shamelessly stole a solution from here:
# https://math.stackexchange.com/a/767953/655547
local homs, kernelSizes;
homs := AllHomomorphismClasses(G,H);
kernelSizes := List(homs, h -> Order(Kernel(h)));
# The smallest a kernel can be is 1, which happens when h is an embedding
return Minimum(kernelSizes) = 1;
end;
TestGroup := function(B)
# Return a witness A with Aut(A) not a subgroup of Aut(B) if one exists
local A, AutA, AutB, flag;
AutB := AutomorphismGroup(B);
flag := false;
for A in ConjugacyClassesSubgroups(B) do
A := Representative(A); # get a representative from the conjugacy class
AutA := AutomorphismGroup(A);
if not Embeds(AutA, AutB) then
Print(StructureDescription(A), " is a subgroup of ", StructureDescription(B));
Print("\n");
Print("But Aut(A) = ", StructureDescription(AutA), "\n");
Print("Which is not a subgroup of Aut(B) = ", StructureDescription(AutB));
Print("\n\n");
flag := true;
fi;
od;
return flag;
end;
# TestGroup(SymmetricGroup(7));
# TestGroup(DihedralGroup(8));
</script>
</div>
<p>You can uncomment either of the tests at the bottom to verify that the
counterexamples mentioned above are actually counterexamples. You can also
ask this same question of any group you’re interested in. It raises some
warnings along the way (something about my code being inefficient for large
subgroups), but it’s good enough for right now.</p>
<div class="boxed">
<p>You might try to add a loop to this code to run <code class="language-plaintext highlighter-rouge">TestGroup</code> on every group
of order $\leq 100$.</p>
</div>
<p>Being able to test a conjecture on all the “small” groups is obviously a
useful skill, so if you aren’t sure how you would go about doing this,
give it a try! It’s not too hard, and you can even test your code out
in the sage cell above.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>I’m not sure how precise this analogy can be made, however. It’s
certainly an informally similar idea, but it clearly doesn’t align
with the notion of a $\mathbb{Z}$-syzygy for abelian groups. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Taken from <a href="https://www.reddit.com/r/Whatisthis/comments/5d23vg/i_keep_seeing_this_trianglecircle_symbol_recently/">here</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>I know, I know, she’s cancelled. The books were still impactful on me,
and this is a good and fairly fun example. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>I’m not crying, you’re crying. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Mon, 30 Nov 2020 00:00:00 +0000
https://grossack.site/2020/11/30/automorphisms-dont-extend.html
https://grossack.site/2020/11/30/automorphisms-dont-extend.htmlTalk - Programming and Category Theory<p>Yesterday I gave a talk at the UCR Category Theory Seminar. I ended up putting
off making the slides for longer than I should have, because I wasn’t entirely
sure what I wanted the talk to be. The connections between
Cartesian Closed Categories/Proof Theory and Constructive Logic/Programming Languages
run extremely deep, and ths kind of talk can kind of be arbitrarily abstract. I wanted
to make sure this talk was easily approachable, though, and it was tricky to
find that balance.</p>
<p>I’ve given talks on this kind of topic before, mainly in
<a href="https://hypefortypes.github.io/">Hype4Types</a>, a class that I founded with some
friends a few years ago. We discussed various topics in type theory and
pl theory, and it’s really cool to see that the class has survived our graduating
for two years now! But in those classes, I knew the students were already
familiar with some basics of writing programming languages, and had seen
<a href="https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form">grammars</a>
before (for instance).</p>
<p>The point of the talk was to describe how certain category theoretic notions
arise automatically from the desire to write good maintainable code. The talk
was at its most abstract when describing Categorical Semantics for programming
languages inside Cartesian Closed Categories, which was at about the halfway
point. After that, we discussed polymorphism
(and how it corresponds to natural transformations) and data structures
(which correspond to monads on the category of your semantics).</p>
<p>I had an outline prepared, but actually writing up the slides was stressing me
out so I waited until the last minute to do it.
I’ve <a href="/2020/10/09/model-theory-and-you.html">mentioned before</a>
that I don’t like giving talks with slides, and this talk in particular would
have been better if I could read the room and adapt to what it seemed like
my audience was and wasn’t comfortable with. Excuses aside, I ended up
latexing these slides from 11pm - 6am the day of the talk… oops.</p>
<p>I knew after giving the talk that it was only OK. It was still a fine talk,
but it didn’t live up to my standards. It took me a little while to piece
together why, though, which is why I’m uploading this post on the following day.
Here are some critiques for myself, which you may or may not enjoy reading.
Either way, it’s cathartic for me to put this into words. Hopefully this will
also help me avoid similar mistakes when writing future talks ^_^.</p>
<p>The big thing that I think slipped through the cracks was my section on
Categorical Semantics. My sleep deprived brain was really worried about
people being familiar with the programming language theory, but becasue
I was giving this talk at a category theory seminar I think I glossed over
some categorical notions that really should have been addressed. I billed
this as an introductory talk, so I think giving some more explicit examples of
Cartesian Closedness, as well as a concrete example of the internal logic of
some small category would have been good additions. Most annoyingly for my
own standards, I used “elements” to denote arrows from the terminal object
for a huge section of the talk. This is entirely standard, and is a harmless
convention, but I didn’t even <em>mention</em> it… I think this led to a mild
amount of confusion from some members of the audience. All in all, the talk
would have been improved by a bit more formality regarding the definitions
of categorical elements/cartesian closed categories/categorical semantics/etc.</p>
<p>Of course, I’m still putting the slides up. Once I get sent a link to the
recording, I’ll put that here as well. As before, the abstract is below:</p>
<hr />
<p>Programming for Category Theorists</p>
<p>Bartosz Milewski has an excellent (free!) book teaching category theory to programmers. The connections run deep,
and many programmers find themselves interested in category theory (that’s what happened to me). In this talk we will
attack the opposite problem: If the connections run deep, surely category theorists have something to learn from the
programmers! We will survey some ways a familiarity with programming can provide intuition for working with categorical
objects. Notably, we will show that arrows in a Cartesian Closed Category are really programs you can run. Moreover,
we will show how functors and monads arise naturally in a programming context. We will finish with the notion of a
polymorphic function, which encodes the notion of a natural transformation.</p>
<p>The talk slides are <a href="/assets/docs/programming-and-ct/talk.pdf">here</a>.
Again, there’s some typos (most notably $𝟙 \xrightarrow[x]{A}$ should be
$𝟙 \xrightarrow{x} A$).</p>
<hr />
<p><strong>Edit:</strong> I just got sent the recording. I uploaded it on youtube, and I’m
including an embedding below.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/PSiyBm4OdaQ" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
Tue, 24 Nov 2020 00:00:00 +0000
https://grossack.site/2020/11/24/programming-and-ct.html
https://grossack.site/2020/11/24/programming-and-ct.htmlQuick Analysis Trick 4<p>To the surprise of no one, I was on math stackexchange earlier and saw an
interesting analysis question. I have a weird fascination with tricky limit
questions because I feel like I’ve always been bad at them. I like working
on them for the same reason I like practicing the difficult parts of
pieces of music – it makes me feel like I’m improving
(in a “no pain no gain” kind of way).</p>
<p>Anyways, <a href="https://math.stackexchange.com/questions/3910478/limit-of-lim-n-to-infty-1-sqrt2-sqrt33-sqrtnn-l">the question</a> was as follows (paraphrased):</p>
<div class="boxed">
<p>Compute the following limit</p>
<p>\(\lim_{n \to \infty} (1 + \sqrt{2} + \sqrt[3]{3} + \ldots + \sqrt[n]{n}) \ln \left ( \frac{2n+1}{2n} \right )\)</p>
</div>
<p>The beginning of the solution makes sense: We want to approximate the $\ln$, so we rewrite it as
$\ln \left ( 1 + \frac{1}{2n} \right )$ and approximate that<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">1</a></sup> as $\frac{1}{2n} - O(\frac{1}{n^2})$.</p>
<p>Then (rewriting as a summation as well) we get</p>
\[\lim_{n \to \infty} \left ( \sum_{k \leq n} k^\frac{1}{k} \right ) \left ( \frac{1}{2n} - O \left ( \frac{1}{n^2} \right )\right )\]
<p>Now here’s the clever idea: We use <a href="https://en.wikipedia.org/wiki/Ces%C3%A0ro_summation">Cesàro Averages</a> backwards! This is mentioned in a comment on the original mse question, and
it’s a trick I’ll absolutely have to remember!</p>
<p>To explain what I mean, let’s take a quick detour into the world of Cesàro Averages:</p>
<hr />
<div class="boxed">
<p>If $(x_n)$ is a sequence, then the <span class="defn">Cesàro Average</span> $a_k$ is the average of the
first $k$ terms:</p>
<p>\(a_k = \displaystyle \frac{1}{k} \displaystyle \sum_{j \leq k} x_j\)</p>
</div>
<p>People care about the Cesàro averages because it’s possible for the Cesàro averages to converge even if the original series doesn’t.
Moreover, the notion of “averaging” in this way comes up very naturally in Fourier Analysis (see <a href="https://en.wikipedia.org/wiki/Fej%C3%A9r%27s_theorem">here</a>) and
Ergodic Theory (see <a href="https://en.wikipedia.org/wiki/Ergodic_theory#Ergodic_theorems">here</a>).</p>
<p>The fundamental theorem in this area is this:</p>
<div class="boxed">
<p>Whenever $x_n$ is already convergent, the sequence of Cesàro averages converges to the same limit.</p>
</div>
<p>So the notion of Cesàro convergence is a true generalization of the original notion of convergence.
It allows us to evaluate certain limits that used to be divergent, but it doesn’t mess up any
limits that already converged.</p>
<hr />
<p>And now we get to the application here:</p>
<p>We recognize $\frac{1}{n} \sum_{k \leq n} k^\frac{1}{k}$ as a Cesàro average of the sequence $k^\frac{1}{k}$.
Since we know $k^\frac{1}{k} \to 1$, we conclude the same is true of the Cesàro averages. So</p>
\[\begin{align*}
\lim_{n \to \infty} \left ( \sum_{k \leq n} k^\frac{1}{k} \right ) \left ( \frac{1}{2n} - O \left ( \frac{1}{n^2} \right )\right )
&= \lim_{n \to \infty} \frac{\sum_{k \leq n} k^\frac{1}{k}}{n} \left ( \frac{1}{2} - O \left ( \frac{1}{n} \right ) \right ) \\
&= \lim_{n \to \infty} 1 \left ( \frac{1}{2} - O \left ( \frac{1}{n} \right ) \right )\\
&\to \frac{1}{2}
\end{align*}\]
<hr />
<p>So we did properly figure out what the limit should be… But I did a little bit of sleight of hand in the
above manipulation. Did you catch it?</p>
<p>As people interested in computer science, we have to be slightly pickier than a lot of mathematicians when
it comes to computing limits. It’s often important exactly how quickly we get convergence<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote">2</a></sup>, and so
we kept track of the big-Oh for the whole problem.</p>
<p>But above we just substituted $n^\frac{1}{n} \to 1$. If we’re careful, it’s not too hard to get
error bounds for the convergence of this series… But who’s to say what the error bounds will
be for the Cesàro averages? These averages keep track of terms earlier in the series. It’s reasonable
to worry that the convergence might be slower, and this worry turns out to be legitimate:</p>
<p>Say $x_n = L \pm O(e(n))$. That is, $x_n$ converges to a limit $L$, and we can bound the <em>error</em> by
$O(e)$. Then</p>
\[\begin{align*}
\frac{ \sum_{k \leq n} x_k }{n}
&= \frac{ \sum_{k \leq n} L \pm O(e(k)) }{n} \\
&= \frac{ nL \pm O \left ( \sum_{k \leq n} e(k) \right ) }{n} \\
&= L \pm O \left ( \frac{1}{n} \sum_{k \leq n} e(k) \right )
\end{align*}\]
<p>So the error in the Cesàro averages is the average of the errors.
Let’s see how this comes up in our analysis of this particular problem.</p>
<hr />
<p>First, we need to know the error bounds on $n^\frac{1}{n}$. This isn’t too hard to figure out:</p>
\[n^\frac{1}{n} = e^{\frac{1}{n}\ln(n)} = 1 + O \left ( \frac{\ln(n)}{n} \right )\]
<p>Now we can find the error of the Cesàro averages:</p>
\[\frac{1}{n} \sum_{k \leq n} k^\frac{1}{k}
=
\frac{1}{n} \sum_{k \leq n} \left ( 1 + O \left ( \frac{\ln(n)}{n} \right ) \right )
=
1 + O \left ( \frac{1}{n} \sum_{k \leq n} \frac{\ln(k)}{k} \right )\]
<p>While this is technically <em>true</em>, I would argue it isn’t <em>useful</em> yet. We can clean it up.</p>
<p>The important observation is that, for $k \geq 4$, $\frac{\ln(k)}{k}$ is monotone decreasing.
Then we do the old “approximate by the nearest power of $2$” trick that we use to prove the
harmonic series diverges<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote">3</a></sup>:</p>
\[\begin{align*}
&\quad \frac{\ln(4)}{4} + \frac{\ln(5)}{5} + \frac{\ln(6)}{6} + \frac{\ln(7)}{7} + \frac{\ln(8)}{8} + \frac{\ln(9)}{9} + \ldots + \frac{\ln(15)}{15} + \frac{\ln(16)}{16} + \ldots \\
&\leq \frac{\ln(4)}{4} + \frac{\ln(4)}{4} + \frac{\ln(4)}{4} + \frac{\ln(4)}{4} + \frac{\ln(8)}{8} + \frac{\ln(8)}{8} + \ldots + \frac{\ln(8)}{8} + \frac{\ln(16)}{16} + \ldots \\
&= \ln(4) + \ln(8) + \ln(16) + \ldots
\end{align*}\]
<p>Notice we can safely assume $n$ is a power of $2$ by adding some extra terms.
Since we’re trying to upper bound the error anyways, this is no issue. Also,
since we’re not interested in additive constants, we can go aheaed and approximate
$\frac{\ln(3)}{3}$ by $\frac{\ln(2)}{2}$ to make the sum more uniform.</p>
\[\begin{align*}
O \left ( \frac{1}{n} \sum_{k \leq n} \frac{\ln(k)}{k} \right )
&= O \left ( \frac{1}{n} \left ( \frac{\ln(2)}{2} + \frac{\ln(3)}{3} + \ln(4) + \ln(8) + \ldots + \ln(n) \right ) \right ) \\
&= O \left ( \frac{1}{n} \sum_{i \leq \log_2 n} \ln(2^i) \right ) \\
&= O \left ( \frac{1}{n} \sum_{i \leq \log_2 n} i \ln(2) \right ) \\
&= O \left ( \frac{1}{n} \sum_{i \leq \log_2 n} i \right ) \\
&= O \left ( \frac{ (\log n)^2 }{n} \right )
\end{align*}\]
<p>Importantly, this error bound is <em>different</em> from the error bound of our original series!
As expected, this converges slightly slower (by a log factor) because it’s being
weighed down by earlier terms in the series.</p>
<p>Why go through this pain, though? Because now we can finally put a nice bow on things:
How quickly does the limit converge?</p>
<!--
Note: We have the \pm here because I've glossed over the fact
that we need to multiply out (1 + O((log n)^2/n))(1/2 - O(1/n))
-->
\[\left ( 1 + \sqrt{2} + \sqrt[3]{3} + \ldots + \sqrt[n]{n} \right ) \ln \left ( \frac{2n+1}{2n} \right )
= \frac{1}{2} \left ( 1 \pm O \left ( \frac{( \log n )^2}{n} \right ) \right )\]
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>A notational quirk I picked up from <a href="http://www.cs.cmu.edu/~odonnell/">Ryan O’Donnell</a> that my brain likes: I will only ever
write $O(f)$ when $f$ is a positive function. So I meaningfully distinguish between
$\text{blah} + O \left ( \frac{1}{n} \right )$ and $\text{blah} - O \left ( \frac{1}{n} \right )$.</p>
<p>This is a pretty minor thing, but it’s led to notably more comfort on my part, since my brain tends to
implicitly make positivity assumptions when I’m not looking. Explicitly writing down when things can
be negative forces my brain to pay extra attention to it. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>In fact, since I have Ryan O’Donnell on the brain, I audited his Theorist’s Toolkit class during
my year off. In one of the lectures, he asserted the <a href="https://en.wikipedia.org/wiki/Central_limit_theorem">Central Limit Theorem</a> was the most
useless theorem in mathematics. Of course, he was being intentionally inflammatory. He prefaced
that statement with some history – the Central Limit Theorem got its name not (as I originally thought)
because summing random variables “tends to lie at the center” of a gaussian. It’s because Pólya thought
it was the most important result in probability. It was <em>central</em> to the field.</p>
<p>Ryan went on to tell us about the <a href="https://en.wikipedia.org/wiki/Berry%E2%80%93Esseen_theorem">Berry-Esseen Theorem</a>, which gives good error bounds on the
convergence guaranteed by the Central Limit Theorem. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>As a quick exercise, why do we get an upper bound here?
We prove the harmonic series diverges by <em>lower</em> bounding it… <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 18 Nov 2020 00:00:00 +0000
https://grossack.site/2020/11/18/quick-analysis-trick-4.html
https://grossack.site/2020/11/18/quick-analysis-trick-4.htmlQuick Analysis Trick 3<p>I’m in a measure theory class right now, and I think it’s important to be
properly comfortable with measure theory in a way that I’m currently not.
It has deep connections with things that I find very interesting
(Descriptive Set Theory, Ergodic Theory, their intersection in Amenable Groups, etc.)
and it’s one of my go to examples of an “obviously useful” branch of math.
If you can apply your interests to measure theory somehow, I think that’s a
compelling argument to fend off questions of “why is this worthwhile”
(at least from other mathematicians).</p>
<p>All this to say I’ve started reading more measure theory books, though it makes
my pile of unread logic and algebra books sad. In particular I bought a copy of
Halmos from an online used bookstore for $8 (!!), and it’s a fantastic read so
far. I wanted to highlight one particular observation that, while obvious in
hindsight, was extremely helpful for me. Hopefully by posting about it here,
I’ll also find it more memorable. While this isn’t a “trick” in the same
sense as the other posts in this series, I still feel like it fits here.</p>
<div class="boxed">
<p>If $(E_n)$ is a sequence of sets, then</p>
<ul>
<li>$\limsup E_n$ is the set of $x$ that appear in <em>infinitely many</em> $E_n$</li>
<li>$\liminf E_n$ is the set of $x$ that appear in <em>all but finitely many</em> $E_n$</li>
</ul>
</div>
<p>I think I didn’t see this because I was too focused in on generalizing my
understanding of $\liminf$ and $\limsup$ of sequences $(x_n)$. In the case
of real sequences, $\liminf$ is the <em>smallest</em> thing some subsequence
converges to, while $\limsup$ is the <em>biggest</em> thing some subsequence
converges to. I think the following image does a good job showing what I mean:</p>
<p><img src="/assets/images/quick-analysis-trick-3/limsup-plot.png" alt="a plot showing limsup and liminf" /></p>
<p>If this is your sequence $x_n$, then the biggest thing you can possibly
converge to is $\frac{1}{2}$. Similarly, the smallest thing you can possibly
converge to is $-\frac{1}{2}$. So these are $\limsup x_n$ and $\liminf x_n$
respectively. From this point of view it is clear that
$\liminf x_n \leq \limsup x_n$, and whenever they are equal, $\lim x_n$
exists and agrees with them both.</p>
<p>Here $\liminf x_n = \lim_{n \to \infty} \inf_{m > n} x_m$, and dually,
$\limsup x_n = \lim_{n \to \infty} \sup_{m > n} x_m$. This, of course, is
where the notation comes from, but I think a better definition is</p>
<ul>
<li>$\limsup x_n = \inf_n \sup_{m > n} x_m$</li>
<li>$\liminf x_n = \sup_n \inf_{m > n} x_m$</li>
</ul>
<p>Not only does this make it much more obvious that these definitions are
dual, it more readily generalizes to the definition for sets:</p>
<ul>
<li>$\limsup E_n = \bigcap_n \bigcup_{m > n} E_m$</li>
<li>$\liminf E_n = \bigcup_n \bigcap_{m > n} E_m$</li>
</ul>
<p>(As a quick check in – why are the two definitions of $\liminf x_n$ equivalent?)</p>
<p>Because I liked my intuition for $\limsup$ and $\liminf$ of sequences of reals,
I’d been viewing $\liminf$ and $\limsup$ of sets as
“the smallest (resp. largest) set that $E_n$ could converge to”… Of course,
I have no intuition for what it means for a sequence of <em>sets</em> to converge!
Because of this, until today I’ve had little to no intuition for what these
sets actually are.</p>
<hr />
<p>As with all realizations, I should have seen this much sooner. It’s a common
trick in descriptive set theory to pass between logical constructors and
set theoretic operations. These are entirely natural, and correspond to a
correspondence between the (syntactic) boolean algebra of propositions, and the
(semantic) boolean algebra of sets. Yet again I’m talking about syntax and
semantics on this blog, and yet again I’m promising a post detailing a few
of my favorite examples. For now though, let’s write this one example out explicitly:
Say we have a family of properties $P_n$. Then</p>
<ul>
<li>$\{ x \mid P_0 \land P_1 \} = \{ x \mid P_0 \} \cap \{ x \mid P_1 \}$ (conjunction corresponds to intersection)</li>
<li>$\{ x \mid P_0 \lor P_1 \} = \{ x \mid P_0 \} \cup \{ x \mid P_1 \}$ (disjunction corresponds to union)</li>
<li>$\{ x \mid \lnot P_0 \} = \{ x \mid P_0 \}^c$ (negation corresponds to complementation)</li>
<li>$\{ x \mid T \} = X$ (“true” corresponds to the whole set)</li>
<li>$\{ x \mid F \} = \emptyset$ (“false” corresponds to the empty set)</li>
</ul>
<p>Quantifiers might seem tricky at first, but notice $\forall n . P_n$ is
really the same thing as $P_0 \land P_1 \land P_2 \land \ldots$ and so:</p>
<ul>
<li>$\{ x \mid \forall n . P_n \} = \bigcap_n \{ x \mid P_n \}$</li>
<li>$\{ x \mid \exists n . P_n \} = \bigcup_n \{ x \mid P_n \}$</li>
</ul>
<p>This trick works more broadly too<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">1</a></sup>. For <em>any</em> index set $I$, we have</p>
<ul>
<li>$\{ x \mid \forall \alpha \in I . P_\alpha \} = \bigcap_I \{ x \mid P_\alpha \}$</li>
<li>$\{ x \mid \exists \alpha \in I . P_\alpha \} = \bigcup_I \{ x \mid P_\alpha \}$</li>
</ul>
<p>Since we only know that <em>countable</em> setwise operations are allowed in measure
theory, much of descriptive set theory amounts to showing that certain
quantifiers only need to range over countable sets.</p>
<p>Of course, through this lens, the description in Halmos is obvious:</p>
\[\limsup E_n = \bigcap_n \bigcup_{m > n} E_n = \{ x \mid \forall n . \exists m > n . x \in E_n \} = \{ x \mid \text{$x \in E_n$ for infinitely many $n$} \}\]
<p>Do you see why this also shows that $\liminf E_n = \{ x \mid \text{ $x$ is in all but finitely many $E_n$ } \}$?</p>
<p>This viewpoint is useful not only in understanding what $\limsup E_n$ and
$\liminf E_n$ are, it’s useful in proving things about them! Let $E_n$ be
a sequence that alternates between two sets $A$ and $B$. Is it obvious that
$\limsup E_n = A \cup B$ and $\liminf E_n = A \cap B$? Now look only at the
definitions – is it only obvious from those?</p>
<hr />
<p>I think this lens sheds some light on $\liminf$ and
$\limsup$ of real sequences too.
I’ll leave it to you to work through the quantifiers, but it turns out that
$\limsup x_n$ is the smallest number $x^*$ so that infinitely many $x_n$ are
bigger than $x^*$. This is the central observation in the proof of the
<a href="https://en.wikipedia.org/wiki/Cauchy%E2%80%93Hadamard_theorem">Cauchy-Hadamard Theorem</a>,
and it’s nice to see that this observation is actually obvious
(in this interpretation of $\limsup$).</p>
<p>Dually, $\liminf x_n$ is the <em>biggest</em> number $x_*$ so
that infinitely many $x_n$ are smaller than $x_*$.
As a last puzzle – why doesn’t $\liminf x_n$ have a “all but finitely many”
flavor like $\liminf E_n$ does? Can you find a sense in which it does?</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>It turns out viewing quantifiers as generalized conjunctions/disjunctions
works <em>very</em> broadly! This is a useful viewpoint to take in many settings
throughout logic. In $\mathcal{L}_{\omega_1, \omega}$
for instance, this trick lets us use natural number quantifiers
(even though the language might not <em>technically</em> allow them).
This lets us express, say, that a group is finitely generated
by writing every element as a word in the generators
(and there’s only countably many such words!) <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Sun, 08 Nov 2020 00:00:00 +0000
https://grossack.site/2020/11/08/quick-analysis-trick-3.html
https://grossack.site/2020/11/08/quick-analysis-trick-3.htmlNilpotentizing Groups<p>I really like group theory, and I’ve spent a lot of time reading about groups
and their properties. Most of these properties seem like very natural things
to consider ($p$-groups, abelian groups, simple groups, etc.) and the ones
that don’t typically seem motivated by some external factor
(solvable groups come to mind). However, I have always been somewhat confused
by nilpotent groups. I know that they are “almost abelian”, and I can rattle
off some facts about them and sketches of proofs… But it was never made
clear to me how to work with them <em>in practice</em>. If I come across a nilpotent
group in the wild, how does that help me? Surely I should be able to leverage
the “almost abelian”-ness in a way that’s more general than “elements of
coprime order commute”.</p>
<p>I figured out how the nilpotency assumption is helpful in computations last
night, though I got on the topic in a rather roundabout way. I forget
exactly how my brain moved between these ideas, but the basic outline was
this:</p>
<ul>
<li>The nilpotent groups of class $c$ (really class $\leq c$) form a variety
by <a href="https://en.wikipedia.org/wiki/Variety_(universal_algebra)#Birkhoff's_theorem">the HSP theorem</a></li>
<li>But this means there are some <em>bonus axioms</em> that we can add to the standard
group axioms in order to carve out the class $c$ nilpotent groups.
(In fact, since the $1$ nilpotent groups are exactly the abelian groups,
we know the bonus axiom is $xy=yx$ in the case $c=1$)</li>
<li>So we should be able to “$c$-nilpotentize” a group in the same way we abelianize
it by quotienting by the relations which force these new axioms to hold.</li>
</ul>
<p>I spent some time thinking about what these new axioms might be, as well as
some categorical questions: Is the subcategory of class $c$ nilpotent groups
reflective in $\mathsf{Grp}$? After all, $\mathsf{Ab}$ is. But then I started
thinking about why somebody might care about this construction. We care about
the abelianization because it <em>simplifies</em> the group. It stands to reason that
a nilpotentization might simplify the group with a slightly softer touch.
By varying $c$, we can control how big a quotient we want to take – this
lets us trade simplicity of the quotient against fidelity to our original group!
Immediately, there are two very natural questions that arise:</p>
<ul>
<li>Can we actually $c$-nilpotentize a group in practice? What do we quotient by?</li>
<li>How exactly are nilpotent groups are easier to work with than general groups?
What does this construction <em>really</em> buy us?</li>
</ul>
<p>It was in the process of understanding the second bullet that I felt like
I started understanding some practical benefits of nilpotency.</p>
<div class="boxed">
<p>As a fun game, can you show that $c\mathsf{Npt}$, the subcategory
of class $c$ nilpotent groups, is reflective in $\mathsf{Grp}$? You
can see this abstractly if you know that every variety has free objects,
so there is a “free class $c$ nilpotent group” on a given set $X$ <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">1</a></sup>.</p>
<p>Remember, to show that $c\mathsf{Npt}$ is reflective in $\mathsf{Grp}$,
you want to show that there is a unique “$c$-nilpotentization” $G^{c\text{Nil}}$
so that every hom $G \to H$ with $H \in c\mathsf{Npt}$ factors through
$G^{c\text{Nil}}$. In the case $c=1$, this is exactly the abelianization!</p>
</div>
<hr />
<p>Let’s start with the second bullet and talk about what nilpotency buys us.
We understand the class-$1$ groups (alias: abelian groups) well, so let’s
take a look at class-$2$ groups.</p>
<p>Recall the <span class="defn">Lower Central Series</span> of a group $G$
is recursively defined by</p>
<ul>
<li>$\gamma_1(G) = G$</li>
<li>$\gamma_{n+1}(G) = [\gamma_n(G),G]$</li>
</ul>
<p>Then $G$ is called <span class="defn">Nilpotent</span> (of class $c$) whenever
$\gamma_{c+1}(G) = 1$. So, a nilpotent group of class $1$ is a group where
$\gamma_2(G) = [G,G] = 1$, and $G$ is abelian. This definition has always felt
kind of opaque to me, but last night I realized what I was missing
(and what was probably obvious to most other people):</p>
\[gh = hg[g,h]\]
<p>This wholly obvious fact says that we can commute any two elements provided
we pick up a factor of $[g,h]$. In this way, $[g,h]$ measures how $g$ and $h$
fail to commute.</p>
<p>Now, in an abelian group, $[g,h] = 1$ for all $g,h$. So we can commute with
impunity. In a group of class $2$ this can fail, but we know that
$\gamma_3(G) = 1$. That is, $[[G,G],G] = 1$. Said yet another way, $[[g,h],k] = 1$
for every $g,h,k$! So sure we might pick up a commutator fudge factor, but
this fudge factor will commute with everything!</p>
<p>Concretely, this means we can always push commutators to one side!</p>
\[ghk = hg[g,h]k = hgk[g,h]\]
<p>So in a group of class $2$, we can rearrange our product as much as we want
as long as we promise to multiply by a fudge factor from $[G,G]$ at the end.
For any permutation $\sigma$:</p>
\[g_1 g_2 \ldots g_n = g_{\sigma(1)} g_{\sigma(2)} \ldots g_{\sigma(n)} h\]
<p>for some $h \in [G,G]$ depending on $\sigma$.</p>
<p>Similarly for groups of class $3$. Now $[G,G,G,G]$ is trivial<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote">2</a></sup>, so
our commutators may not be central, but our “second order” commutators are:</p>
\[ghk = hg[g,h]k = hgk[g,h][[g,h],k]\]
<p>In this instance, the “second order” fudge factor $[[g,h],k]$
(often written as $[g,h,k]$ – see the earlier footnote) will commute
with everything.</p>
<p>It is clear that these get hairy fairly quickly, but it makes the entire
concept feel (at least to me) more concrete. It also makes clear how this
is a generalization of abelianness - when we commute things, the resulting
fudge factors are easy to control. Of course, the degree of “easiness”
decreases fairly quickly as the nilpotency class $c$ increases. For $c=2$,
though, this seems like a viable object to study if one is looking to
simplify a group!</p>
<hr />
<p>So how might we find the nilpotentization of $G$? Earlier on in the
post I alluded to some abstract nonsense which will give us the result
(It probably says something about me that this was the proof my sleep-deprived
brain first reached for). However, now that we’ve remembered the
lower central series definition of a nilpotent group, there is a much cleaner,
down-to-earth approach:</p>
<div class="boxed">
<p>$G^{c\text{Nil}} = G / \gamma_{c+1}(G)$</p>
</div>
<p>We simply <em>force</em> $\gamma_{c+1}(G) = 1$. This is directly analogous to the
abelianization, since $\gamma_2(G) = [G,G]$! It takes a tiny argument to
show that any group homomorphism $G \to H$ with $H$ of class $c$ factors
through $G^{c\text{Nil}}$, but I’ll leave this verification as a cute exercise.</p>
<p>One thing I will touch on, though – Why can we quotient by $\gamma_{c+1}(G)$?
Is it obvious that these groups are always normal? I certainly don’t think so!
But, it turns out that $\gamma_{c+1}(G)$ shares lots of nice properties with $[G,G]$.</p>
<p>Here is (for me) the easiest way to see what I mean: Just like
$[G,G] = \gamma_2(G)$ is generated by $\langle [g,h] \rangle$, it turns out
$[G,G,G] = \gamma_3(G)$ is generated by $\langle [[g,h],k] \rangle$!
In fact, $\gamma_{n}(G)$ is generated by $\langle [g_1, \ldots, g_n] \rangle$
(again, using the notation from the footnote).</p>
<p>But this is fantastic! We know that subgroups of this form are called
<a href="https://groupprops.subwiki.org/wiki/Verbal_subgroup">verbal</a> and they
satisfy <em>lots</em> of very nice properties <sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote">3</a></sup>! In particular, all verbal
subgroups are
<a href="https://en.wikipedia.org/wiki/Characteristic_subgroup">characteristic</a>,
thus normal.</p>
<hr />
<p>As one last question, we might ask how easy it is to compute with these
nilpotentizations. Luckily, there are some efficient implementations of these results.
You can read more about these algorithms <a href="https://math.stackexchange.com/questions/3258639/nilpotent-quotient-algorithm">here</a>,
but the tldr is:</p>
<ul>
<li>
<p><a href="http://magma.maths.usyd.edu.au/magma/">Magma</a> has functions like
$\mathtt{NilpotentQuotient(G,c)}$ which computes the class $c$
nilpotentization of $G$.
(Documentation <a href="https://magma.maths.usyd.edu.au/magma/handbook/text/831#9446">here</a>)</p>
</li>
<li>
<p><a href="https://www.gap-system.org/">GAP</a> has the $\mathtt{NQ}$ package, which also
has a $\mathtt{NilpotentQuotient(G,c)}$ function.
(Documentation <a href="https://www.gap-system.org/Packages/nq.html">here</a>)</p>
</li>
</ul>
<p>Since GAP ships built-in with <a href="https://www.sagemath.org/">Sage</a>, we have access
to these algorithms in our favorite computational tool. Unfortunately,
the $\mathtt{NQ}$ package wasn’t included in my sage installation by default –
I had to install gap-packages via pacman in order to get it.</p>
<p>With that subtle point out of the way, let’s see it in action.
Since the sage cloud server I use doesn’t play nice with the GAP console,
I can only include a screenshot. You should definitely experiment with this
stuff yourself, though!</p>
<p><img src="/assets/images/nilpotentizing-groups/sage-out.png" alt="Some GAP code running inside of sage" /></p>
<p>Notice we asked for the class $1$ nilpotentization of $F_2$, and GAP
correctly gave us $\mathbb{Z}^2 = F_2^\text{ab}$!</p>
<p>In the case of finite groups, we can write dumber code in pure sage:</p>
<div class="linked_auto">
<script type="text/x-sage">
from itertools import combinations_with_replacement
def Nilpotentize(G,c):
"""
Return the nilpotent quotient of class c.
Only works for finite groups G!
"""
def iterated_commutator(gs):
"""
computes [g1, g2, ... gn] from a list gs
"""
comm = gs[0]
for g in gs[1:]:
comm = comm.inverse() * g.inverse() * comm * g
return comm
# Get the subgroup generated by the iterated commutators
toKill = G.subgroup([iterated_commutator(gs) for gs in combinations_with_replacement(G.list(),c+1)])
return G.quotient(toKill)
</script>
</div>
<p>I only wrote this code today, so I haven’t had time to play around with it yet.
Here are some fun questions I have for myself, which you might also want to
think about!</p>
<ul>
<li>Where do the various $\gamma_n(G)$ show up in the lattice of subgroups of $G$?</li>
<li>For finite groups, the decreasing chain $\gamma_n(G)$ must eventually stabilize.
Given a group $G$, can we predict for which $n$ this will happen?</li>
<li>What can we say about a group $G$ if we know the chain $\gamma_n$ stabilizes quickly? Stabilizes slowly?</li>
</ul>
<p>There’s lots of interesting questions one gets by playing around with these groups!
Let me know in the comments if you think of any of your own ^_^</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Free nilpotent groups seem to be well studied, and fairly complicated!
This is one excellent example of abstract nonsense providing the existence
of a free object whose combinatorial description is… unpleasant.
You can read Terry Tao’s description of them
<a href="https://terrytao.wordpress.com/2009/12/21/the-free-nilpotent-group/">here</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Here we define $[G,G,G,G] = [[[G,G],G],G]$. This seems to be standard
in the literature, and we can do it at the element level too:
$[g,h,k] = [[g,h],k]$. Notice this is <em>not</em> associative!
$[[g,h],k] \neq [g,[h,k]]$ so we must remember to associate <em>left</em>! <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>This is a kind of syntax-semantics relationship.
One day I want to make a post talking about syntax and semantics,
and in particular some real-world ways where this duality arises
(even if somewhat informally). In the mean time, trust that this
result is part of a larger pattern of spiritually related results. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 28 Oct 2020 00:00:00 +0000
https://grossack.site/2020/10/28/nilpotentizing-groups.html
https://grossack.site/2020/10/28/nilpotentizing-groups.html