https://grossack.site
Iteration Asymptotics<p>I really like recurrences, and the kind of asymptotic
analysis that shows up in combinatorics and computer science. I think I’m drawn
to it because it melds something I enjoy (combinatorics and computer science)
with something I historically struggle with (analysis).</p>
<p>My usual tool for handling reccurences (particularly for getting asymptotic
information about their solutions) is <a href="https://en.wikipedia.org/wiki/Generating_function">generating functions</a>. They slaughter
linear recurrences (which nowadays I just solve with <a href="https://sagemath.org">sage</a>), but through
functional equations, <a href="https://en.wikipedia.org/wiki/Lagrange_inversion_theorem">lagrange inversion</a>, and complex analysis, they form
an extremely sophisticated theory<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">1</a></sup>. Plus, I would be lying if I didn’t say
I was drawn to them because of how cool they are conceptually. I’m a sucker for
applying one branch of math to another, so the combination of complex analysis
and enumerative combinatorics is irresistable.</p>
<p>Unfortunately, you can’t solve all asymptotics problems with generating
functions, and it’s good to have some other tools around as well. Today
we’ll be working with the following question:</p>
<div class="boxed">
<p>If $f$ is some function, what are the asymptotics of</p>
\[x_{n+1} = f(x_n)\]
<p>where we allow $x_0$ and $n$ to vary?</p>
</div>
<p>If $f$ is continuous and $x_n \to r$, it’s clear that $r$ must be a fixed
point of $f$. If moreover, $f’(r)$ exists, and $|f’(r)| \lt 1$, then anything
which starts out near $r$ will get pulled into $r$. Also, we might as well
assume $r = 0$, since we can replace $f$ by $f(x+r) - r$ without loss of
generality.</p>
<p>These observations tell us we should restrict attention to
those systems where $f(x) = a_1 x + a_2 x^2 + \ldots$ is
analytic at $0$ with $f(0) = 0$ and $|a_1| \lt 1$. Indeed, we’ll focus
on exactly this case for the rest of the post<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">2</a></sup>.</p>
<p>As a case study, let’s take a simple problem I found on mse the other day.
I can’t actually find the question I saw anymore, or else I’d link it.
It looks like it’s been asked <a href="https://approach0.xyz/search/?q=%24a_%7Bn%2B1%7D%20%3D%20%5Cfrac%7Ba_n%20%2B%203%7D%7B3a_n%20%2B%201%7D%24&p=1">a few times now</a>, but none of the options
are recent enough to be the one I saw. Oh well.</p>
<div class="boxed">
<p>Define $x_0 = 2$, $x_{n+1} = \frac{x_n + 3}{3 x_n + 1}$. What is
$\lim_{n \to \infty} x_n$?</p>
</div>
<p>The original problem is fairly routine, and we can solve it using
<a href="https://en.wikipedia.org/wiki/Cobweb_plot">cobweb diagrams</a>. The asymptotics are more interesting, though.
It turns out we can read the asymptotics right off of $f$, which is
super cool! I guess I hadn’t seen any examples because people who are in the
know feel like it’s too obvious to talk about, but that makes it the perfect
topic for a blog post!</p>
<p>Notice $f(x) = \frac{x+3}{3x+1}$ has a fixed point at $1$, so we’ll need to
translate it to the origin. We’ll replace $f$ by
$g(x) = f(x+1) - 1 = \frac{-2x}{3x+4}$, remembering to replace $x_0 = 2$
by $y_0 = x_0 - 1 = 1$ as well.</p>
<p>Notice $g(x) = - \frac{1}{2} x + \frac{3}{8}x^2 - O(x^3)$ is analytic at $0$
with $| - \frac{1}{2} | \lt 1$.</p>
<p>Let’s get started!</p>
<hr />
<h2 id="simple-asymptotics">Simple Asymptotics</h2>
<p><br /></p>
<p>We’ll be following de Bruijn’s <em>Asymptotic Methods in Analysis</em> in both
this section and the next. In the interest of showcasing how to actually
<em>use</em> these tools, I’m going to gloss over a lot of details. You can find
everything made precise in chapter $8$.</p>
<p>First, if $x \approx 0$, then $f(x) \approx f’(0) \cdot x$. Since we are
assuming $|f’(0)| \lt 1$, if $x_n \approx 0$, then
$x_{n+1} \approx f’(0) x_n \approx 0$ too. An easy induction then shows that</p>
\[x_n \approx (f'(0))^n x_0.\]
<p>But now we have our asymptotics! If we formalize all of the $\approx$ signs
above, we find</p>
<div class="boxed">
<p>For each $|f’(0)| \lt b \lt 1$, there is a radius $\delta$ so that as long
as $x_0 \in (-\delta, \delta)$ we’re guaranteed</p>
<p>\(|x_n| \lt b^n |x_0|\)</p>
</div>
<p>Since $x_n \to 0$, we’re guaranteed that eventually our $x_n$s will be inside
however tight a radius we want! Since big-oh notation ignores the first
finitely many terms anyways, this tells us</p>
<div class="boxed">
<p>Life Pro Tip:</p>
<p>If $x_n \to r$ (a fixed point of $f$) with $\lvert f’(r) \rvert \lt 1$, then
$x_n \to r$ exponentially quickly. More precisely, for any $\epsilon$ you want<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">3</a></sup></p>
<p>\(x_n = r \pm O((\lvert a_1 \rvert + \epsilon)^n)\)</p>
</div>
<p>What about our concrete example? We know $f$ has $1$ as a fixed point,
and $f’(1) = - \frac{1}{2}$. Then for $x_0 = 2$, we get
(by choosing $\epsilon = 0.00001$) that $x_n = 1 \pm O ( 0.50001^n )$.
Which is fast enough convergence for most practical purposes.</p>
<hr />
<h2 id="asymptotic-expansion">Asymptotic Expansion</h2>
<p><br /></p>
<p>But what if you’re a real masochist, and the exponential convergence above
isn’t enough for you? Well don’t worry, becaue de Bruijn has more to say.</p>
<p>If $x_0$ is fixed, then $x_n a_1^{-n}$ converges to a limit, which we’ll call
$\omega(x_0)$. But since $x_{n+1} = f(x_n)$, we get an important restriction
on $\omega$ by considering $x_1$ as the start of its own iteration<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">4</a></sup>:</p>
\[\omega(f(x)) = a_1 \omega(x) \quad \quad \quad \quad (\star)\]
<p>In the proof that $\omega$ exists (which you can find in de Bruijn’s book),
we also show that $\omega$ is analytic whenever $f$ is!
Then using <a href="https://en.wikipedia.org/wiki/Lagrange_inversion_theorem">lagrange inversion</a>, we can find an analytic $\Omega$ so that
$\Omega(\omega(x)) = x$.</p>
<p>But now we can get a great approximation for $x_n$! By repeatedly applying
$(\star)$ we find</p>
\[\omega(x_n) = a_1^n \omega(x_0)\]
<p>which we can then invert to find</p>
\[x_n = \Omega(a_1^n \omega(x_0)).\]
<p>If we use the first, say, $5$ terms of the expansion of $\Omega$, this will
give us accuracy up to $\tilde{O}(a_1^{5n})$. There are also some lower order terms
which come from how much of $\omega$ we use, but I’m sweeping under the $\tilde{O}$.</p>
<p>How do we actually do this in practice, though? The answer is “with sage”!</p>
<p>This code will take a function $f$ like we’ve been discussing, and will
recursively compute the first $N$ coefficients of $\omega$. It turns out
the $j$th coefficient of $\omega$ depends only on the first $j-1$ coefficients plus
equation $(\star)$. Then it will lagrange invert to get the first $N$
terms of $\Omega$, and it will use these to compute an asymptotic expansion
for $x_n$ in terms of $n$ and $x_0$ (which it’s writing as $x$).</p>
<p>Since I had to (rather brutally) convert back and forth between symbolic
and ordinary power series, sage wasn’t able to keep track of the error
term for us. Thankfully, it’s pretty easy to see that the error term is
always $O(a_1^n x_0^N)$, so I just manually wrote it in.</p>
<div class="linked_auto">
<script type="text/x-sage">
"""
This is super sloppy because sage actually has two
different kinds of power series:
- symbolic power series
- "ordinary" power series
the ordinary power series are better in almost every way, except they
don't allow variables inside them! Since we need variables to build the
recurrence that we solve for the next coefficient, this is a problem.
The only way I was able to get this working is by hacking back and
forth between the two types of power series. If anyone has a better way
to do this PLEASE let me know.
"""
def omega(f, x, N):
"""
Compute the first N terms of omega(x)
"""
# this is a symbolic power series
f = f.series(x,N)
a1 = f.coefficient(x,1)
# initialize omega (as a symbolic power series)
o = (0 + x).series(x,N)
d = var('d')
for j in range(2,N+1):
# set up the linear recurrence that defines the jth coefficient.
# if you didn't believe symbolic series are more cumbersome than
# "ordinary" series, hopefully you do now.
#
# this comes from looking at the jth coefficient of the equation
# omega(f(x)) == a1 * omega(x)
eqn = (o + d * x^j).subs(x=f).series(x,N).coefficient(x,j) == a1 * d
o = (o + solve(eqn, d)[0].rhs() * x^j).series(x,N)
# this is a symbolic power series
return o
def iterationAsymptotics(f,N=5):
"""
Compute the first N many terms of an asymptotic expansion for x_n
"""
n = var('n')
# for some reason extracting the coefficient gives us a constant function
# and it only breaks things here? Oh well, we'll evaluate it to make
# sage happy.
a1 = f.series(x,2).coefficient(x,1)()
# we convert o to an ordinary power series, since there's no way
# to do lagrange inversion to a symbolic power series
o = omega(f,x,N).power_series(QQbar)
O = o.reverse() # lagrange inversion
# dirty hack to convert back to a symbolic series
return O.truncate().subs(x=(a1^n * o.truncate())).expand().series(x,N)
def stats(f,n=10,N=5):
"""
Run 1000 tests to see how well the asymptotic expansion
agrees with the expected output.
"""
approx = iterationAsymptotics(f,N).truncate().subs(n=n)
tests = []
for _ in range(1000):
x0 = random()
# compute the exact value of xn
cur = x0
for _ in range(n):
cur = f(cur)
# compute the approximate value of xn
guess = approx.subs(x=x0)
tests += [cur - guess]
avg_diff = mean([abs(t) for t in tests])
max_diff = max([abs(t) for t in tests])
median_diff = median([abs(t) for t in tests])
show("maximum error: ", max_diff)
show("mean error: ", avg_diff)
show("median error: ", median_diff)
show(histogram(tests, bins=50, title="frequency of various signed errors (actual $-$ approximation)"))
show("Type in $f$ with fixed point $0$ and $0 < |f'| < 1")
@interact
def _(f=input_box(-2*x / (3*x + 4), width=20, label="$f$"),
n=input_box(100, width=20, label="$n$"),
N=input_box(3, width=20, label="$N$")):
f(x) = f
a1 = f.series(x,2).coefficient(x,1)()
# we have to show things in this weird way to get things on one line
# it's convenient, though, because it also lets us modify the latex
# to print the error bounds
show(f"$$f = {latex(f().series(x,N).power_series(QQbar))}")
show(f"$$\\omega = {latex(omega(f,x,N).power_series(QQbar))}$$")
series = f"x_n = {latex(iterationAsymptotics(f,N).truncate())}"
error = f"O \\left ( \\left ( {latex(abs(a1))} \\right )^n x^{N} \\right )"
show("$$" + series + " \\pm " + error + "$$")
show("How good is this approximation?")
stats(f,n,N)
</script>
</div>
<div class="boxed">
<p>As a nice exercise, you might try to modify the above code to work with
functions with a fixed point at $r \neq 0$. You can do this either by
taylor expanding at $r$ directly, or by translating $r$ to $0$, then using
this code, then translating back.</p>
<p>Be careful, though! We get much more numerical precision near $0$, so if you
do things near $r$ you might want to work with <a href="https://doc.sagemath.org/html/en/reference/rings_numerical/sage/rings/real_mpfr.html">arbitrary precision reals</a>.</p>
</div>
<p>So, in our last moments together, let’s finish up that concrete example in
far more detail than anyone ever wanted. The default function that I put in
the code above is our function translated to $0$. If you look at the first $10$
terms of the sequence (that is, set $N=5$) and work with $x_0 - 1 = 1$
(since we translated everything left by $1$) we find</p>
\[x_n - 1 \approx
\frac{5}{8} \left ( \frac{-1}{2} \right )^{n} +
\frac{3}{8} \left ( \frac{-1}{2} \right )^{2n} -
\frac{1}{8} \left ( \frac{-1}{2} \right )^{3n} +
\frac{1}{8} \left ( \frac{-1}{2} \right )^{4n}\]
<p>For $n = 10$, say, we would expect</p>
\[x_n \approx 1.0006107\]
<p>the actual answer is</p>
\[x_n = 1.0006512\]
<p>which, seeing as $x_0 = 2$ is pretty far away from $1$ (the limit),
$5$ is a pretty small number of terms to use,
and $10$ really isn’t <em>that</em> many iterations, is good enough for me.</p>
<p>Of course, you should look at the statistics in the output of the code above
to see how close we get for $n=100$, or any other number you like ^_^.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:2" role="doc-endnote">
<p>I know the basics, but there’s some real black magic people are
able to do by considering what type of singularities your function has.
This seems to be outlined in Flajolet and Sedgewick’s <em>Analytic Combinatorics</em>,
but every time I’ve tried to read that book I’ve gotten quite lost quite
quickly. I want to find some other references at some point, ideally at a
slower pace, but if I never do I’ll just have to write up a post about it
once I finally muster up the energy to really understand that book. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>It turns out you can sometimes say things if $|a_1| = 1$. But convergence
is slow (if you get it at all) and the entire discussion is a bit more
delicate. You should see de Bruijn’s <em>Asymptotic Methods in Analysis</em>
(chapter $8$) for more details. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>Notice these techniques can’t remove the $\epsilon$. For instance,
$n C^n = O((C+\epsilon)^n)$ for each $\epsilon$, but is <em>not</em> $O(C^n)$. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Which is apparently called <a href="https://en.wikipedia.org/wiki/Schr%C3%B6der%27s_equation">Schröder’s equation</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Thu, 17 Jun 2021 00:00:00 +0000
https://grossack.site/2021/06/17/iteration-asymptotics.html
https://grossack.site/2021/06/17/iteration-asymptotics.htmlHow Many Group Structures on a Set?<p>And so ends my first year of grad school. I’m pretty tired, and my mental
health has taken a turn for the worse, though it’s hard to piece together if
the last few weeks were tiring because my mental health was declining, or if
my mental health is in decline because the last few weeks were tiring. Probably
a little bit of both. Anyways, I have some free time again and a backlog of
ideas for blog posts. Speaking of, now that my life update is out of the way,
let’s see a kind of cute computation!</p>
<p>So the other day someone on mse <a href="https://math.stackexchange.com/q/4166508/655547">asked</a>:</p>
<div class="boxed">
<p>Given a random binary operation on a finite set $G$, what is the probability
that it makes $G$ into a group?</p>
</div>
<p>The answer is, of course, vanishingly small. But it’s interesting to see
<em>how</em> vanishingly small. The answer is actually quite memorable!</p>
<p>We can get a lower bound by assuming $|G| = n$ is prime. After all, there is only
one group structure that will work in this case, so this is the lowest possible.
The only thing we can do is rename the elements, so there are $n!$ many group
operations on $G$.</p>
<p>We can get an upper bound if we assume $|G| = 2^n$. We also have to assume the
(widely believed) conjecture that “most groups are $2$ groups” in order to know
that this is an upper bound. At the very least, it is an upper bound for groups
of size at most $2000$, since <a href="https://math.stackexchange.com/q/241369/655547">$99\%$ of these groups have order $1024$</a>.</p>
<p>The same mse question I just linked provides an asymptotic formula for the
number of group structures on a set of size $N = 2^n$. Multiplying by $N!$
because we are interested in all group structures, not just groups up to
isomorphism, we find an upper bound of (very roughly) $N! \ N^{\frac{2}{27} \log(N)^2}$</p>
<p>Putting our upper and lower bounds together, we see</p>
\[N! \leq
\text{ \# group structures on a set of size $N$ } \leq
N! \ N^{\frac{2}{27} \log(N)^2}\]
<p>But by approximating $N! \approx \left ( \frac{N}{e} \right )^N$, we get</p>
\[e^{-N} N^N \leq
\text{ \# group structures on a set of size $N$ } \leq
e^{-N} N^{N + \frac{2}{27} \log(N)^2}\]
<p>logging everything in sight shows the number of group structures is
$\Theta(N \log(N))$. We can write this as the (rather memorable)</p>
\[\text{\# group structures on a set of size $N$} = N^{\Theta(N)}\]
<p>and to finally answer the problem, there are $N^{N^2}$ many distinct binary
operations on a set of size $N$. So the probability that a random one is a
group operation decays like $N^{- \Theta(N^2)}$, which is vanishingly small,
as promised.</p>
Thu, 17 Jun 2021 00:00:00 +0000
https://grossack.site/2021/06/17/how-many-groups.html
https://grossack.site/2021/06/17/how-many-groups.htmlA Wild Arctan Formula<p>Yesterday a good friend of mine sent me the following bizarre formula:</p>
\[4^{1/\pi} = \lim_{n \to \infty} \frac{\pi}{2 \arctan(n)} \frac{\pi}{2 \arctan(n+1)} \frac{\pi}{2 \arctan(n+2)} \cdots \frac{\pi}{2 \arctan(2n)}\]
<p>This is listed as formula $(130)$ at the bottom of the wolfram mathworld page on
<a href="https://mathworld.wolfram.com/PiFormulas.html">formulas for $\pi$</a>,
where it is called “A fascinating result due to Gosper”.
There are $3$ citations for Gosper on that page, but I can’t actually figure
out how to <em>see</em> any of them. That’s fine, though – We’ll just have to prove
it ourselves. I ended up heavily using computer tools
(by which, of course, I mean <a href="https://sagemath.org">sage</a>) to crack this,
and it almost felt like cheating (since I knew my friend was doing it by hand).
But I hope that, in addition to showing off a cool formula, this blog post
can showcase how one might use sage to solve problems in the wild
(even though in this case we used sage in a fairly mundane way).</p>
<hr />
<p>First let’s rewrite this product a little bit more compactly.
We want to show</p>
\[\lim_{n \to \infty} \prod_{k=n}^{2n} \frac{\pi}{2 \arctan(k)} = 4^{1/\pi}.\]
<p>I only know one way to handle limits of products, so let’s hit everything
in sight with $\log$ and work with a sum instead. We now want to show</p>
\[\lim_{n \to \infty} \sum_{k = n}^{2n} \log \left ( \frac{\pi}{2 \arctan(k)} \right ) = \log(4^{1 / \pi}) = \frac{\log(4)}{\pi}.\]
<p>We know that as $k \to \infty$, $\arctan(k) \approx \frac{\pi}{2}$, so the thing
we’re logging is getting close to $1$, and so our summands are getting close to $0$.
That’s a good sign, but if we want to really compute these things we need to
understand quantitatively how close to $0$ we really are.</p>
<p>My first instinct is to taylor expand at infinity. That is, let’s write</p>
\[f(x) = \log \left ( \frac{\pi}{2 \arctan(1/x)} \right )\]
<p>and expand <em>this</em> at $x = 0$. This corresponds to taylor expanding our
actual summand “near $k = \infty$” since $x = 1/k \approx 0$ when $k \approx \infty$.
I have a custom function<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">1</a></sup> in my <code class="language-plaintext highlighter-rouge">init.sage</code> to automatically compute a
series expansion at $0$, but for some reason it doesn’t work with this<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">2</a></sup>:</p>
<p><img src="/assets/images/wild-arctan-formula/error-message.png" /></p>
<p>So we do some googling for $\arctan(1/x)$, and we land on the following
really fun formula:</p>
\[\arctan(x) + \arctan(1/x) = \pi / 2 \quad \quad (\text{for $x > 0$}).\]
<p>This is really neat (and has a <a href="https://math.stackexchange.com/a/2147689/655547">nice geometric proof</a>),
and lets us rewrite our function as</p>
\[f(x) = \log \left ( \frac{\pi}{2 (\pi /2 - \arctan(x))} \right ).\]
<p>Sage will happily give us a series expansion of <em>this</em> and we learn</p>
\[f(x) = \frac{2x}{\pi} + O(x^2),\]
<p>so our sum becomes</p>
\[\lim_{n \to \infty} \sum_{k=n}^{2n} f(1/k) = \lim_{n \to \infty} \sum_{k=n}^{2n} \left ( \frac{2}{\pi k} + O(1/k^2) \right ).\]
<p>But now we’re basically done! Our error term vanishes in the limit, since
$\sum_{k=n}^{2n} 1/k^2$ is bounded by the tail of a convergent sequence.
Then we pull out the factor of $2 / \pi$, and we find our sum is</p>
\[\frac{2}{\pi} \lim_{n \to \infty} \sum_{k=n}^{2n} \frac{1}{k} = \frac{2}{\pi} \lim_{n \to \infty} H_{2n} - H_n\]
<p>where $H_n$ is the $n$th <a href="https://en.wikipedia.org/wiki/Harmonic_number">harmonic number</a>.</p>
<p>We know that $H_n \sim \log(n)$, and so $H_{2n} - H_n \sim \log(2n) - \log(n) = \log(2)$ in the limit.</p>
<p>This, at last, tells us that our sum is</p>
\[\frac{2}{\pi} \log(2) = \log(4^{1/\pi})\]
<p>and so our product is</p>
\[4^{1/\pi}\]
<p>as desired.</p>
<hr />
<p>Notice nothing we did was <em>hard</em>. Not only does this show the power of
calculus (I feel like the more math I learn the more I respect basic calculus),
it also shows the power of sage to quickly and easily do things like taylor
expansions for us. Of course, with some extra time to think about the problem,
I’ve come up with a way you <em>could</em> see this without needing to taylor expand
anything too tricky<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>, but before you polish the edges of a proof, it’s
nice to have <em>something</em>, and sage really makes that “first draft”
version of a proof easier to find.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:2" role="doc-endnote">
<p>Which you can find in my <a href="https://github.com/HallaSurvivor/dotfiles/blob/master/init.sage">dotfiles</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:1" role="doc-endnote">
<p>While writing this blog post, I tried the taylor series implementation,
and it actually <em>did</em> work!</p>
<p><img src="/assets/images/wild-arctan-formula/it-works.png" /></p>
<p>This would have made the whole process go a bit more smoothly, but we
wouldn’t have learned the fun formula, so I’m weirdly glad that the
power series implemtation has slightly different behavior from the
taylor series implementation… Presumably because one is formal, while
one is analytic?</p>
<p>It is nice that sage could have directly told us that our summand looks
like $\frac{2x}{\pi} + O(x^2)$, though, and I guess this means I’ll have
to modify my series code to call the taylor command if calling series
throws an error. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>It’s pretty clear that near $0$ we should have $\arctan(x) \sim x$.
You can basically read this off a graph, or you could remember
$\frac{d}{dx} \arctan(x) = \frac{1}{1+x^2}$ and evaluate at $0$.</p>
<p>Next we’ll need the fun fact from above, that
$\arctan(x) + \arctan(1/x) = \pi/2$ for positive $x$, as well as a
famous generating function that in hindsight I actually have memorized:</p>
\[\log \left ( \frac{1}{1-x} \right ) = 0 + \frac{1}{1} x + \frac{1}{2} x^2 + \frac{1}{3} x^3 + \ldots\]
<p>Putting these facts together, we can pretty quickly get</p>
\[\begin{align}
f(x)
&= \log \left ( \frac{\pi}{2 (\pi/2 - \arctan(x))} \right ) \\
&= \log \left ( \frac{\pi}{\pi - 2 \arctan(x)} \right ) \\
&= \log \left ( \frac{1}{1 - \frac{2}{\pi} \arctan(x)} \right ) \\
&\sim \log \left ( \frac{1}{1 - \frac{2}{\pi} x} \right ) \\
&= \frac{2x}{\pi} \pm O(x^2)
\end{align}\]
<p>and we could finish off the proof from here.</p>
<p>Note, though, that even though
we avoided sage (or otherwise computing an awful taylor expansion by hand),
the price we paid was more background knowledge. I happened to know this
series, as well as this trick for swapping $\arctan(x)$ with $\arctan(1/x)$,
but there are lots of places where I definitely would <em>not</em> know the
relevant tricks. Obviously it’s still good to know lots of these little
tricks, and computer algebra systems are no excuse for a familiarity with
the objects you’re studying. But it can be really helpful for mindless
computation all the same.</p>
<p>Alright, I’ll get off my soapbox now :P <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Sun, 23 May 2021 00:00:00 +0000
https://grossack.site/2021/05/23/wild-arctan-formula.html
https://grossack.site/2021/05/23/wild-arctan-formula.htmlMaking Pretty Pictures for Galois Theory<p>So in my algebra class we’re doing <a href="https://en.wikipedia.org/wiki/Galois_theory">galois theory</a>, a subject which
never seems to really click with me. I know a lot of the theorems, and
I can even solve a lot of the problems, but I always feel uneasy about it.
The computational problems often feel like guesswork, and the theoretical
problems feel either trivial or impossible with little in between. I used to
feel this way about analysis, but it stings more to be struggling so much with
a subject so near to my heart.</p>
<p>All this to say I’ve been spending a lot of time reading about galois theory
and thinking about galois theory. I found <a href="https://cs.uwaterloo.ca/~cbright/reports/computing-galois-group.pdf">a paper</a><sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> which includes some
pretty pictures of roots of polynomials and how they permute. I know that the
“symmetries” we study in galois theory are more abstract than symmetries we
can obviously see by plotting roots in the complex plane, but I thought it
could be fun to make some pretty pictures of my own anyways. My undergraduate
advisor (<a href="www.cs.cmu.edu/~sutner/">Klaus Sutner</a>) <em>loves</em> making pictures of everything he studies, and
a lot of that has rubbed off on me. You never know when some picture will
exhibit some pattern which you can make mathematically precise.</p>
<p>Of course, once you’re making one or two sets of pictures, you might as well
make a framework for making them. I also like showing off sage on this blog,
so it only makes sense to put my code here for future students to play around with.
Some of the arrows are a little messed up, but my basic algorithm actually
makes a lot of them look quite nice ^_^.</p>
<p>As one quick word of caution: keep in mind that <a href="https://mathoverflow.net/questions/58397/the-galois-group-of-a-random-polynomial">most polynomials</a> have the
whole symmetric group as their group of symmetries! So when you plot those
pictures you’re likely to get factorially many in the degree! Be careful
plotting any polynomials of degree $> 4$ if you want to keep the computation
somewhat quick.</p>
<div class="linked_auto">
<script type="text/x-sage">
R.<x> = QQ[x]
def draw_arrow(p1,p2):
"""
Draw a curved arrow connectiong two points
@param p1 the tail of the arrow
@param p2 the head of the arrow
@return: a plot of the arrow
"""
path = [p1, (p1+p2)/2 + (p2 - p1) * I / 3, p2]
path = [(p.real(),p.imag()) for p in path]
return arrow2d(path=[path], aspect_ratio=1)
def draw_all_actions(f):
"""
Draw the action of each sigma in gal(f) on the roots of f
@param f a polynomial whose galois group we want to study
"""
f = R(f)
# get the galois group of f
K.<a> = f.splitting_field()
G = K.galois_group()
# get the roots of f in K
roots = f.roots(multiplicities=False, ring=K)
# fix an embedding K --> CC
toCC = K.embeddings(CC)[0]
# the basic plot of all the points we're drawing on
pts = list_plot([(toCC(r).real(), toCC(r).imag()) for r in roots], size=30, aspect_ratio=1)
# actually draw all of the pretty pictures
for g in G:
# turn g into a field homomorphism
g = g.as_hom()
out = pts
for r in roots:
out += draw_arrow(toCC(r), toCC(g(r)))
show(out)
@interact
def _(f=input_box(x^5-2, width=20, label="$f$"), auto_update=False):
f = R(f)
draw_all_actions(f)
</script>
</div>
<div class="boxed">
<p>As a (fun?) exercise, can you modify the above code to pick a generating
set for the galois group and plot the generators in red rather than blue?
Or what about <em>only</em> plotting the generators? That might let you visualize
galois groups of larger polynomials.</p>
</div>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Curtis Bright’s <em>Computing the Galois Group of a Polynomial</em>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 19 May 2021 00:00:00 +0000
https://grossack.site/2021/05/19/galois-pretty-pictures.html
https://grossack.site/2021/05/19/galois-pretty-pictures.htmlFinite Calculus, Stirling Numbers, and Cleverly Changing Basis<p>I’m TAing a linear algebra class right now, and the other day a student came
to my office hours asking about the homework. Somehow during this discussion
I had a flash of inspiration that, if I ever teach a linear algebra class of
my own, I would want to use as an example of changing basis “in the wild”.
When I took linear algebra, all the example applications were to diagonalization
and differential equations – but I”m mainly a discrete mathematician, and I
would have appreciated something a bit closer to my own wheelhouse.</p>
<p>The observation in this post was first pointed out in a combinatorics class
I took with <a href="https://www.math.cmu.edu/~clintonc/">Clinton Conley</a>. I was aware of the theorem, but I hadn’t
thought of it as a change of basis theorem until that point. I remember feeling
like this was incredibly obvious, and simultaneously quite enlightening. I hope
you all feel the same way about it ^_^. At the very least, this will be a nice
change of pace from all the thinking I’ve been doing about power series
(which should be a follow up to <a href="/2021/05/05/initial-polynomial-proofs">my post</a> the other day) as well as a few
other tricky things I’m working on. It’s nice to talk about something
(comparatively) easy for a change!</p>
<hr />
<p>Let’s take a second to talk about <a href="https://en.wikipedia.org/wiki/Finite_difference">finite calculus</a>. That wikipedia link
is only so-so (at least at the time of writing), but there’s a great intro
by David Gleich <a href="https://www.cs.purdue.edu/homes/dgleich/publications/Gleich%202005%20-%20finite%20calculus.pdf">here</a> and you can read more in
Knuth, Graham, and Patashnik’s <em>Concrete Mathematics</em> (Ch 2.6) as well as
the (encyclopedic) <em>Calculus of Finite Differences</em> by Jordan<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<p>There’s a lot to say, but the tl;dr is this:
Finite Calculus’s raison dêtre is to compute sums with the same facility
we compute integrals (and indeed, with analogous tools). If you’ve ever
been mystified by <a href="https://en.wikipedia.org/wiki/Summation_by_parts">Summation by Parts</a><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, you’ve already encountered
part of this machinery. I won’t go into much detail in this post, because
I want to keep this short. But I highly encourage you to look into it if
you spend a lot of time computing sums. Nowadays I mainly use
<a href="https://sagemath.org">sage</a>, but it’s nice to know how to do some of these
things by hand.</p>
<p>We start with discrete differentiation:</p>
<div class="boxed">
<p>For a function $f$, we define $\Delta f$
(the <span class="defn">Forward Difference</span> of $f$) to be</p>
\[\Delta f = \frac{f(x+1) - f(x)}{1} = f(x+1) - f(x).\]
<p>Obviously most people write it the second way, but I like to show the
first to emphasize the parallel with the classical derivative.</p>
</div>
<p>This satisfies variants of the nice rules you might want a “derivative” to satisfy:</p>
<div class="boxed">
<p>As an exercise, show the following<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">3</a></sup>:</p>
\[\begin{align}
\text{(Linearity)} && \Delta(\alpha f + \beta g) &= \alpha \Delta f + \beta \Delta g \\
\text{(Leibniz)} && \Delta(f \cdot g) &= (\Delta f) \cdot g + f \cdot (\Delta g) + (\Delta f) \cdot (\Delta g) \\
\end{align}\]
<p>As a tricky challenge, can you find a quotient rule?
As a <em>very</em> tricky challenge, can you find a chain rule<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">4</a></sup>?</p>
</div>
<p>We also get a fundamental theorem of calculus (with a <em>much</em> easier proof!):</p>
<div class="boxed">
<p>Theorem (Fundamental Theorem of Finite Calculus):</p>
<p>\(\sum_a^b \Delta f = f(b+1) - f(a)\)</p>
</div>
<hr />
<p>Of course, these give us ways of <em>combining</em> facts we already know. In a
calculus class we have a toolbox of “basic” functions that we know how to
differentiate. Are there any such functions here? The answer is <em>yes</em>, and
that leads us to the linear algebraic point of this post!</p>
<div class="boxed">
<p>Define the <span class="defn">$n$th falling power</span> to be</p>
\[x^{\underline{n}} = (x-0) (x-1) (x-2) \cdots (x-(n-1))\]
<p>(at least when $n \gt 0$).</p>
</div>
<p>Then we have the following “power rule” for forward differences<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>:</p>
<div class="boxed">
<p>\(\Delta x^{\underline{n}} = n x^{\underline{n-1}}\)</p>
</div>
<p>This plus the fundamental theorem lets us quickly compute
sums of “falling polynomials”. As an example:</p>
\[\begin{align}
\sum_a^b 4 x^\underline{3} - 2 x^\underline{2} + 4
&= \sum_a^b 4 x^\underline{3} - \sum_a^b 2 x^\underline{2} + \sum_a^b 4 \\
&= \sum_a^b \Delta x^\underline{4}
- \frac{2}{3} \sum_a^b \Delta x^\underline{3}
+ 4 \sum_a^b \Delta x^\underline{1} \\
&= \left . x^\underline{4} - \frac{2}{3} x^\underline{3} + 4 x^\underline{1} \right |_a^{b+1} \\
&= \left ( (b+1)^\underline{4} - a^\underline{4} \right )
- \frac{2}{3} \left ((b+1)^\underline{3} - a^\underline{3} \right )
+ 4 \left ( (b+1) - a \right )
\end{align}\]
<p>This is great, but we don’t often see $x^{\underline{k}}$ in the wild.
Most of the time we want to sum “classical” polynomials with terms like $x^k$.
If only we had a way to easily convert back and forth between “classical”
polynomials and “falling” polynomials…</p>
<p>Of course, that’s the punchline! We know the space of polynomials
has a standard basis \(\{x^0, x^1, x^2, x^3, \ldots \}\). But notice the
polynomials \(\{x^\underline{0}, x^\underline{1}, x^\underline{2}, x^\underline{3}, \ldots \}\)
<em>also</em> form a basis!</p>
<div class="boxed">
<p>If this isn’t obvious, you should do it as an easy exercise. As a hint,
what is the degree of each $x^\underline{n}$?</p>
</div>
<p>And now we have a very obvious reason to care about change of basis, which
I think a lot of young mathematicians would appreciate. There’s a lot
of good pedagogy that one can do with this, since the new basis isn’t contrived
(it comes naturally out of a desire to compute sums), and it’s an easy to
understand example. Plus it’s obvious that we’re representing the
<em>same polynomial</em> in multiple ways. In my experience a lot of students struggle
with the idea that changing bases doesn’t actually change the vectors themselves,
only the names we give them (i.e., their coordinates). This gives us an
understandable example of that.</p>
<p>As a sample exercise, we might ask our students to compute
$\sum_{x=1}^n x^2$. Once they know $x^2 = x^\underline{2} + x^\underline{1}$,
(which can be worked out by hand without much effort) they can compute</p>
\[\sum_1^n x^2
= \sum_1^n x^\underline{2} + x^\underline{1}
= \left . \frac{x^\underline{3}}{3} + \frac{x^\underline{2}}{2} \right |_1^{n+1}
= \frac{(n+1)^\underline{3} - 1^\underline{3}}{3} + \frac{(n+1)^\underline{2} - 1^\underline{2}}{2}\]
<p>They can then check (with sage, say) that this agrees with the <a href="https://math.stackexchange.com/questions/48080/sum-of-first-n-squares-equals-fracnn12n16">usual formula</a>.</p>
<hr />
<p>At this point, we’re probably sold on the idea that this alternate basis is
useful for computing these sums. But it’s not yet clear how effective this is.
If I ask you to compute, say, $\sum_a^b x^5$, how would you go about doing it?
We need to know how to actually <em>compute</em> this change of basis<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</p>
<p>Enter the <a href="https://en.wikipedia.org/wiki/Stirling_number">stirling numbers</a>. There’s a lot of very pretty combinatorics
here, but let’s focus on what’s relevant for our linear algebra.
We write ${n \brace k}$ for the “stirling numbers of the second kind”, and
it turns out that</p>
\[x^n = \sum_k {n \brace k} x^\underline{k}\]
<p>which is almost usable! All we need now is a way to quickly compute
${n \brace k}$. Thankfully, there’s jn analogue of Pascal’s Triangle
that works for these coefficients!</p>
<p>Just like pascal’s triangle, we have $1$s down the outside, and we build
the $n+1$th row by adding the two terms from the previous row which you
sit between.
The only difference is the stirling numbers keep track of what <em>column</em> you’re
in as well. Concretely, the recurrence is</p>
\[{n+1 \brace k} = {n \brace k-1} + k {n \brace k}\]
<p>So you add the number above you and to your left to $k$ times the number
above you and to your right. You increase $k$ with every step. Let’s do some
sample rows together:</p>
<p>Say our previous row was</p>
\[1 \quad 7 \quad 6 \quad 1\]
<p>Then our next row will be</p>
\[{\color{blue}1}
\quad
1 + {\color{blue}2} \times 7
\quad
7 + {\color{blue}3} \times 6
\quad
6 + {\color{blue}4} \times 1
\quad
1\]
<p>which is, of course</p>
\[1 \quad 15 \quad 25 \quad 10 \quad 1.\]
<p>Then the next row will be</p>
\[{\color{blue}1}
\quad
1 + {\color{blue}2} \times 15
\quad
15 + {\color{blue}3} \times 25
\quad
25 + {\color{blue}4} \times 10
\quad
10 + {\color{blue}5} \times 1
\quad
1\]
<p>In the above example you can see the blue multiplier is just increasing by $1$
each time. We’re always combining the two entries above the current one, just
like in pascal’s version.</p>
<p><br /></p>
<p>Finally, to be super clear, if we know the $4$th row of our triangle is</p>
\[1 \quad 7 \quad 6 \quad 1\]
<p>that tells us that</p>
\[x^4 = x^\underline{4} + 7 x^\underline{3} + 6 x^\underline{2} + x^\underline{1}.\]
<div class="boxed">
<p>There’s no substitute for doing: As an exercise, you should write out the
first $10$ or so rows of the triangle. Use this to compute $\sum_a^b x^5$.</p>
</div>
<hr />
<p>Another good exercise I might give students one day is to
explicitly write down change of basis matrices for, say, polynomials of degree
$4$. This more or less amounts to writing the above triangle as a matrix,
but hopefully it will give students something to play with to better understand
how the change of basis matrices interact with the vectors.</p>
<p>I really think this example has staying power throughout the course as well.
Once we know $\Delta$ is linear, we know it must have a representation as a
matrix. Computing that representation in the falling power basis and in the
standard basis would be another good exercise. One could also introduce
<a href="https://en.wikipedia.org/wiki/Indefinite_sum">indefinite summation</a> (say by picking constant term $0$).
Again, we know what its matrix looks like in the falling powers basis,
but it’s not at all clear what it looks like in the standard basis.
After conjugating by a change of basis matrix, though, we can figure this
out! And the cool thing? Next time you want to compute a sum, you can just
multiply by (a big enough finite part) of this matrix and evaluate at the
endpoints!</p>
<p>If you’re a teacher and end up using this, or maybe you already <em>were</em> using
this, definitely let me know! I’d be excited to hear about this and other
ways that you try to make linear algebra feel more concrete to people
learning it for the first time.</p>
<p>If you’re a student, hopefully this was exciting! I know I get
geekier about this kind of stuff than a lot of people, but I think
finite calculus is a really cool idea. Hopefully this post encourages you
to go looking for other information about this technique, and maybe
shows that linear algebra is never very far away ^_^.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>This book is actually <em>super</em> cool. It’s fairly old, and that shows in the
language (which can be kind of hard to read sometimes). What’s really cool,
though, is that it’s written for working statisticians in a pre-computer
era. So there’s a ton of pages with detailed tables, and a ton <em>more</em>
pages about how to go about making your own tables should you need some
family of constants that isn’t included. Obviously I’ll never have use
for those particular skills, so I haven’t read those parts too closely,
but I find it <em>so</em> interesting to see how things like that used to be done! <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>And who among us <em>wasn’t</em> when we first heard about it? I remember seeing it
in Baby Rudin, at which point I got really excited. Then really confused.
Then (after some deep thinking) really excited again. It took me a long time
to understand some quirks of the formula, though. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>This actually isn’t how you often see the leibniz rule written. Even though
it’s objectively better than the alternative. Almost every reference I’ve
seen writes the leibniz rule as</p>
\[\Delta(f \cdot g) = (\Delta f) \cdot (Eg) + f \cdot (\Delta g)\]
<p>where $(Eg)(x) = g(x+1)$ is the “shift operator”.</p>
<p>I assume this is because summing both sides of this equation gives
the summation by parts formula, but the fact that the left hand side
is symmetric in $f$ and $g$ while the right hand side isn’t is… offputting. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>I’m not sure if there’s a good answer to this one, actually. There’s an
mse question about it <a href="https://math.stackexchange.com/questions/235680/chain-rule-for-discrete-finite-calculus">here</a>, but it’s pretty unsatisfying.</p>
<p>If you’ll indulge me in some philosophical waxing: The
classical chain rule witnesses the functoriality of the derivative
(really functoriality of the <a href="https://en.wikipedia.org/wiki/Tangent_bundle">tangent bundle</a>, but the induced map on
tangent bundles is exactly the derivative). I’m curious if the nonexistence
of a nice chain rule for us comes down to the fact that this isn’t actually
a functorial thing to do… I would think about it more, but I’m trying to
keep this post somewhat low-effort. I would love to hear someone else’s
thoughts on this, though. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>There are other “fundamental” forward differences worth knowing as well.
Here’s a few to have in your pocket:</p>
<ul>
<li>$\Delta 2^x = 2^x$</li>
<li>More generally, $\Delta r^x = (r-1) r^x$</li>
<li>$\Delta \binom{x}{n} = \binom{x}{n-1}$</li>
<li>If we define $x^{\underline{0}} = 1$ and $x^{\underline{-n}} = \frac{1}{(x+1)(x+2)\cdots(x+n)}$, then the power rule continues to work.</li>
<li>$\Delta H_x = x^{\underline{-1}}$, where $H_x$ are the <a href="https://en.wikipedia.org/wiki/Harmonic_number">harmonic numbers</a></li>
</ul>
<p><a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>This is the kind of thing that I would probably just tell my
hypothetical students, but I might post a video or send them a blog
post where I go through it in detail as extra material for anyone
who’s interested. Introducing stirling numbers and proving properties
about them is really the regime of a combinatorics class, but I think
it doesn’t take too much time to show them the analogue of pascal’s
triangle so that they can actually <em>use</em> this technique should the need
arise. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Thu, 13 May 2021 00:00:00 +0000
https://grossack.site/2021/05/13/stirling-basis-change.html
https://grossack.site/2021/05/13/stirling-basis-change.htmlReducing to $\mathbb{Z}$ -- Permanence and Concrete Proofs<p>There are lots of ways in which good notation can make results seem obvious.
There are also lots of ways in which “illegally” manipulating expressions can
give a meaningful answer at the end of the day.
It turns out that in many cases our illegal manipulations are actually justified,
and this is codified in the principle of
<span class="defn">Permanence of Identities</span>!
This is one place where category theory and model theory conspire in a
particularly beautiful (and powerful) way.</p>
<p>In this post we’ll talk about how to prove statements in general rings by
proving analogous statements for polynomials with integer coefficients.
This is nice because we often have access to ~bonus tools~ when working in
$\mathbb{Z}$, and it doesn’t matter if we <em>use</em> these bonus tools to prove
the general result!</p>
<p>I think this technique is best shown by example, so I’ll give a smattering
of proofs using this idea. Hopefully by the end you’ll be convinced of its
flexibility ^_^.</p>
<hr />
<h2 id="example--the-binomial-theorem">Example – The Binomial Theorem</h2>
<p>Let’s start simple, and build from here. In any of my favorite rings $R$
(though I should mention all my favorite rings are commutative with $1$)
we can take any two elements $a$ and $b$ and know that</p>
\[(a+b)^n = \sum_k \binom{n}{k} a^k b^{n-k}\]
<p>where we (as usual) interpret scaling by an integer as repeated addition.</p>
<p>Many authors prove this by saying something like
“the usual proof goes through unchanged”, but we can actually be a bit cleverer.</p>
<p>$\ulcorner$
First we prove this identity in $\mathbb{Z}[a,b]$. Then we notice there is
a (unique) ring hom $\varphi : \mathbb{Z}[a,b] \to R$ for each choice of $a$ and $b$
in $R$. This is the category theory at work, since $\mathbb{Z}[a,b]$ is the
free (commutative) ring on two generators. Next, we use model theory:
Homomorphisms preserve truth, and so the true equation $p(a,b) = q(a,b)$ in
$\mathbb{Z}[a,b]$ must stay true after we hit it with $\varphi$!</p>
<p>So since we proved $(a+b)^n = \sum_k \binom{n}{k} a^k b^{n-k}$ in $\mathbb{Z}[a,b]$,
we actually get for <em>free</em> that the equation holds for every pair of elements
in every ring!
<span style="float:right">$\lrcorner$</span></p>
<p>Notice that it doesn’t matter what “the usual proof” is! Maybe you like to
prove the binomial theorem by looking at the taylor series of $(1+x)^n$ and
remembering $\mathbb{Z} \subseteq \mathbb{R}$.
This method is entirely unavailable in a general ring, but because we end up
with a polynomial equality, we can conclude that the theorem is
true in general rings anyways!</p>
<p>There are lots of situations where we run this same kind of argument.</p>
<hr />
<h2 id="example--sylvesters-identity">Example – Sylvester’s Identity</h2>
<p>This one is a favorite example of <a href="https://math.stackexchange.com/users/242/bill-dubuque">Bill Dubuque</a> on mse. I’ve seen him
evangelize it so often I felt obligated to include it. It helps that it’s
such a great example! He actually has a <em>fantastic</em> explanation of this same
princple <a href="https://math.stackexchange.com/a/98365/655547">here</a>, which I definitely recommend reading if you’re interested!</p>
<p>Sylvester’s identity says that
(for $n \times n$ matrices over any ring $R$)</p>
\[\text{det}(1 + AB) = \text{det}(1 + BA)\]
<p>$\ulcorner$
How can we prove this? Well, let’s work in
$\mathbb{Z}[a_{ij}, b_{ij}]$, where we have one variable for each of the $2n^2$
matrix entires. Now we have</p>
\[\text{det}(1 + AB) \text{det}(A) = \text{det}(A + ABA) = \text{det}(A) \text{det}(1+BA)\]
<p>Since the determinant is a polynomial in the entries of a matrix<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">1</a></sup>, this is
really expressing the equality of polynomials with integer coefficients!</p>
<p>So we have a polynomial equation $fh = hg$, and we can happily cancel the
nonzero polynomial $h$ from both sides, since $\mathbb{Z}[a_{ij}, b_{ij}]$
is a domain! Here $h$ is the polynomial $\text{det}(A)$, and we get the claim.
<span style="float:right">$\lrcorner$</span></p>
<p>Notice that we’ve, again, used a special property of integer polynomials in
this proof! We can cancel polynomials with reckless abandon because we’re working
in a domain. Once we prove this polynomial identity, though, the result remains
true after we evaluate! In particular, even if
the specific $\text{det}(A)$ of interest is $0$, or the particular $R$ of
interest is <em>not</em> a domain!</p>
<p>Whatever tools we want to use inside $\mathbb{Z}[a_{ij}, b_{ij}]$ is totally ok,
as long as we end our proof with a polynomial identity.</p>
<div class="boxed">
<p>As a simple exercise, can you extend this result to the case where
$A = n \times m$ and $B = m \times n$?</p>
</div>
<hr />
<h2 id="example--computing-inverses">Example – Computing Inverses</h2>
<p>I said we would be focusing on commutative rings in this post, and that’s true.
But there’s a really cool noncommutative example that follows the same
principle and is worth showing.
I learned about this on mse (where else?) in a different <a href="https://math.stackexchange.com/a/675128/655547">excellent post</a> by Bill Dubuque.</p>
<div class="boxed">
<p>Even in noncommutative rings<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">2</a></sup>, if $1 - ab$ has an inverse, then
$1 - ba$ does too.</p>
</div>
<p>$\ulcorner$
We want to work in the ring of noncommutative polynoimals
$\mathbb{Z} \langle a,b \rangle$, but it’s not quite big enough.
We’re making an assumption that $(1-ab)^{-1}$ exists, but it actually
<em>doesn’t</em> in $\mathbb{Z} \langle a,b \rangle$. That said, we can freely add
such an inverse – let’s work in</p>
\[R = \mathbb{Z} \langle a, b, c \rangle \Bigg / (1 - ab)c = 1 = c(1-ab).\]
<p>Now for the clever trick:
we can embed this into the ring of noncommutative power series
\(\mathbb{Z} \langle \! \langle a,b \rangle \! \rangle\).
We send $a \mapsto a$, $b \mapsto b$, and $c \mapsto (1-ab)^{-1}$.</p>
<p>In \(\mathbb{Z} \langle \! \langle a,b \rangle \! \rangle\), we
can run the following argument
(which, of course, would be nonsensical in other settings):</p>
\[\begin{aligned}
(1-ba)^{-1}
&= 1 + ba + (ba)^2 + (ba)^3 + \ldots \\
&= 1 + ba + baba + bababa + \ldots \\
&= 1 + b(1 + ab + abab + \ldots)a \\
&= 1 + b(1-ab)^{-1}a \\
\end{aligned}\]
<p>But this means in \(\mathbb{Z} \langle \! \langle a, b \rangle \! \rangle\)
we have the identity</p>
\[(1-ba) (1 + b(1-ab)^{-1}a) = 1 = (1 + b(1-ab)^{-1}a) (1-ba)\]
<p>which, under our embedding, gives us the following identity in
$R$:</p>
\[(1-ba) (1+bca) = 1 = (1+bca) (1-ba)\]
<p>But then since this ring is initial
among all noncommutative rings with $2$ free variables and an inverse
for $(1-ab)$, we find the identity holds in <em>every</em> (noncommutative) ring $R$!
<span style="float:right">$\lrcorner$</span></p>
<p>Notice the extra power we got, both by using quotient rings to model
some hypotheses in our theorem and by passing to formal power series.
This is part of what’s so nice about embeddings! They let us prove statements
in some smaller setting by using techniques from a bigger setting<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">3</a></sup>. We’ve
been implicitly using this idea throughout the post, but I wanted to make it
explicit at least once. After all, once we’re aware of it, we can intentionally
use it in other settings as well<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">4</a></sup>.</p>
<hr />
<h2 id="a-sentimental-interlude--seven-trees-in-one">A Sentimental Interlude – Seven Trees in One</h2>
<p>The first paper I ever read<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">5</a></sup> opens with the following beautiful passage:</p>
<blockquote>
<p>Consider the following absurd argument concerning planar, binary, rooted,
unlabelled trees. Every such tree is either the trivial tree or consists of
a pair of trees joined together at the root, so the set $T$ of trees is
isomorphic to $1+T^2$. Pretend that $T$ is a complex number and solve the
quadratic $T = 1+T^2$ to find that $T$ is a primitive sixth root of unity
and so $T^6 = 1$. Deduce that $T^6$ is a one-element set; realize immediately
that this is wrong. Notice that $T^7 \cong T$ is, however, not obviously
wrong, and conclude that it is therefore right. In other words, conclude
that there is a bijection $T^7 \cong T$ built up out of copies of the
original bijection $T \cong 1 + T^2$: a tree is the same as seven trees.
The point of this paper is to show that ‘nonsense proofs’ of this kind are,
actually, valid.</p>
</blockquote>
<p>You can see that we’ve “proven” a claim about trees by proving a polynomial
implication in $\mathbb{C}$. That is, the authors show</p>
\[T = 1+T^2 \implies T^7 = T.\]
<p>In the paper, the authors show that homomorphisms of certain polynomial
<em>implications</em> are also preserved<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">6</a></sup> for rigs (that is, rings without negatives).
Here $\mathbb{N}[T]$ plays the role of the initial rig, which embeds in
$\mathbb{C}[T]$. Then we use complex analysis to show the above implication
holds in $\mathbb{C}[T]$ and thus in $\mathbb{N}[T]$.</p>
<p>Since the objects of a category (up to isomorphism) with products and coproducts
forms a rig, this tells us there is a hom from $\mathbb{N}[T]$ to the category
of, say, algebraic datatypes (up to isomorphism).</p>
<p>Since this polynomial implication is of the variety that’s preserved, and in
the category of datatypes we have $T \cong 1 + T^2$, we are allowed to conclude
$T \cong T^7$!</p>
<p>This follows the <em>spirit</em> of what we’re doing in this post, but is a bit
more detailed because general homomorphisms <em>don’t</em> preserve all implications.
A model theorist might jump straight to elementary embeddings, but that’s
far too restrictive for our purposes here. The authors of the above paper do a
great job finding (only slightly technical) conditions which make this
argument go through. I’ve included it both to show what’s possible when you
extend the ideas in this post, and also because it was my first paper and I feel
a certain amount of love for it.</p>
<hr />
<h2 id="example--mutliplicative-determinants">Example – Mutliplicative Determinants</h2>
<p>Say we want to prove that $\text{det}(AB) = \text{det}(A) \text{det}(B)$.
When we’re working over a field, there are slick basis-dependent arguments
to show this. See, for instance, Knapp’s “Basic Algebra” (ch. II.7)<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">7</a></sup>.</p>
<p>These arguments don’t go through for a general ring $R$, though, so you might
think we should drink some caffeine, look up the definition of the determinant
(for the second time this blog post… nobody <em>really</em> remembers it, right?),
and just hit this problem with some honest computation.</p>
<p>Of course, once we remember $\mathbb{Z} \subseteq \mathbb{Q}$, we
we might think of a better way (especially since we’ve seen the rest of the
post).</p>
<p>$\ulcorner$
We again look at $\mathbb{Z}[a_{ij}, b_{ij}]$.</p>
<p>We first note that the entries of $AB$ are polynomials in the
entries of $A$ and $B$ (if this isn’t clear, you should check it).</p>
<p>But since the determinant is a polynomial in the entries, we see the equation</p>
\[\text{det} \left ( (a_{ij}) (b_{ij}) \right )
= \text{det} \left ( (a_{ij}) \right ) \text{det} \left ( (b_{ij}) \right )\]
<p>is actually a polynomial equation (albeit a complicated one) in the
$a_{ij}$ and $b_{ij}$. So it suffices to show that it’s true in our
polynomial ring.</p>
<p>But $\mathbb{Z}[a_{ij},b_{ij}] \subseteq \mathbb{Q}(a_{ij},b_{ij})$, and we
<em>know</em> the formula is true for matrices over a field! So the formula is true
for us, and the claim follows for all rings.
<span style="float:right">$\lrcorner$</span></p>
<hr />
<p>This is a fun and powerful technique, and it’s really useful in a lot of
situations! Here are some quick exercises for you to play around with,
but I encourage you to look for your own as well!</p>
<div class="boxed">
<p>Pick your favorite two facts about matrix algebra and see if they’re
true over arbitrary rings. If you stick to facts about determinants,
matrix multiplication, row operations, etc. you should be able to choose
pretty much anything!</p>
</div>
<div class="boxed">
<p>Show that the quadratic formula always works, unless it obviously doesn’t.</p>
<p>As a hint, you’ll want to work in the ring</p>
<p>\(\mathbb{Z}\left [ a,b,c,d,a^{-1}, \frac{1}{2} \right ] \Bigg / d^2 = b^2 - 4ac\)</p>
</div>
<div class="boxed">
<p>Show that $\mathbb{Z}[x_1, \ldots, x_n]$ embeds in</p>
<ul>
<li>$C(\mathbb{R})$ (the ring of continuous functions on $\mathbb{R}$)</li>
<li>$C(\mathbb{C})$ (the ring of continuous functions on $\mathbb{C}$</li>
<li>$C^\infty(\mathbb{R})$ (the ring of smooth functions on $\mathbb{R}$)</li>
<li>$C^\infty(\mathbb{C})$ (the ring of entire functions on $\mathbb{C}$)</li>
</ul>
<p>so if we prove an polynomial identity using real or complex analysis it
is true in $\mathbb{Z}[x_1, \ldots, x_n]$ (and thus in all rings).</p>
</div>
<div class="boxed">
<p>As another powerful tool in your arsenal, say you want to prove some
polynomial identity in one variable: $p(x) = q(x)$.</p>
<ol>
<li>
<p>Show that there’s some finite number $N$ (depending on $p$ and $q$)
so that $p(x) = q(x)$ if and only if it’s true for $N$ many choices of $x$.</p>
</li>
<li>
<p>Show that that \(\binom{x}{3} \binom{3}{2} = \binom{x}{2} \binom{x-2}{1}\)
is a polynomial identity in $x$. Verify it by hand for $4$ choices of $x$.
Argue that this verification proves this identity holds in all rings
where you can divide by $2$.</p>
</li>
</ol>
<p>It’s wild to me that some finite verification like this is enough to prove
an identity (even a simple one like this) for all rings. If you want to see
more of this proof technique you should check out Petkovšek, Wilf, and
Zeilberger’s book <a href="https://www2.math.upenn.edu/~wilf/AeqB.pdf">A = B</a> (section 1.4, to start).</p>
<p>Does this technique work for polynomials with more than one variable?</p>
</div>
<hr />
<p>As one last aside, I’m really interested in figuring out when we can do this
with power series. Over a year ago now I asked <a href="https://math.stackexchange.com/q/3500045/655547">a question</a> about this,
though I accepted the answer too quickly
(and in hindsight it isn’t really the kind of answer I was looking for).</p>
<p>With the extra year to think about it, though, I think I have a better idea
how to make it work. I meant this to be a kind of prequel where we work in
the simpler setting of polynomials to get practice before we jump into proving
identities with power series.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:7" role="doc-endnote">
<p>This is the one place where it’s helpful to remember that
<a href="https://en.wikipedia.org/wiki/Leibniz_formula_for_determinants">horrible definiton</a> of the determinant in terms of a sum over
permutations of products of the entries. It’s obviously a polynomial
(albeit a gross one). <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>They still have $1$, though. We aren’t animals. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>If you like the model theoretic language, embeddings <em>reflect</em> truth. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>We really do use this kind of machinery ALL the time, though.
Whenver we use complex numbers to prove things about
$\mathbb{R}$, $\mathbb{N}$, etc. for instance.</p>
<p>More excitingly, this is part of the power of the
yoneda lemma – We embed any (small) category into a <a href="https://en.wikipedia.org/wiki/Topos">topos</a>
of presheaves. Then if we can prove some fact
(which doesn’t refer to any topos-y things) using this high powered
machinery, it reflects down to our original category!</p>
<p>This is also why model theorists care so much about
<a href="https://en.wikipedia.org/wiki/Elementary_equivalence#Elementary_embeddings">elementary embeddings</a>, which I’ve given a quick introduction
to <a href="/2020/10/01/elementary-vs-submodel">here</a>. The tl;dr is that embeddings <em>don’t</em> need to preserve
or reflect the truth of formulas involving quantifiers. Elementary
embeddings, on the other hand, do both. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>“Objects of Categories as Complex Numbers” by Fiore and Leinster. See
<a href="https://arxiv.org/pdf/math/0212377.pdf">here</a> for an arxiv link. It’s
<em>really</em> excellent, and readable too! <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>In fact, we do this by quotienting by the assumptions of our impliction,
just like we did with the $(1-ba)^{-1}$ example. So the relevant rig
for binary trees is $\mathbb{N}[T] \big / T^2 - T + 1$.</p>
<p>The issue is that for some equations
$\mathbb{Z}[x] \big / p=q$ (and also $\mathbb{N}[x] \big / p=q$) might
not <em>embed</em> into $\mathbb{C}$!</p>
<p>For instance, when we quotient by some non-monic identity, we get
something that <a href="https://math.stackexchange.com/q/2230921/655547">isn’t finitely generated</a> as a $\mathbb{Z}$-module.
In particular, the powers of some root we added won’t satisfy any integer
linear combinations.
This is a problem since in $\mathbb{C}$ the subfield generated by the
roots of some integer polynomial will be finite dimensional over
$\mathbb{Q}$, and thus powers of any root <em>will</em> satisfy some integer linear combination!</p>
<p>Since we no longer have an embedding, truth is no longer reflected!
The core of Fiore and Leinster’s paper is giving conditions where this
doesn’t happen. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p>Knapp also happens to spend some time talking about Permanence of Identities
(in ch. V.2), which is why I chose this book in particular to mention.
It turns out the exact example we’re about to work out is <em>also</em> in Artin’s
“Algebra” (ch. 12.3) alongside a discussion of Permanence of Identities!
So if you want to see some other perspectives on this topic, you can read
about it there too. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 05 May 2021 00:00:00 +0000
https://grossack.site/2021/05/05/initial-polynomial-proofs.html
https://grossack.site/2021/05/05/initial-polynomial-proofs.htmlTwo Sage Visuals<p>I’m in a reading group with Elliott Vest and <a href="https://sites.google.com/view/jacobgarcia/jacob-garcia">Jacob Garcia</a>
(supervised by <a href="https://sites.google.com/view/mgdurham/">Matt Durham</a>) where we’re talking about
CAT(0) Cube Complexes. We’re reading a set of lecture notes
by sageev (pdf <a href="http://www.math.utah.edu/pcmi12/lecture_notes/sageev.pdf">here</a>, for the interested) and we came across
a fairly simple problem that we wanted to draw. In a completely
different vein, <a href="https://github.com/russphelan">Russell Phelan</a> asked a fun topological question
in the UCR math discord, and to solve it I ended up needing to draw
something else. I figured I would write up a quick post about both
visualizations, since these things can be a bit tricky to get right.</p>
<hr />
<h2 id="hyperbolic-circles">Hyperbolic Circles</h2>
<p>Let’s start with the cube complexes. One of the exercises in Sageev’s
notes<sup id="fnref:sageev-ex" role="doc-noteref"><a href="#fn:sageev-ex" class="footnote" rel="footnote">1</a></sup> asks a question which uses circles in the hyperbolic
plane $H^2$. We had an intuitive idea why this should be true, but
to really visualize it, I asked a question (which in hindsight was silly):</p>
<p>We know that in the <a href="https://en.wikipedia.org/wiki/Poincar%C3%A9_disk_model">disk model</a> hyperbolic circles and euclidean
circles agree (albeit the centers might not be where they appear).
By this I mean a hyperbolic circle</p>
\[C(r,x_0) = \{x \mid d_\text{hyperbolic}(x_0,x) = r \}\]
<p><em>looks</em> like a euclidean circle when you draw it. I didn’t know of any
such fact for the <a href="https://en.wikipedia.org/wiki/Poincar%C3%A9_half-plane_model">upper half plane model</a>, though, and I asked if
anyonne knew what circles look like there. We didn’t, so I went on a
quest to just… draw a bunch of hyperbolic circles in <a href="https://www.sagemath.org/">sage</a>.</p>
<p>According to <a href="https://en.wikipedia.org/wiki/Poincar%C3%A9_half-plane_model#Distance_calculation">wikipedia</a>, the hyperbolic distance in the upper half plane
is given by</p>
\[d \big ( (x_1,y_1),(x_2,y_2) \big ) =
2 \text{arcsinh}
\left (
\frac{1}{2}
\sqrt{\frac{(x_2 - x_1)^2 + (y_2 - y_1)^2}{y_1 y_2}}
\right )\]
<p>So let’s go ahead and plot a circle in this metric! The relevant function
for this is <code class="language-plaintext highlighter-rouge">implicit_plot</code>, which does what it says on the tin.</p>
<div class="auto">
<script type="text/x-sage">
x,y = var('x,y')
d(x1,y1,x2,y2) = 2 * arcsinh(1/2 * sqrt(((x2-x1)^2 + (y2-y1)^2) / (y1*y2)))
# plot a circle of radius 1/2 centered at (0,1)
implicit_plot(d(x,y,0,1) - 1/2, (-5,5), (0,5))
</script>
</div>
<p>You can see this <em>looks</em> like a regular circle. Of course, we know that
distances should be distorted – just look at the notion of distance!
The distortion, it turns out, is in the apparent location of the center
and the apparent <em>size</em> of the circle.</p>
<p>To see exactly what I mean by this, let’s plot a bunch of circles
(with their centers marked) each of radius $1$.</p>
<div class="auto">
<script type="text/x-sage">
x,y = var('x,y')
d(x1,y1,x2,y2) = 2 * arcsinh(1/2 * sqrt(((x2-x1)^2 + (y2-y1)^2) / (y1*y2)))
colors = ["blue", "red", "green", "maroon", "olive", "pink", "silver", "navy"]
def draw_circle(x0,y0,r, c):
"""
Draw a circle with center (x0,y0), radius r, and color c
"""
# draw the circle
p1 = implicit_plot(d(x,y,x0,y0) - r, (-5,5), (0,5), color=c)
# draw the center
p2 = point((x0,y0), color=c)
return p1 + p2
out = Graphics()
for i in range(1,8):
# draw a sequence of circles, all of radius 1,
# but with centers moving closer to the x axis
# (which we think of as a line at infinity)
out += draw_circle(-3 + i, 1/i, 1, colors[i])
out.show()
</script>
</div>
<p>You can tell that the true center of the circle is not where one might
think. This is because distances near the $x$-axis are longer than they appear,
and so our center must be closer to points near the $x$-axis to compensate.</p>
<div class="boxed">
<p>As an aside, it might seem magical that this wild distance formula makes
circles look like euclidean circles. But like a good magic trick, there’s
a shockingly simple explanation under the surface.</p>
<p>If we take for granted that circles in the disk model are euclidean circles,
can you show that this must be true for the upper half plane model as well?</p>
<p>As a (possibly too helpful) hint, you might consider <a href="https://en.wikipedia.org/wiki/M%C3%B6bius_transformation">mobius transformations</a>.</p>
</div>
<p>To really have fun experimenting, let’s make the above graphic interactive,
and let’s throw in an interactive disk model graphic as well, just for fun.</p>
<p>Be a bit careful with the disk model – because I’m using the same sliders
as the upper half plane model, you need to make sure that your center
stays in the unit circle.</p>
<div class="sage">
<script type="text/x-sage">
x,y = var('x,y')
dUHP(x1,y1,x2,y2) = 2 * arcsinh(1/2 * sqrt(((x2-x1)^2 + (y2-y1)^2) / (y1*y2)))
dPD(x1,y1,x2,y2) = arccosh(1 + (2 * ((x2-x1)^2 + (y2-y1)^2))/((1 - (x1^2 + y1^2))*(1 - (x2^2 + y2^2))))
@interact
def _(model=selector(['upper half plane', 'poincare disk'], buttons=True),
x0=slider(-5,5,step_size=0.1, default=0),
y0=slider(0.1,5,step_size=0.1, default=1/2),
r=slider(0,3,step_size=0.1, default=1)):
if model == "upper half plane":
show("Upper Half Plane Circles")
# draw the circle
p1 = implicit_plot(dUHP(x,y,x0,y0) - r, (-5,5), (0,5))
# draw the center
p2 = point((x0,y0))
show(p1+p2)
else:
show("Poincare Disk Circles")
# draw the boundary circle of the poincare disk
p1 = implicit_plot(x^2 + y^2 - 1, (-1.5,1.5), (-1.5,1.5), color="black")
# draw the circle
p2 = implicit_plot(dPD(x,y,x0,y0) - r, (-1.5,1.5), (-1.5,1.5))
# draw the center
p3 = point((x0,y0))
show(p1+p2+p3)
</script>
</div>
<hr />
<h2 id="a-topological-problem">A Topological Problem</h2>
<p><br /></p>
<p><img src="/assets/images/two-sage-visuals/completely-different.gif" /></p>
<p>In the second half of this post we’ll go over a fun problem that Russell
put in the UCR discord. It’s Question 1 from Example 3.1.10 in Burago, Burago,
and Ivanov’s <a href="https://bookstore.ams.org/gsm-33">A Course in Metric Geometry</a>:</p>
<div class="boxed">
<p>What topological space do you get when you quotient $\mathbb{R}^2$ by
$(x,y) \sim (-y,2x)$?</p>
</div>
<p>I encourage you to give this a go by yourself before reading ahead! It took
me a few days to be really confident in my answer, and it was a
<em>lot</em> of fun to work through ^_^.</p>
<p>After a bit of looking for low hanging fruit (which, as far as I can tell,
wasn’t there), I decided to just hunker down and look for a fundamental domain.
It’s easy to see that this quotient space is the orbit space of the action
of $\mathbb{Z}$ on $\mathbb{R}^2$ where the generator acts by
\(T = \begin{pmatrix} 0 & -1 \\ 2 & 0 \end{pmatrix}\).</p>
<p>A little bit of experimentation lets you find a fundamental domain.
We know that we scale things by a factor of $2$ each time we apply $T$,
so that clues us in to look around the anulus between $1/2$ and $1$.</p>
<div class="auto">
<script type="text/x-sage">
x,y = var('x,y')
xr = (x,-2,2)
yr = (y,-2,2)
out = Graphics()
# the first anulus
region = [1/4 < x^2 + y^2, x^2 + y^2 < 1, x > 0, y > 0]
out += region_plot(region, xr, yr, incol="purple")
# the second anulus
region = [1/16 < x^2 + y^2, x^2 + y^2 < 1/4, x > 0, y > 0]
out += region_plot(region, xr, yr, incol="cyan")
out.show()
</script>
</div>
<p>We can convince ourselves that this really is a fundamental domain by
plotting the orbit of all of these regions and checking that they cover the
whole plane
(barring the origin, of course. Do you see why we have to treat it specially?).</p>
<div class="auto">
<script type="text/x-sage">
x,y = var('x,y')
N = 5
T = matrix([[0,-1],[2,0]])
xr = (x,-2,2)
yr = (y,-2,2)
def drawRegion(n):
[v1,v2] = T^n * matrix([x,y]).transpose()
# janky hack mate
# I have no idea why we need to do this
v1 = eval(str(v1))
v2 = eval(str(v2))
out = Graphics()
# the first anulus
region = [1/4 < v1^2 + v2^2, v1^2 + v2^2 < 1, v2 > 0, v1 > 0]
out += region_plot(region, xr, yr, incol="purple")
# the second anulus
region = [1/16 < v1^2 + v2^2, v1^2 + v2^2 < 1/4, v2 > 0, v1 > 0]
out += region_plot(region, xr, yr, incol="cyan")
return out
out = Graphics()
for n in range(-N,N+1):
out += drawRegion(n)
out.show()
</script>
</div>
<p>From this information we can piece together the quotient space!
The pretty picture tells us that we have a (topological) hexagon
from the two cyan sides, the two purple sides, the purple top and the
cyan bottom.</p>
<div class="boxed">
<p>As a quick exercise, write down the hexagon above and figure out
which sides are identified. Why is this a torus?</p>
</div>
<p>So we understand \(\mathbb{R}^2 \setminus \{ 0 \} \bigg / \langle T \rangle\),
and the day of reckoning is upon us. We need to handle the origin.
It’s easy to see that our equivalence relation is not closed in
$\mathbb{R}^2 \times \mathbb{R}^2$, so our quotient space <em>cannot</em> be
hausdorff.
Notice also that any neighborhood of the origin contains a tail of fundamental
domains. That tells us that every neighborhood in the quotient should do the same.</p>
<p>So our picture is of a torus, plus one <em>really big</em> “generic” point.
Any neighborhood of this point contains the entire torus.</p>
<p>What a bizarre space, right? I had a <em>lot</em> of fun working this out!
Before I typed this up I was feeling a bit insecure about the solution
(I also had a much grosser fundamental domain at first), and asked about it
on <a href="https://math.stackexchange.com/q/4117907/655547">mse</a>. It doesn’t have an answer yet, but I’m feeling more confident
in my computation now, so I’m posting this anyways. I can always edit this post
if someone leaves an answer that totally changes how I think about the problem,
and you can always follow that link if you (in the future) want to see what
people had to say.</p>
<p>Finally, if you want to think about some similar things, I have a fun problem
for you ^_^</p>
<div class="boxed">
<p>Let’s look at the action of $T^2$ and $T^4$ on $\mathbb{R}^2$ instead.
These matrices are much easier to understand (they’re diagonal, for instance).</p>
<p>What are the quotient spaces of these actions? They generate subgroups of
$\langle T \rangle$, so (at least morally) we expect them to correspond to
covering spaces of $\mathbb{R}^2 \big / \langle T \rangle$. Do they?
If so, what does the covering action look like? If not, what goes wrong?</p>
</div>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:sageev-ex" role="doc-endnote">
<p>Exercise 2.15, for the curious <a href="#fnref:sageev-ex" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Tue, 27 Apr 2021 00:00:00 +0000
https://grossack.site/2021/04/27/two-sage-visuals.html
https://grossack.site/2021/04/27/two-sage-visuals.htmlRemembering the Reverse Triangle Inequality<p>The quarter is over, and now that I’m vaccinated (twice!) I feel comfortable
seeing people again. So I flew east coast to see my family and a bunch of friends.
Before I left, I had a few ideas for blog posts, and figured I would get around
to writing one now.</p>
<p>I’ve made it known that I struggle with analysis<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, and one manifestation of
this is a complete inability to remember elementary facts about inequalities.
It took me a long time to feel comfortable with things as basic as
“which way does the triangle inequality go?”, and until fairly recently things
like Cauchy-Schwarz were almost entirely beyond me. Over the past year or two,
I’ve been trying to answer lots of analysis questions on mse, as well as read
lots of books on analysis and solve lots of problems, and (thankfully) some of
it is starting to stick. But one inequality that I <em>always</em> seem to forget is
the <a href="https://en.wikipedia.org/wiki/Triangle_inequality#Reverse_triangle_inequality">reverse triangle inequality</a>:</p>
<div class="boxed">
<p>\(\Bigg | |x| - |y| \Bigg | \leq |x-y|\)</p>
</div>
<p>I don’t know many ways for showing a lower bound on absolute values,
but almost every time I need one, I go through the following process:</p>
<ol>
<li>“Doesn’t the reverse triangle inequality give a lower bound?”</li>
<li>“I wonder if I should use that. Let me google it!”</li>
<li>“Oh right, <em>that’s</em> what it says. How do I always forget this?”</li>
<li>“This is actually not as useful as I would have liked. Oh well.”</li>
</ol>
<p>The most recent time I went through this, something on the wikipedia page
really clicked with me, and I’m not sure why it never clicked before.
The geometric intuition<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> never really stays in my head, but for some reason
this did:</p>
<div class="boxed">
<p>The reverse triangle inequality says that the norm $\lVert \cdot \rVert$
of some vector space $X$ is 1-<a href="https://en.wikipedia.org/wiki/Lipschitz_continuity">lipschitz</a> as a function from $X \to \mathbb{R}$:</p>
\[\Bigg | \lVert x \rVert - \lVert y \rVert \Bigg | \leq \lVert x-y \rVert\]
<p>Or, even more suggestively:</p>
<p>\(d_\mathbb{R}(\lVert x \rVert, \lVert y \rVert) \leq d_X(x,y)\)</p>
</div>
<p>I’m trying to see why this is more memorable for me, and moreover why it’s
<em>suddenly</em> more memorable. Because I know that I’ve seen this before.</p>
<p>I think part of it is the visual and semantic distinction that we get by
writing $\lVert \cdot \rVert$ instead of $|\cdot|$. When everything in sight
was a real number, there were too many combinations of what we should and
shouldn’t be absolute-value-ing. As with many things in math and computer
science, taking some time to recognize the <a href="https://en.wikipedia.org/wiki/Type_system">types</a> involved in an equation
or proof, and then making sure to distinguish these types inyour mind,
helps a lot for keeping the structures straight.</p>
<p>I think another reason this is memorable is because the notion of lipschitz
maps has become something I feel familiar with. When I was taking my first
undergraduate analysis class, I really didn’t know why we should care about
the various strengthenings of continuity. Over time I’ve learned to better
appreciate their differences, and I feel like lipschitz-ness is one of the
regularity conditions that I understand best<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. It also makes intuitive
sense that taking the norm of a vector should be a very regular operation.</p>
<p>Anyways, this isn’t so much a “trick” as a “mnemonic”, but I wanted to say
something about it anyways because I think it would have helped younger me.
At the very least, it was nice to write up a really short post with a
somewhat obvious observation. To make it somewhat more worth your time,
here’s a picture of my old cat Oreo. I got to visit her while I was visiting
<a href="https://remydavison.com">Remy</a> in New York!</p>
<p><img src="/assets/images/quick-analysis-trick-5/oreo.jpg" alt="My daughter, a gremlin" style="width: 400px; height: auto; display: block; margin-left: auto; margin-right: auto" /></p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Though I’ve done fairly well the past two quarters, which has been a
real confidence boost… It’s feeling better, but I still don’t feel
like I understand things as well as I should, and while it’s coming
faster, I still don’t feel like it’s coming naturally…
Maybe it’s imposter syndrome? Who’s to say ¯_(ツ)_/¯ <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>The difference between two legs of a triangle must be less than than the
length of the third leg. Otherwise, by adding the length of the shorter
leg to both sides you would violate the triangle inequality. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Not that that’s saying much. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Sun, 21 Mar 2021 00:00:00 +0000
https://grossack.site/2021/03/21/quick-analysis-trick-5.html
https://grossack.site/2021/03/21/quick-analysis-trick-5.htmlChecking Concavity with Sage<p>I haven’t been on MSE lately, because I’ve been fairly burned out from a
few weeks of more work than I’d have liked. I’m actually still catching up,
with a commutative algebra assignment that should have been done last week.
I was (very kindly) given an extension, and I’ll be finishing it soon, though.</p>
<p>I meant to do it today, but I got my second covid vaccination earlier and it
really took a lot out of me. I’m feverish and have a pretty bad migraine, so
I didn’t want to work on “real things”, but I still wanted to feel productive…
MSE it is.</p>
<p>Today someone asked <a href="https://math.stackexchange.com/q/4055724/655547">a question</a>, which again I’ll paraphrase here:</p>
<div class="boxed">
<p>Why is \(\left ( 1+\frac{1}{x} \right )^x\) <a href="https://en.wikipedia.org/wiki/Concave_function">concave</a> (on $x > 0$)?</p>
</div>
<p>It clearly <em>is</em> concave. Here’s a picture of it:</p>
<p><img src="/assets/images/sage-concave/desmos.png" /></p>
<p>Obviously it has an asymptote at $e$, and should always be $\lt e$, so
it really should be concave… showing that is a bit of a hassle, though.</p>
<p>Thankfully, we can use <a href="https://sagemath.org">sage</a> to automate away most of the difficulties. I’ll
more or less be rehashing my answer here. The idea is to put this example of
using sage “in the wild” somewhere a bit easier to find than a stray mse post.</p>
<p>Showing that a function is convex (resp. concave) is routine but tedious,
especially when that function is twice differentiable. Then we can just check
$\frac{d^2f}{dx^2} \geq 0$ (resp. $\leq 0$) on the region of interest.
The issue here, of course, is that
$\frac{d^2}{dx^2} \left ( 1 + \frac{1}{x} \right )^x$ is… unpleasant.
Thankfully, sage doesn’t care in the least! Let’s see if we can bash out
the second derivative and show it’s $\leq 0$ (whenever $x > 0$, of course).</p>
<p>We start by defining $f$ and its second derivative</p>
<div class="linked_auto">
<script type="text/x-sage">
f(x) = (1+1/x)^x
secondDerivative = diff(f,x,2)
show(secondDerivative)
</script>
</div>
<p>In a perfect world, we could just… ask sage if this is $\leq 0$.
Unfortunately, the expression is a bit too complicated, and we don’t get
a clean answer<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>
<div class="linked_auto">
<script type="text/x-sage">
solve(secondDerivative < 0, x)
</script>
</div>
<p>This gives a list of lists of domains, and if you intersect all the domains
in some fixed list, you get a region where the second derivative is $\lt 0$.
Of course, these domains are far too complicated to be useful. We’ll need to
try something else.</p>
<p>Let’s look at the second derivative again.
We as humans can see how to clean it up a little, so let’s do that first:</p>
\[\left ( 1 + \frac{1}{x} \right )^x
\left [
\left ( \frac{1}{1+x} - \log \left ( 1 + \frac{1}{x} \right ) \right )^2 -
\frac{1}{x^3 + 2x^2 + x}
\right ]\]
<p>Since $(1 + 1/x)^x$ is always positive when $x$ is, the sign of this expression
is controlled by the second factor. We might try to ask sage about the second
factor again, but you can check that it’s still too complicated for sage to
handle it automatically. We’ll need to simplify the expression if we want
to proceed.</p>
<p>One obvious way we might try to simplify things is by turning our expression
into a rational function. After all, polynomials are more combinatorial
in nature than things like $\log$, and so sage is better equipped to handle
them. Your first instinct might be to kill the $\log$s with taylor series,
since $x - \frac{x^2}{2} \leq \log(1+x)$. This will work, but we can be a bit
more efficient. It’s <a href="https://math.stackexchange.com/q/324345/655547">well known</a> that</p>
\[\frac{x}{1+x} \leq \log(1+x) \leq x\]
<p>So plugging in $1/x$ and negating, we see</p>
\[-\log(1+1/x) \leq - \frac{1}{1+x}\]
<p>But that means our expression of interest is upper bounded by</p>
\[\left ( \frac{1}{1+x} - \frac{1}{1+x} \right )^2 - \frac{1}{x^3 + 2x^2 + x}\]
<p>and we’ve reduced the problem to showing</p>
\[- \frac{1}{x^3 + 2x^2 + x} \lt 0 \quad \quad (\text{when } x \gt 0)\]
<p>and in the interest of offloading as much thinking as possible to sage,
we see</p>
<div class="linked_auto">
<script type="text/x-sage">
assume(x > 0)
bool(-1/(x^3 + 2*x^2 + x) < 0)
</script>
</div>
<p>and so $f$ is, in fact, concave.</p>
<hr />
<p>This was fairly painless, but we got pretty lucky with that estimate for
$\log$. I’m curious if there’s a way to completely automate this process,
and to remove all need for creativity. If anyone knows a simpler way to do
this, where we can just directly ask if the second derivative is negative,
I would love to hear about it!</p>
<p>We’re at least a little bit out of luck, since <a href="https://en.wikipedia.org/wiki/Richardson%27s_theorem">Richardson’s Theorem</a>
shows that it’s undecidable whether certain (very nice!) functions are
nonnegative. As an easy exercise, can you turn this into a proof that
checking convexity is undecidable on some similarly nice class of functions?</p>
<p>Even though logicians came to ruin the fun
(as we have a reputation for doing, unfortunately…),
I’m curious if any kind of result is possible. Maybe there’s some
hacky solution that works fairly well in practice?
Approximating every nonpolynomial by the first, say, 50 terms of its
taylor series comes to mind, but I’m currently struggling to get sage
to expand and simplify expressions in way that makes me happy, so
manipulating huge expressions like that is, at least for now, a bit beyond me<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>
<p>Again, all ideas are welcome ^_^</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Another thing I tried was to get sage to do <em>everything</em> for us. But for
some reason <code class="language-plaintext highlighter-rouge">bool(secondDerivative < 0)</code> returns false, even when we
<code class="language-plaintext highlighter-rouge">assume(x > 0)</code>… I suspect this is (again) because our expression is too
complicated. After all, it seems like there are <a href="https://ask.sagemath.org/question/42825/assumptions-and-inequalities/">issues</a> with <em>much</em>
simpler expressions than this one. If anyone knows how to make this kind
of direct query work, I would love to hear about it! <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Wow. I know I speak (and write) in run-on sentences, but this one’s on
another level. I feel like I need a swear jar but for commas. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Tue, 09 Mar 2021 00:00:00 +0000
https://grossack.site/2021/03/09/sage-concave.html
https://grossack.site/2021/03/09/sage-concave.htmlTalk - Problem Solving Without Ansibles -- An Introduction to Communication Complexity<p>Wow, two talk posts in one day! Thankfully the actual talks were a week apart!</p>
<p>Earlier today I gave <em>another</em> talk in the grad student seminar here at UCR.
I wanted to break out of my logician mold a <em>little</em> bit, and so I decided to
talk about a result which absolutely blew my mind when I first saw it:
the (public coin) randomized communication complexity of equality is $O(1)$
for any fixed error tolerance $\epsilon$!</p>
<p>Since communication complexity is all about measuring the number of messages
you send, I thought a fun framing device would be to imagine Alyss and Bob on
separate planets. If they’re very far from each other, but they each have the
computational power of an entire planet at their disposal, then it makes sense
to measure communication as the limiting factor in their computation. This was
in part inspired by <a href="http://www.cs.cmu.edu/~odonnell/">Ryan O’Donnell</a>’s excellent videogame themed talk
(<a href="https://www.youtube.com/watch?v=4B0jwIu9fPs">here</a>), and indeed I tried to make my slides in google slides instead of
beamer as an homage to him.</p>
<p>It definitely had advantages and disadvantages, but I liked a lot of the
flexibility it offers. I think he does his in powerpoint, which might solve
some of my bigger gripes. Notably drawing on the slides directly was impossible
(because any time you release your pen, the scribble tool closes itself…
that really needs a rethink on google’s end), so I had to do all the handwriting
in gimp, then insert the writing as an image in google slides. This was annoying
at first, but I gradually got into the flow of it. The most damning problem was
how annoying it is to write mathematical symbols. Every single $\epsilon$ gave
me a headache, and my entire browser lagged anytime the symbols menu was open.
I know there are some add-ons which might make this easier, but nothing can
beat a raw latex engine. In a more technical talk, I don’t think I could have
possibly made the slides in this way.</p>
<p>All in all, I was really pleased with how the talk went, though! I think it’s
an interesting enough topic to stand on its own, and it was fun getting to
evangleize computer science to a crowd of mathematicians. CMU’s math department
obviously worked quite closely with its CS department, and I forget sometimes
that that isn’t the norm. I knew going into the talk that I wanted to spend
some time talking about an interpretation of this using error-correcting codes
(<a href="https://en.wikipedia.org/wiki/Hamming_code">Hamming Codes</a> in particular), but I ended up scrapping it and not writing
the slides for it. In hindsight, I should have just made the slides, because
as soon as someone asked a question that even <em>hinted</em> at this idea, I pounced
on it and went on a mild tangent. I suspect I lost a lot of people during that,
and it would have been a lot easier to retain them if I’d just organized the
big ideas into slides. Oh well, I’m not going to fault past my too much for
their laziness.</p>
<p>All in all, I really enjoyed giving this talk, and it seemed like the
audience really enjoyed watching it. This was almost certainly due to the
influence of Ryan’s lecturing style, and anyone familiar with his (excellent)
“Theorist’s Toolkit” lectures (which I reference in the talk, and which you
can find <a href="https://www.youtube.com/playlist?list=PLm3J0oaFux3ZYpFLwwrlv_EHH9wtH6pnX">here</a>) will recognize his impact.</p>
<p>With that out of the way, here are the things:</p>
<hr />
<p>Problem Solving Without Ansibles: An Introduction to Communication Complexity</p>
<p>In the world of Science Fiction, an “ansible” is a device that allows for
faster-than-light communication. Without ansibles, interstellar travel
puts an interesting constraint on computation. If two planets want to
collaborate on solving a problem, the obstruction will likely not be the
computation that either planet does individually. Instead, what matters
is the <em>Communication Complexity</em> which tracks the amount of messages
the planets have to send each other to solve their problem. In this talk
we will solve a prototypical problem in communication complexity. But be
warned: the answer may surprise you!</p>
<p>You can find the slides <a href="/assets/docs/problem-solving-without-ansibles/handout.pdf">here</a></p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/ImCFucEag3I" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
Fri, 05 Mar 2021 00:00:00 +0000
https://grossack.site/2021/03/05/problem-solving-without-ansibles.html
https://grossack.site/2021/03/05/problem-solving-without-ansibles.html