https://grossack.site
How many symbols can $f'(x)$ have if $f$ has $n$ symbols?<p>The other day <a href="https://www.smbc-comics.com/">SMBC</a> put up a lovely comic which did a great job
<a href="https://xkcd.com/356/">nerdsniping</a> me. I knew that I wanted to try to solve it as soon
as I saw it, but I didn’t have the time for a little while
(it’s midterm season and I had grading to do). It’s a cute problem, and
I want to share my solution with all of you! First, here’s the comic
that started it all:</p>
<p style="text-align:center;">
<a href="https://www.smbc-comics.com/comic/derivative">
<img src="/assets/images/diff-growth/smbc-derivative.png" width="50%" />
</a>
</p>
<p>Now, my old advisor (Klaus Sutner) used to say that whenever you’re faced with a
problem, you can hack or you can think, but you can’t do both. <del>Today</del>
Multiple weeks ago I was in more
of a hacking mood, so I wrote up some haskell code to just <em>try</em> all the
“reasonable” functions I could think of. By this, of course, I mean the
<a href="https://en.wikipedia.org/wiki/Elementary_function">elementary functions</a>.</p>
<p>There’s an obvious recursive way to build up the elementary functions
(which you should think of as those functions which might show up in a calculus class):</p>
<ul>
<li>$f(x) = x$ should probably be a function, as should the constants</li>
<li>If $f(x)$ has previously been defined, $\sin(f(x))$, etc. should be functions</li>
<li>If $f(x)$ and $g(x)$ have previously been defined, $f + g$, etc. should be functions</li>
</ul>
<p>We can formalize this with a datatype<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
<div class="no_eval">
<script type="text/x-sage">
data Expr = Const Rational
| X
| Square Expr
| Sqrt Expr
| Sin Expr
| Cos Expr
| Tan Expr
| ASin Expr
| ACos Expr
| ATan Expr
| Sinh Expr
| Cosh Expr
| Tanh Expr
| ASinh Expr
| ACosh Expr
| ATanh Expr
| Exp Expr
| Log Expr
| Add Expr Expr
| Sub Expr Expr
| Mult Expr Expr
| Div Expr Expr
| Pow Expr Expr
deriving (Show, Eq)
</script>
</div>
<p>Obviously this list, while exhausting the elementary functions, is
still somewhat arbitrary.
For instance, $\sec$ is nowhere on this list, but we can build it using
the functions that <em>are</em> on this list<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">2</a></sup>. Conversely, we added a builtin
function for $\tan$, even though we can express it using $\sin$ and $\cos$.
The decision to add squaring and square roots as primitive, while relegating
cubes and cube roots, etc. to a definition using <code class="language-plaintext highlighter-rouge">Const</code> and <code class="language-plaintext highlighter-rouge">Pow</code> was
similarly arbitrary.</p>
<p>I went for this list basically because it’s what the <a href="https://en.wikipedia.org/wiki/Elementary_function">wikipedia article</a>
names explicitly. Later on we’ll show that our solution doesn’t
depend on the exact list chosen, so we don’t need to worry about this.</p>
<p>Next up, we need to tell haskell how to compute the derivative of a function.
Thankfully, derivatives can be computed recursively, so this is quite easy
to code up:</p>
<div class="no_eval">
<script type="text/x-sage">
diff :: Expr -> Expr
diff (Const n) = Const 0
diff X = Const 1
diff (Square e) = Mult (Mult (Const 2) e) (diff e)
diff (Sqrt e) = Div (diff e) (Mult (Const 2) (Sqrt e))
diff (Sin e) = Mult (Cos e) (diff e)
diff (Cos e) = Mult (Const (-1)) (Mult (Sin e) (diff e))
diff (Tan e) = Mult (Add (Const 1) (Square (Tan e))) (diff e)
diff (ASin e) = Div (diff e) (Sqrt (Sub (Const 1) (Square e)))
diff (ACos e) = Div (Mult (Const (-1)) (diff e)) (Sqrt (Sub (Const 1) (Square e)))
diff (ATan e) = Div (diff e) (Add (Const 1) (Square e))
diff (Sinh e) = Mult (Cosh e) (diff e)
diff (Cosh e) = Mult (Sinh e) (diff e)
diff (Tanh e) = Mult (Sub (Const 1) (Square (Tanh e))) (diff e)
diff (ASinh e) = Div (diff e) (Sqrt (Add (Const 1) (Square e)))
diff (ACosh e) = Div (diff e) (Sqrt (Sub (Square e) (Const 1)))
diff (ATanh e) = Mult (Square (Cosh e)) (diff e)
diff (Exp e) = Mult (Exp e) (diff e)
diff (Log e) = Div (diff e) e
diff (Add e1 e2) = Add (diff e1) (diff e2)
diff (Sub e1 e2) = Sub (diff e1) (diff e2)
diff (Mult e1 e2) = Add (Mult (diff e1) e2) (Mult e1 (diff e2))
diff (Div e1 e2) = Div (Sub (Mult e2 (diff e1)) (Mult e1 (diff e2))) (Square e2)
diff (Pow e1 e2) = Mult (Pow e1 e2) (Add (Mult (Log e1) (diff e2)) (Div (Mult e2 (diff e1)) (e1)))
</script>
</div>
<p>This isn’t perfect. For instance, it doesn’t simplify multiplication by $1$, etc.
But I wanted a quick and dirty approximation, and importantly, I didn’t want to
spend too long on this project<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">3</a></sup>.</p>
<div class="boxed">
<p>As a (fun?) exercise, write a <code class="language-plaintext highlighter-rouge">prune</code> function which makes some easy
simplifications after differentiating. Does that change which functions
grow the most in complexity?</p>
</div>
<p>Next, we need a way to figure out how many symbols are in a given expression.
This is also easy to implement:</p>
<div class="no_eval">
<script type="text/x-sage">
size :: Expr -> Int
size (Const n) = 1
size X = 1
size (Square e) = 1 + size e
size (Sqrt e) = 1 + size e
size (Sin e) = 1 + size e
size (Cos e) = 1 + size e
size (Tan e) = 1 + size e
size (ASin e) = 1 + size e
size (ACos e) = 1 + size e
size (ATan e) = 1 + size e
size (Sinh e) = 1 + size e
size (Cosh e) = 1 + size e
size (Tanh e) = 1 + size e
size (ASinh e) = 1 + size e
size (ACosh e) = 1 + size e
size (ATanh e) = 1 + size e
size (Exp e) = 1 + size e
size (Log e) = 1 + size e
size (Add e1 e2) = 1 + size e1 + size e2
size (Sub e1 e2) = 1 + size e1 + size e2
size (Mult e1 e2) = 1 + size e1 + size e2
size (Div e1 e2) = 1 + size e1 + size e2
size (Pow e1 e2) = 1 + size e1 + size e2
</script>
</div>
<p>Lastly, we need a way to build up every expression with $n$ symbols. This way
we can differentiate them all, and see which gives us the largest output!</p>
<div class="no_eval">
<script type="text/x-sage">
build :: Int -> [Expr]
build 1 = [X]
build n =
[comb e | comb <- unary, e <- (build (n-1))] ++
[comb e1 e2 | comb <- binary, e1 <- (build (n-1)), e2 <- build((n-1))] ++
(build (n-1))
where
unary = [Square, Sqrt,
Sin, Cos, Tan, ASin, ACos, ATan,
Sinh, Cosh, Tanh, ASinh, ACosh, ATanh, Exp, Log]
binary = [Add, Sub, Mult, Div, Pow]
-- compute the largest size of diff e as e ranges over exprs of size n
b :: Int -> [Expr] -> (Int, Expr)
b n = maximumBy cmp . fmap (\e -> (size (diff e), e)) . filter (\e -> size e == n)
where
cmp (s1,_) (s2,_) = compare s1 s2
</script>
</div>
<hr />
<p>Now, my laptop can fully exhaust every function with $\leq 4$ symbols,
and we see that our best bets are</p>
<ul>
<li>$x$, whose derivative has $1$ symbol</li>
<li>$\arccos(x)$, whose derivative has $9$ symbols</li>
<li>$\arccos(\arccos(x))$, whose derivative has $18$ symbols</li>
<li>$\arccos(\arccos(\arccos(x)))$, whose derivative has $28$ symbols</li>
</ul>
<p>(note that the innermost $x$ counts as a symbol).</p>
<p>Moreover, it’s pretty easy to see that we’ll never use a unary function
other than $\arccos$. Indeed, if we write $\lvert f \rvert$ for <code class="language-plaintext highlighter-rouge">size f</code>,
it’s easy to see that</p>
\[\lvert \arccos(f)' \rvert = \lvert f \rvert + \lvert f' \rvert + 7.\]
<p>More generally, $\lvert \text{blah}(f)’ \rvert = \lvert f \rvert + \lvert f’ \rvert + k$
where $k$ is the number of symbols in the derivative of $\text{blah}(x)$<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">4</a></sup>.
Since this is biggest for $\arccos$, and we’re trying to maximize the size of
$f’$, there’s no reason to use a unary constructor other than $\arccos$.</p>
<p>This is fairly good evidence that repeatedly composing $\arccos(x)$ with
itself is the winner, and even though we can’t test <em>every</em> function with
$\geq 5$ symbols, we can test a lot of them, and after letting the code run
for just over $24$ hours, iterating $\arccos$ was still the winner.</p>
<p>So, in light of our computational evidence, we might <em>conjecture</em> that
$\lvert f’ \rvert$ is maximized (among functions with $\lvert f \rvert = n$)
for $f = \arccos(\arccos(\ldots(x)\ldots))$.</p>
<p>At this point, it’s time to stop hacking, and start thinking! Let’s try to
<em>prove</em> that this is the best option. Notice we can easily compute
$\lvert \arccos(\arccos(\ldots(x)\ldots))’ \rvert = \frac{n^2}{2} + \frac{13n}{2} - 6$
(either by solving some recurrence, or by checking <a href="https://oeis.org/search?q=1%2C9%2C18%2C28%2C39%2C51%2C64&language=english&go=Search">oeis</a>), so we should
have some simple proof by induction ahead of us.</p>
<hr />
<div class="boxed">
<p>If $f$ has $n$ symbols in its definition, then $f’$ has at most
$\frac{n^2}{2} + \frac{13n}{2} - 6$ symbols, and moreover this maximum is
attained for $f = \arccos(\arccos(\ldots(x)\ldots))$.</p>
</div>
<p>$\ulcorner$</p>
<p>We’ll induct on $\lvert f \rvert$.</p>
<p>If $\lvert f \rvert = 1,2$ then we’ve already seen that $\lvert f’ \rvert$
satisfied the desired inequality.</p>
<p>If $\lvert f \rvert \geq 3$, then we case on the outermost constructor.</p>
<p>If it’s unary, say $f = g(h)$, where $\lvert h \rvert = n-1$, then we compute</p>
\[\lvert f' \rvert = \lvert g'(h) \cdot h' \rvert = \lvert h' \rvert + (n-1) + k\]
<p>where $k$ is a constant depending on $g$, which is maximized as $k = 7$ when
$g = \arccos(-)$. Then</p>
\[\lvert f' \rvert
\leq \lvert h' \rvert + n + 6
\overset{IH}{\leq} \frac{(n-1)^2}{2} + \frac{13(n-1)}{2} - 6 + n + 6
= \frac{n^2}{2} + \frac{13n}{2} - 6.\]
<p>If instead the outermost constructor of $f$ is binary, then we have</p>
<ul>
<li>$f = g + h$</li>
<li>$f = g - h$</li>
<li>$f = g \cdot h$</li>
<li>$f = g \div h$</li>
<li>$f = g^h$</li>
</ul>
<p>where $\lvert f \rvert = n = \lvert g \rvert + \lvert h \rvert + 1$.</p>
<p>In each of these cases we compute $\lvert f’ \rvert$, and find</p>
<ul>
<li>$\lvert (g+h)’ \rvert = \lvert g’ \rvert + \lvert h’ \rvert + 1$</li>
<li>$\lvert (g-h)’ \rvert = \lvert g’ \rvert + \lvert h’ \rvert + 1$</li>
<li>$\lvert (g \cdot h)’ \rvert = \lvert g \rvert + \lvert h \rvert + \lvert g’ \rvert + \lvert h’ \rvert + 3$</li>
<li>$\lvert (g \div h)’ \rvert = \lvert g \rvert + 2 \lvert h \rvert + \lvert g’ \rvert + \lvert h’ \rvert + 5$</li>
<li>$\lvert (g^h)’ \rvert = 3 \lvert g \rvert + 2 \lvert h \rvert + \lvert g’ \rvert + \lvert h’ \rvert + 7$</li>
</ul>
<p>Clearly these are maximized for $g^h$, so let’s put $\lvert g \rvert = k$
and $\lvert h \rvert = n-1-k$. Then we see</p>
\[\begin{align}
\lvert (g^h)' \rvert
&=
3 \lvert g \rvert + 2 \lvert h \rvert + \lvert g' \rvert + \lvert h' \rvert + 7 \\
&\overset{IH}{\leq}
3k + 2(n-1-k) + \frac{k^2}{2} + \frac{13k}{2} - 6 +
\frac{(n-1-k)^2}{2} + \frac{13(n-1-k)}{2} - 6 + 7 \\
&=
\frac{n^2}{2} + \frac{(15-2k)n}{2} + k^2 + 2k - 13
\end{align}\]
<p>So we want this to be $\leq \frac{n^2}{2} + \frac{13n}{2} - 6$ for
every choice of $1 \leq k \leq n-2$.</p>
<p>Aaaaaand…. ruh roh!</p>
<p>You can see by <a href="https://www.desmos.com/calculator/0eyfyqovj2">this</a> desmos graph that this fails in general.
Indeed, the earliest failure happens when $n=8$ and $k=6$. Of course,
this is <em>outside</em> of the $n \leq 4$ range that I was able to exhaustively test,
and even the $n \leq 6$ range that I had tested a lot of<sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote" rel="footnote">5</a></sup>.</p>
<p><span style="float:right">$\lrcorner$</span></p>
<hr />
<p>This is a perfect example of Klaus’s “Magic Spiral”, which he shows in the
first CDM lecture every year.</p>
<p style="text-align:center;">
<img src="/assets/images/diff-growth/magic-spiral.png" width="50%" />
</p>
<p>In this particular situation, there wasn’t a ton to visualize, so we jumped
straight from “compute/experiment” to “conjecture”. Indeed, our computations
seemed to suggest that iterating $\arccos$ was the right approach, but when
we tried to prove it we failed.</p>
<p>This is ok, though! Good, even, because our failure is instructive!
We know where our proof failed, and this tells us where we should focus our
computational effort on our next trip around the spiral.</p>
<p>Indeed, knowing that
we want $k = \lvert g \rvert = 6$ and $n = 8$ says we should try something like</p>
\[f = g^h = \arccos(\arccos(\arccos(\arccos(\arccos(x)))))^x\]
<p>and indeed, haskell tells us that $\lvert f’ \rvert = 79$, which is
bigger than the $78$ we would get by iterating $\arccos$.</p>
<hr />
<p>Now with <code class="language-plaintext highlighter-rouge">Pow</code> <em>and</em> <code class="language-plaintext highlighter-rouge">ACos</code> to worry about, it’s much less clear what the
optimal function will be. After all, we’ll need to balance the two, and I don’t
have the processing power to do an exhaustive search of $n=8,9,10$ (say)
to try and guess at a pattern<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">6</a></sup>.</p>
<p>Thankfully, this problem still admits an <em>asymptotic</em> solution, and our
earlier proof attempt is easily adapted to this setting.</p>
<p>Now, the most important skill a mathematician should learn is how to cover
their tracks<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">7</a></sup>. So when presenting a result like this to journals, we should
never say that we’re presenting an asymptotic solution because we didn’t
have the time to get a closed form.</p>
<p>Instead, we should argue that the choice of constructors for the elementary
functions was arbitrary, and any closed form for the maximal size of
$\lvert f’ \rvert$ <em>necessarily</em> depends on the choice of constructors!
Indeed, there are other conventions one could make, such as deciding to
not count multiplication towards the symbol count, since we often denote
multiplication by juxtaposition, which doesn’t require a “symbol” at all.</p>
<p>Of course, one can show that the <em>asymptotics</em> of $\lvert f’ \rvert$ are
independent of these choices, which makes the asymptotics a better
object of study.</p>
<p>… sounds good, doesn’t it<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">8</a></sup>?</p>
<p>Now let’s prove it!</p>
<div class="boxed">
<p>If $f$ has $n$ symbols in its definition, then $f’$ has $O(n^2)$
symbols in its definition, and this is optimal.</p>
<p>Moreover, our proof shows that this is independent of the choice of
presentation of the elementary functions.</p>
</div>
<p>$\ulcorner$</p>
<p>Again, we induct on $\lvert f \rvert$, the number of symbols in $f$.</p>
<p>Since we’re only interested in asymptotics, there’s nothing interesting to
prove about the base case.</p>
<p>For the inductive case, we case on the outermost constructor of $f$.</p>
<p>If it’s unary, say $f = c(g)$, then we see that</p>
\[\lvert f' \rvert =
\lvert c'(g) \cdot g'(x) \rvert =
\lvert c'(x) \rvert + O \left ( \lvert g \rvert \right ) +
O \left ( \lvert g' \rvert \right ) \pm O(1)\]
<p>where the $O(1)$ term is independent of $c$ and keeps track of the symbols
involved in representing the multiplication, etc. The big-ohs
around $\lvert g \rvert$ and $\lvert g’ \rvert$ account for the fact that
we might use each of these a constant number of times<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote" rel="footnote">9</a></sup>.</p>
<p>Next we see that $\lvert c’(x) \rvert = O(1)$,
since we can uniformly bound these by the size of the largest one,
as we did with $\arccos$ earlier in this post<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote" rel="footnote">10</a></sup>. So we see</p>
\[\begin{align}
\lvert f' \rvert
&= \lvert c(g)' \rvert \\
&\leq O \left ( \lvert g \rvert \right ) + O \left ( \lvert g' \rvert \right ) + O(1) \\
&\overset{IH}{\leq} O \left ( n-1 \right ) + O \left ((n-1)^2 \right ) + O(1) \\
&\leq O(n^2)
\end{align}\]
<p>If instead the outermost constructor is binary, say $f = c(g,h)$,
where $c(g,h)$ might be $g+h$, $gh$, $g^h$, etc. then we similarly compute</p>
\[\lvert f' \rvert =
\lvert c(g,h)' \rvert =
O \left ( \lvert g \rvert \right ) + O \left ( \lvert h \rvert \right ) +
O \left ( \lvert g' \rvert \right ) + O \left ( \lvert h' \rvert \right ) + O(1)\]
<p>and since $\lvert g \rvert + \lvert h \rvert = n-1$, we see that this is
bounded by</p>
\[O(n-1) + O \left ( (n-1)^2 \right ) + O(1) = O(n^2)\]
<p>and the claim follows.</p>
<p>As for the tightness of this bound, any presentation of the elementary
functions must have at least one trig function (since we cannot build
the trig functions from the others), say $\sin$. Then the $n$-fold
composition $\sin(\sin(\cdots(\sin(x) \cdots)))$ is easily seen to have
a derivative with quadratically many symbols.</p>
<p><span style="float:right">$\lrcorner$</span></p>
<hr />
<p>So we see that the precise question posed in the comic has no answer!
It asks for the maximal ratio of $\frac{\lvert f’ \rvert}{\lvert f \rvert}$,
but we’ve just shown that this ratio is unbounded. Of course, it’s still a
fun problem, and a natural variant <em>does</em> admit a nice solution
(which we found).</p>
<p>Moreover, this was a good way to showcase the back and forth between
computational experimentation and proof. Sometimes you get things wrong,
and that’s ok! We learn, and we form new conjectures that are more likely
to be correct with every trip around the spiral.</p>
<div class="boxed">
<p>As a cute project idea, while I was writing this one of my friends
(<a href="https://rahulrajkumar.github.io/">Rahul</a>) sent me <a href="https://iagoleal.com/posts/calculus-symbolic/">a blog post</a> where Iago Leal de Freitas built
a calculus evaluator in haskell that does simplification properly!</p>
<p>A better hacker than me can probably modify this code to push things a bit
further (especially with some parallel computation) to try and find
a family of functions $(f_n)$ attaining the maximum ratios
$\frac{\lvert f_n’ \rvert}{\lvert f_n \rvert}$.</p>
<p>This should be a pretty approachable problem for an enthusiastic
combinatorics student, and I would love to see somebody do it ^_^</p>
</div>
<hr />
<p>This was a lot of fun! It’s been in the works for a while now
(since April 28, apparently), but I really only worked on it
for a few days. I’m busy working on a lot of other stuff<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">11</a></sup>, and I’ll
hopefully share some of it soon.</p>
<p>One of the biggest things I’ve been spending time on
(which probably also qualifies as an announcement)
has been the <a href="https://www.uwo.ca/math/faculty/kapulkin/seminars/hottest_summer_school_2022.html">HoTTEST Summer 2022</a>,
where I’ll be TAing this summer. I’m already pretty active answering questions
in the discord, and I’ve been brushing up on my HoTT to get
ready<sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote" rel="footnote">12</a></sup>. I can <em>not</em> express how excited I am to be working on this, and
if anybody wants to show up, you’re more than welcome! We’re quickly coming
up on 1000 participants (of all experience levels),
and it’s sure to be a great time!</p>
<p>For now, though, I’m off to bed. Goodnight all, and I’ll see you in the next one ^_^</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Incidentally, this is why I chose haskell instead of sage. Python really
doesn’t handle algebraic datatypes with any sort of alacrity, and I wanted
to exploit the recursive structure of the problem.</p>
<p>Plus, it’s been a hot second since I got to use haskell, and it’s one of
my favorite languages to work in, so I didn’t spend very long on the decision :P. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>Namely as <code class="language-plaintext highlighter-rouge">Div (Const 1) (Cos X)</code> <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>… and regrettably I failed in that regard. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>Up to an additive constant, at least. If you want to be super precise,
then <code class="language-plaintext highlighter-rouge">size $ diff $ C e = size e + size (diff e) + size (C X) - 2</code> is
true for every unary constructor <code class="language-plaintext highlighter-rouge">C</code>. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:15" role="doc-endnote">
<p>Of course, we could simply <em>remove</em> <code class="language-plaintext highlighter-rouge">Pow</code> as a constructor, since we
can simulate it using <code class="language-plaintext highlighter-rouge">Exp</code> and <code class="language-plaintext highlighter-rouge">Log</code>. It’s not hard to show that the other
binary operations <em>will</em> let this proof go through, so we could have
“covered our tracks” by acting like we never even considered <code class="language-plaintext highlighter-rouge">Pow</code>!</p>
<p>I thought it would make for a better narrative (and it might be more
instructive) to go the asymptotic approach instead. Plus, it really is
more hygenic to prove a result that doesn’t depend on a particular
choice of “basic” constructors. <a href="#fnref:15" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>Looking at the formulas, we can tell that eventually <code class="language-plaintext highlighter-rouge">Pow</code> will win out
over <code class="language-plaintext highlighter-rouge">ACos</code>, and it probably wouldn’t take <em>too</em> much work to sort this out…</p>
<p>Maybe some reader with some free time wants to take this on as a project? <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>I’m only half joking <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p>It helps that this is actually a perfectly reasonable thing to do, and
jokes aside my original plan was to get asymptotics for exactly this reason
(also because I anticipated that an exact solution might be hard to get).</p>
<p>I thought we had gotten lucky with the iterated $\arccos$ construction,
and if you <em>can</em> get a closed form, you might as well. But with those
dreams dashed, it’s back to the asymptotics at the end of the day. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:13" role="doc-endnote">
<p>For example, we might choose to represent the derivative of $g^2$ by
$(g + g) g’$, in which case $\lvert (g^2)’$ would refer to $\lvert g \rvert$
twice.</p>
<p>I haven’t actually thought much about how badly things break if you do
something silly like this, but take it to an extreme (can we find a way to
make it so that there’s <em>no</em> uniform bound on this constant?), but I’m also
ok to leave that particular avenue unexplored.</p>
<p>Officially I should probably add some hypotheses explicitly forbidding this –
for instance, it should be enough to ask that we allow at most finitely many
constructors. That said, I think it’s ok to leave this a bit imprecise for
the purposes of a blog post. <a href="#fnref:13" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:12" role="doc-endnote">
<p>You might worry that there <em>is</em> no largest unary constructor. But the
only infinite families of constructors
(at least that are listed on <a href="https://en.wikipedia.org/wiki/Elementary_function">wikipedia</a>)
are the rational powers and the bases for $\exp$ and $\log$.</p>
<p>It’s clear, though, that the contributions of each of these derivatives
can be uniformly bounded as long as we’re counting a constant as a single
symbol. <a href="#fnref:12" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>I’m reading a series of papers on <a href="https://en.wikipedia.org/wiki/Model_category">model categories</a> with
<a href="https://sites.google.com/view/syeakel/">Sarah Yeakel</a> (who recently got a permanent position at UCR!),
as well as continuing my own readings on topos theory (which have filtered
into a reading course on locale theory that I’m teaching some undergrads).
I’m also in a class on riemann surfaces which has been really enlightening
for me. I have a few ideas for blog posts of the “I wish someone had shown
me this example sooner” variety, and hopefully I can get to them soon!</p>
<p>On top of all this, I’ve been talking with <a href="https://sites.google.com/site/patriciogallardomath/">Patricio Gallardo</a> about becoming an
algebraic geometer, and he wants me to start spending a serious amount of
time working through Hartshorne and Vakil’s notes. This makes sense,
of course, and I’m having a ton of fun doing it, but it means I have less
time to work on silly projects like this. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:14" role="doc-endnote">
<p>Plus trying to gain some serious familiarity with model categories and
$\infty$-categories before we start. This lined up quite nicely with my
conversations with Sarah about model categories. Sometimes you just get
lucky! <a href="#fnref:14" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Tue, 10 May 2022 00:00:00 +0000
https://grossack.site/2022/05/10/diff-growth.html
https://grossack.site/2022/05/10/diff-growth.htmlUsing Geometry in Logic<p>One thing that I talk a lot about is the (surprisingly tight) connection
between geometry and logic. I feel like this is something that one usually
gains an appreciation for by seeing lots of examples, and I found a particularly
simple example today <a href="https://math.stackexchange.com/q/4430107/655547">on mse</a>.</p>
<p>For completeness, OP wanted to know how to formally derive</p>
<div class="boxed">
<p>\(B \leftrightarrow A \land B, \ A \leftrightarrow \lnot B \vdash A\)</p>
</div>
<p>and when I first saw this, I thought it looked vaguely <a href="https://en.wikipedia.org/wiki/Law_of_excluded_middle">LEM</a>-y, so my first
question was whether it was true intuitionistically. If it <em>isn’t</em> true
intuitionistically, I would also want to find an intuitionistic model which
invalidates it in order to give a complete answer (since I like to justify my
uses of LEM for problems like this).</p>
<p>But how, you might ask, does geometry come into the picture? Well, <a href="https://en.wikipedia.org/wiki/Intuitionistic_logic#Heyting_algebra_semantics">we know</a> that
a sequent is provable intuitionistically if and only if it’s valid on all
topological spaces with the following semantics<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>
<ul>
<li>primitive propositions $A,B,C,\ldots$ are open sets</li>
<li>$\varphi \land \psi$ is the intersection of $\varphi$ and $\psi$</li>
<li>$\varphi \lor \psi$ is the union of $\varphi$ and $\psi$</li>
<li>$\lnot \varphi$ is the interior of the complement of $\varphi$</li>
<li>$\varphi$ is “true” exactly when it’s the whole space</li>
<li>$\varphi$ is “false” exactly when it’s the empty set</li>
</ul>
<p>In fact, we can say more: it’s enough<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">2</a></sup> to check when the primitive propositions
are open subsets of $\mathbb{R}$. To summarize this situation, cool kids will
say that the topological semantics of $\mathbb{R}$ are
<span class="defn">complete</span> for intuitionistic logic.</p>
<p>By this completeness theorem,</p>
\[B \leftrightarrow A \land B, \ A \leftrightarrow \lnot B \vdash A\]
<p>is provable intuitionistically if and only if<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">3</a></sup></p>
<ul>
<li>for any two open subsets $A$ and $B$ of $\mathbb{R}$</li>
<li>if $B = A \cap B$ and $A$ is the interior of $B^c$</li>
<li>then $A = \mathbb{R}$</li>
</ul>
<p>But now we see that we can start applying our geometric intuition to this
problem! After all, we know what open subsets of $\mathbb{R}$ look like, and
(at least for me), it’s much faster to show that $A$ must be all of $\mathbb{R}$
in the above example than to look for a formal derivation.</p>
<p>Of course, to really answer OP’s question, we <em>should</em> provide a derivation.
It’s not enough to argue abstractly that one should exist<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">4</a></sup>, and thankfully
this is also not hard. Since we now know that the claim is true
intuitionistically, we can switch over to a programming interpretation by
<a href="https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspondence">curry-howard</a>! Since I have a lot of experience with functional programming,
this is <em>also</em> easier for me than working with the logic directly. The idea is
that writing programs is the same thing as writing proofs, and there’s a
totally algorithmic way to convert some code (which I’ll outline below) into
the desired proof tree:</p>
<p>Say we have programs</p>
<ul>
<li>$f_1 : B \to A \times B$</li>
<li>$f_2 : A \times B \to B$</li>
<li>$g_1 : A \to B \to 0$</li>
<li>$g_2 : (B \to 0) \to A$</li>
</ul>
<p>You’ll recognize these as
our assumptions (where I’ve unpacked the $\leftrightarrow$s). We want to build
a program of type $A$.</p>
<p>By $g_2$, if we can build a program of type $B \to 0$, we’ll be done! But
if we’re given a $b:B$, then it’s almost immediate to build the desired term
as follows</p>
\[B
\overset{f_1}{\longrightarrow} A \times B
\overset{g_1 \times \text{id}_B}{\longrightarrow} \lnot B \times B
\longrightarrow 0\]
<div class="boxed">
<p>As a quick exercise, you might try to write down the actual code of type $A$
in your favorite functional programming language, assuming the existence of
these functions $f_1$, $f_2$, $g_1$, and $g_2$.</p>
<p>If you don’t have anything better to do (or if you’ve never done it before)
you might then convert this program into the proof tree that OP asked for.</p>
</div>
<hr />
<p>This all worked out quite smoothly, since it turned out that the claim
was actually true intuitionistically. If we got a claim that <em>isn’t</em>
intuitionistically valid, can we use geometry in order to find a model
where it’s false?</p>
<p>The answer, of course, is “yes”!</p>
<p>As an easy example, let’s take double negation elimination</p>
\[\vdash \lnot \lnot A \leftrightarrow A\]
<p>under our topological interpretation, this says that
“the interior of the complement of the interior of the complement of $A$ is $A$”.
A moment’s thought shows that this is the same thing as
“the interior of the closure of $A$ equals $A$”, and there are well known
open sets which don’t have this property<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">5</a></sup>.</p>
<div class="boxed">
<p>As another quick exercise, find an open set $A$ which is <em>not</em> the
interior of its closure!</p>
<p>Then the heyting algebra of open subsets of $\mathbb{R}$ equipped with this
open set $A$ provides a countermodel for double negation elimination.</p>
</div>
<hr />
<p>Another quick one tonight! I know I talk a lot about how my interests lie in
the intersection of geometry and logic, but I think it can be tricky to really
understand what that means. When I answered this mse question, I realized it
would make a good expository post to give the general flavor of my interests.
The fact that these things are <em>also</em> related to functional programming and
PL theory is not an accident, and I’m also interested in those fields for
similar reasons!</p>
<p>Obviously the rabbit hole goes much deeper than this. First via locales,
which are geometric objects that “classify” propositional theories, and later
via toposes, which are geoemtric objects that classify predicate (and higher order)
theories in an analogous way.
For more details about this, see Vicker’s excellent paper
<em>Locales and Toposes and Spaces</em>, available <a href="https://www.cs.bham.ac.uk/~sjv/LocTopSpaces.pdf">here</a>, for instance.</p>
<p>Stay warm, and I’ll see you all soon ^_^</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Here, for simplicity, I’m identifying a formula $\varphi$ with its
interpretation. If you want to be less sloppy than me, you should
write \([ \! [ \varphi ] \! ]\), but this is too annoying for me to
type in mathjax – there aren’t enough hours in the day to write</p>
<p><code class="language-plaintext highlighter-rouge">[ \! [ \varphi ] \! ]</code></p>
<p>the number of times that would be required of me. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>It’s also enough to check
$\mathbb{R}^n$ for any fixed $n$ (I often have $\mathbb{R}^2$ in mind),
or $2^\omega$ cantor space, or many other concrete spaces. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>Again we may take, for example, $\mathbb{R}^2$, $2^\omega$, etc. instead
of $\mathbb{R}$ if we prefer. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>I read a great (albeit somewhat aggressive) <a href="https://www.hedonisticlearning.com/posts/the-pedagogy-of-logic-a-rant.html">blog post</a> a while ago which
gave an analogy that now lives in my head rent free. If there are any readers
confused by the (admittedly subtle!) distinction between giving a derivation
and checking semantically that a sequent must be valid,
hopefully this analogy helps!</p>
<div class="boxed">
<p>When asked to derive a sequent $\Gamma \vdash \varphi$, it’s not enough
to just check that it’s valid semantically.</p>
<p>This would be like being asked to compute the inverse of a matrix, and
instead checking that the determinant is nonzero. Yes, this is equivalent
to the <em>existence</em> of an inverse, but finding the inverse itself carries
more information!</p>
</div>
<p><a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Indeed, those open sets which satisfy this property are called
<a href="https://en.wikipedia.org/wiki/Regular_open_set">regular</a>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Sun, 17 Apr 2022 00:00:00 +0000
https://grossack.site/2022/04/17/geometry-and-logic.html
https://grossack.site/2022/04/17/geometry-and-logic.htmlHow Holomorphic Functions are Just Like Polynomials<p>I took a complex exam <del>last week</del> a while ago, and while I was studying I
realized that a lot of the theorems were saying that holomorphic functions
behave like polynomials. This makes sense, since a holomorphic function,
which locally has a power series, looks like a polynomial of infinite degree,
but there’s actually quite a bit to say here! With that in mind, I decided to
write up some quick thoughts about this, in line with my post from a while ago
talking about banach space theorems generalizing finite dimensional linear
algebra (see <a href="/2021/09/09/banach-spaces.html">here</a>). Now, on with the show!</p>
<hr />
<p>I mentioned this in the introduction, but it’s worth stating the obvious.
A holomorphic function locally looks like a power series</p>
\[a_0 + a_1 (z - \xi) + a_2 (z - \xi)^2 + \ldots\]
<p>that is, a “polynomial of infinite degree”. With this in mind, there’s lots
of well known formulas that work for polynomials that continue to the
holmorphic setting. For instance, we can differentiate and integrate
term by term (provided we stay inside the radius of convergence), and if we
have two series, we can add and multiply them exactly as we would polynomials
(term by term for addition, and via the <a href="https://en.wikipedia.org/wiki/Cauchy_product">cauchy product</a> formula for products)
provided we stay inside the radius of convergence for <em>both</em> series.</p>
<p>In fact, there are deeper ways in which holomorphic functions act like polynomials.
For a start, polynomials over $\mathbb{C}$ always factor as some constant times
a product of roots. That is, we always have</p>
\[p(z) = c \prod_{\rho} (z - \rho)\]
<p>(where the roots $\rho$ are counted with multiplicity, of course).</p>
<p>This says that, up to <a href="https://en.wikipedia.org/wiki/Unit_(ring_theory)">units</a>, polynomials are in bijection with
finite (multi)sets of points in the complex plane.</p>
<p>The situation for holomorphic functions is more delicate, but only
slightly. The <span class="defn">Weierstrass Factorization Theorem</span>
tells us that every holomorphic function factors as</p>
\[f(z) = e^g z^m \prod_{\rho} E_{n_\rho} \left ( \frac{z}{\rho} \right )\]
<p>Here $e^g$ is nowhere zero, thus is a unit in the ring of holomorphic functions,
and the function $E_{n_\rho} \left ( \frac{z}{\rho} \right )$ is zero only at a
nonzero root $\rho$, so is analogous to the factor $(z - \rho)$, and we also
have $m$ factors of $z$ which correspond to the order of vanishing of $f$ at $0$.</p>
<p>These functions $E_k \left ( \frac{z}{\rho} \right )$, which differ from
$(z - \rho)$ only by units, are cleverly chosen to force the infinite product
to converge, which is an issue we don’t have in the case of polynomials<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">1</a></sup>.</p>
<p>Conversely, we would like to know that to any family of points in the plane,
we can associate a holomorphic function with precisely those points as roots.
This is possible, but the key insight is a hidden assumption in the case of
polynomials: a finite set of points ia always <em>discrete</em>! If we want to allow
for infinitely many zeroes, we have to explicitly demand discreteness of the
set of zeroes<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">2</a></sup>.</p>
<p>But this is the only obstruction! Say $(a_n)$ is a discrete set of points
in the plane, and $(r_n)$ is a sequence of natural numbers.
Then there exists a holomorphic function, unique up to units,
which vanishes to order $r_n$ at $a_n$, and nowhere else<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">3</a></sup>.</p>
<div class="boxed">
<p>As a quick exercise, this data is often repackaged by saying that
$(a_n)$ is a <em>sequence</em> of points in the plane with $|a_n| \to \infty$.</p>
<p>Show that this condition is equivalent to ours.</p>
</div>
<p>Next up is the argument principle, which lets us count the number of zeroes
$f$ has in some region.</p>
<p>For polynomials, the key insight is that the <a href="https://en.wikipedia.org/wiki/Logarithmic_derivative">logarithmic derivative</a>
turns products into sums. That is,</p>
\[\frac{(uv)'}{uv} = \frac{u'}{u} + \frac{v'}{v}\]
<p>So if we factor our polynomial as $c \prod (z - \rho)$, we can use this
formula to compute the logarithmic derivative:</p>
\[\frac{p'}{p} =
\frac{c \left ( \prod (z - \rho) \right )'}{c \prod (z-\rho)} =
\sum \frac{(z - \rho)'}{z-\rho} =
\sum \frac{1}{z-\rho}\]
<p>Of course, it’s easy to compute the integral of this sum along a (simple) closed
contour! We pick up a $2\pi i$ is $\rho$ is inside the contour, and a $0$ otherwise.</p>
<p>So integrating both sides, we see that</p>
\[\oint_\gamma \frac{p'}{p}\ dz = 2 \pi i \left ( \# \text{roots inside $\gamma$} \right )\]
<p>The remarkable thing is that this formula goes through entirely unchanged when
we pass to holomorphic functions<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>!</p>
<hr />
<p>Lastly, let’s give an important property that <em>does</em> change when we move from
the polynomial setting to the holomorphic one: growth rates!</p>
<div class="boxed">
<p>As a preemptive exercise, you should show that any holomorphic function $f$
with $|f| = O(|z|^n)$ for some $n$ must itself be a polynomial.</p>
<p>You’ll want to use <a href="https://en.wikipedia.org/wiki/Liouville%27s_theorem_(complex_analysis)">Liouville’s theorem</a>!</p>
</div>
<p>So a holomorphic function which grows polynomially quickly is itself a polynomial.
Taking contrapositives, we see that any nonpolynomial holomorphic function must
grow faster than any polynomial! This gives rise to an obvious question:</p>
<p>How quickly <em>can</em> holomorphic functions grow?</p>
<p>Well, in the last section we said that for any discrete sequence $(a_n)$, and for
any values $A_n$ we like, we can find a holomorphic function $f$ so that
$f(a_n) = A_n$.</p>
<p>For simplicity, let’s take $a_n = n$ to be integers. Now let’s take
$A_n \triangleq n!$. Or better yet, $A_n = (n!)!$. What the hell, let’s let
$A_n \triangleq \mathtt{Ack}(n,n)$ the diagonal of the <a href="https://en.wikipedia.org/wiki/Ackermann_function">ackermann function</a>!</p>
<p>The theorem from the last section says that there’s a holomorphic function which
grows at least as quickly as $\mathtt{Ack}(n,n)$, but it’s easy to see that
we can make functions which grow as quickly as we like by modifying this argument<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>!</p>
<p>This should be some mild warning that, despite many other similarities, there
is still a marked difference between the behavior of holomorphic (even entire!)
functions and polynomials.</p>
<hr />
<p>Finally a truly quick one! Can you believe a post this short has been in my
drafts for almost a month now? I still have plans for some more exciting posts
coming up, but I’ve been really overwhelmed with work lately.</p>
<p>One fun, thing is that I’m running a reading course for some undergraduates on
<a href="https://en.wikipedia.org/wiki/Pointless_topology">locale theory</a>, and I might try to keep a running series where I summarize
what we do on any given week.</p>
<p>So far we’ve mainly been reviewing the definitions of categories, which I don’t
think I need to go into here, but once we start doing more interesting things
I’d like to post my thoughts here as we go. No promises, though!</p>
<p>If nothing else, I have another post which is <em>almost</em> done, and I should
hopefully post it soon! I’ll see you all there ^_^</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:3" role="doc-endnote">
<p>More details can be found in Conway’s <em>Functions of One Complex Variable</em>
section $VII.5$, but basically</p>
\[E_k(z) \triangleq
(1 - z)
\exp \left ( z + \frac{z^2}{2} + \frac{z^3}{3} + \ldots + \frac{z^k}{k} \right )\]
<p>so in particular, $E_k \left ( \frac{z}{\rho} \right )$ is zero precisely when
$z = \rho$.</p>
<p>Notice we could just as easily factor a polynomial $p$ as</p>
\[p(z) = c z^m \prod \left ( 1 - \frac{z}{\rho} \right )\]
<p>where the $\rho$ are the nonzero roots of $p$, and this differs from
the usual factorization only by a unit.</p>
<p>Writing $p$ in this way makes the analogy with the weierstrass factorization
theorem much more obvious. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:1" role="doc-endnote">
<p>After all, by uniqueness of analytic continuation, if the zeroes of a
holomorphic function contain a limit point, then that function <em>must</em> be
identically zero!</p>
<p>So if we want to be able to say the zeroes are on our specified points
and <em>nowhere else</em>, then the desired set of zeroes cannot contain a limit
point. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Remarkably <em>much</em> more is true!</p>
<p>We can specify nonzero values at some discrete family of points,
and we can even specify the first finitely many derivatives at those
points!</p>
<p>Formally, if $a_n$ is some discrete set of points, and $k$ is some fixed integer,
then for any values $A_n^0, A_n^1, A_n^2, \ldots, A_n^k$,
there’s a holomorpic function (unique up to units) so that for every
$0 \leq j \leq k$, and for every $n$, we have</p>
\[\left . \frac{d^j f}{dz^j} \right |_{a_n} = A_n^j\]
<p>This is <em>incredible</em>, since it seems to fly in the face of the rigidity
of holomorphic functions. It’s wild to me that there should be <em>such</em> a
wealth of holomorphic functions which we can create to our specifications.
I (unsurprisingly) first heard about this result on <a href="https://math.stackexchange.com/questions/1627388/is-there-an-upper-bound-on-the-growth-rate-of-analytic-functions">mse</a>, and I don’t
actually have a reference besides that… If someone happens to know one,
I would love to hear about it! <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>This is basically because the region bounded by $\gamma$ is compact. Thus
it can contain only finitely many roots (do you see why?) so we can write
$f$ as a <em>finite</em> product of roots inside $\gamma$ (just like our polynomial!)
times some function $g$ which is nonzero inside $\gamma$. Then we apply our
formula for the logarithmic derivative exactly as we did for polynomials,
but we’ll get some final term of the form $\frac{g’}{g}$, whose integral
is also $0$. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>This doesn’t really fit in with the rest of the post, but I want to mention it,
so… footnote :P</p>
<p>It turns out that we can <em>lower</em> bound the growth rate of a holomorphic function
by understanding the density of its zeroes. This amounts to
<a href="https://en.wikipedia.org/wiki/Jensen%27s_formula">jensen’s formula</a>, which you can read about on Terry Tao’s blog
<a href="https://terrytao.wordpress.com/2020/12/23/246b-notes-1-zeroes-poles-and-factorisation-of-meromorphic-functions/">here</a>.</p>
<p>As a cute problem of this format, here’s a homework question from UCR’s
complex analysis class:</p>
<div class="boxed">
<p>Let $t > 0$ be fixed, and let</p>
\[f(z) \triangleq \prod_{n=1}^\infty
\left (
1 - \exp(-2\pi n t) \exp (2 \pi i z)
\right )\]
<p>In particular, $f$ has zeroes at exactly $m - int$ for $m,n \in \mathbb{Z}$.</p>
<p>Show that</p>
<p>\(\max_{|z| < R} |f(z)| =
\Omega \left ( \exp \left ( \frac{\pi R^2}{4t} \right )\right )\)</p>
</div>
<p>In proving this (using jensen’s formula), you’ll want to estimate the
number of zeroes in a circle of radius $R$, which you can (and should)
do geometrically. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 13 Apr 2022 00:00:00 +0000
https://grossack.site/2022/04/13/hol-poly.html
https://grossack.site/2022/04/13/hol-poly.htmlWhy is the Completion of a Local Ring "More Local" than just Localizing?<p>An oft-repeated piece of intuition I’ve seen while trying to learn algebraic
geometry is that localizing a ring at a prime is like “zooming in” on that
point. But if you want to zoom in “really close” then you have to take the
<a href="https://en.wikipedia.org/wiki/Completion_of_a_ring">completion</a> of this ring… Why is that?</p>
<p>This is definitely an observation well known to experts
(and even many nonexperts, probably), but I know I would have liked to
have seen it spelled out explicitly, so here we are. I also would have
realized it sooner if I’d read Hartshorne sooner, since he gives this exact
observation in chapter $I.5$, in the section on completion<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<p>The idea, in hindsight, is obvious:</p>
<div class="boxed">
<p>Open subsets in the Zariski topology are dense!</p>
<p>So knowing what happens in some neighborhood means knowing what happens
almost everywhere!</p>
</div>
<p>Let $R$ be a ring, and consider $X = \text{Spec}(R)$.</p>
<p>Then the ring of regular functions on $X$ is exactly $R$, where we think of
$f(\mathfrak{p}) \triangleq f / \mathfrak{p}$. In particular,
$f(\mathfrak{p}) = 0$ if and only if $f \in \mathfrak{p}$.
Notice the codomain of $f$ might vary with its input
(so $f$ is like an element of a <a href="https://en.wikipedia.org/wiki/Dependent_type#%CE%A0_type">dependent product</a>).</p>
<p>Now, what does it mean to “zoom in” on a point $\mathfrak{p}$ in $X$?</p>
<p>Well, one obvious idea is to look at functions defined on smaller and smaller
open sets containing $\mathfrak{p}$. This is formalized by the idea of
<a href="https://en.wikipedia.org/wiki/Germ_(mathematics)">germs</a>, and a fairly easy computation shows that the ring of germs at
$\mathfrak{p}$ is exactly the localization $R_\mathfrak{p}$. Indeed, that’s
where the name comes from. This also makes some intuitive sense, since
(for integral domains)</p>
\[R_\mathfrak{p} = \left \{ \frac{f}{g} \ \middle | \ g(\mathfrak{p}) \neq 0 \right \}\]
<p>but if $g(\mathfrak{p}) \neq 0$, then (by continuity) there’s a
<em>neighborhood</em> of $\mathfrak{p}$ where $g$ doesn’t vanish. So near
$\mathfrak{p}$, $g$ looks invertible!</p>
<p>This sounds like a great definition of “local”, so what’s the problem?
Well, remember the boxed statement! Zariski open sets are <em>dense</em>,
and thus remember information from “most of” $X$!</p>
<p>Here is one striking example:</p>
<p>Let $X$ and $Y$ be varieties over an (algebraically closed) field $k$.</p>
<p>Say the local rings of $\mathfrak{p} \in X$ and $\mathfrak{q} \in Y$ are
isomorphic (as $k$-algebras). Then (exercise $I.4.7$ in Hartshorne)
$\mathfrak{p}$ and $\mathfrak{q}$ have neighborhoods $U \subseteq X$ and
$V \subseteq Y$ so that $U$ and $V$ are isomorphic as varieties.</p>
<p>This doesn’t sound too bad until you remember that $U$ is dense in $X$
and $V$ is dense in $Y$. So already $X$ and $Y$ have to be
<a href="https://en.wikipedia.org/wiki/Birational_geometry">birationally equivalent</a>!</p>
<p>So even though we only knew that $\mathfrak{p}$ and $\mathfrak{q}$ had
isomorphic neighborhoods, we were able to lift this to a (coarse)
equivalence of the whole of $X$ and $Y$!</p>
<hr />
<p>Now, most of my geometric intuition comes from manifolds, and in that setting
this would be extremely strange! After all, manifolds are locally euclidean,
so if $X$ and $Y$ have the same dimension, then <em>every</em> point of $X$ and
<em>every</em> point have $Y$ have isomorphic neighborhoods!</p>
<p>See, for instance, this picture I stole from the nlab<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>:</p>
<p style="text-align:center;">
<img src="/assets/images/locality-in-alg-geom/Chart.png" width="50%" />
</p>
<p>Every point on this surface has a neighborhood homeomorphic to an open disk,
and the same is true of every other point of every other $2D$ manifold!</p>
<p>Intuitively, then we want to say that locally, every point of every
variety “looks the same” as every other point, as long as the two varieties
have the same dimension. We can’t get this behavior with just open sets in
the case of varieties because there simply aren’t enough open sets to
“get close enough” to a point.</p>
<p>But now let’s look at the completion.</p>
<p>The <a href="https://en.wikipedia.org/wiki/Cohen_structure_theorem">Cohen Structure Theorem</a> says that any two complete regular local
rings containing the same field $k$ of the same dimension are isomorphic.
In fact, every such ring is isomorphic to $k[\![ x_1, \ldots, x_n ] \!]$.</p>
<p>So <em>now</em> if $X$ and $Y$ are varieties of the same dimension, and
$\mathfrak{p}$ and $\mathfrak{q}$ are (regular) points, then even if
the <em>local</em> rings \(k[X]_\mathfrak{p}\) and \(k[Y]_\mathfrak{q}\) might not
be isomorphic the <em>complete</em> local rings \(\widehat{k[X]_\mathfrak{p}}\)
and \(\widehat{k[Y]_\mathfrak{q}}\) <em>will</em> be!</p>
<p>This is the analogue in algebraic geometry of the fact that any two points
of manifolds of the same dimension must have homeomorphic neighborhoods!</p>
<p>As a last aside, notice we had to squeeze in the word
<span class="defn">regular</span> earlier. What does that mean?</p>
<p>Well manifolds have to be smooth <em>everywhere</em>, whereas varieties are allowed
to have a small set of <a href="https://en.wikipedia.org/wiki/Singular_point_of_an_algebraic_variety">singular points</a>. Broadly speaking, these are points
where the tangent space has the wrong dimension. When a point <em>isn’t</em> singular,
we call it regular, and the set of regular points is open and dense, so it’s
most of the variety.</p>
<p>Since we expect the tangent space of a point to “look like” some small
neighborhood of that point, it makes sense that “zooming in” too close to
a singular point might make it look unlike the rest of the regular points
on the variety, and indeed that’s the case.</p>
<p>At this point I encourage everyone to go look at example $I.5.6.3$ in
Hartshorne. I <em>could</em> transcribe it here for convenience, but I wanted this
to be a lower effort post, so… I won’t :P.</p>
<p>I have some higher effort posts in the pipeline, but for now, it’s time for bed.</p>
<p>Stay warm, all! See you soon ^_^</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>I was scared of Hartshorne for a long time, and in hindsight I really
didn’t need to be. I haven’t read it cover to cover, but I’m at a place
now where chapter $I$ on varieties all feels quite natural, and I’m
reading chapter $II$ on schemes right now. I’m skimming quite lightly,
because I only have so much time, and I have other things I’m reading
about too, but I’m really enjoying it so far.</p>
<p>I’ve actually had this post idea in my todo list for a little while now,
and when I saw the exact observation spelled out in Hartshorne, it
was the last push I needed to actually get to it. Of course that push was
a few weeks ago now, but it takes time to actually get around to writing
these posts, haha. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>This image actually has some ~bonus information~ about transition maps,
which muddles the picture somewhat, but that’s what I get for using
someone else’s diagram <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Fri, 04 Mar 2022 00:00:00 +0000
https://grossack.site/2022/03/04/locality-in-alg-geom.html
https://grossack.site/2022/03/04/locality-in-alg-geom.htmlTalk (?) -- Why Care about Lie Algebras?<p>I gave my second (and last) presentation/talk in my lie algebras class today,
and it was on a topic that was and is really important to me. When learning
something new, I think it’s worth asking yourself what it does for you.
What problems does it solve? How do the structures we’re learning about arise?
There’s a lot of people who enjoy abstraction for its own sake, but I ultimately
care about <em>solving problems</em>, and while I’m not going to shy away from high
abstraction mathematics to do so (I’m still a category theorist after all)
I think it’s important to be aware of concrete examples that can ground your
theory in things that obviously matter.</p>
<p>Again, this talk was just under a half hour, and I don’t have an abstract for
it (since it was informal), but I want it included with my other talks because
I’ll be giving a debrief at the end.</p>
<hr />
<p>First, I gave (what I think is) the original motivation for lie algebras:</p>
<p>A <a href="https://en.wikipedia.org/wiki/Lie_group">lie group</a> is a group which is also a manifold, and the group operations
should be smooth. Important examples are $(\mathbb{R}, +)$ and $(S^1, \times)$
(where we view $S^1$ as the unit circle in $\mathbb{C}$, say).</p>
<p>These are (in my mind) obviously interesting, with wide application. Anytime
you have a continuous family of symmetries, as you do in many situations in
geometry, physics, and differential equations<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, you have a lie group describing
those symmetries. Then we can <em>exploit</em> this symmetry in order to simplify
problems. As a great example of using lie groups to simplify calculations
and solve problems, see <a href="https://www.youtube.com/watch?v=ltLUadnCyi0">this</a> 3Blue1Brown video.</p>
<p>Unsurprisingly, lie groups are <em>complicated</em>! But since they have smooth
structure we can hit our problems over the head with calculus, thus turning
our problems into linear algebra. This is exactly how lie algebras arise<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>!</p>
<p>Now, since we’re taking the first-order approximation of our lie group structure,
it’s reasonable to worry that we’re going to lose lots of information about
our lie group when we differentiate. Thankfully, this isn’t the case! There’s
a slew of correspondence theorems which let you translate back and forth between
the lie algebra and the lie group. Most notably:</p>
<div class="boxed">
<ol>
<li>
<p>If $G$ is connected and simply connected, then lie group homs $G \to H$
are in natural bijection with lie algebra homs $\mathfrak{g} \to \mathfrak{h}$</p>
</li>
<li>
<p>If $\mathfrak{g}$ is a (real or complex) lie algebra, then there is a
unique connected and simply connected lie group $\tilde{G}$ whose lie algebra is $\mathfrak{g}$.
Moreover, every connected lie group $G$ whose lie algebra in $\mathfrak{g}$
is a quotient of $\tilde{G}$ by a discrete central subgroup.</p>
</li>
<li>
<p>The (connected) subgroups (resp. normal subgroups, etc.) of a (connected)
lie group $G$ are in natural correspondence with subalgebras (resp. ideals, etc.)
of the lie algebra $\mathfrak{g}$.</p>
</li>
</ol>
</div>
<p>So as long as we’re interested in connected lie groups, we lose <em>remarkably</em>
little information in passage to the lie algebra<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>
<p>Next I gave a concrete example of a computation that is simplified by the
use of lie algebras. Physicists care about finite dimensional irreducible representations
of $SU(2)$ because they tell us what kinds of “spin” a particle can have.
Thankfully $SU(2)$ is simply connected, so irreducible representations of
$SU(2)$ are in natural bijection with (real) irreducible representations of
$\mathfrak{su}(2)$. Then by <a href="https://en.wikipedia.org/wiki/Complexification">complexifying</a>, these are in natural bijection
with (complex) irreducible representations of $\mathfrak{sl}(2, \mathbb{C})$.
But in this class we completely classified the irreducible
$\mathfrak{sl}(2,\mathbb{C})$ representations<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>! Unraveling this procedure
gives a complete classification of irreducible representations of $SU(2)$,
which seem (to me) almost impossible to get our hands on directly!</p>
<hr />
<p>With this computation out of the way, the talk became much more of a survey.
I ended with three other examples of lie algebras arising “in the wild” as
useful tools for solving (seemingly unrelated) problems.</p>
<p>First, the obvious one. Instead of working with lie groups, we can work with
<a href="https://en.wikipedia.org/wiki/Algebraic_group">algebraic groups</a> (their algebro-geometric analogue). These too give
lie algebras when we look at the tangent space of the identity, and again
there’s a correspondence between properties of the group and its lie algebra
(see <a href="https://www.jmilne.org/math/CourseNotes/LAG.pdf">here</a> for more). These are important (so I’m told) to the
<a href="https://en.wikipedia.org/wiki/Langlands_program">langlands program</a>, which I think provides suitable motivation.</p>
<p>Next a more surprising application. Recall the <a href="https://en.wikipedia.org/wiki/Burnside_problem">Burnside Problem</a>, which asks</p>
<div class="boxed">
<p>If $G$ is finitely generated, and every element is finite order, must $G$ be finite?</p>
</div>
<p>The answer (famously) is “no”<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>. But it was an extremely hard problem, and in
the process of studying it people came up with the
<span class="defn">Restricted Burnside Problem</span>:</p>
<div class="boxed">
<p>If $G$ is finitely generated (by $m$ elements) and every element is finite order
(at most $n$), <em>and</em> we moreover assume $G$ is finite, can we bound the size
of $G$ in terms of $m$ and $n$?</p>
</div>
<p>If $n = p^k$ is a prime power, then $G$ is nilpotent (it’s a finite $p$-group),
and bounding its nilpotency class will also bound its order. Now for the magic.</p>
<p>From a “Zassenhaus Filtration”</p>
\[G = G_0 \vartriangleright G_1 \vartriangleright \cdots \vartriangleright G_\ell = 1\]
<p>we build a vector space</p>
\[\tilde{L}(G) \triangleq \bigoplus G_i / G_{i+1}\]
<p>(recall each $G_i / G_{i+1}$ is elementary abelian, thus a vector space over
$\mathbb{F}_p$)</p>
<p>equipped with the bracket
$[a G_{i+1}, b G_{j+1}] \triangleq a^{-1}b^{-1}ab G_{i+j+1}$
this becomes a lie algebra, and the question of bounding the nilpotency class
of $G$ is translated into a purely lie algebra theoretic question about
a subalgebra of $\tilde{L}(G)$. See the (excellent!) survey
<em>On the Restricted Burnside Problem</em> by Zelmanov for more details.</p>
<p>Lastly, a construction from <a href="https://en.wikipedia.org/wiki/Homotopy_theory">homotopy theory</a>. Here I spent another good
chunk of time talking about homotopy groups (which are hard to understand),
and the (comparatively simple) <em>rational</em> homotopy groups. It’s a (very)
famous theorem of Quillen that the rational homotopy groups of a space
assemble into a (differential, graded) lie algebra
(with the <a href="https://en.wikipedia.org/wiki/Whitehead_product">whitehead bracket</a>) which is effective
(in the sense that we can actually compute with this lie algebra) and also
a perfect invariant (in the sense that two simply connected spaces $X$ and $Y$
have the same rational homotopy if and only if their associated lie algebras
are <a href="https://en.wikipedia.org/wiki/Quasi-isomorphism">quasi-isomorphic</a>).</p>
<p>Much of this was parroted from Jacob Lurie’s talk
<em>Lie Algebras and Homotopy Theory</em>, which you can find <a href="https://www.youtube.com/watch?v=LeaiPHAh0X0">here</a>.
One day I would like to spend more time with this material, because I think
homotopy theory is super interesting, but for now I have to be quite sketchy
when talking about it, because I don’t know anything at all.</p>
<hr />
<p>And that was the talk!</p>
<p>I think it was alright? Definitely not my best talk, but I also wanted to give
as many examples as possible in a fairly short time span. I liked the
survey-esque nature of it, but wish I knew more about each of the topics I was
surveying :P.</p>
<p>If nothing else, I learned a <em>ton</em> writing this talk, and am now entirely
convinced on lie algebras as something I should be familiar with, which was
really the point of me looking into this at all. I also gained some ammunition
for caring about categories, since we actually get an <em>equivalence of categories</em>
between the category of $G$ reps to the category of $\mathfrak{g}$ reps
(when $G$ is simply connected, see <a href="https://math.stackexchange.com/q/641082/655547">here</a>). This tells you, basically for
free, that irreducible $G$ representations are the same thing as irreducible
$\mathfrak{g}$ representations, since reducibility is expressible categorically.</p>
<p>Another thing I want to think more about is the relationship between the last
two examples… In both cases we had a family of related objects
(the $G_i / G_{i+1}$ and the rational homotopy groups) plus some kind of
“derivation” (either group-theoretic commutators, or the whitehead bracket)
which gives us lie algebra structure.</p>
<p>This might be superficial, but I would like to think more about it, and see
if I can’t find other examples of this phenomenon. I remember reading somewhere
that we can study noncommutative algebras by using techniques from
commutative algebra as long as the noncommutativity is “bounded” in some sense,
and this <em>also</em> reminds me of that.</p>
<p>For instance, we can study the <a href="https://en.wikipedia.org/wiki/Weyl_algebra">Weyl Algebra</a></p>
\[k\langle x, y \rangle \big / xy - yx = 1\]
<p>which is “barely noncommutative<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>” in the sense that while $x$ and $y$
don’t commute, we’re only off by a constant (that is, a term of smaller degree
than the terms we started with). Then we can run a similar construction, where
we look at the direct sum of the degree $n$ elements modulo the degree $n-1$
elements, and this forms a ring that <em>is</em> commutative.</p>
<p>Anyways, it’s quite late now and I feel like I’m starting to ramble.
All in all, the talk was <em>fine</em>, but I definitely learned a lot, and have
a lot of follow-up thinking to do, which is what I really wanted to get out
of it.</p>
<p>I’ll see you all soon ^_^</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p><a href="https://en.wikipedia.org/wiki/Sophus_Lie">Sophus Lie</a> was actually initially interested in lie groups because he
wanted a smooth analogue of <a href="https://en.wikipedia.org/wiki/Galois_theory">galois theory</a> that would let him understand
when differential equations have “simple” solutions, in a way analogous to
classical galois theory classifying when polynomial equations have “simple”
(read: radical) solutions.</p>
<p>The resulting machinery, which can also be used to solve differential
equations (again, in a way analogous to using galois theory to solve polynomial
equations, a topic I’ve talked about <a href="/2021/08/06/cyclic-extensions.html">before</a>) turned out to be a bit
unweildy. It turns one “large” differential equation into many “small”
differential equations, and by “many” I mean possibly hundreds.</p>
<p>This is difficult for a human to manage? But for a computer? This is ideal!</p>
<p>You can read more about the idea in <a href="http://www.physics.drexel.edu/~bob/LieGroups/LG_16.pdf">this</a> set of notes, and about
possible computer implementations in <a href="https://www.heldermann-verlag.de/jlt/jlt01/CZICHPL.PDF">this</a> paper. There’s also apparently
a whole book on the subject, Schwarz’s <em>Algorithmic Lie Theory for
Solving Ordinary Differential Equations</em>, though I’ve not read it. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>I didn’t mention this in the talk, because I didn’t have time, but I feel
like I should say a quick word here for completeness.</p>
<p>Given a lie group $G$, we define its lie algebra $\mathfrak{g}$ to be the
<a href="https://en.wikipedia.org/wiki/Tangent_space">tangent space</a> at the identity element $e$. Then the bracket
$[-,-]$ on $\mathfrak{g}$ is defined in a somewhat roundabout way.</p>
<p>First, let $\text{AD}_g : G \to G$ be $\text{AD}_g(h) = ghg^{-1}$. Then
if we differentiate $\text{AD}_g(h)$ with respect to $h$ at the identity,
we get a map $\text{Ad}_g : \mathfrak{g} \to \mathfrak{g}$.</p>
<p>Next, we differentiate <em>this</em> with respect to $g$! This gives us a map
$\text{ad}_v : \mathfrak{g} \to \mathfrak{g}$ for each $v \in \mathfrak{g}$.</p>
<p>Lastly, we define $[v,w] = \text{ad}_v(w)$.</p>
<p>If you’re interested in this, you can read more at John Baez’s lecture
notes <a href="https://math.ucr.edu/home/baez/lie_groups/">here</a> (which is where I learned most of this). In particular
lectures $11$ to $16$. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>There’s a categorical remark here as well, which I didn’t make in the
talk. But the functor sending a lie group to its lie algebra and the
functor sending a lie algebra to its connected, simply connected lie group
is a pair of adjoint functors!</p>
<p>In this sense, sending a lie group to its lie algebra is “forgetful”,
and there’s a unique “free” way to assign a lie group to a lie algebra!
Moreover, we have $RL\mathfrak{g} \cong \mathfrak{g}$. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>They all look like the action of $\mathfrak{sl}(2)$ on $S^n\mathbb{C}^2$,
the space of homogeneous degree $n$ polynomials in $2$ variables. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>It’s at this point that I’m obligated to mention one solution to this
problem (though not the first) comes via <a href="https://en.wikipedia.org/wiki/Mealy_machine">automata groups</a>, which I
did research in as an undergraduate.</p>
<p>For more, you might look into “Gupta-Sidki Groups”. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>Apparently “<a href="https://en.wikipedia.org/wiki/Almost_commutative_ring">almost commutative</a>” is a technical term! <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 02 Mar 2022 00:00:00 +0000
https://grossack.site/2022/03/02/why-care-about-lie-algebras.html
https://grossack.site/2022/03/02/why-care-about-lie-algebras.htmlTalk -- Let's Solve a Simple Analysis Problem. Together.<p>Last Friday I gave a talk in GSS where I tried to give a super concrete
application of topos theory to “mainstream” mathematics. The idea was to solve
a simple analysis problem, streamlining the argument by using the internal
logic of a sheaf topos. The <em>good</em> news is that I think I was quite successful
in making my point that “ordinary mathematicians” should care about topos
theory and constructive logic. The <em>bad</em> news is that the last 10 minutes of
my talk were false… It didn’t end up mattering, but I was still pretty torn
up about it. Anyways, in this post I’ll give an overview of the talk, which
should double as a nice description of how to actuallly <em>use</em> topos theory
to solve problems.
I’m planning to start a new series soon where I go over this
proof in more detail, explaining the relevant aspects of topos theory along
the way!</p>
<p>So then, what was the talk about?</p>
<p>Recall the <a href="https://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theorem#Weierstrass_approximation_theorem">Weierstrass Approximation Theorem</a>, which says that</p>
<div class="boxed">
<p>For every continuous $f : [0,1] \to \mathbb{R}$, for every $\epsilon > 0$,
there is a polynomial $p$ so that $\lVert f - p \rVert_\infty < \epsilon$.</p>
</div>
<p>Let’s look into a natural generalization of this. Say
\((f_\omega : [0,1] \to \mathbb{R})_{\omega \in X}\) is a
continuous family of continuous functions. That is,
$f : X \times [0,1] \to \mathbb{R}$ is continuous, and we think of
$f(\omega,-)$ as a function $f_\omega$.</p>
<p>If we fix an $\epsilon > 0$, then we know that each $f_\omega$ can be
approximated by some polynomial $p_\omega$. It’s natural to ask if the $p_\omega$
also vary continuously in $\omega$. That is, are the coefficients continuous
functions of $\omega$?</p>
<p>In case $f$ is “nice”, then the answer is obviously yes. After all, if we look
at $\sin(\omega t) : [0,2] \times [0,1] \to \mathbb{R}$, then if
$\epsilon = 0.025$ we see that the family</p>
\[p_\omega(t) = \omega t - \frac{\omega^3}{6} t^3 + \frac{\omega^5}{120} t^5\]
<p>has $\lVert f_\omega - p_\omega \rVert_\infty \lt \epsilon$ for every $\omega$,
and the coefficients are continuous in $\omega$.</p>
<p>In case $f$ is “not nice”, the situation is much murkier. For instance, say<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
\[f_\omega(t) = \sum_{n=0}^\infty \omega^n \cos(12^n \pi t)\]
<p>each of these are continuous, nowhere differentiable, and <em>highly</em> oscillatory.
With this in mind, a reasonable person might wonder if these $f_\omega$ are
too sensitive to initial conditions for us to have continuity of the polynomial
approximators<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>
<p>So, what to do? Notice that naively applying the theorem to each $f_\omega$
in no way guarantees continuity. We need a cleverer argument in order to treat
the $f_\omega$ uniformly (at least locally) in $\omega$.</p>
<p>This brings us to the, probably quite surprising, idea motivating the talk:</p>
<div class="boxed">
<p>Let’s solve this analysis problem with high powered category theory and constructive logic!</p>
</div>
<hr />
<p>I spent the next little while “defining” a <a href="https://en.wikipedia.org/wiki/Topos">topos</a> as a category which
“looks like $\mathsf{Set}$”. Since we know that we can encode all of mathematics
in set theory (formally, $\mathsf{ZFC}$) then we should be able to translate
that encoding to a topos too. I then drew an evocative picture on the board,
which I completely ripped off from Ingo Blechschmidt’s <a href="https://www.ingo-blechschmidt.eu/">website</a>. In this
post I’ll spare you my art, and just rip off the photo directly:</p>
<p style="text-align:center;">
<img src="/assets/images/talk-practical-topos-theory/external-internal.jpeg" width="50%" />
</p>
<p>(Except when I drew this on the board, I used “weierstrass approximating polynomial”
instead of “finitely generated module”)</p>
<p>Later in the talk I went into more detail about what exactly the
“complicated external statement” is. But before we could go into that,
we need to get an important caveat out of the way:</p>
<div class="boxed">
<p>If we want to do mathematics inside of a topos, it has to be
<span class="defn">Constructive</span></p>
</div>
<p>What do I mean by “constructive”? Well to start, it shouldn’t use the axiom of choice.
Indeed, even <em>very</em> weak principles like <a href="https://en.wikipedia.org/wiki/Axiom_of_countable_choice">countable choice</a> can fail<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.
It really doesn’t take too long (imo) to get used to working without choice<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.
But more importantly, we have to work without the <a href="https://en.wikipedia.org/wiki/Law_of_excluded_middle">Law of Excluded Middle</a>
(LEM), and this can take quite a bit of getting used to.</p>
<p>At this point in the talk I introduced the notion of a
<a href="https://en.wikipedia.org/wiki/Sheaf_(mathematics)">sheaf</a> on a topological space, as well as sheaf maps, and thus
the category $\mathsf{Sh}(X)$ of sheaves on $X$<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">5</a></sup>. Of course, the canonical
example of a sheaf on $X$ is the sheaf of continuous real valued functions
on $X$, and I went over this example in some detail.</p>
<p>Then I said
that the truth values $\mathsf{Sh}(X)$ are exactly the opens of $X$. I used
this to explain how LEM fails<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">6</a></sup> in this topos.
As you might expect, this is relevant because it means that we’ll need to know our proof of the
weierstrass approximation theorem is constructive if we want to interpret it
inside $\mathsf{Sh}(X)$.</p>
<p>At this point I talked about the <a href="https://en.wikipedia.org/wiki/Natural_numbers_object">natural numbers object</a> in $\mathsf{Sh}(X)$
(which assigns to each open set $U$ the set $\mathbb{N}$, with the
identity as the restriction maps. Then I showed how we can use this to build
$\mathbb{Z}$ and $\mathbb{Q}$ inside $\mathsf{Sh}(X)$ just like we do in
$\mathsf{Set}$. This was a multipurpose discussion, because it showed how we
can do mathematics in a topos just as we do with sets. It also let me introduce
the real numbers object $\mathbb{R}$ as the object you get by doing dedekind
cuts inside $\mathsf{Sh}(X)$<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">7</a></sup>, which I then externalized (without proof)
as the sheaf of continuous functions on $X$ from before<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>!</p>
<p>Next I introduced the notation for forcing, where we say that
$U \Vdash \varphi$ (read as “$U$ forces $\varphi$”) if $\varphi$ is true on
$U$. I told the audience I would formally define this later, but the important
“theorem” is called <span class="defn">Soundness</span>:</p>
<div class="boxed">
<p>If we can prove $\varphi$ without LEM or choice, then it is true inside
the topos $\mathsf{Sh}(X)$ in the sense that $X \Vdash \varphi$.</p>
</div>
<p>Here “theorem” is in scare quotes because I haven’t given anywhere near enough
details to make this precise. But the purpose of the talk was to get the point
across that we should care about constructive mathematics, and this idea of a
theorem was good enough for that purpose.</p>
<p>Now that we have the soundness “theorem” in hand, let’s see if the
weierstrass approximation theorem is even provable constructively! The
next step of the talk was formally writing down the theorem we’re proving:</p>
<div class="boxed">
<p>Thm (Weierstrass):</p>
<p>\(\forall f : C \big ( [0,1], \mathbb{R} \big ) . \
\forall \epsilon : \mathbb{R}_{> 0} . \
\exists p : \mathbb{R}[t] . \
\forall t : [0,1] . \
|f(t) - p(t)| \lt \epsilon\)</p>
</div>
<p>Now, it looks like there <em>is</em> a completely constructive proof of this theorem
(see chapter $4$ of Bridges’ <em>Constructive Functional Analysis</em><sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>), but I
wanted to show how easy it is to miss a usage of LEM, so I gave the proof
via <a href="https://en.wikipedia.org/wiki/Bernstein_polynomial">bernstein polynomials</a>, which you can find in the “elementary proof”
section of the wikipedia page.</p>
<p>In this proof we separate a sum into “good” and “bad” parts, which we
approximate separately. But knowing that each summand is either “good” or “bad”
requires LEM<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>.</p>
<p>At this point, I said that this proof <em>does</em> go through in
\(\mathsf{Sh}_{\lnot \lnot}(X)\), the topos of “double negation sheaves”,
which is equivalent to \(\mathsf{Sh}(X_{\lnot \lnot})\), the topos of sheaves
on the (typically pointfree) <a href="https://ncatlab.org/nlab/show/locale">locale</a> of regular opens in $X$.
I talked about how we actually go about externalizing statements internal to
a sheaf topos to get statements about sheaves, but I got muddled up in how
truth in \(\mathsf{Sh}(X_{\lnot \lnot})\) relates to truth in $\mathsf{Sh}(X)$.</p>
<p>Normally I would use this as a learning experience, and show you all what my
mistake was. But I was <em>so</em> muddled up I don’t think it would be worth it<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote" rel="footnote">11</a></sup>.</p>
<p>Next up was forcing semantics. If we want to know how to externalize
theorems proved by mathematicians inside the topos, we need to know how to
interpret logic inside as statements we can see from outside. In the talk
I spent some time going over these, but they’re available in lots of places
already (for instance, on page $316$ of Mac Lane and Moerdijk’s
<em>Sheaves in Geometry and Logic</em>), so I won’t go into it in this post.
I <em>will</em> show how to translate the weierstrass approximation theorem
step by step, though:</p>
\[X \Vdash
\forall f : C \big ( [0,1], \mathbb{R} \big ) . \
\forall \epsilon : \mathbb{R}_{> 0} . \
\exists p : \mathbb{R}[t] . \
\forall t : [0,1] . \
|f(t) - p(t)| \lt \epsilon\]
<p>we unravel the outer universal quantifier, and find<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote" rel="footnote">12</a></sup>:</p>
<p>For every $U$ open in $X$, for every $f : U \times [0,1] \to \mathbb{R}$,</p>
\[U \Vdash
\forall \epsilon : \mathbb{R}_{> 0} . \
\exists p : \mathbb{R}[t] . \
\forall t : [0,1] . \
|f(t) - p(t)| \lt \epsilon\]
<p>Unraveling this, we see:</p>
<p>For every $U_1$ open in $U$, for every positive $\epsilon : X \to \mathbb{R}$,</p>
\[U_1 \Vdash
\exists p : \mathbb{R}[t] . \
\forall t : [0,1] . \
|f(t) - p(t)| \lt \epsilon\]
<p>Then we see:</p>
<p>There is an open cover $V_\alpha$ of $U_1$, and polynomials $p_\alpha$ with
coefficients continuous on $V_\alpha$ so that, for each $V_\alpha$ we have</p>
\[V_\alpha \Vdash
\forall t : [0,1] . \
|f(t) - p(t)| \lt \epsilon\]
<p>That is, for every $V_{\alpha, 1}$ open in $V_\alpha$, and
every $t : V_{\alpha, 1} \to [0,1]$ we have</p>
\[V_{\alpha, 1} \Vdash
|f(t) - p(t)| \lt \epsilon\]
<p>That is, \(\lvert f(x,t(x)) - p_{\alpha}(x,t(x)) \rvert \lt \epsilon(x)\) for each $x \in V_{\alpha, 1}$.</p>
<p>Putting these pieces together, and choosing $U_1 = U = X$, as well as $V_{\alpha, 1} = V_\alpha$,
then letting $\epsilon$ and $t$ be constant functions, we find</p>
<div class="boxed">
<p>There is an open cover $V_\alpha$ of $X$ and polynomials $p_\alpha$ with
coefficients continuous on $V_\alpha$ so that, on each $V_\alpha$ we have</p>
\[| f(x,t) - p_\alpha(x,t) | \lt \epsilon\]
<p>for every $t \in [0,1]$.</p>
</div>
<p>which is exactly what we wanted!</p>
<hr />
<p>Now, this is true because we have Bridges’ genuinely constructive proof of
the weierstrass approximation theorem. The proof that I gave in the talk,
involving LEM doesn’t quite provide this. Instead, it shows that
there’s a dense open set $U$ of $X$, and an open cover $V_\alpha$ of $U$,
so that the rest holds.</p>
<p>I’m still thinking about the double negation modality, and a while ago I asked
<a href="https://math.stackexchange.com/questions/4378270/externalizing-a-concrete-application-of-double-negation-toposes">a question about this</a> on mse.</p>
<p>I’m planning to start a series on topos theory and how we can actually solve
problems with it, and this double negation stuff will definitely get its own
post.</p>
<p>In the meantime, I’ll leave you with the abstract for the talk. Take care
everyone, see you soon ^_^.</p>
<hr />
<p>Solving An Easy Analysis Problem: Together.</p>
<p>*asmr voice* Let’s all take a break. Unwind. And solve a nice friendly problem in elementary analysis.
Now that I’ve lulled you into a false sense of security, you should know that we’ll be solving this
(very concrete!) problem by using heavy duty machinery from category theory and logic. In particular,
we’ll be using the language of topos theory. In the process, we’ll see why people care about constructive
mathematics, how category theory can solve real problems, and whether topos theory really is as
scary as its reputation makes it out to be.
We assume no background in topos theory, or indeed anything but basic category theory.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Don’t worry about the $12$. It’s just there to make sure this family is
continuous and nowhere differentiable for $\omega \in (1/2, 1)$. See
<a href="https://en.wikipedia.org/wiki/Weierstrass_function">here</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>As an aside, <a href="https://www.desmos.com/calculator/jvax2h3tui">desmos</a> shows this doesn’t happen, but it’s super nonobvious
from just the definition of $f$. This is yet another reason that computers
can be invaluable for guessing whether a theorem should be true or false! <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>In fact, this is probably the biggest reason I care about AC at all! <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Maybe this is because I did my undergrad in a very logic-heavy department,
so lots of my classes were somewhat sensitive to choice anyways? <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>I gave this definition without saying the word “functor” or “natural transformation”
since I think it’s important for this talk to be accessible to people with
as little category theoretic background as possible. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>Since truth values are opens, if the truth value associated to $\varphi$
(denoted $[ ! [ \varphi ] ! ]$) is, say, $(0, \infty) \subseteq \mathbb{R}$
then $[ ! [ \lnot \varphi ] ! ]$ is <em>not</em> the complement of $[ ! [ \varphi ] ! ]$,
as you might expect. It’s the <em>interior</em> of that set.</p>
<p>So then</p>
\[[ \! [ \varphi \lor \lnot \varphi ] \! ] =
[ \! [ \varphi ] \! ] \cup [ \! [ \varphi ] \! ] =
(0, \infty) \cup (-\infty, 0) \neq \mathbb{R}\]
<p>and we see that $\varphi \lor \lnot \varphi$ is not true in $\mathsf{Sh}(X)$!</p>
<p>I know I’m glossing over a lot of details here, but only because I want to
write up a more detailed post in the nearish future. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>I elected to omit the fact that dedekind and cauchy reals are different in
the absence of LEM or CC. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>As an aside, I was quite pleased with this section of the talk! I thought
it flowed really well, since I gave this sheaf as my main example earlier,
and then built up to it as the real numbers object. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>Candidly, though. I haven’t read the whole book. I looked through that
chapter, but apparently Bridges uses some nonstandard definitions that
might be misleading… I haven’t actually checked to see how this squares
with the definitions we usually use in a topos. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>For a while I thought we could get around this since $\mathbb{Q}$ has
decidable equality, but we really can’t.</p>
<p>For instance, let $a$ be the “real number” which is given by the continuous
function $x \mapsto -|x|$. Then $a \geq 0$ is true only at $0$
(thus its truth value is $\emptyset$, the interior of ${0}$)
and $a \lt 0$ is true on $(-\infty,0) \cup (0,\infty)$.</p>
<p>So even comparing a real $x$ to a <em>natural</em> number $n$ doesn’t guarantee
$x \geq n \lor x \lt n$ is valid. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:12" role="doc-endnote">
<p>If you really want to have an idea, I that that semantics in
$\mathsf{Sh}(X_{\lnot \lnot})$ was “basically the same” as truth in
$\mathsf{Sh}(X)$, except we have to restrict attention to <a href="https://en.wikipedia.org/wiki/Regular_open_set">regular opens</a>.</p>
<p>Since $\mathbb{R}$ has a basis of regular opens, this really doesn’t change
anything at all, and it wasn’t sitting right with me, because it felt like
we could really just use double negation for free. I thought it was too
good to be true and it was ¯_(ツ)_/¯.</p>
<p>I’ll spend more time talking about double negation in a future post. <a href="#fnref:12" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:13" role="doc-endnote">
<p>using the fact that continuous maps $[0,1] \to \mathbb{R}$ inside
$\mathsf{Sh}(X)$ correspond to continuous maps $[0,1] \times X \to \mathbb{R}$
externally. See, for instance, Fourman’s <em>Sheaf Models for Analysis</em>,
on written page $286$. <a href="#fnref:13" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 16 Feb 2022 00:00:00 +0000
https://grossack.site/2022/02/16/talk-practical-topos-theory.html
https://grossack.site/2022/02/16/talk-practical-topos-theory.htmlTalk (?) -- Universal Enveloping Algebras<p>This barely counts as a talk, but I want it catalogued with the rest of the
talks I’ve given, because this is going to have a retrospective aspect to it
like all of my post-talk posts do. A while ago now, I gave a 30 minute
presentation in my Lie Algebras class where I answered two questions that I’d
brought up over the course of the class, with the unifying thread being this:
both questions are naturally answered by the <a href="https://en.wikipedia.org/wiki/Universal_enveloping_algebra">Universal Enveloping Algebra</a>.</p>
<p>As a brief aside, I gave a talk last week about a concrete application of
topos theory. I want to write up a post about it, but unfortunately I
screwed up the last ten minutes… I’ll say more about it when I get around
to posting it, but the moral of the story is that I don’t want to write that
post until I really understand what the last ten minutes <em>should</em> have been if
I’d done it correctly.</p>
<hr />
<p>So: what did I talk about in the presentation? Well, a lie algebra is a vector
space with bonus structure, and we talk about short exact sequences of
lie algebras. So it’s reasonable to wonder if the category of lie algebras
is <a href="https://en.wikipedia.org/wiki/Abelian_category">abelian</a>.</p>
<p>In hindsight the answer is obviously “no”, and I briefly mentioned why. It
comes down to the existence of monos which aren’t <a href="http://nlab-pages.s3.us-east-2.amazonaws.com/nlab/show/normal+monomorphism">normal</a>. In the category
of abelian groups (the prototypical abelian category) every subgroup is normal.
Rephrased categorically, this says that every monomorphism is the kernel of
some morphism. In an abelian category this condition (and its dual) must be
satisfied, but we know there is a distinction between ideals
(which we can quotient by) and subalgebras (which, in general, we can’t) of
a given lie algebra. So not every mono is normal, and the category of lie algebras
cannot be abelian.</p>
<p>This leads us to a related question: Is the category of $\mathfrak{g}$-reps
abelian for a lie algebra $\mathfrak{g}$? I was fairly sure the answer would be “yes”
asked this, but I wanted to bring it up in class because the class has been
entirely devoid of category theory, despite representation theory having a
fairly categorical reputation in my mind.</p>
<p>The answer, of course <em>is</em> yes, and the easiest way to see this (imo) is via
the universal enveloping algebra $U(\mathfrak{g})$. Before we go into that,
though, let’s briefly say what the other question I wanted to answer was.</p>
<p>We know that $\mathfrak{g}$ naturally acts on itself by “left multiplication”,
by which I mean $[g,-] : \mathfrak{g} \to \mathfrak{g}$. However this action
need not be faithful!</p>
<div class="boxed">
<p>If it’s not obvious, it’s a cute exercise to work out why this action need
not be faithful!</p>
</div>
<p>With this in mind, it’s natural to ask if there <em>is</em> an space which admits
a natural, faithful $\mathfrak{g}$ action. The answer, again, is yes, and the
vector space in question is the universal enveloping algebra<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>!</p>
<p>So then, at this point you should be sold on $U\mathfrak{g}$ being an interesting
object… but what exactly <em>is</em> it? I’ll start with a motivating categorical
approach, then (for people whole aren’t as well versed in the way of adjoint
functors), we’ll go over how one might build it by hand.
This follows the talk fairly closely<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, but it should go better here since
I expect a bit more categorical maturity from readers of my blog<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>
<hr />
<p>So then, what’s the goal of the universal enveloping algebra? We know that
matrix algebras are automatically lie algebras, when we interpret $[A,B]$
as $AB - BA$. More generally we can do this with the ring of endomorphisms
of your favorite vector space $\text{End}(V)$. More generally <em>still</em>, we
can do this for your favorite algebra over a field.</p>
<p>This is obviously functorial, and sends the ring $\text{End}(V)$ to the lie
algebra $\mathfrak{gl}(V)$, but I’m going to call this
$\mathsf{Lie} \left ( \text{End}(V) \right )$ instead, to emphasize the fact
that $\mathsf{Lie}$ is a functor from $\mathbb{F}\text{-alg}$ to $\mathbb{F}\text{-lie-alg}$.</p>
<p>Here, as usual, $\mathbb{F}$ is either $\mathbb{R}$ or $\mathbb{C}$. I’m not
sure what happens in positive characteristic (it seems to be subtle. See <a href="https://mathoverflow.net/questions/7112/which-is-the-correct-universal-enveloping-algebra-in-positive-characteristic">here</a>)
so I’m restricting attention to particularly nice fields, haha.</p>
<p>Now, recall a <span class="defn">Representation</span> of a lie algebra
$\mathfrak{g}$ is a homomorphism
\(\alpha : \mathfrak{g} \to \mathsf{Lie} \left ( \text{End}(V) \right )\).
These obviously assemble into a category whose objects are pairs $(V,\alpha)$
and whose morphisms are linear maps $V \to W$ which commute with the
$\mathfrak{g}$-action.</p>
<p>If this sounds like a category of modules, you’re right! That’s some good
justification for this being an abelian category, and it’s not super hard to
verify the axioms by hand… But wouldn’t it be nice if we didn’t have to?</p>
<p>Let’s say we had a left adjoint $U \dashv \mathsf{Lie}$. Then we would have</p>
\[\begin{prooftree}
\AxiomC{$\mathfrak{g}\text{-reps}$}
\UnaryInfC{$\alpha : \mathfrak{g} \to \mathsf{Lie} \left ( \text{End}(V) \right )$}
\UnaryInfC{$\tilde{\alpha} : U \mathfrak{g} \to \text{End}(V)$}
\UnaryInfC{$U \mathfrak{g}\text{-modules}$}
\end{prooftree}\]
<p>so, if we can find a left adjoint $U$ for $\mathsf{Lie}$,
the category of $\mathfrak{g}$-representations will be equivalent to the
category of $U\mathfrak{g}$-modules.</p>
<p>Now, let’s get all the abstract nonsense out of our system right now. Since
it’s clear that $\mathsf{Lie}$ preserves limits (indeed, it’s basically not
<em>doing</em> anything at all), we can apply the
<a href="https://ncatlab.org/nlab/show/adjoint+functor+theorem#statement">Special Adjoint Functor Theorem</a> to quickly see that $\mathsf{Lie}$
has a left adjoint. Unfortunately this tells us <em>nothing</em> about what $U$
looks like, so while <em>true</em>, this proof is not as useful as it could be.</p>
<p>There’s an art to finding left adjoints, where we try to figure out what
the “freest” way to build an object should be. Usually to introduce free
structure, you just add the syntax, and declare it <em>has</em> to satisfy all the
rules you want.</p>
<p>So for us, we want to add multiplication, to turn our lie algebra into a
traditional algebra. Following the hint I gave in the last paragraph, we might
look at the space of all sums of formal strings $x_1 x_2 \cdots x_n$, where
we multiply by concatenation. This construction already has a name, if we
recognize $x_1 x_2 \cdots x_n$ as an element of the $n$th tensor power of $\mathfrak{g}$.</p>
<p>That is, we look at</p>
\[\mathfrak{g}^{\otimes n} \triangleq
\underbrace{\mathfrak{g} \otimes \mathfrak{g} \otimes \cdots \otimes \mathfrak{g}}_{n \text{ times}}\]
<p>and we think of the element $x_1 \otimes x_2 \otimes x_3 \cdots \otimes x_n$ as
the product of the $x_i$.</p>
<p>So $\mathfrak{g}^{\otimes n}$ consists of the free $n$-fold products of elements
of $\mathfrak{g}$, and we want to allow <em>arbitrary</em> products. What’s the obvious
thing to do? We look at the <a href="https://en.wikipedia.org/wiki/Tensor_algebra">tensor algebra</a></p>
\[\bigoplus_{n \geq 0} \mathfrak{g}^{\otimes n}\]
<p>Now elements of this are exactly sums of products of elements of $\mathfrak{g}$!</p>
<p>There’s one piece of data we’re missing, though. We want $\mathsf{Lie}$ of this
algebra to be related to $\mathfrak{g}$ somehow.
We know $\mathfrak{g} \hookrightarrow \bigoplus_{n \geq 0} \mathfrak{g}^{\otimes n}$
since $\mathfrak{g} = \mathfrak{g}^{\otimes 1}$, so it’s reasonable to want
$[x_1, x_2]$ as computed in the tensor algebra to be the same as $[x_1, x_2]$
as computed in $\mathfrak{g}$.</p>
<p>How do we do that? Well, we just <em>force</em> it to be true! For every $x_1, x_2 \in \mathfrak{g}$
we add a rule telling us how to evaluate the syntax
$[x_1, x_2] = x_1 \otimes x_2 - x_2 \otimes x_1$. It should be $[x_1, x_2]_\mathfrak{g}$,
and so we define</p>
\[U \mathfrak{g} \triangleq
\frac{\bigoplus_{n \geq 0} \mathfrak{g}^{\otimes n}}{ x_1 \otimes x_2 - x_2 \otimes x_1 = [x_1, x_2]_\mathfrak{g}}\]
<div class="boxed">
<p>Check (using existing universal properties of quotients, sums, and tensor products)
that algebra homs out of $U \mathfrak{g}$ are in bijection with lie algebra homs
out of $\mathfrak{g}$.</p>
<p>That is, show that $U \dashv \mathsf{Lie}$.</p>
</div>
<p>Remember at the start of all this, we had two questions that $U \mathfrak{g}$
is supposed to help us answer.</p>
<p>We’ve already spent some time showing that the category of
$\mathfrak{g}$-reps is abelian. This is because the category of
$\mathfrak{g}$-reps is equivalent to the category of $U \mathfrak{g}$-modules,
and we know that module categories are always abelian.</p>
<p>But how does this provide us with a canonical faithful $\mathfrak{g}$-rep?
Well, I’ll leave it to you to check that the obvious map
$\mathfrak{g} \hookrightarrow \mathsf{Lie} U \mathfrak{g}$
(if you like, the <a href="https://ncatlab.org/nlab/show/unit+of+an+adjunction">unit</a> of the adjunction) really is an embedding. So
$\mathfrak{g}$ acts faithfully on $U \mathfrak{g}$.</p>
<p>For extra concreteness, what <em>is</em> this action? It’s exactly the left-multiplication
action! So</p>
\[g \cdot (x_1 \otimes x_2 \otimes \cdots \otimes x_n) \triangleq
g \otimes x_1 \otimes x_2 \otimes \cdots \otimes x_n\]
<hr />
<p>Let’s work this out the only example in lie theory I know, $\mathfrak{sl}_2(\mathbb{C})$<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>
<p>This lie algebra is three dimensional, with a basis</p>
\[x = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}
\quad
y = \begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix}
\quad
h = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}\]
<p>Moreover, the lie structure is generated by the equations</p>
<ul>
<li>$[x,y] = h$</li>
<li>$[h,x] = 2x$</li>
<li>$[h,y] = -2y$</li>
</ul>
<p>Now the tensor algebra is going to be all the formal products of $x$, $y$,
and $h$. That is, we get $\mathbb{C} \langle x,y,h \rangle$, the space of
(noncommutative!) polynomials in $x$, $y$, and $h$. The subspace of degree $1$
polynomials can be identified with $\mathfrak{sl}_2$, since the degree $1$
polynomials are linear combinations $x$, $y$, and $h$.</p>
<p>Now we have to <em>quotient</em> the tensor algebra by the generating equations for
the lie bracket. So we end up with</p>
\[U \mathfrak{sl}_2(\mathbb{C}) \triangleq
\frac{\mathbb{C}\langle x, y, h \rangle}{xy-yx = h \quad \quad hx-xh = 2x \quad \quad hy-yh = -2y}\]
<p>A fairly concrete object!</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>NB: $U\mathfrak{g}$ is <em>huge</em> in general, and if $\mathfrak{g}$ is finite
dimensional, one can also ask if there’s a natural, finite dimensional
faithful $\mathfrak{g}$-representation. Now it seems like the answer is
“no”.</p>
<p>A finite dimensional faithful $\mathfrak{g}$-representation always exists,
<em>but</em> it’s not natural (in the sense of category theory). This is called
<a href="https://en.wikipedia.org/wiki/Ado%27s_theorem">Ado’s Theorem</a>, and Terry Tao has a great blog post about it
<a href="https://terrytao.wordpress.com/2011/05/10/ados-theorem/">here</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>In hindsight, this was a
mistake. I knew my audience wasn’t particularly experienced with categorical
lingo, but I gave a talk that would have been good for me, were I an audience
member, rather than a talk that would be good for the audience I knew I had.</p>
<p>The particularly disappointing thing is that, if I’m being completely
honest with myself, I knew as I was writing the talk that I was probably
not writing the best talk for my audience, and I did it anyways. I know I
have high standards for my talks, and am liable to think a quite average
talk was a trainwreck, but at the very least I should have done the
concrete construction first… Oh well. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>There’s a part of me that likes to think I’m writing these posts for
“myself, roughly 2 years ago”, and by that metric the mention of
adjoint functors will be more clarifying than confusing. Unfortunately I
think the opposite was true in my presentation… <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>I’m joking, obviously… But not by much. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Mon, 14 Feb 2022 00:00:00 +0000
https://grossack.site/2022/02/14/uea-presentation.html
https://grossack.site/2022/02/14/uea-presentation.htmlA New (to me) Perspective on Jordan Canonical Form<p>Lie Algebras have been on my to-learn list for a fairly long time now, and
I’m finally taking a class focusing on them (and their representation theory).
<a href="https://en.wikipedia.org/wiki/Vyjayanthi_Chari">Our professor</a> is very concrete in how she presents things, and so this
class is doubling as a higher level review of some matrix algebra, which I’ve
been enjoying. In particular, last week we talked about the
<a href="https://en.wikipedia.org/wiki/Jordan_normal_form">Jordan Canonical Form</a> in a way that I quite liked, and which I’d never
seen before. I’m sure this will be familiar to plenty of people, but I want
to write up a thing about it anyways just in case!</p>
<p>Now then, what <em>is</em> JCF?</p>
<p>Not every matrix can be diagonalized, so
it’s natural to ask if there’s a “next best thing”. The answer is “yes”,
and it’s the JCF<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">1</a></sup>. When a matrix is diagonal it means that it acts just by
rescaling along some axes, given by the basis we use to diagonalize it.
The JCF lets us write <em>any</em> matrix<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">2</a></sup> as a diagonal matrix, plus a few $1$s
right above the diagonal. Stealing wikipedia’s example, we see that</p>
\[\begin{pmatrix}
5 & 4 & 2 & 1 \\
0 & 1 & -1 & -1 \\
-1 & -1 & 3 & 0 \\
1 & 1 & -1 & 2
\end{pmatrix}\]
<p>can be “almost diagonalized” as</p>
\[\begin{pmatrix}
1 & 0 & 0 & 0 \\
0 & 2 & 0 & 0 \\
0 & 0 & 4 & 1 \\
0 & 0 & 0 & 4
\end{pmatrix}\]
<p>Notice this is diagonal except for $1$s immediately above the diagonal.
Moreover, the $1$ lies “between” the repeated $4$s. This is typical, in the
sense that any matrix $M$ can be brought into the following form<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">3</a></sup>:</p>
<p style="text-align:center;">
<img src="/assets/images/jcf-lie-alg/jordan-blocks.png" width="50%" />
</p>
<p>where the $\lambda_k$ are eigenvalues of $M$, possibly with repetition.</p>
<p>This says, roughly, that we can decompose our space into disjoint subspaces.
Then $M$ acts on each subspace by a “scale and shift” action, where
we send \(\vec{v}_{k+1}\) to \(\lambda \vec{v}_{k+1} + \vec{v}_k\). The
top basis vector, \(\vec{v}_0\), just gets rescaled: \(M \vec{v}_0 = \lambda \vec{v}_0\).</p>
<p>Then a matrix is diagonalizable if and only if each of these subspaces is
one dimensional (do you see why?).</p>
<p>One reason to care about this is classification. If we pick some reasonable
convention for ordering the eigenvalues, then every matrix has a unique JCF,
and two matrices are similar if and only if they have the same JCF. This lets
us solve the similarity problem in polynomial time<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">4</a></sup>, which is quite nice.</p>
<p>Another reason is for computation. When working with matrices
(by hand or by computer) it’s nice to have a lot of $0$s hanging around,
because it makes computation easier. Moreover, we can make special use of
the form of a matrix in JCF in order to do computations we wouldn’t be able
to do <em>at all</em> in general. For instance, can you see what $M^2$ and $M^3$ must
look like for a matrix in JCF? Obviously this is hopeless for an arbitrary matrix.</p>
<p>These computational benefits help us solve real problems, for instance it’s
comparatively easy to compute <a href="https://en.wikipedia.org/wiki/Matrix_exponential">matrix exponentials</a> when $M$ is in JCF,
and this lets us solve coupled differential equations. See <a href="https://math.stackexchange.com/questions/2309707/whats-the-relationship-between-the-jordan-theory-and-odes">here</a>, say.</p>
<hr />
<p>Now for the new perspective:</p>
<p>The entire point of JCF is to find a basis which renders our favorite
linear transformation particularly easy to study.
So it never crossed my mind to look into basis-agnostic formulation of the
theorem. Of course, it’s almost <em>always</em> natural to ask if there’s a
version of a theorem that doesn’t rely on a choice of basis, and
(surprisingly, imo) there <em>is</em> actually such a formulation for the JCF!</p>
<p>So what’s the idea?</p>
<p>If we allow ourselves to use addition as well as multiplication, than the JCF
says that there’s a basis which makes $T$ look like the sum of a
diagonal matrix and a (particularly sparse) strictly upper triangular matrix.</p>
<p>In the example from before, we write</p>
\[\begin{pmatrix}
1 & 0 & 0 & 0 \\
0 & 2 & 0 & 0 \\
0 & 0 & 4 & 1 \\
0 & 0 & 0 & 4
\end{pmatrix}
=
\begin{pmatrix}
1 & 0 & 0 & 0 \\
0 & 2 & 0 & 0 \\
0 & 0 & 4 & 0 \\
0 & 0 & 0 & 4
\end{pmatrix}
+
\begin{pmatrix}
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 0 & 0 & 0
\end{pmatrix}\]
<p>Another way to say this is that every linear map on $\mathbb{C}^n$
decomposes as the sum of two commuting maps<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">5</a></sup>, one of which is <a href="https://en.wikipedia.org/wiki/Semisimple_operator">semisimple</a>
one of which is nilpotent:</p>
\[T = T_s + T_n\]
<p>It’s clear that the latter matrix is nilpotent
(after all, it’s strictly upper triangular)
and that nilpotent-ness is a condition which doesn’t rely on a choice of basis.</p>
<p>Some meditation shows that semisimplicity corresponds exactly to diagonalizability.
The following famous example should help guide your meditation. In this context
we should view it as a $2 \times 2$ Jordan block:</p>
\[\begin{pmatrix}
1 & 1 \\ 0 & 1
\end{pmatrix}\]
<p>Notice there is an invariant subspace which is not complemented. Namely
the space \(\left \{ \begin{pmatrix} a \\ 0 \end{pmatrix} \ \middle | \ a \in \mathbb{C} \right \}\).</p>
<p>Of course, semisimplicity is a much trickier notion to come up with compared
to nilpotency. But once you’ve seen it it’s clear that it’s <em>also</em>
basis-agnostic.</p>
<p>So the whole decomposition doesn’t depend on a choice of basis<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">6</a></sup>! This makes
it much more amenable to generalization, and in fact it <em>does</em> go through
in a wide variety of settings where we don’t directly have access to JCF<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">7</a></sup>.
See the <a href="https://en.wikipedia.org/wiki/Jordan%E2%80%93Chevalley_decomposition">Jordan-Chevalley Decomposition</a>.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:6" role="doc-endnote">
<p>When I was taking my first ever linear algebra class
(which, incidentally, was also my first proof based class) as
a freshman, our professor (<a href="https://www.cmu.edu/math/people/faculty/pisztora.html">Pisztora</a>) spent the last few
weeks giving a completely algorithmic way to get our hands on the JCF
of a matrix.</p>
<p>Later in life, I learned about the
<a href="https://en.wikipedia.org/wiki/Structure_theorem_for_finitely_generated_modules_over_a_principal_ideal_domain">fundamental theorem of fintely generated modules over PIDs</a>
(which <em>really</em> needs a snappier name). It gives us two canonical ways
to decompose a fg PID-module: the <span class="defn">Primary Decomposition</span>
and the <span class="defn">Invariant Factor Decomposition</span>.</p>
<p>Now if we view $\mathbb{C}^n$ as a $\mathbb{C}[x]$ module where
$x$ acts by some linear transformation $T$, this theorem tells us we can decompose
$\mathbb{C}^n$ into subspaces where $T$ acts in a particularly simple way.</p>
<p>The primary decomposition of $\mathbb{C}^n$ provides a basis on which $T$
attains its Jordan Canonical Form, and the invariant factor decomposition
provides a basis on which $T$ is in <a href="https://en.wikipedia.org/wiki/Frobenius_normal_form">Rational Canonical Form</a>
(which, incidentally, I like more. But that might be because it was a big
part of my undergraduate research).</p>
<p>I won’t say more about the existence proofs here, because that’s not really
the point of the post. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:1" role="doc-endnote">
<p>Over an algebraically closed field <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>This image, like the previous example, was stolen from wikipedia <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Where we gloss over some subtle details of the complexity of working
with exact complex numbers. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>It wasn’t immediately clear to me that these maps commute, but
(as is often the case) it’s pretty obvious in hindsight. If it’s
not clear to you, you should take some time and work through it!</p>
<p>As a hint, it suffices to check commutativity in each Jordan block.
But of course, within each block, the diagonal matrix is <em>particularly</em>
simple… <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>In fact, there’s a ~ bonus property ~ as well,
which says that $T_s$ and $T_n$ are actually <em>polynomials</em> in $T$!</p>
<p>I’m still too new to this to have a concrete application in mind, but
it’s an extremely cool fact which seems obviously useful on an intuitive
level. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>Though, as a quick exercise, you might wonder if this goes through for
linear maps on an <em>infinite dimensional</em> vector space.</p>
<p>I have a counterexample in mind, though I don’t actually have a formal
proof that it has no such decomposition.</p>
<!--
I'm thinking about the shift map $v_k \mapsto v_{k+1}$ on a space
with basis $\{ v_k \mid k \in \mathbb{N} \}$.
-->
<p><a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Tue, 01 Feb 2022 00:00:00 +0000
https://grossack.site/2022/02/01/jcf-lie-alg.html
https://grossack.site/2022/02/01/jcf-lie-alg.htmlAutomatic Asymptotics with Sage<p>Recurrence relations show up <em>all</em> the time in combinatorics and computer
science, and even simple objects give recurrences that are difficult or
impossible to solve. Thankfully, we <em>can</em> often find a <a href="https://en.wikipedia.org/wiki/Generating_function">generating function</a>
for our objects, and through the power of complex analysis and algebraic geometry,
we can use the singularities of the generating function in order to get good
<em>asymptotic estimates</em>. Software like <a href="https://en.wikipedia.org/wiki/Maple_(software)">Maple</a> can automatically compute
asymptotic expansions for a lot of generating functions using the
<a href="https://www.maplesoft.com/support/help/maple/view.aspx?path=asympt">asympt</a> function… Can <a href="https://sagemath.org">sage</a>?</p>
<p>The answer is “yes”, which is why I’m writing this post, but the longer answer
is “with some work”. Which is the <em>real</em> reason I’m writing this post.
I had to slog through a lot of kind of crummy documentation to get this working,
and I want to make sure that the next people looking into this have an easier
time.</p>
<p>There are two modules (that I can find) which provide features for computing
asymptotics:</p>
<ul>
<li><a href="https://doc.sagemath.org/html/en/reference/asymptotic/sage/rings/asymptotic/asymptotics_multivariate_generating_functions.html">multivariate generating functions</a></li>
<li><a href="https://doc.sagemath.org/html/en/reference/asymptotic/sage/rings/asymptotic/asymptotic_ring.html#introductory-examples">asymptotic rings</a></li>
</ul>
<p>These both have pros and cons:</p>
<p>The former is slightly easier to work with, and allows generating functions with
multiple variables. The downside is that it
only works for functions with polynomial denominators<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<p>The latter has nicer documentation, works for more general functions, and
gives more detailed asymptotic expansions. The downside is that it only works
for functions of a single variable, and it’s kind of annoying to use because
it doesn’t accept symbolic inputs natively. You also have to manually provide
a list of singularities<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>
<hr />
<p>Let’s start with the multivariate case, since it’s more complicated. I’ll give
a code dump, then explain what’s going on.</p>
<p>The tl;dr is that it works for functions of the form $h / p$ where $h$ is
a holomorphic function and $p$ is a polynomial. The number of variables is
irrelevant.</p>
<div class="no_eval">
<script type="text/x-sage">
from sage.rings.asymptotic.asymptotics_multivariate_generating_functions \
import FractionWithFactoredDenominatorRing
def multivar_asy(f, N=5, alpha=None, numeric=0):
"""
compute the first N terms of the asymptotic expansion of f
setting numeric = n will give floats with n digits of precision.
if
f = sum_{ns} F_{ns} x^{ns}
for ns = (n1,...,nk) a multi-index, then we compute an
expansion for F_{r alpha} for alpha = (a1 ... ak) a given
"direction" and r --> oo
By default, we assume alpha = [1,...,1], which reduces to what we
almost certainly want in the one variable case
"""
fn = f.numerator()
fd = f.denominator()
R_internal = PolynomialRing(QQ, fd.variables())
# FFPD is the ring of quotients p/q
# where p is in SR and q is in R_internal
# rather confusingly we put the denominator ring before the numerator ring
FFPD = FractionWithFactoredDenominatorRing(R_internal, SR)
# but when we make new element of the ring, we put things in the right order
# for some reason units in the denominator get clobbered, so we manually
# add it in the numerator (this is consistent with the examples in the docs)
fdFactored = R_internal(fd).factor()
f = FFPD(fn / fdFactored.unit(), fdFactored)
# now we choose a "direction" alpha
if alpha == None:
alpha = [1] * f.dimension()
decomp = f.asymptotic_decomposition(alpha)
result = 0
n = 0
for part in decomp:
n += 1
# this is brittle, but makes things work
if part == FFPD(0, []):
continue
# p is supposed to be a minimal critical point for the denominator of part.
# let's first find the critical points
# first we find the smooth points
I = part.smooth_critical_ideal(alpha)
smoothSols = solve([SR(v) for v in I.gens()],
[SR(v) for v in R_internal.gens()],
solution_dict=True)
# next we find the singular points
J = part.singular_ideal()
singSols = solve([SR(v) for v in J.gens()],
[SR(v) for v in R_internal.gens()],
solution_dict=True)
s = smoothSols + singSols
# remove any varieties of dimension > 0 from the space of solutions
# I don't know if this will break things or not, but in my (limited)!
# testing it seems fine.
# If I were less lazy I would probably make this take the minimum value
# across the whole variety? But doing it this way makes things agree with
# the examples.
sFiltered = []
for soln in s:
keep = True
for v in soln.values():
if not v.is_constant(): # remove any solutions involving a parameter
keep = False
if keep:
sFiltered += [soln]
# if we didn't find any solutions at all, give up.
if len(sFiltered) == 0:
if len(s) != 0:
print("We finally found something where removing the varieties caused problems")
return None
else:
print("no critical points were found. Giving up.")
return None
# otherwise we get the _minimal_ singularity
pMin = sFiltered[0]
for p in sFiltered:
if sum([xi^2 for xi in p.values()]) < sum([yi^2 for yi in pMin.values()]):
pMin = p
# and finally get the asymptotics
(a,_,_) = part.asymptotics(pMin, alpha, N, numerical=numeric)
result += a
return result
</script>
</div>
<p>This is pretty obviously cribbed from the examples <a href="https://doc.sagemath.org/html/en/reference/asymptotic/sage/rings/asymptotic/asymptotics_multivariate_generating_functions.html">here</a>, but the basic idea
is this:</p>
<p>We need the denominator of our generating function to be a factored polynomial,
so we build a polynomial ring with the variables in the denominator and factor
over that ring.</p>
<p>Then, we put this into <code class="language-plaintext highlighter-rouge">FFPD</code> to get an object that the module
knows how to deal with.</p>
<p>We choose a multi-index $\alpha$ which controls the “direction” in which
we take our asymptotics<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. For instance, if our generating function is
$f(x,y) = \sum_{m,n} F_{m,n} x^m y^n$, then</p>
<ul>
<li>$\alpha = (1,1)$ will give us the asymptotics of $F_{n \alpha} = F_{n,n}$ as $n \to \infty$</li>
<li>$\alpha = (2,1)$ will give us the asymptotics of $F_{n \alpha} = F_{2n,n}$ as $n \to \infty$</li>
<li>etc.</li>
</ul>
<p>If we view the $F_{m,n}$ as being located at the lattice points $(m,n)$ in the plane,
then $\alpha$ picks out the direction in which we move.</p>
<p>Next we look at the critical points of the denominator. That is, the points
where its gradient is either undefined or vanishing. The former is the
<em>singular</em> case, and the latter is the <em>smooth</em> case.</p>
<p>In the one variable case, we know the asymptotics are controlled by the
singularity closest to the origin. See chapter $5$ of
<a href="https://www2.math.upenn.edu/~wilf/gfology2.pdf"><em>Generatingfunctionology</em></a>, for instance.</p>
<p>From skimming these articles, we now have a whole <a href="https://en.wikipedia.org/wiki/Algebraic_variety">algebraic variety</a> of
singularities, and for <em>reasons</em> the asymptotics are now governed by the
critical points on this variety. In particular, we’re on the hunt for the
critical point which is closest to the origin (which we’ll call the
<span class="defn">Dominant Singularity</span>).</p>
<p>So now, what does the code do? Well we look for the location of the minimal
smooth point and the minimal singular point. Then take the smaller one and
run the asymptotic function provided by the module.</p>
<p>It seems to work quite well, and is surprisingly fast too. For a simple
example, we can try</p>
<div class="no_eval">
<script type="text/x-sage">
multivar_asy(z/(1-z-z^2))
</script>
</div>
<p>which outputs<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></p>
\[\frac{1}{5} \, \sqrt{5} \left(\frac{2}{\sqrt{5} - 1}\right)^{r}\]
<p>Of course, \(\frac{z}{1-z-z^2}\) is the generating function for the fibonacci
numbers, and we’ve successfully recovered the asymptotic formula
(though it requires some massaging to make it look like its usual presentation).</p>
<p>This really shines when it comes to multivariate series, though. For instance,
we know that</p>
\[\binom{n}{k} = [x^n y^k] (1 - x(1+y))^{-1}\]
<p>So if we’re interested in the asymptotics of $\binom{3n}{n}$ we can compute</p>
<div class="no_eval">
<script type="text/x-sage">
multivar_asy((1-x*(1+y))^(-1), alpha=[3,1], N=3)
</script>
</div>
<p>which quickly gives</p>
\[\frac{1}{41472} \, \left(\frac{27}{4}\right)^{r} {\left(\frac{10368 \, \sqrt{6} \sqrt{2}}{\sqrt{\pi} \sqrt{r}} - \frac{1008 \, \sqrt{6} \sqrt{2}}{\sqrt{\pi} r^{\frac{3}{2}}} + \frac{49 \, \sqrt{6} \sqrt{2}}{\sqrt{\pi} r^{\frac{5}{2}}}\right)}\]
<p>Now $\binom{15}{5} = 3003$ and evaluating the above at $r=5$ gives $3002.931$.</p>
<hr />
<p>But what if we’re interested in generating functions whose numerator isn’t
holomorphic, or whose denominator isn’t a polynomial? It’s clear that we
should be interested in such things, since the famous <a href="https://en.wikipedia.org/wiki/Catalan_number">catalan numbers</a>
already have the generating function</p>
\[\frac{1 - \sqrt{1 - 4z}}{2z}\]
<p>For cases like this (which must be of a single variable), we work with
the second asymptotics package <a href="https://doc.sagemath.org/html/en/reference/asymptotic/sage/rings/asymptotic/asymptotic_ring.html#introductory-examples">here</a>.</p>
<p>This is nice because it returns an asymptotic expansion, which is quite
intuitive to work with. The major downside, though, might be surprising:</p>
<p>It only works with (callable) python functions.</p>
<p>This is annoying<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>, but it really isn’t <em>that</em> much of a hassle to rewrite
your function using <code class="language-plaintext highlighter-rouge">def</code>. We also need to provide the dominant singularities
by hand, which is also a bit of work<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</p>
<p>If and when I get around to automatically finding singularities, I’ll update
this code, but for now there’s <em>much</em> less boilerplate here than in the
previous case:</p>
<div class="no_eval">
<script type="text/x-sage">
# interestingly, it seems like it doesn't matter at all
# what asymptotic ring you use.
AsyRing = AsymptoticRing('QQ^n * n^QQ * log(n)^QQ', QQ)
def singlevar_asy(f, N=5, sings=None):
"""
compute the first N terms of the asymptotic expansion of f
sings is a list of dominant singularities
TODO: get these automatically somehow?
"""
return AsyRing.coefficients_of_generating_function(f, sings, precision=N)
</script>
</div>
<p>The documentation for this function is also quite good, so I’ll stick to
one example. For the catalan numbers, we’ll have:</p>
<div class="no_eval">
<script type="text/x-sage">
def cat(z):
return (1 - sqrt(1-4*z))/(2*z)
</script>
</div>
<p>This looks like it has singularities at $0$ (which makes the denominator vanish)
and $1/4$ (which makes the $\sqrt{\cdot}$ vanish), but actually it only has the
singularity at $1/4$. The singularity at $0$ is removable. So we call</p>
<div class="no_eval">
<script type="text/x-sage">
singlevar_asy(cat, sings=[1/4])
</script>
</div>
<p>which outputs</p>
\[\frac{1}{\sqrt{\pi}} 4^{n} n^{-\frac{3}{2}} - \frac{9}{8 \, \sqrt{\pi}} 4^{n} n^{-\frac{5}{2}} + \frac{145}{128 \, \sqrt{\pi}} 4^{n} n^{-\frac{7}{2}} - \frac{1155}{1024 \, \sqrt{\pi}} 4^{n} n^{-\frac{9}{2}} + \frac{36939}{32768 \, \sqrt{\pi}} 4^{n} n^{-\frac{11}{2}} + O\!\left(4^{n} n^{-6}\right)\]
<p>A quick massage turns this into the (maybe not so) familiar</p>
\[\frac{4^n}{\sqrt{\pi}}
\left (
n^{-3/2} -
\frac{9}{8} n^{-5/2} +
\frac{145}{128} n^{-7/2} -
\frac{1155}{1024} n^{-9/2} +
\frac{36939}{32768} n^{-11/2}
\pm O \left ( n^{-13/2} \right )
\right )\]
<p>which you can find as formula (20) <a href="https://mathworld.wolfram.com/CatalanNumber.html">here</a>.</p>
<hr />
<p>Lastly, we can combine these different asymptotic approximations into one
function. In my <code class="language-plaintext highlighter-rouge">init.sage</code> file, which you can find <a href="https://github.com/HallaSurvivor/dotfiles/blob/master/init.sage">here</a>, I’ve defined
the function</p>
<div class="no_eval">
<script type="text/x-sage">
def asy(f, *args, **kwargs):
try:
return multivar_asy(f,*args,**kwargs)
except:
return singlevar_asy(f,*args,**kwargs)
</script>
</div>
<p>This is definitely too brittle to give to other people, but it works
fine for my purposes. One day I’d like to make it a bit more robust
(this is not how <code class="language-plaintext highlighter-rouge">try</code>/<code class="language-plaintext highlighter-rouge">except</code> statements are supposed to be used),
and ideally make it a bit more automatic (in particular following
the discussion of <code class="language-plaintext highlighter-rouge">singlevar_asy</code> above). Another option might be to try
and get it working with the <a href="https://doc.sagemath.org/html/en/reference/asymptotic/sage/rings/asymptotic/asymptotic_expansion_generators.html#sage.rings.asymptotic.asymptotic_expansion_generators.AsymptoticExpansionGenerators.SingularityAnalysis"><code class="language-plaintext highlighter-rouge">singularlity_analysis</code></a> function, which
would require a certain amount of parsing…
Anyways, as it stands it’s a plenty servicable substitute for maple’s
<code class="language-plaintext highlighter-rouge">asympt</code> function, and I’m glad I took the time to code it up.</p>
<p>If anyone has any other ideas for making this more usable, or encounters
any problems, definitely let me know!</p>
<p>If not, take care and stay warm!</p>
<p>I’ll see you all in the next one ^_^.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>You can read about these algorithms and their proofs</p>
<ul>
<li><a href="http://arxiv.org/abs/0803.2914">here</a> (for the smooth case)</li>
<li><a href="http://arxiv.org/abs/1009.5715">here</a> (for the singular case)</li>
</ul>
<p>There’s also Pemantle and Wilson’s
<a href="https://www.cambridge.org/core/books/analytic-combinatorics-in-several-variables/7FD6C5820465ECC25FBDF42236BFAEB2"><em>Analytic Combinatorics in Several Variables</em></a> or Melczer’s
<a href="https://melczer.ca/textbook/"><em>An Invitation to Analytic Combinatorics</em></a>, which both look
quite good. Unfortunately I haven’t had time to read either. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>I’m curious if I can automate the search for these singularities…
I feel like it shouldn’t be too hard, but I need to think about it. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>As an aside, I keep reading this as “asymptomatic” out of the corner of my
eye, which definitely tells you something about my current mental state… <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Sorry I’m not making these evaluate inline like I normally do. You’ll have
to trust me (or check yourself!) that the computation really is quite fast.</p>
<p>The issue is that I don’t want to copy/paste the definition
of <code class="language-plaintext highlighter-rouge">multivar_asy</code> into each cell. I think that clutters things up.</p>
<p>I could use <em>linked</em> sage-cells, but linked cells have to be evaluated
in the right order, and aren’t allowed to be auto-evaluated. I think that’s
a bit user-unfriendly, so I’ve decided to just show the inputs/outputs
(which is what really matters in any case). <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>And part of me wants to go in and try to fix it <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>This is something I really think should be automate-able, though, and
I’ll probably fight with it some more when I have the time. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Thu, 20 Jan 2022 00:00:00 +0000
https://grossack.site/2022/01/20/automatic-asymptotics.html
https://grossack.site/2022/01/20/automatic-asymptotics.htmlAn Explicit Example of the Proof of the Nullstellensatz<p>I’m in an algebraic geometry class right now, and a friend was struggling
conceptually with the proof of the strong nullstellensatz. I thought it might be
helpful to see a concrete example of the idea, since the proof is actually quite
constructive! Which brings us to this post:</p>
<p>Formally, we’re going to assume the weak nullstellensatz, and use it to show
the strong nullstellensatz. That is, we’ll assume</p>
<div class="boxed">
<p><span class="defn">The Weak Nullstellensatz</span></p>
<p>$V(\mathfrak{a}) = \emptyset$ if and only if $\mathfrak{a} = (1)$</p>
<p>Purely algebraically, this says:</p>
<p>“The only way $f_1, f_2, \ldots, f_r$ can fail to have a common zero is if
$(f_1, f_2, f_3, \ldots, f_r) = (1)$.”</p>
</div>
<p>and we’ll show</p>
<div class="boxed">
<p><span class="defn">The Strong Nullstellensatz</span></p>
<p>$I(V(\mathfrak{a})) = \sqrt{\mathfrak{a}}$</p>
<p>Again, purely algebraically, this says:</p>
<p>“The only way $g, f_1, f_2, \ldots, f_r$ can all be $0$ simultaneously is
if (for some $n$) $g^n$ is a linear combination of the $f_i$.”</p>
</div>
<p>Both of these are theorems of the form “the obvious issue is the only one”.
Obviously if
$1 = p_1 f_1 + p_2 f_2 + \ldots + p_r f_r$, then the $f_i$ cannot all be $0$
simultanously. Indeed, if \(f_i(x^*) = 0\) for all $i$, then evaluating both
sides of the above at \(x^*\) gives $1 = 0$, which is a problem. The weak
nullstellensatz says that this is the <em>only</em> reason a family of polynomials
won’t have a common root.</p>
<p>Similarly, it’s obvious that if $g^n = p_1 f_1 + \ldots + p_r f_r$, then
at any common zero of the $f_i$, $g = 0$ too. Again, if we evaluate both
sides at $x^*$ we find \(g(x^*)^n = 0\), and so \(g(x^*) = 0\) too
(since a field has no nontrivial nilpotents). The strong nullstellensatz says
that this is the <em>only</em> way for $g$ to share a zero with the $f_i$.</p>
<hr />
<p>Now, it turns out the weak nullstellensatz has computational content. That
is, if $f_1, \ldots, f_r$ <em>don’t</em> have a common zero, there’s a computer
program<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> that will actually <em>find</em> the $p_i$ so that
$1 = p_1 f_1 + \ldots + p_r f_r$.</p>
<p>For instance, let’s take a simple example:</p>
\[f_1 = xy - 1 \quad \quad f_2 = x+y \quad \quad f_3 = xy^3 \quad \quad f_4 = yx^3\]
<p>First, let’s check that these polynomials really don’t have any points in common:</p>
<div class="auto">
<script type="text/x-sage">
# make a polynomial ring with generators x,y
# over QQbar, the algebraic closure of QQ.
R.<x,y> = QQbar[]
f1, f2, f3, f4 = x*y - 1, x+y, x*y^3, y*x^3
I = ideal(f1,f2,f3,f4)
I.variety() # should print out [], the empty set
</script>
</div>
<p>Next we can see that they generate the ideal $(1)$.</p>
<div class="auto">
<script type="text/x-sage">
R.<x,y> = QQbar[]
f1, f2, f3, f4 = x*y - 1, x+y, x*y^3, y*x^3
I = ideal(f1,f2,f3,f4)
I == ideal(1)
</script>
</div>
<p>Of course, this means we should be able to write $1$ as a linear combination
of the $f_i$:</p>
<div class="auto">
<script type="text/x-sage">
R.<x,y> = QQbar[]
f1, f2, f3, f4 = x*y - 1, x+y, x*y^3, y*x^3
I = ideal(f1,f2,f3,f4)
R(1).lift(I) # prints [y^2 - 1, y, -1, 0]
</script>
</div>
<p>and indeed, these are the coefficients to get $1$</p>
<div class="auto">
<script type="text/x-sage">
R.<x,y> = QQbar[]
f1, f2, f3, f4 = x*y - 1, x+y, x*y^3, y*x^3
# should give 1
(y^2 - 1) * f1 + y * f2 + (-1) * f3 + 0 * f4
</script>
</div>
<hr />
<p>So now what about the strong nullstellensatz?</p>
<p>Let’s take $g = yx + x + 1$, which vanishes at
every point of the variety defined by $x^2 + 2x + 1$ and $y$
(do you see why?).</p>
<p>Then we expect $g^n \in (x^2 + 2x + 1, y)$ for some $n$, and we’ll
get there by the <a href="https://en.wikipedia.org/wiki/Rabinowitsch_trick">Rabinowitsch trick</a>:</p>
<p>We’ll add a variable $z$ to the mix, and notice that
$x^2 + 2x + 1$, $y$, and $(yx + x + 1)z - 1$ don’t have any
common zeroes.</p>
<p>Indeed, if $x^2 + 2x + 1$ or $y$ is nonzero at some point, then we’re done.
But if they’re both zero, then we know $g = yx + x + 1$ is zero as well. Then
$(yx + x + 1)z - 1 = gz - 1 = 0z - 1 = -1 \neq 0$.</p>
<p>But then, by the weak nullstellensatz, that means these three polynomials
must generate the ideal $(1)$ in $k[x,y,z]$!</p>
<p>Indeed,</p>
<div class="auto">
<script type="text/x-sage">
R.<x,y,z> = QQbar[]
f1, f2 = x^2 + 2*x + 1, y
g = y*x + x + 1
p1,p2,p3 = var('p_1, p_2, p_3')
coeffs = R(1).lift(ideal(f1,f2,g*z-1))
show(p1 == coeffs[0])
show(p2 == coeffs[1])
show(p3 == coeffs[2])
</script>
</div>
<p>So we know that</p>
\[1 = p_1 f_1 + p_2 f_2 + p_3 (gz - 1)\]
<p>or</p>
\[1 = z^2 (x^2 + 2x + 1) + (x^2 z^2 + xz^2 + xz) y + (-xz - z - 1)(gz - 1)\]
<p>Now for the slick trick! We’re working in an ideal containing $zg - 1$,
which means that $z = \frac{1}{g}$ in all of our computations<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>! So let’s
take this expression and plug in $z = \frac{1}{g}$ to get</p>
\[1 =
\frac{1}{g^2} f_1 +
\left ( \frac{x^2}{g^2} + \frac{x}{g^2} + \frac{x}{g} \right ) f_2 +
\left ( -\frac{x}{g} - \frac{1}{g} - 1 \right ) 0\]
<p>Of course, we can clear the denominators by multiplying through by $g^2$ to see</p>
\[g^2 = f_1 + (x^2 + x + xg) f_2 \in (f_1, f_2)\]
<p>So we found that, for some $n$, $g^n \in (f_1, f_2)$. As desired.</p>
<hr />
<p>It turns out that this is <em>exactly</em> how the proof goes in general!</p>
<p>Say you give me polynomials $f_1, \ldots, f_r, g \in k[x_1, \ldots, x_m]$
so that $g$ vanishes whenever all the $f_i$ do.</p>
<p>Then we look at the ideal (in $k[x_1, \ldots x_m, z]$)</p>
\[(f_1, \ldots, f_r, zg - 1)\]
<p>which must equal $(1)$ by the weak nullstellensatz.</p>
<p>Then a computation, which sage will happily do for us, gives us polynomials
$p_1, \ldots, p_{r+1} \in k[x_1, \ldots, x_m, z]$ so that</p>
\[1 = p_1 f_1 + \ldots + p_r f_r + p_{r+1} (zg - 1)\]
<p>Then we plug in $\frac{1}{g}$ for $z$ to get a new expression</p>
\[1 =
p_1 \left ( \vec{x}, \frac{1}{g} \right ) f_1(\vec{x})
+ \ldots +
p_r \left ( \vec{x}, \frac{1}{g} \right ) f_r(\vec{x})\]
<p>This is a polynomial with $g$s in the denominator. So we multiply both sides
by some $g^n$ to clear denominators, and we find</p>
\[g^n = g^n p_1( \vec{x}, 1 ) f_1 + \ldots + g^n p_r( \vec{x}, 1 ) f_r\]
<p>Notice that the $p_i(\vec{x}, 1)$ are polynomials in $x_1, \ldots, x_m$,
since we’ve plugged in $1$ for $z$ everywhere. This means we’ve shown
$g^n$ is a linear combination of the $f_i$, so $g^n \in (f_1, \ldots, f_r)$,
as desired.</p>
<hr />
<p>Another quick post today! Hopefully other people find this helpful too ^_^.</p>
<p>I do have a few bigger ones in the pipeline, but I won’t say exactly what.
I’ve learned that saying what post you’re planning to write next is a
guaranteed way to not actually write it, haha.</p>
<p>See you soon!</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>If you’re interested in this, you’ll want to read about
<a href="https://en.wikipedia.org/wiki/Gr%C3%B6bner_basis">gröbner bases</a>. The actual algorithm for computing with
these is <a href="https://en.wikipedia.org/wiki/Buchberger%27s_algorithm">buchberger’s algorithm</a>.</p>
<p>I really liked Adams and Loustaunau’s
<em>An Introduction to Gröbner Bases</em>, which is a very polite
introduction. I’ve heard great things about Cox, Little, and O’Shea’s
<em>Ideals, Varieties, and Algorithms: An Introduction to
Computational Algebraic Geometry and Commutative Algebra</em>, though I
haven’t gotten around to reading it myself. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>There’s a lot to be said about precisely why this trick works. It’s
really because we’re looking at the homomorphism</p>
\[k[x,y,z] \to k(x,y)\]
<p>sending $x \mapsto x$, $y \mapsto y$, and $z \mapsto \frac{1}{g}$.</p>
<p>This is quickly seen to be injective, so it preserves and reflects truth.
We solve our problem in $k(x,y)$, but recover a formula of polynomials
in $x$ and $y$ that gets reflected back to $k[x,y]$ under this embedding.</p>
<p>For more information about this technique of “permanence of identities”,
you can see <a href="/2021/05/05/initial-polynomial-proofs.html">this</a> blog post of mine. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Tue, 11 Jan 2022 00:00:00 +0000
https://grossack.site/2022/01/11/explicit-nullstellensatz.html
https://grossack.site/2022/01/11/explicit-nullstellensatz.html