Skip to main content

Hilbert Spaces

15 Sep 2021 - Tags: analysis-qual-prep

Hilbert Spaces are banach spaces whose norms come from an inner product. This is fantastic, because inner product spaces are a very minimal amount of structure for the amount of geometry they buy us. Beacuse of the new geoemtric structure, many of the pathologies of banach spaces are absent from the theory of hilbert spaces, and the rigidity of the category of hilbert spaces (there is a complete cardinal invariant describing hilbert spaces up to isometric isomorphism) makes it extremely easy to make an abstract hilbert space concrete. Moreover, this “concretization” is exactly the genesis of the fourier transform!

With that introduction out of the way, let’s get to it!


First, a word on conventions. I’ll be working with the inner product $\langle x,y \rangle$ which is linear in the second slot, and conjugate linear in the first. This is often called the “physicist” notation, but I have a reason for it:

Riesz Representation Theorem

If $\mathcal{H}$ is a hilbert space, then the map

\[x \mapsto \langle x, \cdot \rangle : \mathcal{H} \to \mathcal{H}^*\]

is a conjugate linear isometry $\mathcal{H} \cong \mathcal{H}^*$.

In particular, every linear functional on a hilbert space is of the form $\langle x, \cdot \rangle$ for some $x$1.

Since we write function application on the left2, it makes sense for the associated functional $\langle x, \cdot \rangle$ to also be “on the left”. With the standard “mathematician” convention, we instead have the linear map $\langle \cdot, y \rangle$ which acts on $x$ “on the right” by $x \mapsto \langle x, y \rangle$.

Obviously this doesn’t matter at all, but I feel the need to draw attention to it3.

There are two key examples of hilbert spaces:

In fact, as we will see, every hilbert space is isometrically isomorphic to some $\ell^2(A)$, and just as we can distinguish vector spaces by the cardinality of their basis, we can distinguish hilbert spaces by the cardinality of their “hilbert basis”.

First, though, why should we care about inner products? How much extra structure does it really buy us? The answer is: lots!

As is so often the case in mathematics, theorems in a concrete setting become definitions in a more general setting, and once we do this much of the intuition for the concrete setting can be carried over. For us, then, let’s see what we can do with inner products.


First, it’s worth remembering that an inner product defines a norm

\[\lVert f \rVert \triangleq \sqrt{\langle f, f \rangle}\]

so every inner product preserving function also preserves the norm.

It turns out we can go the other way as well, and the polarization identity lets us write the inner product in terms of the norm4. This is fantastic, as it means any norm preserving function automatically preserves the inner product!

We can also define the angle between to vectors by

\[\cos \theta \triangleq \frac{\langle f, g \rangle}{\lVert f \rVert \lVert g \rVert}\]

and when $\theta = \pm \frac{\pi}{2}$ (that is, when $\langle f, g \rangle = 0$) we say that $f$ and $g$ are orthogonal, written $f \perp g$.

Of course, once we have orthogonality, we have a famous theorem from antiquity:

The Pythagorean Theorem

If $f_1, \ldots, f_n$ are pairwise orthogonal, then

\(\lVert \sum f_k \rVert^2 = \sum \lVert f_k \rVert^2\)

As a quick exercise, you should prove the Law of Cosines:

For any $f$ and $g$, if the angle between $f$ and $g$ is $\theta$, then:

\(\lVert f+g \rVert^2 = \lVert f \rVert^2 + \lVert g \rVert^2 - 2 \lVert f \rVert \lVert g \rVert \cos \theta\)

Once we have orthogonality, we also have the notion of orthogonal complements. You might remember from the finite dimensional setting that there’s (a priori) no distinguished complement to a subspace. For instance, any two distinct $1$-dimensional subspaces are complements in $\mathbb{R}^2$. But if we decomposed $\mathbb{R}^2$ as a direct sum of the subsaces shown below, we would likely feel there’s a “better choice” of complement for the orange subspace:

The blue subspace is one of many complements of the orange, so why should we choose it over anything else? Instead, we can use the inner product (in particular the notion of orthogonality) to choose a canonical complement:

Why is this complement canonical? Because it is the unique complement so that every blue vector is orthogonal to every orange vector.

In the banach space setting (where we don’t have access to an inner product) recall there are subspaces which have no complement. In the hilbert space setting this problem vanishes – the orthogonal complement always exists, and is a complement5!

If $U$ is a subspace of $\mathcal{H}$, the orthogonal complement of $U$ is

\[U^\perp \triangleq \{ x \mid \forall u \in U . \langle u, x \rangle = 0 \}.\]

$U^\perp$ is always a closed subspace of $\mathcal{H}$, and moreover

\[(U^\perp)^\perp = \overline{U}\]

so if $U$ is a closed subspace of $\mathcal{H}$, then $(U^\perp)^\perp = U$ and $\mathcal{H} = U \oplus U^\perp$.

You might also remember from the finite case that we can find particuarly nice bases for inner product spaces (called orthonormal bases). It’s then natural to ask if there’s an analytic extension of this concept to the hilbert space setting.

The answer, of course, is “yes”:

If \(\{u_\alpha\} \subseteq \mathcal{H}\) is orthonormal (that is, pairwise orthogonal and of norm $1$), the following are equivalent:

  1. (Completeness) If $\langle u_\alpha, x \rangle = 0$ for every $\alpha$, then $x = 0$

  2. (Parseval’s Identity) $\lVert x \rVert^2 = \sum_\alpha \lvert \langle u_\alpha, x \rangle \rvert^2$

  3. (Density) $x = \sum_\alpha \langle u_\alpha, x \rangle u_\alpha$, and the sum converges no matter how the terms are ordered6

If any (and thus all) of the above are satisfied, we say that \(\{ u_\alpha \}\) is a Hilbert Basis for $\mathcal{H}$.

Moreover, every hilbert space admits a hilbert basis!

These conditions are very foundational to hilbert space theory, and are worth remembering. The way that I like to remember is

Let \(\{u_\alpha\}_{\alpha \in A}\) be a hilbert basis for $\mathcal{H}$.

The map $x \mapsto (\langle u_\alpha, x \rangle)_{\alpha \in A}$ is a unitary map witnessing the (isometric!) isomorphism

\[\mathcal{H} \cong \ell^2(A).\]

Here Parseval’s Identity is tells us that this map is isometric, and Density tells us that the obvious inverse map $(c_\alpha) \mapsto \sum c_\alpha u_\alpha$ really is an inverse.

In fact, one can show that the size of a hilbert basis is a complete invariant for hilbert spaces. That is, any two hilbert bases for $\mathcal{H}$ have the same cardinality (often called the Hilbert Dimension), and $\mathcal{H}_1 \cong \mathcal{H}_2$ if and only if they have the same hilbert dimension7.


That’s a lot of information about hilbert spaces in the abstract. But why should we care about any of this? Let’s see how to solve some problems using this machinery!

Let’s work with $L^2(S^1)$, the hilbert space of square-integrable functions on the unit circle. Classically, we would extend these to be periodic functions on $\mathbb{R}$, but that turns out to be the wrong point of view8.

A hilbert space is separable if and only if it has a countable hilbert basis. This is nice since most hilbert spaces arising in practice (including $L^2(S^1)$) are separable, so we can work with a countable sum (even though the theorem is true in more generality). For us, we note that \(\{ e^{inx} \}_{n \in \mathbb{Z}}\) is a hilbert basis for $L^2(S^1)$, sj we would expect periodic functions to decompose into an infinite sum of these basic functions.

Historically, it was a very important problem to understand the convergence of “fourier series”. That is, if we define

\[\hat{f}(n) \triangleq \int e^{-inx} f\]

(which we, with the benefit of hindsight, recognize as $\langle e^{inx}, f \rangle$)

when is it the case that we can recover $f$ as the sum $\displaystyle \sum_{-\infty}^\infty \hat{f}(n) e^{inx}$? That is, as the $n \to \infty$ limit of the partial fourier series

\[S_n f \triangleq \sum_{-n}^n \hat{f}(n) e^{inx}\]

In particular, for “nice” functions $f$, is it the case that $\displaystyle \lim_{n \to \infty} S_n f(x) \to f(x)$ pointwise? If not, is there some sense in which $S_n f \to f$?

This problem was fundamental for hundreds of years, with Fourier publishing his treatise on the theory of heat in $1822$, and there are textbooks written as recently as $1957$ which says it is unknown whether the fourier coefficients of a continuous function has to converge at even one point! See the historical survey here for more information, as well as the full course notes here by the same author.

The language of $L^p$ and Hilbert spaces wasn’t developed until the $1910$s, and they are integral (pun intended) in phrasing the solution of the fourier series convergence problem (Carleson’s Theorem).

Carleson’s theorem is famously hard to prove, be we can get a partial solution for free using the theory of hilbert spaces!

For any $L^2$ function $f$, we have

\(S_n f \overset{L^2}{\longrightarrow} f\)

This is exactly the “density” part of the equivalence above! With some work, one can show that $S_n f \to f$ in the $L^p$ norm for any $p \neq 1, \infty$9.

Hilbert spaces are also useful in the world of ergodic theory. Say we have a function $T$ from some space $X$ to itself, which we should consider as describing how $X$ evolves over one time-step. One might expect that if we average the position of a point $x$ over time, we should converge on a fixed point of the transformation10.

A result in hilbert space theory tells us we aren’t off base!

Von Neumann Ergodic Theorem

If $U : \mathcal{H} \to \mathcal{H}$ is a unitary operator on a separable hilbert space, then for every $x \in \mathcal{H}$ we have

\[\lim_{n \to \infty} \frac{1}{n} \sum_{k=0}^{n-1} U^k x = \pi(x)\]

where $\pi$ is the orthogonal projection onto the subspace of $U$-fixed points \(\{ x \mid Ux = x \}\).

This gives us the Mean Ergodic Theorem as a corollary!

If $T : X \to X$ is measure preserving and $f \in L^2(X)$, then

\[\frac{1}{n} \sum_{k=0}^{n-1} T^n f \to \pi(f)\]

where $\pi$ is (orthogonal) projection onto the $T$-invariant functions.


Alright! We’ve seen some of the foundational results in hilbert space theory, and it’s worth remembering our techniques from the banach space world still apply. Hilbert spaces are very common in analysis, with application in PDEs, Ergodic Theory, Fourier Theory and more. The ability to basically do algebra as we would expect, and leverage our geometric intuition, is extremely useful in practice.

Next time, we’ll give a quick tour of applications of the Baire Category Theorem, and then it’s on to the Fourier Transform on $\mathbb{R}$!

The qual is a week from today, but I’m starting to feel better about the material. This has been super useful in organizing my thoughts, and if I’m lucky, you all might find them helpful as well.

If you have been, thanks for reading! And I’ll see you next time ^_^.


  1. The proof is pretty easy once we have a bit more machinery.

    Take a functional $f$. If $f = 0$, then $x=0$ works. Otherwise, $\text{Ker}(f)$ is a proper closed subspace of $\mathcal{H}$, and has a nontrivial orthogonal complement. If we take some $x \neq 0$ in the orthogonal complement, then $\overline{fx}x$ works (as you should verify). 

  2. Though we really shouldn’t, it seems to be unchangeable at this point.

    I tried switching over a few years ago, but it made communication terribly confusing. Monomorphisms, for instance, cancel on the left with the usual notation, but cancel on the right with the other notation. I tried to get around this by remembering monos “cancel after” and epis “cancel before”, but I got horribly muddled up anytime I tried to talk with another mathematician.

    Teaching and communication are extremely important to me, so I sacrificed my morals and went back to functions on the left. 

  3. And to evangelize. Until we switch over to functions on the right, this really is the correct convention.

    So I suppose the “mathematician” convention is correct, but inconsistent with how we (incorrectly) write function application on the left, while the “physicist” convention is incorrect, but consistent with the rest of our (incorrect) notation… What a world to live in :P.

    If you happen to have a convincing argument for using the other convention, though, I would love to hear it! 

  4. It’s worth taking a moment to ask yourself why we can’t use polarization to turn every normed space into an inner product space. The answer has to do with the parallelogram law

  5. In fact, I mentioned this last time as well, but this feature characterizes hilbert spaces! If every subspace of a given banach space is complemented, then that banach space is actually a hilbert space! 

  6. This does not mean the terms are absolutely summable! 

  7. This isomorphism is in the category of hilbert spaces and unitary maps, so it is automatically an isometry. 

  8. The natural domain of a periodic function really is $S^1$, and the mathematics reflects this.

    There is a notion of fourier transform on arbitrary (locally compact) abelian groups. This “pontryagin duality” swaps “compactness” and “discreteness” (amongst other things) and we see this already. $S^1$ is compact and $\mathbb{Z}$ is discrete, and they are pontryagin duals of each other.

    Abstract harmonic analysis (the branch of math that studies this duality theory) seems really interesting, and I want to learn more about it when I have the time. It seems to have a lot of connections to representation theory, which is also on my to-learn list. 

  9. The case of $L^1$ and $L^\infty$ are both interesting, and the disproofs proceed by the uniform boundedness principle!

    If $S_N f \to f$ in $L^1$ or $L^\infty$, we would know the maps $S_N$ are bounded pointwise. But then by banach space-ness we have the uniform boundedness principle, and we would know that the $S_N$s would need to be uniformly bounded. But we can find functions so that $S_N f$ have arbitrarily high $L^1$ and $L^\infty$ norm, giving us the contradiction. 

  10. This idea of averaging to land on a fixed point is both common and powerful. It is the key idea behind Maschke’s Theorem and many other results. I’ve actually been meaning to write a blog post on these kinds of averaging arguments, but I haven’t gotten around to it yet…