My advisor just solved Kakeya in three dimensions, here is why that matters

by greenturtle3141, Mar 14, 2025, 8:14 PM

Reading Difficulty: anywhere between 2 and 5 out of 5

On March 10th, 3:30 pm, Room 1302 of the Courant Institute of Mathematics was at its liveliest in years. With every seat filled, the unfortunate late-comers were doomed to sitting on tables or simply standing along the edges of the lecture room. There was barely any room to breathe, but the air was no less vibrant with excitement and anticipation, with students and professors alike straining their necks in search of the seemingly absent speaker. The speaker's absence was no mystery, however --- the talk was not to start for another 15 long minutes.

As the room continued to fill up, it became increasingly clear to me that this would be no ordinary mathematical colloquium. After all, in a stunning 127-page paper, Hong Wang had recently proven the Kakeya Conjecture in $\mathbb{R}^3$, a notorious problem in the field of geometric measure theory. The proof took the mathematical world by storm, with reknowned mathematicians in the field hailing the paper as a major breakthrough. With such an achievement that few could ever hope to claim, it is no surprise that a horde of faculty and PhD students from a wide spectrum of mathematical subfields, from fluid dynamics to algebraic geometry to stochastic calculus, were here to witness history in the making. By 3:40 pm, the lecture room's occupancy was undoubtedly violating several federal fire codes, and at last, not too long after, Hong's eagerly-awaited appearance was greeted with a thunderous applause from an audience of no less than 100.
https://i.ibb.co/ZzvCsKN3/hongtalk.jpg

I couldn't find the fire codes for the building online, but I'm quite confident that this room was not meant to house more than 50 people.

What is the Kakeya Conjecture?

Take a needle of length 1. If I want to spin it, one way to do that is to spin it around its center $360^\circ$. The area it sweeps out would thus be a circle of diameter $1$.
[asy]
size(3cm);
fill(circle((0,0),1/2), p=gray(0.9));
draw(circle((0,0),1/2), p=linewidth(1.5));
[/asy]

If you're feeling feisty, you might want to try to spin it in such a way that the area swept out by the needle is as small as possible. How small can this area get? Besicovitch proved that the area can be arbitrarily small. Below is an example (graphics stolen from Wikipedia) whose area is supposedly $\pi/8$.
https://upload.wikimedia.org/wikipedia/commons/2/2b/Kakeya_needle.gif

The above examples are nice for explaining this problem to laymen, but the Kakeya conjecture is about a more general class of shapes called Kakeya sets --- sets that contain a unit line segment in every direction. (Sometimes these are also called Besicovitch sets.) Such objects are of interest not just in the plane, but in arbitrary dimensions as well. Besicovitch's discovery shows that the size of Kakeya sets can be as close to $0$ as you want, and in fact, could even have a size of exactly $0$. The Kakeya conjecture asks if Kakeya sets can be even smaller than that.

To make this more precise, let's take the two dimensional case as an example.
  • An example of a shape with $0$ area could be something like a circle with a ton of tiny holes punched into it. It has no area, but it's still a 2-dimensional object.
  • Another example is a line. Lines have zero area. But they're definitely "smaller" than 2-dimensional objects. In fact, lines have only 1 dimension.
  • Another example is a dot, a single point. A single point has no area, the same area as a line, but intuitively a single dot is far smaller than a line. Using the concept of dimension makes this intuition clearer: A single point has zero dimensions.

So, dimension can be used to in some sense quantify the size of sets that have zero measure. The larger the dimension, the "larger" such a set is. The Kakeya conjecture states that Kakeya sets are actually not that small at all.

Kakeya Conjecture in $\mathbb{R}^2$: Every Kakeya set in the plane is at least two-dimensional. (...thus exactly two-dimensional.)

This conjecture generalizes.

Kakeya Conjecture in $\mathbb{R}^d$: Every Kakeya set in $d$-dimensional space is $d$-dimensional.

(Actually the conjecture in $\mathbb{R}^2$ isn't a conjecture --- it's been fully solved for a while now.)

What is "dimension"?

The word dimension is actually quite vague to a mathematician. You really have to specify what type of dimension you're talking about. Even as it pertains to measure theory and/or geometry, there are still several different notions of "dimension". The main types of interest here are the Hausdorff dimension and the Minkowski dimension. The Hausdorff dimension is a bit more proper but it's harder to explain. The Minkowski dimension is easier to explain.

Let's stay in 3D space to motivate this. How can we study the "size" of a 2-dimensional object (living in 3D space), like a filled-in unit square, using only the notion of volume (i.e. "3D size")? Taking the volume of the square gives you zero, so that's not good. What you can do instead is fatten up the object so that it has volume. If you take a neighborhood of the square, you get (ignoring the curvy parts) a rectangular prism of dimensions $1 \times 1 \times r$, where $r$ is how much we fatten up the square, so the volume of the fattened square is about $r$.

We can do the same idea for studying a (unit) line segment: If we fatten this up, we get (ignoring the parts at the ends) a cylinder with a base of radius $r$, so its volume is about $\pi r^2$.

Finally, we can try this out for studying just a single point. Fattening this up gives us a sphere with volume $\frac43\pi r^3$.

Notice that the less dimensions the object has, the bigger the exponent of $r$, with an exponent of $3$ representing the smallest possible dimension. This lets us define a notion of dimension.

Definition (Minkowski Dimension in $\mathbb{R}^3$): Let $E$ be a set in $\mathbb{R}^3$. The Minkowski dimenson of $E$ is the number $d_M$ (if it exists) for which
$$d_M + \lim_{r \to 0^+} \log_r \text{vol}(E + B(0,r)) = 3.$$


This generaizes nicely to arbitrary dimensions.

Definition (Minkowski Dimension in $\mathbb{R}^n$): Let $E$ be a set in $\mathbb{R}^n$. The Minkowski dimenson of $E$ is the number $d_M$ (if it exists) for which
$$d_M + \lim_{r \to 0^+} \log_r \text{vol}(E + B(0,r)) = n.$$


So a (slightly weaker) form of the Kakeya conjecture is that every Kakeya set in $\mathbb{R}^n$ has Minkowski dimension $n$. It turns out that if you prove this for Hausdorff dimension instead, you automatically get it for Minkowski dimension as well.

Hong Wang (joint with Joshua Zahl) managed to prove this statement (for both notions of dimension) for $n = 3$. It's pretty complicated.

What should I care about this result?

There is a certain heirarchy of seemingly unrelated conjectures that are in play here. The first is the Local Smoothing Conjecture for the wave equation, first formulated by Sogge.

Conjecture: Let $u$ solve the wave equation in $n+1$ dimensions with initial data $u\,|_{t=0} = u_0$ and $\partial_t u\,|_{t=0} = 0$. Let* $I$ be a compact subinterval of time. Then** for every $p \geq \frac{2n}{n-1}$ and $\alpha > \frac{n}{2} - \frac{n}{p} - \frac{1}{2}$, we have
$$\|u\|_{L^p(\mathbb{R}^n \times I)} \lesssim \|u_0\|_{W^{\alpha,p}(\mathbb{R}^n)}.$$


(* In the literature they always take this to be $I = [1,2]$. I don't know why but I don't think the choice is important.)
(** There is also stuff conjectured for $1 \leq p < \frac{2n}{n-1}$ but I don't really care.)

Essentially this conjecture is asking whether integrability and regularity is lost as waves travel. The conjecture says that you don't lose too much, with the caveat that you first average your solution over a small time interval (hence the word "local"). Hilariously, the first proof of this conjecture for some value of $p$ was given by Wolff: For dimension $n = 2$ and $p > 74$ (that number is not a typo!!!).

The next conjecture is the restriction conjecture.

Conjecture: The inequality
$$\|\hat{f}\|_{L^q(\partial B(0,1))} \lesssim \|f\|_{L^p(\mathbb{R}^n)}$$holds for all test functions $f \in \mathcal{S}(\mathbb{R}^n)$ if and only if $p' \geq \frac{n+1}{n-1}q$ and $p < \frac{2n}{n+1}$, where $p'$ is the Holder conjugate of $p$.


Intuitively, this question asks if anything terrible happens if I look at the Fourier transform of a function over a surface. The conjecture states that it's not too terrible.

This sequence of conjectures can be made longer but I'll just say that Kakeya is the last one in the chain.

Conjecture: Every Kakeya set in $\mathbb{R}^n$ has Hausdorff and Minkowski dimension $n$.

For bizarre reasons, we have the following chain of implications:
$$\text{Local Smoothing} \implies \text{Restriction} \implies \text{Kakeya}$$It is not known if any of these implications can be reversed, though there are some partial results on this matter. Nevertheless, what this means is that if we can't solve Kakeya then we sure as heck can't solve the restriction conjecture or the local smoothing conjecture. Technically speaking, we would get more if Kakeya were somehow proven to be false. Since Kakeya was successfully proven in 3 dimensions, the other two conjectures are still wide open in 3 dimensions.

Many other wildly different parts of mathematics have been used to attack Kakeya, such as additive combinatorics. In the other direction, the theory surrounding Kakeya sets has also been used as the catalyst for proving things. The main such result (that I know of) is actually a negative result related to Fourier analysis. We know that for $f$ sufficiently nice we have
$$f(x) = \int_{\mathbb{R}^d} \hat{f}(\xi)e^{2\pi i x \cdot \xi}\,d\xi,$$and so it is very natural to ask whether
$$f_R(x) := \int_{B(0,R)} \hat{f}(\xi)e^{2\pi i x \cdot \xi}\,d\xi$$converges to $f$ in a nice way as $R \to +\infty$.

In dimension $d = 1$ it's very nice: We have that $f_R \to f$ in $L^p$ for all $1 < p < \infty$. (Here $f$ is assumed to be a Schwarz function so that its Fourier transform makes sense.) It's also true for $p=2$ in arbitrary dimensions, which is "trivial": We may write $f_R$ as $\mathcal{F}^{-1}(\hat{f} \cdot 1_{B(0,R)})$ (hence why this is called a "disk multiplier problem"), then since Fourier transform is an $L^2$ isometry we have
$$\|f_R-f\|_{L^2} = \|\mathcal{F}^{-1}(\hat{f} \cdot 1_{B(0,R)}) - f\|_{L^2} = \|\hat{f} \cdot 1_{B(0,R)} - \hat{f}\|_{L^2},$$which obviously (to those for which this is obvious) goes to $0$ as $R \to +\infty$.

So, what about for higher dimensions $d \geq 2$ and exponents $p \neq 2$? Mathematicians were stuck on proving this until Fefferman ruined everyone's day by showing that it's completely false. He did this by using a Kakeya set to build a function $f$ for which $\|f\|_{L^p}$ is small but $\|f_R\|_{L^p}$ is huge. Roughly speaking, this is done by designing $f$ to be a bunch of thin rectangles that are spread out, such that the disk multiplier shifts all these rectangles to overlap like a Kakeya set, causing an explosion.

In sum, Kakeya theory has a ton of fascinating connections to other fields. For now, most of these connections are of the form "this field can be applied to help prove Kakeya". Nevertheless, the methodologies developed for attacking Kakeya are themselves very important results with a variety of implications, and I do find myself quite curious about trying to reverse things and using Kakeya to attack other fields. It's quite possible that I'm being very naive and that such reversals are unrealistic to achieve. But I very much would like to try and find such results, since it would make this theory all the more rich.

Can you give a summary of how the proof works?

Reading Difficulty: 11 out of 5

I obviously did not actually read the paper because it's very long and I'm also not quite learned enough in this field to effectively digest it, aside from snippets of the introductory sections. Fortunately it turns out that I managed to snag a front-row seat for Hong's talk, so here's an overview of what's going on from what I can gather.

First, here's a history lesson. For a Kakeya set in $\mathbb{R}^3$, one can slightly fatten a selection of the line segments in this Kakeya set and study how much bigger it gets. Using this idea, one can find that the statement "every Kakeya set in $\mathbb{R}^3$ has dimension $\geq d$" reduces to proving an inequality of the following form: For every $\varepsilon > 0$ there is a constant $C_\varepsilon > 0$ such that for all sufficiently small $\delta > 0$,
$$\int_{\mathbb{R}^3} u^p \leq C_{\varepsilon}\delta^{-\varepsilon} \cdot \delta^{3-2p}$$for every $u = \sum_{T \in \mathbb{T}} 1_T$ where $\mathbb{T}$ is (an arbitrary) collection of $\delta$-separated (in angle) tubes of length 1 and radius $\delta$. Also $p = \frac{d}{d-1}$ (the Holder conjugate).

There are a truckload of quantifiers here so we like to shorten this proposition as:
$$\int_{\mathbb{R}^3} u^p \lessapprox \delta^{3-2p}$$Note that the number of tubes is about $1/\delta^2$ so this is trivial for $p = +\infty$ (after raising each side to $1/p$ so that the LHS is actually a norm). What we want to do is get $p$ to be as small as possible, and we win if we can get down to $p = 1.5$.

Cordoba proved this inequaity for $p = 2$ (corresponding to $d = 2$) using a dumb argument, which is now a classic trick. First you write
$$\int_{\mathbb{R}^3} u^2 = \sum_T \sum_{T'} |T \cap T'|.$$Then you split up the sum according to the scale of the angle between the tubes. Formally speaking we partition the range of possible angles $[\delta,1]$ dyadically as $[1/2,1]$, $[1/4,1/2]$, and so on. There are like $-\log \delta$ possible scales, and morally $-\log \delta \approx 1$, so we can toss out that factor and just look at tubes that intersect at an angle of approximately $1/2^k$. By a counting argument there are $\approx \frac{1}{\delta^2} \cdot \frac{(1/2^k)^2}{\delta^2} = \frac{1}{4^k\delta^4}$ pairs of tubes that intersect at an angle of (up to a factor of 2) $1/2^k$. If you draw what that looks like, you find that the volume of the intersection is at most $\frac{\delta^3}{1/2^k} = 2^k\delta^3$. So the sum of these intersections over all such pairs is like $\frac{1}{2^k\delta}$. Clearly only the $k=0$ term is dominant so we get the upper bound $\frac{1}{\delta}$, and the exponent we wanted was $3-2(2) = -1$, so we're chilling.

That last sentence suggests that this was a pretty wasteful argument, and Bourgain agreed. He one-upped Cordoba by getting $p$ down to $1.75$, which corresponds to $d = 2 + \frac13$. I don't actually know what this argument is.

(A guy named Drury also one-upped Cordoba, but only in higher dimensions.)

Then Wolff woke up and chose violence. He skill-issued everyone by getting $p$ down to $1.666\ldots$, which corresponds to $d = 2.5$. He did this using some black magic semi-inspired by what Drury did. Basically, Drury took advantage of the fact that lines can only intersect once, and Wolff took this further by using the fact that three lines make a triangle. (If you're getting the sense that I'm oversimplifying, you are 100% correct.) Wolff's proof is now known as the hairbrush argument, and I won't describe it because it is very long.

Then people got stuck for a while. There's a really interesting reason for that: None of the above proofs really use the fact that the underlying field of $\mathbb{R}^3$ is $\mathbb{R}$. For example the Cordoba argument works perfectly fine if our setting was $\mathbb{C}^3$. (The finite field case is also of interest, and if you think about it for a minute you'll find that the Cordoba argument in this setting is super stupid because there are literally only two possible angles.) The reason why this philosophically represents a problem is that if we work in $\mathbb{C}^3$, then Wolff's bound of $d \geq 2.5$ is the (kinda) best possible!

A counterexample is given by what is called the Heisenberg group on $\mathbb{C}^3$ (which, as far as I can tell is not exactly the same as the "Heienberg group" you find on Wikipedia), given by the set
$$E := \{(x,y,z) \in \mathbb{C}^3 : \text{Im}(x) = \text{Im}(y\overline{z})\}.$$It turns out that $E$ is a $5$-dimensional set, which is morally $2.5$ (complex) dimesions. It's also kinda a Kakeya set, so this is a kinda-counterexample. (To be more precise, it is a counterexample to a stronger version of the Kakeya conjecture which has the more relaxed assumpion called the Wolff axioms).

So, philosophically, if you want to beat Wolff's hairbrush proof, you have to do something tricky that takes advantage of the exact structure of the real numbers. For example, the reals does not have a "half-dimensional" subfield like the complex numbers do. To wit, via a 64-page paper, Katz, Łaba and Tao managed to prove the bound $d \geq 2.5000000001$ in the year 2000.

(That extra $0.0000000001$ isn't an exaggeration by the way!)

Their paper essentially proves that if a Kakeya set's dimension is big enough then it needs to exhibit three properties, which they call stickiness, planiness, and graininess. (The word "sticky" means "stick-like", not "glue-like") So sets of these sorts became of interest to study.

Mathematicians were stuck for another 25 years because this problem is very hard. Now let me try to talk about what Hong Wang did. I am not an expert on this (yet) so you probably shouldn't cite this post for anything.

So, the Kakeya conjecture roughly reduces to proving an inequality that looks kinda like
$$\left|\bigcup_{T \in \mathbb{T}} T\right| \gtrapprox 1$$where $\mathbb{T}$ is a collection of about $1/\delta^2$ tubes of size $\delta \times \delta \times 1$, contained in like a $10 \times 10 \times 10$ box, and satisfying the condition $K$: "All tubes are $\delta$-separated in angle". This turns out to be hard, so I believe they prove a different statement that's more complicated but still enough to resolve Kakeya. Don't ask me what it is. I said don't. Stop. Shush.

Anyways, they want to use something called an induction on scales argument. Essentially this boils down to something like this:
  • You have a bunch of $\delta \times 1$ tubes.
  • You want to group them together into big bundles of tubes. (I am very clearly oversimplifying.)
  • These big bundles now look something like, I dunno, $\delta^{0.9} \times 1$ tubes. (I didn't actually read the technicalities of this induction. Don't quote me.)
  • So in some sense you can reduce to solving the problem for tubes of slightly larger radius.
  • Now you induct upwards, getting larger and larger radii, until you hit a base case and you win.

Now, there are two problems here that kinda clash with each other in trying to execute this protocol. On one hand, the assumption $K$ is not preserved when we go through the "inductive step". So you might be tempted to throw it out. On the other hand, the conjecture is false without the assumption $K$: Take a $\delta \times 1 \times 1$ slab and draw every possible $\delta \times 1$ tube inside it. There are about $1/\delta^2$ such tubes. However the volume of their union is approximately $\delta \cdot 1 \cdot 1 = \delta$, which is certainly not $\gtrapprox 1$.

The idea to fix this is, instead of tossing $K$, we replace it with a pair of axioms that do get preserved under scaling shenanigans:
  • The Katz-Tao-Wolff axioms (KTW), in which $\mathbb{T}$ satisfies the following condition: For every rectangular prism $U$, the cardinality of the set
    $\mathbb{T}[U] := \{T \in \mathbb{T} : T \subseteq U\}$ satisfies the bound
    $$\# \mathbb{T}[U] \lesssim \frac{|U|}{|T|}.$$
  • The Frostman-Wolff axioms (FW), in which $\mathbb{T}$ satisfies the following condition: For every rectangular prism $U$, we have
    the bound
    $$\# \mathbb{T}[U] \lesssim |U| \cdot \#\mathbb{T}.$$
(Don't ask me why they're named like that. I don't know anything. My puny two degrees in math mean nothing in the face of this conjecture.)

At this point in the talk Hong gave us the exercise to find trivial upper (resp. lower) bounds on $\#\mathbb{T}$ assuming the KTW (resp. FW) axioms. And she made us think about it which was incredibly funny. So I'll now give you the opportunity to think about them as well. (I managed to answer the second one so it's not hard I swear!)

.

.

.

Alright let's talk about it.
  • If $\mathbb{T}$ satisfies the KTW axioms, then by taking $U$ to be the entire $10 \times 10 \times 10$ box, we have that $|U| = 1000$ and $1000$ is approximately $1$. So $\#\mathbb{T} = \mathbb{T}[U] \lesssim |U|/|T| = 1/|T|$. Hence a trival upper bound is $1$ divided by the volume of a typical tube (i.e. like $\delta^2$). The intuition here is that in this case $\mathbb{T}$ is small enough that it is "sparse", and so we expect the tubes to be "essentially disjoint".
  • If $\mathbb{T}$ satisfies the FW axioms, then by taking $U$ to just be one of our tubes (in this field rectangular prisms and tubes are viewed as basically the same thing so this is kosher), we have $\#\mathbb{T}[U] = 1$ and so $1 = \#\mathbb{T}[U] \lesssim |U| \cdot \#\mathbb{T} = |T| \cdot \#\mathbb{T}$. So this time we get $\#\mathbb{T} \gtrsim 1/|T|$. The intuition here is that in this case $\mathbb{T}$ is big enough that it should "fill out the whole $10 \times 10 \times 10$ cube".

We are allowed to replace the assumption $K$ with these axioms because these axioms are both implied by $K$. Proof

Supposedly, if $\mathbb{T}$ satisfies the KTW axioms, then one can show that $\left|\bigcup_{\mathbb{T}}T\right| \approx |T| \cdot \#\mathbb{T}$, which I believe means you essentially win.

Now let's talk a bit more on how the induction on scales work, as I understand it. Define $\mu(\mathbb{T})$ to be the number of $\delta$-tubes in $\mathbb{T}$ that pass through a typical point in $\bigcup_{\mathbb{T}} T$. The goal essentially reduces to studying this $\mu$ quantity. Now, for each $\delta < \rho < 1$, which represents a larger "scale", gather up most of these $\delta$-tubes into large $\rho \times \rho \times 1$ tubes.

[asy]
size(3inches);
real r1 = 0.05;
real r2 = 0.3;
path t1 = (1,r1)--(-1,r1)--(-1,-r1)--(1,-r1)--cycle;
path t2 = (1,r2)--(-1,r2)--(-1,-r2)--(1,-r2)--cycle;
pair[] xs = intersectionpoints(rotate(35, (0,0)) * t1, rotate(-10, (0,0)) * t2);
fill(rotate(30, (0,0)) * t2, p=gray(0.9));
fill(rotate(-10, (0,0)) * t2, p=gray(0.9));
fill(rotate(35, (0,0)) * t1, p=gray(0.6));
fill(rotate(22, (0,0)) * t1, p=gray(0.6));
fill(rotate(-3, (0,0)) * t1, p=gray(0.6));
fill(rotate(-15, (0,0)) * t1, p=gray(0.6));
fill(rotate(-20, (0,0)) * t1, p=gray(0.6));
fill(xs[0]--xs[1]--xs[2]--xs[3]--cycle, p=red);
draw(rotate(30, (0,0)) * t2, p=linewidth(1.3));
draw(rotate(-10, (0,0)) * t2, p=linewidth(1.3));
draw(rotate(35, (0,0)) * t1);
draw(rotate(22, (0,0)) * t1);
draw(rotate(-3, (0,0)) * t1);
draw(rotate(-15, (0,0)) * t1);
draw(rotate(-20, (0,0)) * t1);
[/asy]

The larger $\rho$-tubes form a new collection of tubes $\mathbb{T}_\rho$, which we can also study with this $\mu$ function, defined in the same way for these new tubes. Now, in the "sticky case", $\mathbb{T}_\rho$ is not too big for every $\rho$, and I think in this case we're happy for some reason. In the "non-sticky case", there is some $\rho > \delta$ for which $\#\mathbb{T} \gg \frac{1}{\rho^2}$. I think this is the case that is assumed going forward.

Previously it was observed that
$$\mu(\mathbb{T}) \lesssim \mu(\mathbb{T}[T_\rho])\mu(\mathbb{T}_\rho)$$for a typical $\rho$-tube $T_\rho \in \mathbb{T}_\rho$. For some reason this isn't good enough, for the induction on scales, so it appears that they managed to prove
$$\mu(\mathbb{T}) \lesssim \mu(\mathbb{T}[T_\rho]) \cdot \mu(G)$$where $G$ is a collection of tubes of size about $\delta \times \delta \times \delta/\rho$ that you get after taking the $\delta$-tubes and chopping off the parts that aren't in $T_\rho$. An example is shaded in red in the above diagram.

If you feel that $G$ is weird, your gut instinct is correct --- $G$ hasn't been studied very much by researchers because it has no a priori structure. Fortunately, today it can now be analyzed thanks to a certain factoring proposition. Essentially, it states that if you're given a bunch of tubes, you can throw out just a few of them so that the rest can be grouped in a nice way. More precisely (in the sense that this is what was written on the board during the talk), if $\mathbb{T}$ is a collection of $\delta$-tubes, then there is a subcollection $\mathbb{T}' \subseteq \mathbb{T}$ with $\# \mathbb{T}' \gtrsim \#\mathbb{T}$ and a set $\mathbb{W}$ of $A \times B \times 1$ boxes (where $\delta \leq A \leq B \lesssim 1$ are determined by $\mathbb{T}$) such that
  • $\mathbb{W}$ satisfies the KTW axioms, i.e. $\#\mathbb{W}[U] \leq \frac{|U|}{|W|} = \frac{|U|}{AB}$ for all $U$, and
  • For every $W \in \mathbb{W}$, $\mathbb{T}[W]$ "satisfies the FW axioms after rescaling", i.e. for every $U \subseteq W$, we have $\#\mathbb{T}[U] \lesssim \frac{|U|}{|W|} \cdot \#\mathbb{T}'[W]$.
As I understand it, this nice grouping (the "factoring" of the collection of tubes) gives you some nicer control when we attempt to do the induction on scales.

That's it for my overview. Happy $\pi$ day.
This post has been edited 4 times. Last edited by greenturtle3141, Mar 15, 2025, 6:47 AM

The effect of a constant heat source depends on dimension

by greenturtle3141, Mar 4, 2025, 9:51 PM

Reading Difficulty: 3-6/5

Prerequisites: You should probably know what the heat equation is, and hopefully you know how to solve it...

It is a well-known fact among those that have studied PDEs that the nature of waves depends wildly on the dimension of space. Particularly, it depends on the parity of the dimension $d$: If we take space to be $\mathbb{R}^d$ where $d \geq 3$ is odd, then solutions to the wave equation
$$\partial_t^2 u - \Delta u = 0$$behave the way we're used to: They travel in a direction and don't go backwards because they feel like it. In other words, if you yell at someone for a second, they'll only hear you for about a second because your yell will travel past them. However, if $d$ is even, then is not true: If you yell at someone for a second, then your yell will keep traveling in all directions for some reason, and so they'll hear you forever! (But your yell will get quieter at an exponential rate.)

For some bizarre reason, there is a similar fascinating phenomenon about the heat equation that I haven't yet come across. In fact, I've been unable to find a source for this.

A Thought Experiment

Go deep into the woods on a cold night and build a fire somewhere. Let's assume that the fire neither spreads nor dies out, so that it is constant in time as a heat source. Do you expect the distribution of heat to approach some stable distribution? Or will it keep accumulating and make the woods infinitely hot over time?

As someone that has seen a fire before, I'm pretty sure the latter doesn't happen. I mean, does a fireplace make a room infinitely hot? I thought not. This setting is the natural physical interpretation for the following PDE in one dimension:
$$\begin{cases}\partial_tu - \partial_x^2 u = 1_{(0,1)}\\ u\,|_{t=0} = 0\end{cases}$$Here the ``fire" is built on the interval $(0,1)$ and persists as a heat source. If God is good, surely $\lim_{t \to \infty} u(\cdot,t)$ should converge pointwise to something.

To my horror, it turns out that $\lim_{t \to \infty} u(\cdot,t) \equiv +\infty$, meaning that the fire makes space infinitely hot. Huh?

It turns out that the reason why this limit defies our intuition is because of dimension.

Theorem: Given a constant heat source on $\mathbb{R}^d$, the heat distribution approaches a stable distribution for $d \geq 3$. But for $d = 1,2$, heat accumulates infinitely in the sense that space gets arbitrarily hot.

An Explanation

Let us begin with a slightly scuffed argument for this principle by considering a "delta" heat source: That is, if $u$ solves
$$\begin{cases}\partial_t u - \Delta u = \delta_0 \\ u\,|_{t=0} = 0\end{cases}$$in the sense of distributions, where $\delta_0$ is the Dirac delta mass at $x = 0$, then $u(\cdot,t)$ converges pointwise iff $d \geq 3$ and explodes to $+\infty$ for $d=1,2$. This is not too hard. By Duhamel's principle we have the following explicit form for $u$ (where $\star$ is convolution and $\Phi$ is the heat kernel):
$$u(x,t) = \int_0^t (\Phi(\cdot,t-s) \star \delta_0)(x)\,ds$$$$ = \int_0^t \Phi(x,t-s)\,ds = \int_0^t \Phi(x,s)\,ds$$$$ = \int_0^t \frac{1}{(4\pi s)^{d/2}}e^{\frac{-|x|^2}{4s}}\,ds$$We are interested in what happens when we take $t = +\infty$. Morally speaking (the constants and $x$ are irrelevant) this reduces to studying the convergence of the integral $\int_0^\infty \frac{1}{s^{d/2}}e^{-1/s}\,ds$, which in turn (from $u = 1/s$) reduces to the convergence of $\int_0^\infty \frac{1}{u^{2-\frac{d}{2}}}e^{-u}\,du$, which is $< \infty$ iff $2 - \frac{d}{2} < 1$, i.e. $d > 2$.

This is pretty much a complete proof, but if we wish to be more precise, the integral at the end is
$$\int_0^\infty \frac{1}{(4\pi )^{d/2}s^{2-d/2}}e^{\frac{-|x|^2s}{4}}\,ds = \frac{|x|^{2-d}}{2^{6-d}\pi^{d/2}}\begin{cases}\Gamma(d/2-1), & d/2 - 1 > 0 \\ +\infty, & \text{otherwise}\end{cases}.$$(Note that when $d/2 -1 < 0$, $\Gamma(d/2-1)$ technically has a value, but strictly speaking this is the analytic continuation of the integral as the integral diverges in this case.)

Hence if we light a fire somewhere in a space of at least three dimensions then the heat should approach a distribution with polynomial decay on the order $\frac{1}{|x|^{d-2}}$, which is not too surprising. In fact, this form makes some intuitive sense in that for large dimensions, heat should have more directions to disperse in, which matches with the fact that the exponent here increases with $d$. For the low dimensions $d=1,2$, heat does not have enough directions to spread out in to offset the speed at which a constant heat source increases the heat in a region, so the heat will grow infinitely.

The langugage used in this intuitive picture suggests a wildly different approach.

The Probabilist's View

The theorem we've proven here bears an uncanny resemblance to a theorem in probability: A symmetric random walk on the lattice $\mathbb{Z}^d$ will visit the origin infinitely often almost surely when $d = 1,2$, but not for $d \geq 3$. Since we can imagine that heat proliferates as a random walk with infinitesimally small steps, we may expect an interesting connection between these two results. Indeed, we can "use" the probabilistic result to prove our result for the heat equation. Though, I believe there is some unavoidable reliance on the integral $\int_0^\infty \frac{1}{s^{d/2}}e^{-1/s}\,ds$. If you can somehow dodge this, do let me know. The main tool we require is the theorem which establishes the connection between probability and the heat equation with a forcing term.

Theorem (Brownian motion solves the heat equation; forcing term variant): Let $\phi \in C_0(\mathbb{R}^d)$ and let $W_t$ be a Brownian motion on $\mathbb{R}^d$ started at $x$. Take
$$u(x,t) := \mathbb{E}[\phi(W_t)] + \mathbb{E}\int_0^t f(W_s)\,ds.$$Then $u$ solves the PDE
$$\begin{cases}\partial_t u - \frac12\Delta u = f \\ u\,|_{t=0} = \phi\end{cases}.$$


This makes a lot of intuitive sense: You start out with $\phi$ heat, and then move along and collect heat according to $f$.

Proof. By Ito's lemma applied to $\phi$,
$$\phi(W_t) = \phi(x) + \int_0^t \nabla \phi(W_s) \cdot dW_s + \frac12\int_0^t \Delta \phi(W_s)\,ds.$$Take the expectation of both sides. Since $W_t$ is a continuous local martingale, so is $\int_0^t \nabla \phi(W_s) \cdot dW_s$, so this has zero mean. Thus
$$\mathbb{E}[\phi(W_t)] = \phi(x) + \frac12\int_0^t \Delta \mathbb{E}[\phi(W_s)]\,ds \qquad (1).$$On the other hand, the same reasoning applied to $f$ gives
$$\mathbb{E}[f(W_t)] = f(x) + \frac12\int_0^t \Delta \mathbb{E}[f(W_s)]\,ds \qquad (2).$$Adding the time derivative of $(1)$ to $(2)$ gives
$$\partial_t\mathbb{E}[\phi(W_t)] + \mathbb{E}[f(W_t)] = \frac12\Delta \mathbb{E}[\phi(W_t)] + \frac12\int_0^t \Delta \mathbb{E}[f(W_s)]\,ds + f(x).$$The LHS is $\partial_t u$, and the term $\frac12\Delta \mathbb{E}[\phi(W_t)] + \frac12\int_0^t \Delta \mathbb{E}[f(W_s)]\,ds$ is equal to $\frac12\Delta u$, so the above equality is $\partial t u = \frac12\Delta u + f$. It's clear that $u(x,0) = \phi$ from the definition of $u$ so this completes the proof. $\square$

The second tool we need is the continuous version of the transience/recurrence theorem for discrete random walks.

Theorem (Transience and Recurrence of Brownian motion): Let $W_t$ be a Brownian motion on $\mathbb{R}^d$ started at some $x$. For $d = 1,2$, $W_t$ will visit the unit ball infinitely often with probability 1. For $d \geq 3$, there is a positive probability that $W_t$ never visits the unit ball. (Quantitatively, the probability of ever visiting the ball $B(x,r)$ is $\frac{1}{|x/r|^{d-2}}$ for $|x| > r$.)

Proof

It would also be nice if there were a proof for this that involves appealing directly to the discrete version of this theorem (and using the fact that random walks on increasingly finer lattices converge to a Brownian motion in distribution), but I find it unclear how this can be done without a more quantiative bound on the probabilities involved in the discrete case.

Anyways, with these two results the connection is clearer and a probabilistic argument can be made. Let $f$ be a heat source somewhere on $\mathbb{R}^d$ --- say, an indicator on some open set $U$, for simplicity. If $d = 1,2$, take two balls $B_1 \Subset B_2 \Subset U$. Then any Brownian motion will visit $B_1$ infinitely often with probability 1, and with each such visit we expect $W_t$ to spend at least $\varepsilon > 0$ time inside $B_2$, hence accumulating at least $\varepsilon$ heat in expectation. Thus $u(x,t) = \int_0^t \mathbb{E} f(W_s)\,ds \xrightarrow{t \to \infty} \infty\varepsilon = \infty$ for every $x \in \mathbb{R}^d$.

If instead $d \geq 3$, we instead have that (for each $x$, where we recall $W_0 = x$) the random variable $\int_0^\infty f(W_s)\,ds$ is finite almost surely, as with probability 1 it will only visit $U$ finitely many times, with each visit accumulating only a finite amount of heat. But sadly, this is not quite enough to imply that the expectation is finite, and it's also not possible to prove that this random variable has an upper bound (it doesn't!). Unfortunately this means we have to be more delicate to complete this weird probabilistic painting, and here is where I believe we are forced back into considering the integral $\int_0^\infty \frac{1}{s^{d/2}}e^{-1/s}\,ds$. Indeed,
$$u(x,\infty) = \mathbb{E}\int_0^\infty f(W_s)\,ds = \int_0^\infty \mathbb{P}(W_s \in U)\,ds$$$$ = \int_0^\infty \int_{x + U} \frac{c_1}{s^{d/2}}e^{\frac{-|y|^2}{c_2s}}\,dy\,ds$$for some irrelevant constants $c_1$ and $c_2$. Interchanging the integrals again will bring this inevitable integral to light and show that $u(x,\infty) < \infty$.

I had a few ideas for trying to get this argument to work without this integral, relying as much as possible on the "random walk" result, but none of them seemed very feasible, and certainly not more elegant than what I've shown. If you have ideas for alternate approaches I'd be delighted to hear it.

Probability hurts my brain, let's go back to spamming integrals.

I'll close with generalizing the initial result, with $f$ not necessarily being an indicator.

Theorem: Let $f \geq 0$ everywhere, compactly supported, with $f > 0$ over a set of positive measure. Let $u$ solve the heat equation with $f$ as a heat source, constant in time. Then $u$ explodes for $d = 1,2$, and approaches a finite stable distribution a.e. for $d \geq 3$.

Proof. We have
$$u(x,t) = \int_0^t (\Phi(\cdot,t-s) \star f)(x)\,ds = \int_0^t \int_{\mathbb{R}^d} \frac{1}{(4\pi(t-s))^{d/2}}e^{\frac{|x-y|^2}{4(t-s)}}f(y)\,dy\,ds$$$$ = \int_{\mathbb{R}^d} f(y)\int_0^t \frac{1}{(4\pi(t-s))^{d/2}}e^{\frac{|x-y|^2}{4(t-s)}}\,ds\,dy$$$$ = \int_{\mathbb{R}^d} f(y)\int_0^t \frac{1}{(4\pi s)^{d/2}}e^{\frac{|x-y|^2}{4s}}\,ds\,dy.$$If $d = 1,2$ then the inner integral explodes as $t \to \infty$ for every $(x,y)$, and so the outer integral explodes over a positive-measure subset where $f > 0$. (Note that funny things happen if we didn't take $f$ to be signed... better to not think about it. And interchanging all the integrals and limits is 100% safe because of this restriction on $f$, so...)

If $d \geq 3$, then, well, taking $t = +\infty$ and running through the computation again gives
$$ = C_d\int_{\mathbb{R}^d} \frac{f(y)}{|x-y|^{d-2}}\,dy,$$for a constant $C_d < \infty$ depending on $d$ that I could not care less about, and this is $<\infty$ since we assume $f$ is compactly supported (and because $d-2 < d$ is always true). The compact support condition can certainly be weakened here but I'm going to call the fire brigade if you manage to get your hands on an unbounded heat source. More importantly, the above is a convolution of $f$ with a (multiple of) the fundamental solution to the Laplacian, so we see that the stable distribution that we approach is the (possibly distributional) solution to the problem $-\Delta u = f$. This makes a lot of sense: If $u$ were constant in time then the time derivative term goes away leaving us with $\partial_t u - \Delta u = -\Delta u = f$. That's cool.
This post has been edited 2 times. Last edited by greenturtle3141, Mar 4, 2025, 9:57 PM

How good are Riemann sums for periodic functions?

by greenturtle3141, Nov 14, 2024, 10:22 PM

Let $f:\mathbb{R} \to \mathbb{R}$ be periodic (say, 1-periodic) and smooth. Our goal is to approximate $\int_0^1 f\,dx$. Here are the first two ways that come to mind:
  • Use a left (or right) Riemann sum:
    $$\int_0^1 f\,dx \approx \frac{1}{n}\sum_{k=0}^{n-1} f(k/n)$$
  • Use a trapezoidal Riemann sum:
    $$\int_0^1 f\,dx \approx \frac{1}{n}\sum_{k=0}^{n-1} \frac{f(k/n)+f((k+1)/n)}{2}$$
How good is each approximation as $n \to \infty$? In other words, if $S_n$ is the approximation and $I$ is the integral, how does $|I-S_n|$ decay? What is the largest exponent $p$ such that $|I-S_n| \leq \frac{C_p}{n^p}$ for some constant $C_p > 0$ (that is allowed to depend on $f$)? Are there any schemes that induce an even faster convergence?

Try to guess the answer before moving on.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Ok time's up.

Theorem: Let $f:\mathbb{R} \to \mathbb{R}$ be smooth and 1-periodic. Then
$$\left|\int_0^1 f\,dx - \frac{1}{n}\sum_{k=0}^{n-1} f(k/n)\right| = O(1/n^p)$$for every $p > 1$!


So you're not going to do better than the classic Riemann sum. I leave as an exercise to show that "any scheme" will also achieve this extremely rapid decay.

(By the way, did you notice that left, right, and trapezoidal Riemann sums are all completely identical for periodic functions?)

Proof. By periodicity we may consider the Fourier series
$$f(x) = \sum_{m \in \mathbb{Z}} a_me^{-2\pi imx}.$$By smoothness the above relationship is a "true" pointwise equality, holding for every $x$. Now take the left Riemann sum of both sides to find
$$\frac{1}{n}\sum_{k=0}^{n-1} f(k/n) = \frac{1}{n}\sum_{k=0}^{n-1}\sum_{m \in \mathbb{Z}} a_me^{-2\pi imk/n}$$$$ = \sum_{m \in \mathbb{Z}} a_m \cdot \frac1n\sum_{k=0}^{n-1} e^{-2\pi imk/n}$$where the interchange is justified (why?). But observe that
$$\frac1n\sum_{k=0}^{n-1} ae^{-2\pi imk/n} = \begin{cases}1, & n \mid m \\ 0, & \text{otherwise}\end{cases},$$hence
$$\frac{1}{n}\sum_{k=0}^{n-1} f(k/n) = \sum_{m \in \mathbb{Z}, n \mid m} a_m = \sum_{j \in \mathbb{Z}} a_{jn}$$$$ = \int_0^1 f\,dx + \sum_{j \in \mathbb{Z} \setminus \{0\}} a_{jn}.$$Since $f$ is smooth, the Fourier coefficients decay rapidly; say, $|a_m| \leq C_p/|m|^p$ for $p$ as large as we wish. Then the error is bounded as
$$\left|\sum_{j \in \mathbb{Z} \setminus \{0\}} a_{jn}\right| \lesssim \sum_{j=1}^\infty \frac{1}{j^pn^p} \sim \frac{1}{n^p}$$for $p$ as large as we wish. $\square$.

We can also try this trick for $f$ not necessarily periodic. If $f$ is integrable and continuous then we may safely apply Poisson summation to the function $x \mapsto f(x/n)$ to find
$$\frac1n\sum_{m \in \mathbb{Z}} f(m/n) = \sum_{m \in \mathbb{Z}} \hat{f}(nm).$$(Recall that if $g(x) = f(x/a)$ then $\hat{g}(\xi) = a\hat{f}(a\xi)$.) The LHS is a Riemann sum using rectangles of width $1/n$ and the error is then given by $\sum_{m \in \mathbb{Z} \setminus \{0\}} \hat{f}(nm)$, which is smaller the more regular $f$ is. In particular if $f \in C^k$ then we can expect the error to be of order $O(1/n^k)$.

[Humor] Quiz

by greenturtle3141, Oct 17, 2024, 6:53 AM

This post has been edited 1 time. Last edited by greenturtle3141, Oct 17, 2024, 6:53 AM

10 "Real" Applications of Complex Numbers (Part 2/2)

by greenturtle3141, Aug 23, 2024, 3:39 AM

(Part 1: Here)

7. Evaluation of integrals

The topic of using contour integration to evaluate certain integrals has been beaten to death, this time by me. For a full discussion please see https://artofproblemsolving.com/community/c2532359h3075016.

I'll add that contour integration isn't the only way for complex numbers to evaluate integrals. If there is a way to introduce complex numbers which can simplify matters, there's a good chance that this can be made to work. Let us, for example, evaluate the integral
$$\int_0^x \frac{1}{1+t^2}\,dt.$$The answer, of course, is $\arctan(x)$. Let us try to derive this using complex numbers.

As suggested in my previous writings concerning complex numbers, there are some dangerous pitfalls that we must beware of. It is easy to fall for such traps and end up with nonsense, but fortunately I am here to light the way. We begin by factoring over the complex numbers to rewrite the integral as
$$\frac{1}{2i}\int_0^x \frac{1}{t-i} - \frac{1}{t+i}\,dt.$$This is perfectly legal. Next, it is in fact true that this integral is simply equal to
$$ = \frac{1}{2i}\left[\left(\log(x-i) - \log(-i)\right) - \left(\log(x+i) - \log(i)\right)\right].$$But you will fall into a pit unless you know precisely what $\log(\cdot)$ means for complex numbers. Here, it refers to the principal logarithm (there is some reasoning that should be applied to justify that this works; namely one ought to avoid the perilous negative real line). A bit more precisely, we are choosing a particular branch of the complex logarithm that contains all the complex numbers we're working with. Working with the principal log (characterized by $\log(re^{i\theta}) = \log r + \theta i$ for $-\pi < \theta < \pi$) and drawing some pictures, we discover the following:
$$\log(x+i) = \log(\sqrt{x^2+1}) + \arctan(1/x)i$$$$\log(x-i) = \log(\sqrt{x^2+1}) - \arctan(1/x)i$$$$\log i = \pi i/2$$$$\log(-i) = -\pi i/2$$Hence our integral is
$$\int_0^x \frac{1}{1+t^2}\,dt = \frac{1}{2i}\left[-2\arctan(1/x)i + \pi i\right] = \boxed{\frac{\pi}{2} - \arctan(1/x)}.$$This may look different from the claimed integral of $\arctan x$, but fortunately these two expressions are the same (why?).

Exercise: Evaluate the integral
$$\int_0^\pi e^x\cos x\,dx$$using complex numbers (instead of the usual integration by parts).

8. Evaluation of infinite sums

In a previous exercise I asked you to evaluate $\sum_{n=1}^\infty \frac{\cos n}{2^n}$ using complex numbers. This is one instance of complex numbers being the silver bullet for evaluating an infinite sum, but there are other (less elementary) applications.

Fourier series

The Fourier series of a periodic function is, morally speaking, its Fourier transform under the Fourier transform of "type" $\mathbb{T} \to \mathbb{Z}$. This seemingly "abstract nonsense" view of Fourier series is actually quite helpful for making much of the properties shared by both Fourier series and the "classic" Fourier transform more intuitive. A more extended discussion of Fourier series and/or the Fourier transform will be reserved for a separate post, but I felt motivated to provide a few insights here.

Taking the Fourier series of certain simple functions and plugging some values into them can often lead to interesting identities. Unfortunately the converse is, at least for me, far more difficult; it is a tall task to look at an infinite sum and proceed to conjure up a proof via Fourier series.

Let us, for example, take the function $f(x) = |x|$ over $(-1,1)$. Morally speaking we view this as a 2-periodic function on $\mathbb{R}$ by extending $f$ periodically. To obtain the Fourier series of $f$ with high probability of success, and without the help of the extreme "abstract" view, we can follow these steps.
  1. Determine what frequencies we need to use. We need to use all frequencies $x \mapsto e^{-i\xi x}$ that repeat every $2$, i.e. are invariant under a phase shift of $2$. It's not hard to reason out that the frequencies we should use are $x \mapsto e^{-n\pi i x}$ for every integer $n$.
  2. Scale the frequencies so that they have unit $L^2$ norm. That is, we want to define
    $$e_n(x) := c \cdot e^{-n\pi ix}$$where $c > 0$ is chosen so that $\int_{-1}^1 |e_n(x)|^2\,dx = 1$. Seeing that $|e_n(x)| = 1$, it's not hard to find that $c = 1/\sqrt{2}$. We need to do this so that the frequencies $\{e_n\}_{n \in \mathbb{Z}}$ form an orthonormal basis for $L^2(-1,1)$.
  3. Now the $n$th Fourier coefficient of $f$ is simply the $e_n$-component of $f$ under the orthonormal basis $\{e_n\}_{n \in \mathbb{Z}}$, which is given by the inner product
    $$\hat{f}(n) = \langle f, e_n\rangle = \int_{-1}^1 |x| \cdot \frac{1}{\sqrt{2}}e^{-n\pi i x}\,dx$$$$= \sqrt{2}\int_0^1 x\cos(n\pi x)\,dx = \sqrt{2}\int_0^1 \frac{\sin n\pi x}{n \pi}\,dx = \frac{-\sqrt{2}(\cos(n\pi) - 1)}{n^2\pi^2} = \begin{cases}0, & \text{$n$ even} \\ \frac{-2\sqrt{2}}{n^2\pi^2}, & \text{$n$ odd}\end{cases}.$$Oh but you'll notice that this actually doesn't work for $n=0$, so we need to compute separately that $\hat{f}(0) = \langle f, e_0\rangle = \int_{-1}^1 |x| \cdot\frac{1}{\sqrt{2}}\,dx = \frac{1}{\sqrt{2}}$.

Voila! You have successfully found the Fourier series
$$|x| = f(x) = \sum_{n=-\infty}^\infty \hat{f}(n)e_n(x) = \sum_{\text{$n < 0$, odd}} \frac{-2\sqrt{2}}{n^2\pi^2}e_n(x) + \frac{1}{\sqrt{2}}e_0 + \sum_{\text{$n > 0$, odd}}\frac{-2\sqrt{2}}{n^2\pi^2}e_n(x) \qquad (*)$$without memorization. (I could combine the positive and negative parts to write everything in terms of $\cos \pi nx$, but I don't want to for reasons that may become apparent. Also I think it's more instructive like this.)

Note that writing down $(*)$ willy-nilly is actually a bit reckless. It is not always going to be true that both sides of $(*)$ are equal for every $x$. To avoid pitfalls, it should first be shown that $f$ is continuous (which is obvious here) and that $\sum_{n \in \mathbb{Z}} |\hat{f}(n)| < \infty$ (which is not too hard). The motivation and rationale for these conditions will be deferred for another post.

Once you've shown that, it's safe to plug in numbers into $(*)$ and see what you get. Let's try $x = 0$ for fun. Noting that $e_n(0) = 1/\sqrt{2}$ for all $n$, we get
$$0 = \frac12 + 2\sum_{\text{$n \in \mathbb{N}$ odd}} \frac{-2}{n^2\pi^2},$$which rearranges to $\sum_{n \in \mathbb{N}\text{ odd}} \frac1{n^2} = \frac{\pi^2}{8}$. Wow! We have recovered the Basel sum, $\sum_{n=1}^\infty \frac{1}{n^2} = \frac{\pi^2}{6}$ (and if you're not sure how this immediately follows, you should figure it out!).

But wait, there's more to glean from $(*)$! By Parseval's identity, or the fact that the Fourier transform is an isometry, we know that
$$\int_{-1}^1 |f(x)|^2\,dx = \sum_{n \in \mathbb{Z}} |\hat{f}(n)|^2.$$This gives
$$\frac{2}{3} = \frac12 + 2\sum_{\text{$n \in \mathbb{N}$ odd}} \frac{8}{n^4\pi^4},$$which rearranges to $\sum_{n \in \mathbb{N}\text{ odd}} \frac{1}{n^4} = \frac{\pi^4}{96}$. This recovers the formula
$$\zeta(4) = \sum_{n=1}^\infty \frac{1}{n^4}  =\frac{\pi^4}{90}.$$
Exercise: Let $a \neq 0$ be a real constant. Derive the identity
$$\sum_{n=-\infty}^\infty \frac{1}{a^2+n^2} = \frac{\pi\coth(\pi a)}{a}$$by abusing the power of complex numbers.

Hint 1
Hint 2
Hint 3
Hint 4

Poisson Summation

Poisson summation is a tool that can be used to tackle sums of the form
$$\sum_{n=-\infty}^\infty f(n),$$where $f:\mathbb{R} \to \mathbb{R}$ is a continuous function that "decreases rapidly" as $x \to \pm \infty$. For such functions, Poisson summation states that you can replace $f$ with its Fourier transform without changing the value of the sum. That is,
$$\sum_{n=-\infty}^\infty f(n) = \sum_{n=-\infty}^\infty \hat{f}(n).$$It is normal to feel that this identity is absurd. Two of the proofs I know of include using Fourier series and using contour integration of a cleverly-chosen function.

Poisson summation happens to be an essential tool in analytic number theory, but I am not an expert in this field so I cannot say much on that matter. Amazingly, it is also used crucially for proving optimal results in the study of sphere packings (https://en.wikipedia.org/wiki/Sphere_packing), though it should be noted that such applications use the multi-dimensional form
$$\sum_{n \in \mathbb{Z}^d} f(n) = \sum_{n \in \mathbb{Z}^d} \hat{f}(n)$$or further generalizations.

Exercise: Use Poisson summation to solve the previous exercise. It is helpful to first prove the identity
$$\int_{-\infty}^\infty \frac{\cos x}{x^2+a^2}\,dx = \frac{e^{-a}\pi}{a}$$for $a > 0$, by using contour integration. Or you can just assume it's true if you aren't good at that yet.

9. The Central Limit Theorem

The Central Limit Theorem (CLT) states that with enough data, all distributions are normal (Gaussian) distributions, i.e. look like bell curves. More formally: If $X_n$ is a sequence of independent and identically distributed random variables with mean $\mu$ and variance $\sigma^2$, then
$$\sum_{k=1}^n \frac{X_1+\ldots+X_n - n\mu}{\sqrt{n}\sigma}$$converges in distribution to a $N(0,1)$ random variable. This is scary fact. But what's even more frightening is that this result can be proven using complex numbers. In fact, it is a very pretty, brief and elegant proof.

The proof uses a probabilistic version of the Fourier transform, called the characteristic function: If $X$ is a random variable, we can define its "Fourier transform" $\phi_X:\mathbb{R} \to \mathbb{C}$ via
$$\phi_X(t) := \mathbb{E} e^{itX}.$$The distribution of a random variable is captured fully by its characteristic function (this requires proof), and moreover the characteristic function of a sum of independent random variables turns into a product! This makes the characteristic function a powerful tool for tackling the CLT, which we will now provide (roughly) the proof of.

Assume for simplicity that $\mu = 0$ and $\sigma^2 = 1$. (In fact, this can be assumed WLOG.) Now if $Y_n := \frac{X_1+\ldots+X_n}{\sqrt{n}}$, then
$$\phi_{Y_n}(t) = \mathbb{E}\left[e^{it(X_1+\ldots+X_n)/\sqrt{n}}\right] = \mathbb{E}\left[e^{itX_1/\sqrt{n}}e^{itX_2/\sqrt{n}}\ldots e^{itX_n/\sqrt{n}}\right].$$Since the variables are independent, this splits into a product of expectations.
$$ = \mathbb{E} e^{itX_1/\sqrt{n}}\mathbb{E} e^{itX_2/\sqrt{n}}\ldots \mathbb{E} e^{itX_n/\sqrt{n}}$$Since the variables have the same distribution, all these expectations are the same.
$$ = \left(\mathbb{E} e^{itX_1/\sqrt{n}}\right)^n$$Now we rewrite $e^{itX_1/\sqrt{n}}$ by Taylor expansion.
$$e^{itX_1/n} = 1 + \frac{iX_1}{\sqrt{n}}t - \frac{t^2}{2n}X_1^2 + \text{Remainder}$$The most difficult part of this proof is dealing with the remainder term, and it requires some technical bounds. I will omit these parts of the proof. But intuitively you should know that the remainder term here is on the order of $t^3/n^{3/2}$, in some sense.

Taking the expectation of each side (and using $\mathbb{E}X_1 = 0$, $\mathbb{E} X_1^2 = 1$) now gives
$$\mathbb{E}e^{itX_1/\sqrt{n}} = 1 - \frac{t^2}{2n} + \text{Remainder},$$and now raising to the $n$th power gives
$$\phi_{Y_n}(t) = \left(\mathbb{E} e^{itX_1/\sqrt{n}}\right)^n = \left(1 - \frac{t^2}{2n} + \text{Remainder}\right)^n.$$Now, since the remainder term is on the order of like $1/n^{3/2}$, we can consider it asymptotically insignificant compared to the leading term $t^2/(2n)$. Thus, by handwavingly ignoring the remainder, we find that
$$\lim_{n \to \infty} \phi_{Y_n}(t) = \lim_{n \to \infty} \left(1 - \frac{t^2}{2n}\right)^n = e^{-t^2/2}.$$(Be rest assured that taming the remainder to make this argument rigorous isn't too bad to do.) The above limit basically means that $Y_n$ converges in distribution to a random variable whose characteristic function is $e^{-t^2/2}$. It turns out that $e^{-t^2/2}$ is exactly the characteristic function of a standard Gaussian! That's the proof. (...modulo proving all those properties of characteristic functions that we relied on.)

Exercise: Use complex numbers to show that if $X$ and $Y$ are independent random variables such that $X$ and $X+Y$ have the same distributon, then $Y = 0$ (almost surely).

10. Fractional derivatives and Sobolev embeddings

First, do note that there are many notions of fractional derivatives, and they are not at all equivalent. It's also not that clear how they are useful. However, it sounds cool.

Complex numbers give one method of defining a notion of fractional differentiation. Let $f(x)$ be a nice function. Then it is well-known that the Fourier transform converts derivatives of $f$ into multiplication by a term:
$$\mathcal{F}(f')(\xi) = -2\pi i \xi \hat{f}(\xi)$$Of course, this works inductively for any number of derivatives.
$$\mathcal{F}(f^{(n)})(\xi) = (-2\pi i \xi)^n\hat{f}(\xi)$$In particular, we get the identity
$$f^{(n)}(x) = \mathcal{F}^{-1}\left((-2\pi i \xi)^n\hat{f}(\xi)\right),$$which gives a rather awkward method of taking $n$ derivatives of $f$.

But now this begs the question: What if in the above identity we allow $n$ to be a non-integer? Then this would give a way to define a notion of taking $s$ derivatives, where $s$ is any real number.
$$f^{(s)}(x) := \mathcal{F}^{-1}\left((-2\pi i \xi)^s\hat{f}(\xi)\right)$$This is clearly consistent with "classical" differentiation by definition, so this definition is reasonable provided that $f$ is nice. It should be noted that we should be a wee bit careful with what "$i^s$" means when $s$ is not an integer, but I don't feel like discussing this.

I'm going to use this definition to compute the fractional derivatives of the insultingly simple function $f(x) = 1$. Before you complain, I have very good reason for keeping it simple --- if we try to use the above definition, we immediately run into a problem! Namely, this choice of $f$ does not have a Fourier transform because it does not have sufficient decay at $\pm \infty$.

While this is certainly a major issue, it can be sidestepped with the knowledge that such functions do have Fourier transforms, in the sense of distributions. As I've previously suggested, this is quite advanced, so it is not easy for me to explain precisely how this works at a low level. Rigorously, we can find that $\hat{f} = \delta_0$ where $\delta_0$ is the Dirac delta at $0$. Now $\mathcal{F}^{-1}((-2\pi i\xi)^s\delta_0)$ is the distribution acting as follows:
$$\langle \mathcal{F}^{-1}((-2\pi i\xi)^s\delta_0),\phi\rangle = \langle(-2\pi i\xi)^s\delta_0, \hat{\phi}\rangle = \langle \delta_0,(-2\pi i\xi)^s\hat{\phi}\rangle$$Aha! By definition of $\delta_0$, this expression is just $(-2\pi i \cdot 0)^s\hat{\phi}(0)$. If $s > 0$, this is just $0 = \langle 0, \phi\rangle$. If $s = 0$, this is $\hat{\phi}(0)$ (because $0^0 = 1$), which is equal to $\int_{\mathbb{R}} \phi = \langle 1,\phi \rangle$. Thus for $s \geq 0$,
$$f^{(s)}(x) = \begin{cases}1, & s = 0 \\ 0, & s > 0\end{cases}.$$
The next simplest function we can try differentiating is $f(x) = x$. The first immediate issue is, of course, that $x$ can't be Fourier'd in the usual sense, so we must again interpret $x$ distributionally. It's not trivial, but you can show that $\hat{x} = \frac{1}{-2\pi i}\delta'_0$ where $\delta_0'$ is the distributional derivative of $\delta_0$. (One way to demonstrate this is to work backwards by proving that $$\langle \mathcal{F}^{-1}(\delta_0'),\phi\rangle = \langle \delta_0', \hat{\phi}\rangle = \langle \delta_0, -(\hat{\phi})'\rangle = \left.\frac{d}{d\xi}\right|_{\xi=0} -\hat{\phi}(\xi) = \langle x,-2\pi i \phi\rangle.$$I'm not sure how I'd figure it out more directly.) Now
$$\langle x^{(s)}, \phi\rangle = \langle \mathcal{F}^{-1}((-2\pi i)^{s-1}\xi^s\delta_0'),\phi\rangle = \langle (-2\pi i)^{s-1}\xi^s\delta_0',\hat{\phi}\rangle = \langle \delta_0', (-2\pi i)^{s-1}\xi^s\hat{\phi}\rangle$$$$ = \left.\frac{d}{d\xi}\right|_{\xi = 0} (-2\pi i)^{\xi-1}\xi^s \hat{\phi}(\xi) = s(-2\pi i)^{s-1}0^{s-1}\hat{\phi}(0) + (-2\pi i)^{s-1}0^s\hat{\phi}'(0)$$$$ = s(-2\pi i)^{s-1}0^{s-1}\langle 1, \phi\rangle + (-2\pi i)^{s-1}0^s\langle -2\pi i x,\phi\rangle = \langle s(-2\pi i)^{s-1}0^{s-1} + x(-2\pi i)^{s}0^s, \phi\rangle.$$We conclude that
$$x^{(s)} = s \cdot 0^{s-1} + x \cdot 0^s.$$Let's check that this is consistent for integer values of $s$. When $s = 0$ (using the convention that $0 \cdot \infty = 0$), this is just $x$. When $s = 1$, this is $1$. When $s \geq 2$, this is $0$. Hence we did not mess up. We can also write this derivative piece-wise in $s$ as
$$x^{(s)} = \begin{cases}x, & s = 0 \\ \text{DNE}, & 0 < s < 1 \\ 1, & s = 1 \\ 0, & s > 1\end{cases}.$$
This procedure can be used to take the (Fourier-wise) fractional derivative of various expressions, but as you can probably tell, computing an exact expression for such derivatives is rather cumbersome in practice. What's far more useful is the consideration of spaces of functions for which you can take a fractional derivative, which can pop up in the study of PDE, interpolation theory, and other deep regions of analysis. Of particular interest is the Hilbert space $H^s(\mathbb{R}^d)$ of functions $f \in L^2(\mathbb{R}^d)$ that satisfy
$$\int_{\mathbb{R}^d} (1+x^2)^{s}|\hat{f}(\xi)|^2\,d\xi < \infty,$$which happens to be equal to the fractional Sobolev space $W^{s,2}(\mathbb{R}^d)$, where $W^{s,p}(\mathbb{R}^d)$ is defined as the space of functions $f \in L^2(\mathbb{R}^d)$ for which
$$\int_{\mathbb{R}^d \times \mathbb{R}^d} \frac{|f(x)-f(y)|^p}{|x-y|^{d+sp}}\,d(x,y) < \infty.$$For $s$ a positive integer, this in turn is just the Sobolev space of functions which admit $s$ weak derivatives, so these abstractions are certainly grounded in reality (provided you consider weak derivatives to be at all grounded in reality...).

In fact, $f \mapsto \left(\int_{\mathbb{R}^d} (1+x^2)^{s}|\hat{f}(\xi)|^2\,d\xi\right)^{1/2}$ gives an equivalent norm for the topology on $H^s(\mathbb{R}^d)$, which allows some very neat proofs in the realm of Sobolev spaces. A particularly easy result to get is Morrey's Embedding, which states that if $\alpha = s - d/2 \in (0,1)$ then $H^s(\mathbb{R}^d) \hookrightarrow C^{0,\alpha}(\mathbb{R}^d)$. Equivalently, for any Schwartz function $f \in \mathcal{S}(\mathbb{R}^d)$ we have the bound
$$|f|_{C^{0,\alpha}} \preceq \|f\|_{H^s}.$$To show this, first fix $x,y \in \mathbb{R}^d$ and use Fourier inversion to write
$$|f(x)-f(y)| = \left|\int_{\mathbb{R}^d} \hat{f}(\xi)\left(e^{2\pi i x\xi} - e^{2\pi i y\xi}\right)\,d\xi\right|$$$$ \leq \int_{\mathbb{R}^d} |\hat{f}(\xi)|\left|e^{2\pi i x\xi} - e^{2\pi i y\xi}\right|\,d\xi.$$Since we wish to use the funny Fourier norm on $H^s$, we now multiply and divide by the appropriate expression and apply Cauchy-Schwarz.
$$= \int_{\mathbb{R}^d} (1+|\xi|^2)^{s/2}|\hat{f}(\xi)| \cdot \frac{\left|e^{2\pi i x\xi} - e^{2\pi i y\xi}\right|}{(1+|\xi|^2)^{s/2}}\,d\xi$$$$\leq \left(\int_{\mathbb{R}^d} (1+|\xi|^2)^{s}|\hat{f}(\xi)|^2\,d\xi\right)^{1/2}\left(\int_{\mathbb{R}^d}\frac{\left|e^{2\pi i x\xi} - e^{2\pi i y\xi}\right|^2}{(1+|\xi|^2)^{s}}\,d\xi\right)^{1/2}$$$$\preceq \|f\|_{H^s}\left(\int_{\mathbb{R}^d}\frac{\left|e^{2\pi i x\xi} - e^{2\pi i y\xi}\right|^2}{(1+|\xi|^2)^{s}}\,d\xi\right)^{1/2}$$It remains to prove that $\int_{\mathbb{R}^d}\frac{\left|e^{2\pi i x\xi} - e^{2\pi i y\xi}\right|^2}{(1+|\xi|^2)^{s}}\,d\xi \preceq |x-y|^\alpha$. Since integrals over $\mathbb{R}^d$ are kinda awful, it makes to split it as a sum of an integral over $|\xi| < R$ and an integral over $|\xi| > R$, with $R$ to be chosen later. Near $0$ we use the estimate
$$\left|e^{2\pi i x\xi} - e^{2\pi i y\xi}\right|^2 = \left|e^{2\pi i \xi \cdot (x-y)} - 1\right|^2 = (\cos(2\pi\xi \cdot (x-y))-1)^2 + \sin^2(2\pi\xi \cdot (x-y))$$$$ \preceq 1-\cos(2\pi\xi \cdot (x-y)) \leq 4\pi^2|\xi \cdot (x-y)|^2 \preceq |\xi|^2|x-y|^2.$$Far from $0$, we use the stupid estimate $\left|e^{2\pi i x\xi} - e^{2\pi i y\xi}\right|^2 \leq 4 \preceq 1$. Putting this together,
$$\int_{\mathbb{R}^d}\frac{\left|e^{2\pi i x\xi} - e^{2\pi i y\xi}\right|^2}{(1+|\xi|^2)^{s}}\,d\xi \preceq \int_{\mathbb{R}^d} \frac{1}{(1+|\xi|^2)^s} \cdot \left(1_{\{|\xi| < R\}}|x-y|^2|\xi|^2 + 1_{\{|\xi| > R\}}\right)\,d\xi$$$$\preceq \int_{\mathbb{R}^d} \frac{1}{|\xi|^{2s}} \cdot \left(1_{\{|\xi| < R\}}|x-y|^2|\xi|^2 + 1_{\{|\xi| > R\}}\right)\,d\xi$$$$\preceq \int_0^\infty \frac{r^{d-1}}{r^{2s}} \cdot \left(1_{(0,R)}r^2|x-y|^2 + 1_{(R,\infty)}\right)\,dr$$$$= |x-y|^2\int_0^R r^{d+1-2s}\,dr + \int_R^\infty r^{d-1-2s}\,dr \preceq |x-y|^2R^{d+2-2s} + R^{d-2s}.$$Now we choose $R$, and it makes sense to take $R = |x-y|^{-1}$. This gives the final upper bound (up to a constant) of $|x-y|^{2s-d} = |x-y|^{2\alpha}$, completing the proof.

Actually to really show the desired embedding we should control the $\|\cdot\|_\infty$ norm. I will leave this to you.

Exercise: Use complex numbers to prove that any $f \in H^s(\mathbb{R}^d)$, where $s > d/2$, is actually bounded with upper bound
$$\|f\|_\infty \preceq \|f\|_{H^s}$$(up to a constant depending only on $s$ and $d$).

I must stress that the above statement does not involve complex numbers!

One last (tough) exercise for you.

Exercise: Use complex numbers to prove that for any $f,g \in H^s(\mathbb{R}^d)$, where $s > d/2$, we have the inequality
$$\|fg\|_{H^s} \preceq \|f\|_{H^s}\|g\|_{H^s}.$$(Hint: You will need to use Young's Convolution Inequality. One form of this inequality is as follows: For $f,g,h:\mathbb{R}^d \to \mathbb{R}$ and $p,q,r \geq 1$ with $\frac1p + \frac1q + \frac1r = 2$, we have
$$\left|\int_{\mathbb{R}^d}\int_{\mathbb{R}^d} f(x)g(x-y)h(y)\,dx\,dy\right| \leq \|f\|_{L^p}\|g\|_{L^q}\|h\|_{L^r}.$$)

Additional Hint


What I have presented here is only those applications that I am most familiar with. There are far more magical uses for complex numbers in "real" settings, many of which are likely unknown to me. What applications do you know of? Feel free to share.
This post has been edited 5 times. Last edited by greenturtle3141, Aug 23, 2024, 5:09 PM

10 "Real" Applications of Complex Numbers (Part 1/2)

by greenturtle3141, Aug 23, 2024, 3:33 AM

Reading Difficulty: Anywhere between 1/5 and 7/5
Prerequisites: Varies

Despite not being "real", complex numbers somehow have fascinating applications to "real" settings. That is, complex numbers can be introduced to solve problems that do not at all reference complex numbers. Despite having seen many such instances, I continue to be amused by these applications. Let's talk about 10 of them:
  1. Geometry (Reading Difficulty: 1/5)
  2. Trigonometric Identities (Reading Difficulty: 1-2/5)
  3. Sums of Two Squares (Reading Difficulty: 1/5)
  4. Roots of Unity Filtering (Reading Difficulty: 2-3/5)
  5. Physics (Reading Difficulty: 1/5)
  6. Solving Differential Equations (Reading Difficulty: 3-5/5)
  7. Integrals (Reading Difficulty: 3/5)
  8. Infinite sums (Reading Difficulty: 3-4/5)
  9. Proof of the Central Limit Theorem (Reading Difficulty: 4-5/5)
  10. Fractional Derivatives and Embeddings of Fractional Sobolev Spaces (Reading Difficulty: 5-7/5)

1. Geometry

Rotate points with ease!

This may be "backwards" in terms of which result depends on the other, but complex numbers are incredibly useful for rotations. This is because when multiplying complex numbers, you are really multiplying their magnitudes and adding their angles. This follows from, say, sine and cosine angle sum formulae, and it is also "easy to see" if you write complex numbers in polar form (though this is not at all a rigorous argument). But where this method really shines is when dealing with cartesian coordinates. If we have a point $(a,b) \in \mathbb{R}^2$, and we wish to rotate this about the origin by $60$ degrees, then complex numbers makes this incredibly simple to execute: Just multiply the complex number $a+bi$ by $\cos(60) + i\sin(60)$.

The second-best way to rotate points is via linear algebra. But even if you remember the rotation matrix, it's probably much faster to use complex numbers. Moreover, you can even use the complex number method to derive the rotation matrix! Since
$$(a+bi)(\cos \theta + i\sin\theta) = a\cos\theta-b\sin\theta + (a\sin\theta + b\cos\theta)i,$$we see that the rotation of $(a,b)$ about $(0,0)$ by an angle $\theta$ is given by
$$\begin{bmatrix}\cos\theta & -\sin\theta \\ \sin\theta & \cos\theta\end{bmatrix}\begin{bmatrix}a \\ b\end{bmatrix}.$$(Yes, we can argue this also by knowing that rotation is a linear transformation and then plugging in basis vectors. But that requires thinking. This method does not!)

Exercise: Using complex methods, find the area of an equilateral triangle whose vertices have $x$-coordintes $0$, $1$, and $3$, in some order.

Complex numbers are so intertwined with the concept of rotation that Asymptote's coordinate system is literally complex numbers, which hence allows you to multiply points together.

Other applications to Euclidean geometry

Viewing the Euclidean plane as being the same as the complex plane $\mathbb{C}$ gives some useful algebraic structure to geometry that allows one to algebraically prove certain statements. It lets you "add" points, of course, though this isn't that thrilling since this is a "standard" vector space operation. The true power (and really the only reason why you'd want to be in the complex plane versus $\mathbb{R}^2$) is the ability to multiply points. For example, you can prove that a triangle with vertices $a,b,c \in \mathbb{C}$ is equilateral iff
$$a + \omega b + \omega^2 c = 0$$where $\omega$ is a primitive third root of unity, such as $\omega = e^{2\pi i/3} = \frac{-1}{2} + \frac{\sqrt{3}}{2}i$.

Exercise: Prove this fact. (Avoid "bashing" it, there are nice ways to argue both directions.)

Exercise: Use this fact to prove Napoleon's Theorem.

This is just the tip of the iceberg. A more in-depth study of this topic is best deferred to some works of Evan Chen.

2. Trigonometric identities made easy

The sine and cosine angle sum identities can be quite difficult to remember on a first encounter. The way I "memorized" them was quite silly: I simply derived the on the spot whenever I needed them, and each subsequent derivation got faster and faster until I could procure them so fast that I had effectively memorized them. Indeed, "proving" the identities reduces to nothing more of the following back-of-the-envelope calculation:
$$\cos(A+B)+i\sin(A+B) = e^{iA}e^{iB} = (\cos A+i\sin A)(\cos B + i\sin B) = \cos A \cos A - \sin A \sin B + (\cos A \sin B + \sin A \cos B)i.$$The angle sum identities can thus be derived in mere seconds!

Exercise: Derive the triple-angle formula $\cos(3\theta) = 4\cos^3\theta - 3cos\theta$ using complex methods.

In general, you see a good bit of "complex bashing" in Olympiad settings that concern trigonometry. Essentially, if one wishes to prove some identity that involves $\sin$ and $\cos$, we can try writing
$$\sin x = \frac{e^{ix}-e^{-ix}}{2i}$$and
$$\cos x = \frac{e^{ix}+e^{-ix}}{2}.$$While this can be quite messy, it often ends up clarifying the situation since exponentials are more "flexible" to algebraically work with than sine and cosine.

As a last remark here, let's eviscerate a particularly "fearsome" trigonometric identity: The tangent addition formula for multiple angles. That is,
$$\tan(\theta_1+\theta_2+\ldots+\theta_n) = \frac{s_1-s_3+s_5-s_7 +\ldots}{1-s_2+s_4-s_6+\ldots},$$where
$$s_k = \sum_{S \subseteq \{1,\cdots,n\}, |S| = k} \prod_{j \in S} \tan \theta_j$$is the $k$th symmetric polynomial in $\tan \theta_1, \cdots, \tan\theta_n$. For example,
$$\tan(A+B+C) = \frac{\tan A + \tan B + \tan C - \tan A \tan B \tan C}{1 - \tan A \tan B - \tan A \tan C - \tan B \tan C}.$$When this was taught to me first, I recall the argument being a bit of an inductive mess. But such pains are completely sidestepped using complex numbers!

Let's start by finding a complex number $a$ whose argument is $A$. How about $a = 1 + i\tan A$? Define $b$ and $c$ similarly. Then the product $abc$ has argument $A+B+C$, and $\tan(A+B+C)$ will be given by the quotient of the imaginary and real parts of $a+b+c$. Computing,
$$abc = (1+i\tan A)(1+i\tan B)(1+i\tan C) = 1 - \tan A \tan B - \tan A \tan C - \tan B \tan C + (\tan A + \tan B + \tan C - \tan A \tan B \tan C)i.$$The tangent addition formula shows up immediately with practically no effort, and the argument is equally easy for the general case of $n$ angles.

Exercise: Evaluate the infinite sum
$$\sum_{n=1}^\infty \frac{\cos n}{2^n}$$using complex numbers.

Exercise: Let $\theta \in \mathbb{R}$ be a constant. Using complex numbers, obtain a closed form for the finite sum
$$a_n := \sum_{k=0}^n \sin(k\theta),$$and deduce that the sequence $\{a_n\}_{n=0}^\infty$ is bounded. (Note that if $\sin$ is replaced with $\cos$, the same statement holds except for when $\theta$ is an integer multiple of $2\pi$.)

3. Sums of two squares

A famous result in elementary number theory is that if there are positive integers $x$ and $y$, each of which can be written as the sum of two perfect squares, then the product $xy$ can also be written as the sum of two squares! This miraculous statement can be proven with the help of complex numbers.

Suppose $x = a^2+b^2$ and $y = c^2+d^2$. Then
$$x = |a+bi|^2$$and
$$y = |c+di|^2.$$Therefore
$$xy = |a+bi|^2|c+di|^2 = |(a+bi)(c+di)|^2 = |ac-bd + (bc+ad)i|^2 = (ac-bd)^2 + (bc+ad)^2.$$Voila.

Of course, one could have jumped straight to the identity $(a^2+b^2)(c^2+d^2) = (ac-bd)^2 + (bc+ad)^2$ without ever mentioning complex numbers, but good luck finding this without the help of our imaginary friends.

Exercise: Using the help of complex numbers, show that if $x$ is a positive even integer that can be written as the sum of two squares, then $x/2$ can also be written as the sum of two squares.

4. Roots of Unity Filtering

Suppose we have a (possible infinite) polynomial $P(x) = a_0 + a_1x + a_2x^2 + \ldots.$ The sum of the coefficients of $P$ is given as $P(1)$, clearly. How about the sum of every other coefficient? If we wish to obtain a nice expression for $a_0+a_2+a_4+\ldots$, one trick is to observe that
$$P(1) = a_0+a_1+a_2+a_3+\ldots$$$$P(-1) = a_0-a_1+a_2-a_3+\ldots$$and then sum these equalities to obtain
$$P(1)+P(-1) = 2a_0 + 2a_2 + 2a_4 + \ldots.$$We have hence killed all non-even terms by introducing $(-1)$, which lets us conclude that the sum of every other coefficient, $a_0+a_2+\ldots$, is exactly $\frac{P(1)+P(-1)}{2}$.

What about every third coefficient? Is there a nice expression for $a_0+a_3+a_6+\ldots$? The answer is yes, and we can actually do the same trick to obtain it? Last time, we killed all non-even terms using the fact that $1 + (-1) = 0$. How do we kill all non-multiple-of-3 terms? We simply go into the complex plane, using the fact that $1+\omega+\omega^2 = 0$ where $\omega$ is a (primitive) third root of unity!

If $\omega$ is, say, $-\frac12 + \frac{\sqrt{3}}{2}i$, then $\omega^3 = 1$ and $1+\omega+\omega^2 = 0$. So $1+\omega^n+(\omega^2)^{n}$ is $3$ exactly when $n$ is a multiple of $3$, and otherwise it is $0$, vanishing into thin air. A good picture to have in mind for convincing yourself that this is true, and for visualizing what exactly is going on here, is to view $1^n$, $\omega^n$, and $(\omega^2)^n$ as "propellors" rotating in the complex plane at different speeds. The $1^n$ propellor stays at $1$ without ever moving. The $\omega^n$ propellor starts at $1$ and rotates $120$ degrees counter-clockwise every time $n$ increments. The $(\omega^2)^n$ rotates at twice the speed, rotating $240$ degrees counter-clockwise every time $n$ goes up. In this visualization, the three propellors meet up exactly when $n$ is a multiple of $3$, but otherwise they are evenly spaced around $0$, which causes the cancellation.

To spell out the punchline, we plug in $1$, $\omega$, and $\omega^2$ into $P$ to get
$$P(1) = a_0+a_1+a_2+a_3+\ldots$$$$P(\omega) = a_0+a_1\omega+a_2\omega^2+a_3\omega^3+\ldots$$$$P(\omega^2) = a_0+a_1\omega^2+a_2\omega^4+a_3\omega^6+\ldots$$and summing these gives
$$P(1)+P(\omega)+P(\omega^2) = a_0(3) + a_1(0) + a_2(0) + a_3(3) + \ldots.$$Thus
$$a_0+a_3+a_6+\ldots = \frac{P(1)+P(\omega)+P(\omega^2)}{3}.$$This neat trick works when $3$ is replaced with any prime number.

This method, known as "Roots of Unity Filtering", is especially useful when applied to certain polynomials $P$. A common choice is $P(x) = (1+x)^n = \sum_{k=0}^n \binom{n}{k}x^k$, which gives a beautiful expression for sums such as
$$\sum_{k=0}^{100} \binom{300}{k}.$$If you haven't seen this little sleight of hand yet, it is worth computing. (And if you're stuck, this demonstration is literally on every other site that talks about roots of unity filtering.)

Exercise: Use the methods of complex numbers to find a smooth function $f:\mathbb{R} \to \mathbb{R}$, written in closed form, such that the sequence of derivatives of $f$ has period $3$. (That is, $f''' = f$ but $f' \neq f$.) Your expression for $f$ should not involve non-real numbers.

5. Acceleration of uniform circular motion

Disclaimer: I am not a physicist.

This is rather informal and, as far as I'm aware, not an actual "application" of complex numbers. It's a fun connection to make, however!

If you took some form of physics, you are probably aware that the acceleration of an object undergoing uniform circular motion about a center $O$ is a vector that points towards $O$. If you're like me, you probably found this extremely unintuitive and/or remarkable. Why does it point exactly towards the center? Complex numbers can help us understand why. An object rotating about the complex origin $0$ with angular velocity $\omega$ will at
$$e^{i\omega t}$$after $t$ seconds. This is the position of the object over time. Differentiating gives the velocity over time,
$$i\omega e^{i\omega t}.$$Lastly, differentiating this gives the acceleration of the object over time,
$$-\omega^2 e^{i\omega t}.$$The position vector $e^{i\omega t}$ and the acceleration vector $-\omega^2 e^{i\omega t}$ are pointing in opposite directions at all times. In other words, if we place the acceleration vector at the position of the position vector, then it will point towards $0$, the center of rotation.

6. Differential Equations

Elementary Applications

Differential equations can be solved with the help of complex numbers. Consider, for example, the differential equation for periodic motion or whatever it's called,
$$u'' = -ku$$where $k > 0$ is the spring constant. Practically speaking the "best" way to solve this in my opinion is to make the substitution $u(t) = v(\sqrt{k}t)$ to eliminate the $k$, and then guess that $v(t) = a\sin t + b\cos t$. But you'd only guess this if you pretty much know what the solution is (and this would be expected from you in practice), so just for fun let's try and derive the solution "from scratch".

Let $D$ be the "differentiation operator". Then the ODE may be written as $(D^2+kI)u = 0$, where $I$ is the "identity". We can then make the factorization
$$(D+\sqrt{k}iI)(D-\sqrt{k}iI)u = 0.$$(If you haven't seen this trick before, it SHOULD feel suspicious. Fortunately it turns out that this is actually perfectly rigorous if you think hard enough.)

This lets us decouple the ODE into a simple system. Let $v = (D-\sqrt{k}iI)u$. Then we can start by solving $(D+\sqrt{k}iI)v = 0$, i.e.
$$v' + \sqrt{k}iv = 0. \qquad (*)$$Of course, it's crucial to recognize that we are now allowing our solutions to be complex-valued, i.e. of type $\mathbb{R} \to \mathbb{C}$ rather than just $\mathbb{R} \to \mathbb{R}$. This is not a mistake because finding all solutions of the former type will also find all solutions of the latter type.

Solving $(*)$ using your favorite method (such as the rather non-rigorous "separate and integrate", or perhaps using an integrating factor, or even by guessing, like I just did), we end up with $v(t) = v(0)e^{-\sqrt{k}i t}$. Now, from $v = (D-\sqrt{k}iI)u = u' - \sqrt{k}iu$, we get a first-order ODE for $u$,
$$v(0)e^{-\sqrt{k}i t} = u'(t) - \sqrt{k}iu(t).$$Using your favorite method (probably integrating factors), we get the solution
$$u(t) = Ae^{-\sqrt{k}i t} + Be^{\sqrt{k}i t}$$for constants $A$ and $B$. These are all the complex solutions. To recover the real solutions, some more work should be done, which I will omit. (Do beware that $A$ and $B$ are not necessarily real.)

Exercise: An ant is placed on each corner of a square of side length $2024$. Each ant walks towards its counter-clockwise neighbor at a speed of $42$ units per second. Eventually the ants will converge at the center of the square. Using complex numbers, prove that each ant will have travelled exactly $2024$ units.

Advanced Applications

This is not the only way to employ complex numbers, however. We can also try using the Fourier Transform, which in this post we define to be
$$\mathcal{F}(u)(xi) = \hat{u}(\xi) := \int_{\mathbb{R}} u(t)e^{-2\pi i\xi t}\,dt.$$Provided sufficient regularity and decay properties on $u$, we have that
$$\mathcal{F}(u')(\xi) = -2\pi i \xi \mathcal{F}(u)(\xi).$$This allows us to effectively turn differentiation into multiplication. Applying this to the spring equation,
$$(-2\pi i \xi)^2\hat{u}(\xi) = -k\hat{u}(\xi).$$Strangely, this equation seems to suggest that the only solution is $\hat{u}(\xi) = 0$, so that $u(x) = 0$. What gives?

The issue is that you can only take the Fourier transform of $u$ provided that $u$ has sufficient decay properties, e.g. something like $|u(x)| \leq \frac{A}{1+|x|^2}$. So in a way, trying to solve the spring equation using Fourier methods has given us every possible solution that decays to 0 at infinity. Indeed, $u(x) = 0$ is the only such solution.

So is this approach doomed? It is not, but fixing this will (1) require some very advanced theory, and (2) doesn't necessarily find all solutions either (the saving grace here, though, is that it does appear to find all solutions "symbolically"). The advanced theory we need is called the theory of distributions, particularly that of tempered distributions. A full discussion of this will be deferred to a different post because it really is quite advanced, but the gist is that distributions are a generalization of functions. Some properties includes:
  • If $f$ does not grow too fast, then it is also a tempered distribution.
  • If $f$ is a tempered distribution, then you can take its Fourier transform $\hat{f}$ to get another tempered distribution.
  • You can add tempered distributions, and this interacts well with the Fourier transform.
  • You cannot multiply tempered distributions. But you can multiply tempered distributions by a smooth function. For example, if $f$ is a tempered distribution, then so is $xf(x)$.
  • We still have the nice identity $\mathcal{F}(u')(\xi) = -2\pi i \xi \mathcal{F}(u)(\xi)$, which holds "in the sense of distributions". That is, you shouldn't view each side as a function. Rather you should see each side as a tempered distribution.

Now let us solve the spring equation, and I will make exactly one (fatal!) error that I shall address at the end. Taking the Fourier transform again gives
$$(-2\pi i \xi)^2\hat{u}(\xi) = -k\hat{u}(\xi).$$This rearranges to
$$(2\pi\xi-\sqrt{k})(2\pi\xi+\sqrt{k})\hat{u}(\xi) = 0. \qquad (*)$$It might seem like we've wound up at the same problem. Isn't the only solution here $\hat{u}(\xi) = 0$? This would be true if this equation was an equality of functions. But it's an equality of distributions, so the intuitive "algebra" doesn't actually apply. It turns out that the solution to a equation such as $(x-a)f(x) = 0$ for $f$, in the sense of distributions, is given by $f = c \cdot \delta_a$ where $\delta_a$ is the Dirac delta distribution at $a$ and $c$ is any constant. This is not that easy to prove, but it is powerful here. Applying this twice to $(*)$, we find the distributional solution
$$\hat{u}(\xi) = c_1\delta_{-\sqrt{k}/(2\pi)} + c_2\delta_{\sqrt{k}/(2\pi)}$$for any constants $c_1$ and $c_2$. Applying the inverse Fourier transform on both sides recovers the solution to the spring equation. I'll defer the details (and the more rigorous argument) for another post.

I promised that there was a fatal error. Indeed, the error here was the assumption that the solution $u$ could be treated as a (tempered) distribution. Unfortunately, we do not know a priori that this is the case, since a solution such as $u(x) = e^x$ actually grows too fast to be interpreted as a tempered distribution. Hence, this method is unable to rule out candidates such as $u(x) = e^x$.

One silver lining is that if you just "pretend" that you could work with $e^x$ as a distribution and do the algebra to solve a differential equation such as $u' = u$, then you will find that the method still recovers the solution, even though the steps cease to be legal. Also, this tool can be used in conjunction with other theorems to find all solutions or to even just obtain some useful information, particularly in more complicated differential equations.

The Fourier transform (in multiple dimensions) can also be used to solve partial differential equations such as $\Delta u = f$, but I won't pursue this matter here.

(Part 2: Here)
This post has been edited 1 time. Last edited by greenturtle3141, Aug 23, 2024, 3:42 AM

An Alternate Proof of Ascoli-Arzela

by greenturtle3141, Apr 13, 2024, 4:02 AM

Reading Difficulty: 4.5/5

Prerequisites: Understand the title

I make no claim that this is superior to the "Bolzano-Weierstrass + diagonalization" proof.

Theorem (Ascoli-Arzela): Let $(K,d)$ be a compact metric space. Then a family $\mathcal{F} \subseteq C(X;\mathbb{R})$ is precompact iff its is pointwise bounded and equicontinuous.

Proof. $(\implies)$ Exercise.

$(\impliedby)$ We approach this by showing that $\mathcal{F}$ is totally bounded. Fix $\varepsilon > 0$.

Note that $\mathcal{F}$ is actually uniformly equicontinuous (why?). So pick $\delta > 0$ such that $|f(x)-f(y)| < \varepsilon/10$ for all $x,y \in K$ with $d(x,y) < \delta$ and for all $f \in \mathcal{F}$.

Find a finite cover of $K$ with $n$ balls $\{B_i(x_i,\delta)\}_{i=1}^n$ of radius $\delta > 0$. For each tuple $(y_1,\cdots,y_n)$ of real numbers, construct a continuous function $g_{y_1,\cdots,y_n}:K \to \mathbb{R}$ satisfying $g_{y_1,\cdots,y_n}(x_i) = y_i$ for all $i$ as follows:
$$g_{y_1,\cdots,y_n}(x) = \frac{\sum_{i=1}^n \omega(d(x,x_i))y_i}{\sum_{i=1}^n \omega(d(x,x_i))}$$Where $\omega:(0,\infty) \to [0,\infty)$ is some decreasing continuous "weighting function" with the values $\omega(0^+) = +\infty$ and $\omega(\delta) = 0$ (and $\omega(t) = 0$ for $t \geq \delta$). Something like $\omega(t) = \left(\frac{1}{t}-\frac{1}{\delta}\right)1_{0 < t < \delta}$ would work. But whatever. The whole point of this construction is that:
  • $g_{y_1,\cdots,y_n}(x_i) = y_i$
  • $g_{y_1,\cdots,y_n}$ is continuous
  • $g_{y_1,\cdots,y_n}(x)$ is always expressed as a weighted average of the values $y_i$ for which $x_i \in B(x,\delta)$. In particular, among all $x_i \in B(x,\delta)$, there must be two such points $x_{\text{low}}$ and $x_{\text{high}}$ such that $y_\text{low} \leq g_{y_1,\cdots,y_n}(x) \leq y_\text{high}$. Ultimately this is the property that we'll be exploiting.

Now for each $x_i$ find $M_i > 0$ for which $\sup_{f \in \mathcal{F}} |f(x_i)| \leq M_i < \infty$, let $L_i$ be the set of all multiples of $\varepsilon/10$ in the interval $[-M_i,M_i]$, and define the family
$$\mathcal{G} := \{g_{y_1,\cdots,y_n} : y_i \in L_i\}.$$Note that $\mathcal{G}$ is a finite subset of $C(K;\mathbb{R})$. We are done if we can show that any element of $\mathcal{F}$ is within $\varepsilon$ of some member of $\mathcal{G}$.

Indeed, take $f \in \mathcal{F}$ and for each $i$, find $y_i \in L_i$ for which $|f(x_i)-y_i| < \varepsilon/10$. We claim that $\|f-g_{y_1,\cdots,y_n}\|_\infty < \varepsilon$. Well, for any $x \in K$, we can find two points among $\{x_1,\cdots,x_n\} \cap B(x,\delta)$, $x_{\text{low}}$ and $x_{\text{high}}$, such that
$$f(x_\text{low}) - \varepsilon/10 < y_\text{low} \leq g_{y_1,\cdots,y_n}(x) \leq y_\text{high} < f(x_\text{high}) + \varepsilon/10.$$But $d(x,x_\text{low}) < \delta \implies |f(x)-f(x_\text{low})| < \varepsilon/10$, and ditto for $x_\text{high}$. So in fact,
$$f(x) - \varepsilon/5 < g_{y_1,\cdots,y_n}(x) < f(x) + \varepsilon/5.$$We're done. $\square$

Elevator Math!

by greenturtle3141, Apr 5, 2024, 8:30 PM

Reading Difficulty: 1.5/5

Can you solve a real life math puzzle?

I currently I live in an apartment complex for my PhD program. My apartment building has $13$ floors: $T$ ("Terrace"), $M$ ("Mezzanine"), $1$, $2$ $\cdots$, $10$ and $11$. Since it's a tall building, there are two elevators.

One day, I called an elevator and stepped in, but forgot to press a button to tell it to go to another floor. The elevator proceeded to automatically go to the Terrace. Interesting. This elevator seems to have a default floor: $T$, that it goes to whenever it thinks nobody is using the elevator.

I guess it made sense for one elevator to default to the bottom-most floor. What about the other elevator? After stepping outside and messing around a bit, it was clear that the other elevator also had a default floor.

What floor does the other elevator default to when nobody is using the elevators?

(It may seem like you have basically no information to work with, and that's true! In real life, we often have to make some reasonable assumptions before math can be done. Good luck!)

Hint 1

Hint 2

Hint 3

A more complete problem statement

Answer


Elevator math!

Alright let's get some math done.

The floor names are annoying so let's rename them to $0,1,\cdots,12$. Then the default floor of the "first elevator" is $0$ and the default floor of the "second elevator" is $a$, and we want to solve for $a$ such that travel time is optimized.

Choose a floor $X$ from these 13 floors uniformly at random. Take a resident of floor $X$ that wants to use the elevators at some random point in the day. There are two cases.
  • There is a $50\%$ chance that this resident is coming home, in which case they start at floor $0$ and use the first elevator to go up $X$ floors.
  • There is a $50\%$ chance that this resident is leaving home, in which case they start at floor $X$ and call the nearest elevator, which has to travel $\min(|a-X|, |X-0|)$ floors to reach floor $X$, and then they use the elevator to go down $X$ floors.

Thus the expected travel time given a random resident at a random time of day (in terms of how many floors the elevator they use needs to travel) is
$$\mathbb{E}\left[\frac12 \cdot X + \frac12 \cdot \left(\min(|a-X|, |X-0|) + X\right)\right].$$We want to find the value of $a$ that minimizes this.

First, note that we can rewrite this as
$$\mathbb{E}\left[X + \frac12 \cdot \min(|a-X|, |X-0|) \right].$$This makes sense because no matter what, a resident of a floor $X$ will always have to travel at least $X$ floors. So all we need to do is minimize the expression
$$\mathbb{E} \min(|a-X|,X),$$because the only thing we can try to minimize is how long it would take for the nearest elevator to reach floor $X$. All other time expenses are inevitable.

Now, using the definition of expected value, we want to minimize the quantity
$$\sum_{k=0}^{12} \min(|a-k|,k).$$At this point there's not much we can do besides listing out the values of this sum over all $a$.
\begin{tabular}{c|c}
$a$ & Sum\\
\hline
0 & 78 \\
1 & 66 \\
2 & 56 \\
3 & 47 \\
4 & 40 \\
5 & 34 \\
6 & 30 \\
7 & 27 \\
8 & 26 \\
9 & 26 \\
10 & 28 \\
11 & 31 \\
12 & 36 
\end{tabular}
So it looks like the sum is minimized when $a = 8$ or $a=9$. In the original problem, this corresponds to floors $7$ and $8$. So if you answered $8$, you get partial credit!

...so why is the answer $7$? Was it just arbitrary?


Continuous abstraction

Let's put that question aside and instead try to make some general assertions when there are more floors. If there were, say, $1000$ floors, what's a very good educated guess for where the second elevator should be?

Obviously I don't want to list out $1000$ different values. But $1000$ is such a big number that we can effectively study the problem by making a sort of abstract "idealization": Let's just pretend that there are uncountably infinite many floors!

That is, let's model the building as the interval $[0,1]$. We choose a random "floor" $X \in [0,1]$ uniformly at random, and we now want to minimize the expected value of $\min(|X-a|,X)$, where $a$ is where the second elevator is. The expected value is given by the integral
$$\int_0^1 \min(|x-a|,x)\,dx.$$Integrals are so much easier to deal with than sums, which is why we're motivated to make this abstraction. To evaluate this, we split up into three intervals to get
$$\int_0^1 \min(|x-a|,x)\,dx = \int_0^{a/2} \min(|x-a|,x)\,dx + \int_{a/2}^a\min(|x-a|,x)\,dx + \int_a^1 \min(|x-a|,x)\,dx.$$$$ = \int_0^{a/2} x\,dx + \int_{a/2}^a a-x\,dx + \int_a^1 x-a\,dx$$$$ = \frac{a^2}{8} + \frac{a^2}{2} - \frac{3a^2}{8} + \frac12 - \frac12 a^2 - a(1-a)$$$$ = \frac{3}{4}a^2 - a + \frac12.$$This is minimized when $a = \frac{1}{2 \cdot \frac34} = \boxed{\frac23}$! So the general heuristic is that the second elevator should be $2/3$ of the way up the building.

So, in a building with $13$ floors, numbered $0$ through $12$, we should expect that the second elevator defaults to the floor $2/3$ of the way up, which is $8$. In the original problem, this corresponds to floor $7$. That's why $7$ is the most reasonable answer.


Even more elevators

Now, what if instead of just two elevators, there were actually $n$ elevators? Again, just for the sake of heuristics, let's assume that the apartment building's floors are given by the interval $[0,1]$.

I leave it as an exercise to show that one of the elevators has to be at floor $0$ (intuitively, this is because half the time, people are coming home, and placing one elevator at $0$ minimizes all their travel times). So it remains to place the other $n-1$ elevators.

Let's say that the other elevators are at floors $a_1,\cdots,a_{n-1}$ with $a_1 < a_2 < \ldots < a_{n-1}$. We can draw a graph of how much time it takes for the nearest elevator to get to you.

[asy]
size(6cm);
real[] a = {0.1, 0.3, 0.4, 0.7, 0.85};
int n = a.length;
real L = 5;
path gr = (0,0)--(a[0]/2*L,a[0]/2*L)--(a[0]*L,0);
for(int i=0; i<n-1; ++i){
gr = gr--((a[i]+a[i+1])/2*L,(a[i+1]-a[i])/2*L)--(a[i+1]*L,0);
}
gr = gr -- (L, L*(1-a[n-1])) -- (L,0);
fill(gr--cycle, p=rgb(1,0.8,0.8));
draw(gr,p=red);
draw((0,0)--(5,0));
dot("$0$", (0,0), S);
label("$1$", (L,0),S);
for(int i=0; i<n; ++i){
dot("$a_"+((string) (i+1))+"$", (a[i]*L,0),S);
}
[/asy]

We're trying to minimize the area under the graph. This area is comprised of $n$ triangular "mountains" and $1$ more triangle at the very end.

Let's consider two adjacent mountains: The one between $a_{i-1}$ and $a_i$, and the one between $a_i$ and $a_{i+1}$, for some $i$.

The base of the first mountain is $x = a_i- a_{i-1}$, and the base of the second mountain is $y = a_{i+1}-a_i$.

[asy]
size(6cm);
path gr = (0,0)--(1,1)--(2,0)--(3.5,1.5)--(5,0);
fill(gr--cycle, p=rgb(1,.8,.8));
draw(gr, p=red);
draw((0,0)--(5,0));
label("$x$", (1,0), N);
label("$y$", (3.5,0),N);
dot("$a_i$", (2,0), S);
dot("$a_{i-1}$", (0,0),S);
dot("$a_{i+1}$", (5,0),S);
[/asy]

The combined area of the mountains is then $\frac{x^2+y^2}{4}$. Where could we move $a_i$ so that this is minimized?

Aha! We are trying to minimize $\frac{x^2+y^2}{4}$ subject to the constraint $x+y = a_{i+1}-a_{i-1}$. By the QM-AM inequality, the minimum is obtained exactly when $x=y$. Therefore, in the minimum configuration, each pair of adjacent mountains must be of the same size! Because otherwise, there must be two adjacent mountains with different sizes, and then we can move the point in between them so that their area goes down.

Let's update what the full graph looks like now.

[asy]
size(6cm);
real[] a = {0.15, 0.3, 0.45, 0.6, 0.75};
int n = a.length;
real L = 5;
path gr = (0,0)--(a[0]/2*L,a[0]/2*L)--(a[0]*L,0);
for(int i=0; i<n-1; ++i){
gr = gr--((a[i]+a[i+1])/2*L,(a[i+1]-a[i])/2*L)--(a[i+1]*L,0);
}
gr = gr -- (L, L*(1-a[n-1])) -- (L,0);
fill(gr--cycle, p=rgb(1,0.8,0.8));
draw(gr,p=red);
draw((0,0)--(5,0));
dot("$0$", (0,0), S);
label("$1$", (L,0),S);
for(int i=0; i<n; ++i){
dot("$a_"+((string) (i+1))+"$", (a[i]*L,0),S);
}
[/asy]

With the above positions for the elevators, we can't move $a_1, a_2,\cdots,a_{n-2}$ to make the area go down. So we can only move the elevator at the end, $a_{n-1}$ (in the above picture, this is $a_5$).

Where do we move the final elevator to minimize the area of the two triangles next to it? We could do another QM-AM-type argument. But we actually don't have to do any work: We've done this before! This just reduces to the case with two elevators (restricted to just the interval $[a_{n-2},1]$)! So, it is optimal to put $a_{n-1}$ exactly $2/3$ of the way between $a_{n-2}$ and $1$.

Therefore, in the optimal configuration, we must have
$$a_1 = a_2-a_1 = a_3 - a_2 = \ldots = a_{n-1} - a_{n-2},$$and
$$a_{n-1} - a_{n-2} = \frac23(1 -a_{n-2}).$$
[asy]
size(6cm);
real[] a = {2/11, 4/11, 6/11, 8/11, 10/11};
int n = a.length;
real L = 5;
path gr = (0,0)--(a[0]/2*L,a[0]/2*L)--(a[0]*L,0);
for(int i=0; i<n-1; ++i){
gr = gr--((a[i]+a[i+1])/2*L,(a[i+1]-a[i])/2*L)--(a[i+1]*L,0);
}
gr = gr -- (L, L*(1-a[n-1])) -- (L,0);
fill(gr--cycle, p=rgb(1,0.8,0.8));
draw(gr,p=red);
draw((0,0)--(5,0));
dot("$0$", (0,0), S);
label("$1$", (L,0),S);
for(int i=0; i<n; ++i){
dot("$a_"+((string) (i+1))+"$", (a[i]*L,0),S);
}
[/asy]

Some basic algebra reveals the final answer: When there are $n$ elevators, it is optimal to place them at the following floors:
$$0, \frac{2}{2n-1}, \frac{4}{2n-1}, \cdots, \frac{2(n-1)}{2n-1}$$


It's always a great joy to discover compelling and intricate mathematics in the wild!
This post has been edited 4 times. Last edited by greenturtle3141, Apr 6, 2024, 1:13 AM

Morey's Embedding via Campanato's Criterion

by greenturtle3141, Apr 2, 2024, 3:23 AM

Reading Difficulty: 6/5
Prerequisites: Know what a Sobolev space is

When $p > N$ we have that $W^{1,p}(\mathbb{R}^N)$ continuously embeds into $C^{0,1-N/p}(\mathbb{R}^N)$. For the sake of my fingers I'm going to drop the $(\mathbb{R}^N)$; all spaces referenced in this post are over $\mathbb{R}^N$ unless otherwise specified. Anyways, I have a hard time remembering the proof. I think the following approach is far easier to remember and more useful since it relies on a result which is useful in other areas.

The first "step" is to prove the following very cool characterization of Holder continuity. Throughout, we use the notation $\bar{u}_{x,r}$ to mean the average integral
$$\bar{u}_{x,r} := \frac{1}{\omega r^N}\int_{B(x,r)} u(y)\,dy$$where $\omega$ is the measure of the $N$-ball. We also write $A \preceq B$ to mean that there exists a constant $C$ depending only on $N$ such that $A \leq CB$. Also, to avoid writing $C'$ or $C_1$ et cetera, all unimportant "constants" hereafter are referred to as $C$.

Theorem (Campanato's Criterion): Let $\alpha > 0$, $u \in L^1_{\text{loc}}$. Then $u \in C^{0,\alpha}$ iff there exists $C_* > 0$ such that
$$\frac{1}{r^N}\int_{B(x,r)} |u(y)-\bar{u}_{x,r}|\,dy \leq C_*r^\alpha \qquad (*)$$for all $x \in \mathbb{R}^N$ and all $r > 0$.


Written plainly: A function is $\alpha$-Holder iff the $B(x,r)$-average of the deviation from the $B(x,r)$-average is controlled by $r^\alpha$.

(For a reason that we'll see in the remarks, we'll be keeping track of the $C_*$ constant, which is why it's special.)

Proof.

The forward direction is a simple exercise, so we'll just consider the backward direction. Assume that $(*)$ holds for every $x$ and $r > 0$.

Step 1

We begin by investigating how much the average changes when the radius changes. To wit, fix $x$ and take $r,R$ with $r < R$. What can we say about $|\bar{u}_{x,r} - \bar{u}_{x,R}|$? Since we want to use $(*)$, it makes sense to take an arbitrary $y$ and go up with the triangle inequality:
$$|\bar{u}_{x,r} - \bar{u}_{x,R}| \leq |\bar{u}_{x,r} - u(y)| + |u(y) - \bar{u}_{x,R}|$$Now we average the RHS over $y \in B(x,r)$ (the smaller ball). From there, the bounding here is quite natural:
$$|\bar{u}_{x,r} - \bar{u}_{x,R}| \preceq \frac{1}{r^N}\int_{B(x,r)}|\bar{u}_{x,r} - u(y)|\,dy + \frac{1}{r^N}\int_{B(x,r)}|u(y) - \bar{u}_{x,R}|\,dy$$$$\leq \frac{1}{r^N}\int_{B(x,r)}|\bar{u}_{x,r} - u(y)|\,dy + \frac{1}{r^N}\int_{B(x,R)}|u(y) - \bar{u}_{x,R}|\,dy$$$$= \frac{1}{r^N}\int_{B(x,r)}|\bar{u}_{x,r} - u(y)|\,dy + \left(\frac{R^N}{r^N}\right)^N\frac{1}{R^N}\int_{B(x,R)}|u(y) - \bar{u}_{x,R}|\,dy$$$$\stackrel{(*)}\preceq C_*r^\alpha + \left(\frac{R}{r}\right)^NR^\alpha \leq 2\left(\frac{R}{r}\right)^NR^\alpha.$$So $|\bar{u}_{x,r} - \bar{u}_{x,R}| \preceq C_*\left(\frac{R}{r}\right)^NR^\alpha$. That's a nice estimate.

Step 2

Now we use this estimate to get information on how $u(x)$ differs from the average around $x$. Fix a Lebesgue point $x$ of $u$ (since they are the only points where a pointwise value of $u$ makes sense) and $r > 0$. What can we say about $|u(x) - \bar{u}_{x,r}|$? The natural idea is that the value $u(x)$ is recovered by the limit of $\bar{u}_{x,s}$ as $s \to 0^+$. So we apply Step 1 for very small radii. Particularly, we apply Step 1 for $(r,R) = (r/2,r), (r/4, r/2), (r/8, r/4)$, etc. An "infinite triangle inequality" gives
$$|u(x) - \bar{u}_{x,r}| \leq \sum_{k=0}^\infty |\bar{u}_{x,r/2^k} - \bar{u}_{x,r/2^{k+1}}|,$$and the sum bounds as
$$\stackrel{(1)}\preceq C_*\left(\frac{r/2^k}{r/2^{k+1}}\right)^N(r/2^k)^\alpha = C_*2^{N-k\alpha}r^\alpha.$$So $|u(x) - \bar{u}_{x,r}| \preceq C_*r^\alpha$. Good.

Step 3

We're basically ready to show $u$ is $\alpha$-Holder now. Fix $x,y$ Lebesgue points. We're done if we show $|u(x)-u(y)| \preceq \|x-y\|^\alpha$ (because then there is clearly a $\alpha$-Holder extension from the Lebesgue points by density, and this is the desired Holder continuous representative). Step 2 gives
$$|u(x)-u(y)| \leq |u(x) - \bar{u}_{x,r}| + |\bar{u}_{x,r} - \bar{u}_{y,r}| + |\bar{u}_{y,r} - u(y)| \preceq C_*r^\alpha + |\bar{u}_{x,r} - \bar{u}_{y,r}|.$$Well hey, we've never actually chosen $r$. The obvious value to take is $r = \|x-y\|$, so let's do that. Now it remains to prove that $|\bar{u}_{x,r} - \bar{u}_{y,r}|$ is controlled by $r^\alpha$ for this value of $r$. In particular we'll show that it's $\preceq C_*r^\alpha$.

Again, we need to use $(*)$, and to do that we use the same trick of taking an arbitrary point and going up by triangle inequality.
$$|\bar{u}_{x,r} - \bar{u}_{y,r}| \leq |\bar{u}_{x,r} - u(z)| + |u(z) - \bar{u}_{y,r}|$$Now average the RHS over $z \in B(x,r) \cap B(y,r)$. Note that since $r = \|x-y\|$, the measure of $B(x,r) \cap B(y,r)$ is $r^N$ times some constant depending only on $N$. Hence
$$|\bar{u}_{x,r} - \bar{u}_{y,r}| \preceq \frac{1}{r^N}\int_{B(x,r) \cap B(y,r)} |\bar{u}_{x,r} - u(z)|\,dz + \frac{1}{r^N}\int_{B(x,r) \cap B(y,r)}|u(z) - \bar{u}_{y,r}|\,dz.$$Now grow the domain of integration to get
$$\leq \frac{1}{r^N}\int_{B(x,r)} |\bar{u}_{x,r} - u(z)|\,dz + \frac{1}{r^N}\int_{B(y,r)}|u(z) - \bar{u}_{y,r}|\,dz,$$and this is $\preceq C_*r^\alpha$ by $(*)$. $\square$

A few important remarks:
  • Thanks to our dilligence in keeping track of $C_*$, we've actually shown that
    $$|u(x)-u(y)| \leq C \cdot C_*\|x-y\|^\alpha$$for a constant $C$ depending only on $N$. That is, $|u|_{C^{0,\alpha}} \preceq C_*$. So a more descriptive statement of the criterion is that the quantity
    $$|u|_{\text{camp}} := \inf\{C_* : \text{$(*)$ holds}\}$$is "equivalent" to the Holder seminorm $|u|_{C^{0,\alpha}}$ in the sense that $|u|_{C^{0,\alpha}} \preceq |u|_{\text{camp}} \preceq |u|_{C^{0,\alpha}}$.
  • Campanato generalizes to other open domains. I'm pretty sure that for any $U$, we have that $(*)$ holds with $\mathbb{R}^N$ replaced with $U$ iff $u$ is Holder continuous on $U$. I can't guarantee that the previous remark still holds though, I'll leave that to you to figure out.

With this criterion proven, Morey's embedding falls out pretty much immediately.

Proof. Take $u \in W^{1,p}$. Then by the Poincare inequality,
$$\int_{B(x,r)} |u(y)-\bar{u}_{x,r}|\,dy \preceq r\int_{B(x,r)}\|\nabla u\|\,dy$$$$\leq r(r^N)^{1-1/p} \|\nabla u\|_{L^p(B(x,r)} \leq r^{1+N-N/p}\|\nabla u\|_{L^p}.$$So
$$\frac{1}{r^N}\int_{B(x,r)} |u(y)-\bar{u}_{x,r}|\,dy \preceq r^{1-N/p}\|\nabla u\|_{L^p},$$giving us that $u \in C^{0,1-N/p}$ by Campanato. In fact, the above inequality tells us that $|u|_{\text{camp}} \preceq \|\nabla u\|_{L^p}$, so by the first remark we may deduce that
$$|u|_{C^{0,\alpha}} \preceq |u|_{\text{camp}} \preceq \|\nabla u\|_{L^p}.$$So we've tamed the seminorm. As for the sup norm, we write
$$|u(x)| \leq |u(y)| + |u(x)-u(y)| \leq |u(y)| + |u|_{C^{0,1-N/p}}\|x-y\|^{1-N/p},$$and averaging the RHS over $y \in B(x,1)$ gives
$$|u(x)| \preceq \|u\|_{L^p} + |u|_{C^{0,1-N/p}} \preceq \|u\|_{L^p} + \|\nabla u\|_{L^p}.$$Thus $\|u\|_{C^{0,\alpha}} \leq \|u\|_{W^{1,p}}$, which completes the proof that $W^{1,p} \hookrightarrow C^{0,1-N/p}$. $\square$
This post has been edited 3 times. Last edited by greenturtle3141, Apr 3, 2024, 1:14 AM

The Mathematical Convention Survey

by greenturtle3141, Mar 14, 2024, 6:04 AM

This was a gargantuan effort, but I finally finished writing up the results to this funny survey. Please enjoy.

https://cims.nyu.edu/~tjl8195/survey/results.html

Turtle math!

avatar

greenturtle3141
Archives
+ October 2024
Shouts
Submit
  • Can you give some thought to dropping a guide to STS? Just like how you presented your research (in your paper), what your essays were about, etc. Also cool blog!

    by Shreyasharma, Mar 13, 2025, 7:03 PM

  • this is so good

    by purpledonutdragon, Mar 4, 2025, 2:05 PM

  • orz usamts grader

    by Lhaj3, Jan 23, 2025, 7:43 PM

  • Entertaining blog

    by eduD_looC, Dec 31, 2024, 8:57 PM

  • wow really cool stuff

    by kingu, Dec 4, 2024, 1:02 AM

  • Although I had a decent college essay, this isn't really my specialty so I don't really have anything useful to say that isn't already available online.

    by greenturtle3141, Nov 3, 2024, 7:25 PM

  • Could you also make a blog post about college essay writing :skull:

    by Shreyasharma, Nov 2, 2024, 9:04 PM

  • what gold

    by peace09, Oct 15, 2024, 3:39 PM

  • oh lmao, i was confused because of the title initially. thanks! great read

    by OlympusHero, Jul 20, 2024, 5:00 AM

  • It should be under August 2023

    by greenturtle3141, Jul 11, 2024, 11:44 PM

  • does this blog still have the post about your math journey? for some reason i can't find it

    by OlympusHero, Jul 10, 2024, 5:41 PM

  • imagine not tortoise math

    no but seriously really interesting blog

    by fruitmonster97, Apr 2, 2024, 12:39 AM

  • W blog man

    by s12d34, Jan 24, 2024, 11:37 PM

  • very nice blog greenturtle it is very descriptive and fascinating to pay attention to :-D

    by StarLex1, Jan 3, 2024, 3:12 PM

  • orz blog

    by ryanbear, Dec 6, 2023, 9:23 PM

67 shouts
Tags
About Owner
  • Posts: 3543
  • Joined: Oct 14, 2014
Blog Stats
  • Blog created: Oct 23, 2021
  • Total entries: 54
  • Total visits: 39379
  • Total comments: 126
Search Blog
a