Three approaches to sums of squares

by t0rajir0u, Aug 6, 2008, 2:12 AM

The following two theorems are good examples of number-theoretic results that are quite easy to state but quite difficult to prove (by comparison). They both suggest the need for stronger tools than, say, just modular arithmetic, and in this post we will be looking at three different sets of tools that can be brought to bear on both theorems.

Fermat's Two-Square Theorem: Given a prime $ p$, there exist integers $ a, b$ such that $ a^2 + b^2 = p$ iff $ p = 2$ or $ p \equiv 1 \bmod 4$. Consequently, a number $ n$ is expressible in the form $ a^2 + b^2$ iff the primes congruent to $ 3 \bmod 4$ in its prime factorization each divide $ n$ an even number of times.

Lagrange's Four-Square Theorem: Every non-negative integer can be written in the form $ a^2 + b^2 + c^2 + d^2$ for integers $ a, b, c, d$.

What is perhaps more interesting than the proofs of these two theorems is the fact that the corresponding statement about three squares is harder than either of the statements above, but that's a different story entirely.


Toolkit 1: Algebraic number theory. Our basis for the exploration of this toolkit is the observation that both the set $ \mathcal{S}_2$ of sums of two squares and the set $ \mathcal{S}_4$ of sums of four squares are closed under multiplication. The first result is due to the identity

$ (a^2 + b^2)(c^2 + d^2) = (ac \pm bd)^2 + (ad \mp bc)^2$

and it has as an underlying concept the notion of multiplicativity of the norm $ N(a + bi) = a^2 + b^2$ in our good friend the Gaussian integers. Specifically, the above is a consequence of taking the norms of

$ (a + bi)(c \mp di) = (ac \pm bd) + (ad \mp bc)i$.

This is also clear from our discussion of the complex numbers as 2D rotations.

The corresponding identity for four squares is far more tedious to write out, but its basis is the multiplicativity of the norm $ N(a + bi + cj + dk) = a^2 + b^2 + c^2 + d^2$ in what is called the Lipschitz integers, a (sub-)ring of integers in the quaternions. You may have seen the quaternions by following the links from the complex numbers post; they are a non-commutative extension of the real numbers generated by $ 1$ and the imaginary units $ i, j, k$, each of which can be interpreted to correspond to counterclockwise rotation $ 90^{\circ}$ in the $ xy, yz$, and $ zx$-planes, respectively. It turns out, therefore, that the quaternions are essentially 3D rotations.

$ \mathbb{Z}[\imath]$ is easier to understand than the Lipschitz integers, so we'll tackle the two-square theorem first. It is clearly necessary that $ p = 2$ or $ p \equiv 1 \bmod 4$ in order for a prime to be represented in the form $ a^2 + b^2$ because $ p | a^2 + b^2 \implies \left( \frac {a}{b} \right)^2 \equiv - 1 \bmod p$. But now we are trying to prove that this condition is also sufficient. We proceed as follows: if $ p$ is a prime with $ - 1$ as a quadratic residue, then there exists $ x$ such that $ p | x^2 + 1$. But over the Gaussian integers,

$ p | (x + i)(x - i)$.

If $ p$ is prime in the Gaussian integers, then it must divide either $ x + i$ or $ x - i$, but this is absurd. Hence $ p$ is composite, so there exist Gaussian integers $ z, w$ such that $ p = zw$. But since $ p$ is purely real, (WLOG) $ w$ must be a rational integer multiple of $ \bar{z}$, and since $ p$ is a rational prime $ w = \bar{z}$ and the result follows by letting $ z = a + bi$. Consequently, rational primes congruent to $ 3 \bmod 4$ are prime in the Gaussian integers (or else they could be written in the above form). It then follows that if $ q \equiv 3 \bmod 4$ and $ q | a^2 + b^2$ then $ q | a, q | b$. Repeated application of this lemma gives the full statement of the two-square theorem.

In an algebraic context, the two-square theorem can therefore be understood as reflecting the structure of the Gaussian primes. This notion of primes factoring into particular patterns when we pass to larger rings of integers generalizes significantly, and this is therefore just one application of a rather general concept involving representability by quadratic forms like $ a^2 + db^2$. Note that it is very important that the Gaussian integers have unique factorization.

The four-square theorem requires a little more work: what we want to prove is that every prime can be represented as a sum of four squares, and then by multiplicativity we are done. To do so, we will need the following

Lemma: Every prime $ p$ divides an integer of the form $ a^2 + b^2 + 1$.

Proof. This is obvious when $ p = 2$; when $ p$ is odd, there are $ \frac {p + 1}{2}$ possible values of $ a^2 \bmod p$ and $ \frac {p + 1}{2}$ possible values of $ - b^2 - 1 \bmod p$, so the two sets are not disjoint.


Excursion: Let's discuss what's going on here geometrically some more, since, as easy as the above proof is, the choice of $ a^2 + b^2 + 1$ appears to have been pulled out of the air. Geometric intuition over $ \mathbb{Q}$ tells us very little about how the variety $ x^2 + y^2 + 1 = 0$ behaves since it is an imaginary ellipse and has no real component. In contrast, earlier I showed that the rational projective unit circle $ x^2 + y^2 - 1 = 0$ was isomorphic to the rational projective line via the parameterization $ \left( \frac {2t}{1 + t^2}, \frac {1 - t^2}{1 + t^2} \right)$, which is in fact a parameterization valid over any field (not of characteristic $ 2$), in particular $ \mathbb{Z}/p\mathbb{Z}$!

As $ t$ ranges through the values $ \{ 0, 1, 2, ... p - 1 \}$ we can easily count that there are $ p$ solutions if $ p \equiv 3 \bmod 4$ and $ p - 2$ solutions if $ p \equiv 1 \bmod 4$ (since two values of $ t$ make the denominator vanish). Recall that the above parameterization, as given, misses exactly the point $ (0, - 1)$; this gives, succinctly, $ p - ( - 1)^{\frac {p - 1}{2} }$ solutions. The error term disappears if we consider the projective solutions, that is, the solutions to

$ X^2 + Y^2 = Z^2, (X : Y : Z) \in \mathbb{F}_p P^2$;

then there are always exactly $ p + 1$ solutions, which is exactly the number of points on the projective line $ \mathbb{F}_p P$, two of which are at infinity when $ p \equiv 1 \bmod 4$.

Note that the above curve is essentially identical to $ x^2 + y^2 + 1 \equiv 0$ when $ p \equiv 1 \bmod 4$; on the other hand, an explicit rational parameterization of the solutions to $ x^2 + y^2 + 1 \equiv 0 \bmod p$ is not as easily achieved when $ p \equiv 3 \bmod 4$ because the technique we used earlier relies on projecting from a point on the curve, which requires that we already know a point on the curve, and it's not at all clear how we should construct such a point. (Nevertheless, the above argument shows that such a point exists, so it should still be true that there are $ p + 1$ projective solutions to $ X^2 + Y^2 + Z^2 \equiv 0$.)

This makes the problem of determining points on conic sections over finite fields fairly straightforward. If one point exists, the conic section is isomorphic to the projective line over $ \mathbb{F}_p$, so there are $ p + 1$ solutions; otherwise, there are none. For more complicated curves (that is, curves of positive genus) it is no longer true that non-empty curves have exactly $ p + 1$ points, but this is still a reasonable approximation, and the error term is described by the Riemann hypothesis over finite fields, but that's going too far afield.


It is tempting to conclude, in complete analogy with the first proof, that every prime is composite in the Lipschitz integers, but there is a major problem: the Lipschitz integers do not have unique factorization (even up to units). In particular, we made implicit use of the fact that if $ p | ab$ and $ p$ divides neither $ a$ nor $ b$, then $ p$ is composite. In other words,

If an element of a ring of integers is not prime, then it is also reducible.

The distinction between prime and irreducible elements only appears in more general integral domains; over the integers, we often use the terms interchangeably. The above turns out to be false in general if we do not have unique factorization. (It's the reverse implication - that reducible implies composite - that holds.) So what we have to use is not actually the Lipschitz integers, but the Hurwitz integers, which can be constructed by adjoining the unit $ \frac {1 + i + j + k}{2}$ to the Lipschitz integers.

To be more specific, the Hurwitz integers consist of the quaternions $ a + bi + cj + dk$ where either $ a, b, c, d \in \mathbb{Z}$ or $ a, b, c, d \in \mathbb{Z} + \frac {1}{2}$. Conway and Smith discuss some rather fascinating geometric consequences of this definition, but we will focus on the number theory for now. A version of prime factorization holds here (although it is more difficult to state because the quaternions aren't commutative) that is enough for us to carry out the above argument; what we have shown, therefore, is that for every prime $ p$ there exist a Hurwitz integer $ a + bi + cj + dk$ such that

$ p = a^2 + b^2 + c^2 + d^2$.

If $ a, b, c, d$ are all integers, great. Otherwise, there are odd integers $ a' = 2a, b' = 2b, c' = 2c, d' = 2d$ such that

$ 4p = a'^2 + b'^2 + c'^2 + d'^2$.

Now note that since $ a', b', c', d'$ are all odd we can write $ x = \frac {a' + b'}{2}, y = \frac {a' - b'}{2}, z = \frac {c' + d'}{2}, w = \frac {c' - d'}{2}$ to conclude that

$ 2p = x^2 + y^2 + z^2 + w^2$.

Since primes are congruent to $ 0, 1 \bmod 4$ and the LHS is congruent to $ 2 \bmod 4$ (except in the trivial case that $ p = 2$) exactly two terms on the RHS are odd, say $ x$ and $ y$, and the others are even, so again we can write $ u = \frac {x + y}{2}, v = \frac {x - y}{2}, r = \frac {z + w}{2}, s = \frac {z - w}{2}$ to conclude that

$ p = u^2 + v^2 + r^2 + s^2$

as desired. (What we just described was essentially an algorithm for dividing by $ 2$ in the Hurwitz integers.)

Two rather remarkable generalizations of these results are the 15 and 290 theorems. In any case, factorization and primality are solidly number-theoretic concepts, and so we might expect this to be the most "natural" toolkit by which to attack this problem. But the geometric interpretation of the Gaussian and Hurwitz integers as rotations begs for a geometric proof.


Toolkit 2: The geometry of numbers. By the geometry of numbers we mean results like Pick's Theorem that can be used to prove number-theoretic facts using geometric reasoning on lattice points. In fact, we will need only one innocent-looking theorem that goes by the name of

Minkowski's Theorem: If a convex region $ R$ contains and is symmetric about the origin and has area strictly greater than $ 4$, then it contains a lattice point besides the origin.

This is intuitively because $ R$ cannot be much bigger than the square with vertices $ (1, 1), (1, - 1), ( - 1, 1), ( - 1, - 1)$ (the proof is left as an exercise) and can be phrased as follows: $ R$ can be at most four times the area of the unit square $ (0, 0), (0, 1), (1, 0), (1, 1)$. We will actually be strengthening this theorem slightly by considering the linear transformation $ (x, y) \mapsto (ax + by, cx + dy)$ on lattice points that takes the usual lattice to the generalized lattice spanned by $ (a, c)$ and $ (b, d)$ (as opposed to $ (0, 1), (1, 0)$). Such a transformation takes convex sets to convex sets and takes the unit square to what is called the fundamental parallelogram of the lattice; moreover, it multiplies all areas by the determinant of the corresponding matrix, which is $ ad - bc$ (and since we're considering unsigned areas, we'll take the absolute value). Therefore Minkowski's Theorem can be stated as follows:

Minkowski's Theorem (generalized lattices): Given a lattice $ L$ with a fundamental parallelogram of area $ D$, a convex region symmetric about and containing the origin with area strictly greater than $ 4D$ contains a lattice point besides the origin.


The vectors $ (a, b), (c, d)$ that generate a particular lattice are not unique. For example, the standard integer lattice can be generated by the vectors $ (1, 0), (2, 1)$. This is straightforward since it's possible to express the vectors $ (1, 0), (0, 1)$ in terms of these two vectors. It turns out that two sets of vectors generate the same lattice if and only if they are related by an element of the modular group. As the Wikipedia article mentions, this fact has deep consequences for the study of elliptic functions, which turn out to be closely related to modular forms (see the modularity theorem).


What lattice is relevant to the two-square theorem? Our starting point will again be the observation that $ p | x^2 + 1$. This means that the lattice point $ (x, 1)$ has distance from the origin whose square is divisible by $ p$. Now consider the lattice $ L$ spanned by $ (x, 1)$ and $ (p, 0)$, which consists of points with distance from the origin whose square is divisible by $ p$. (Perhaps now the connection is clearer!) The convex region we will consider is the circle $ x^2 + y^2 < 2p$, which contains the origin and has area $ 2 \pi p$. But the fundamental parallelogram of $ L$ has area $ |x \cdot 0 - p| = p$ and $ 2 \pi > 4$; it therefore follows by Minkowski's theorem that $ x^2 + y^2 < 2p$ contains a lattice point besides the origin, which is a point $ (a, b)$ such that

- $ a^2 + b^2 < 2p$, but
- $ p | a^2 + b^2$,

and so must be a point such that $ a^2 + b^2 = p$.

This is a marvelously clever proof and is (apparently) of a very different flavor from the one we demonstrated above. While in the first proof the answer to the question "why is the two-square theorem true" is "because the Gaussian primes behave a particular way," in the second proof the answer is "because $ \pi$ is greater than $ 2$"! But the two proofs are more similar than they first appear. The lattice spanned by $ (x, 1)$ and $ (p, 0)$ can be identified with the complex numbers of the form $ a(x + i) + b(p)$; since we know $ x + i$ and $ p$ both to be divisible by $ a + bi$ (where $ p = a^2 + b^2$), setting up the lattice boils down to the Euclidean algorithm in the Gaussian integers and Minkowski's Theorem boils down to the statement that the norm defines a division algorithm, which we will discuss later. The cleverness of this solution, then, is implicitly performing Euclidean algorithm-like operations to deduce the existence of $ a + bi$.

Yes, this proof adapts to the four-square theorem. We will need yet another strengthening of

Minkowski's Theorem ($ n$ dimensions): Given a lattice $ L$ spanned by $ n$ linearly independent vectors in $ \mathbb{Z}^n$ with a fundamental region of hypervolume $ V$, if a convex region $ R \in \mathbb{R}^n$ contains and is symmetric about the origin and has hypervolume strictly greater than $ 2^n V$, then it contains a point of $ L$ besides the origin.

We will then need to set $ n = 4$, find $ a, b$ such that $ p | a^2 + b^2 + 1$, and consider the lattice spanned by the four vectors

$ (p, 0, 0, 0), (b, 1, 0, a), (a, 0, 1, - b), (0, 0, 0, p)$.

It's slightly less trivial to construct this lattice so that the norm (the square of the distance from the origin) of any lattice point remains divisible by $ p$, but the above manages it. The fundamental region of this lattice has hypervolume the determinant of the above matrix, which is $ p^2$. Now we consider the convex region

$ x^2 + y^2 + z^2 + w^2 < 2p$

which is a 4-ball with radius $ r = \sqrt {2p}$ and hypervolume

$ \frac {\pi^2}{2} r^4 = 2 \pi^2 p^2 > 16p^2$

(the formulas for the volume of the $ n$-ball being here) and as before, it follows by Minkowski's theorem that the 4-ball contains a lattice point $ (u, v, r, s)$ such that

- $ u^2 + v^2 + r^2 + s^2 < 2p$, but
- $ p | u^2 + v^2 + r^2 + s^2$

and therefore $ u^2 + v^2 + r^2 + s^2 = p$.

As before, we can think of this as a sort of Euclidean algorithm that "filters out" the existence of $ u + vi + rj + sk$, and our answer to the question "why is the four-square theorem true" is the amusing "because $ \pi^2$ is greater than $ 8$!" I find the above proof particularly cute, though. Legend has it that Jason Bland (scorpius119) devised this proof independently on a PROMYS test, although it's likely to be classic.


I've postponed a clearer discussion of how unique factorization does or does not come about for until the relationship between different rings of integers and their geometries became clearer. First, let's clarify the notion of a generic division algorithm in an integral domain. Given such a domain $ D$, one wishes to define a function $ N : D \mapsto \mathbb{Z}_{\ge 0}$ with the property that for any $ a, b \in D$, there exists $ q, r$ such that

$ a = qb + r$

and either $ r = 0$ or $ N(r) < N(b)$. Such a function ensures that we can define the Euclidean algorithm: repeated division forces the sequence $ N(r_k)$ to be a strictly decreasing sequence of non-negative integers, so the Euclidean algorithm terminates. This turns out to be one of the most useful properties of the integers, and whenever it occurs we have what is called a Euclidean domain, which automatically has prime factorization (because the Euclidean algorithm is fundamental to its proof). A few common examples:

- $ \mathbb{Z}$ is a Euclidean domain: we take $ N(a) = |a|$.
- $ k[x]$ (the ring of polynomials with coefficients in a field $ k$) is a Euclidean domain: we take $ N(P) = \text{deg } P$.
- $ \mathbb{Z}[\imath]$ is a Euclidean domain: we take $ N(a + bi) = a^2 + b^2$.

A non-example: $ k[x, y]$, a ring of polynomials in two variables, is not a Euclidean domain. Although it's easiest to prove this by exhibiting an ideal that is not principal, for now what this means is that we don't have a division algorithm on bivariate polynomials with the same properties as the ones above. (This is not to say that we can't talk about division at all; Gröbner bases allow one to do pretty well.)

The fact that the Gaussian integers are Euclidean, in contrast, is not immediately obvious. Geometrically, the norm above gives the following statement: in the lattice of multiples of some $ z = a + bi$ (which is the lattice spanned by $ (a, b)$ and $ ( - b, a)$, in real terms), any other Gaussian integer $ c + di$ differs from a lattice point by a distance of at most $ \sqrt {a^2 + b^2}$. This is straightforward; the lattice of multiples of $ a + bi$ consists of squares of side length $ \sqrt {a^2 + b^2}$, and given an arbitrary point in such a square the closest corner is at most $ \frac {1}{\sqrt {2} } \sqrt {a^2 + b^2}$ away. This is in fact a stronger statement than we need.

The above argument extends directly to $ \mathbb{Z}[\sqrt { - 2}]$ and $ \mathbb{Z}[ \frac {1 + \sqrt { - 3}}{2} ]$ but fails for $ \mathbb{Z}[ \sqrt { - 5} ]$ because the lattice of multiples becomes composed of large rectangles. Correspondingly, the first two are Euclidean domains and therefore unique factorization domains (the second being the Eisenstein integers we know and love), and the third is not (for example, $ 6 = 2 \cdot 3 = (1 + \sqrt { - 5})(1 - \sqrt { - 5})$.) Of course, this doesn't rule out the possibility that some more exotic definition of $ N(x)$ could work, and that's the limitation of this line of reasoning. In general, prime factorization still turns out to fail most of the time, even though it is possible to recover prime factorization without the Euclidean algorithm; in fact, it is known that there is a finite list of negative integers $ d$ such that the ring of integers in $ \mathbb{Q}[ \sqrt {d} ]$ has prime factorization. These very special numbers are called the Heegner numbers. It is worth mentioning that the same question for positive $ d$ is much harder, but I am not qualified to say more on this point. (Well, let me just mention that because we like prime factorization so much we often do not look at factorizations in the integers themselves, but in their ideals, which turns out to be more useful anyway.)

Why didn't we look at $ \mathbb{Z}[ \sqrt { - 3} ]$? The straightforward way of defining the ring of integers in a quadratic field is as the intersection of that field (here $ \mathbb{Q}[ \sqrt { - 3} ]$) with the ring of algebraic integers, complex numbers that are the roots of monic polynomials with integer coefficients. When $ d$ is odd and congruent to $ 1 \bmod 4$ it is always the case that $ \frac {1 + \sqrt {d}}{2}$ is an algebraic integer (a straightforward computation), and therefore that the correct ring of integers to look at is actually $ \mathbb{Z}\left[ \frac {1 + \sqrt {d}}{2} \right]$ and not $ \mathbb{Z}[ \sqrt {d} ]$ as would be expected. When $ d = 5$ this is $ \mathbb{Z}[\varphi]$ and when $ d = - 3$ this is the Eisenstein integers. (This is the same kind of thing we did when we constructed the Hurwitz integers.)

In any case, we can now explain the discrepancy between the Lipschitz and Hurwitz integers. The problem with the Lipschitz integers is that the fundamental region (a hypercube) of the lattice of multiples of some $ a + bi + cj + dk$ has side length $ \sqrt {a^2 + b^2 + c^2 + d^2}$ and diagonal twice that. In other words, it can turn out that the "remainder" we want can be in the exact center of a fundamental region and have the same norm. (For example, this occurs if we try to divide $ 2$ by $ 1 + i + j + k$, which have the same norm.) This discrepancy is resolved by introducing the center of the unit hypercube - $ \frac {1 + i + j + k}{2}$ - into the lattice. (In $ n$ dimensions, the diagonal of the unit hypercube is $ \sqrt {n}$ - in other words, as $ n$ becomes large, the diagonal of a hypercube becomes much larger than its side length. High-dimensional geometry is full of strange behavior like this.)


Toolkit 3: Generating functions. Define $ r_k(n)$ to be the number of representations of $ n$ as a sum of $ k$ squares. We are then asking the question of when $ r_2(n)$ and $ r_4(n)$, respectively, are nonzero. It turns out that we can do better - we can give explicit formulas for both, and these explicit formulas then imply the weaker existence theorems we stated above, but correspondingly require a little more work.

The motivation for the expression we're about to derive for $ r_2(n)$ is that, in the identity

$ (a^2 + b^2)(c^2 + d^2) = (ac \pm bd)^2 + (ad \mp bc)^2$

the two different choices of signs on the RHS correspond to different ways to write the RHS as a sum of two squares (except when one of the factors on the LHS is $ 2$). In other words, two different Gaussian integers can have the same norm. (What happens is that some of the prime factors with norms that are primes congruent to $ 1 \bmod 4$ get exchanged for their conjugates.) $ r_2(n)$ therefore counts the number of Gaussian integers whose norm is $ n$, and this count boils down to counting how many ways we can exchange a prime factor for its conjugate.

This should already suggest the correct answer, but first of all, it's not (immediately) clear that different choices of signs will actually lead to different expressions, and second of all the counting can be elegantly done using the Dirichlet series

$ \sum_{n = 1}^{\infty} \frac {r_2(n)}{n^s}$

which, by the above discussion, has Euler product

$ \prod_{a + bi \text{ prime}} \frac {1}{\left( 1 - 1/N(a + bi)^s \right) }$.

(This argument appears in Ireland and Rosen and is quite nice.) (By comparison, letting $ N(a) = |a|$ in the integers we get the usual Euler product for the Zeta function, which counts the number of positive integers with a given norm (one).) Note that we are implicitly restricting ourselves to $ a, b$ non-negative; taking signs into account is straightforward. Now, the Gaussian primes come in one of three categories (up to multiplication by units):

- The prime $ 1 + i$. (Note that $ 1 - i = (1 + i)( - i)$.)
- The primes $ p$ where $ p \equiv 3 \bmod 4$.
- The primes $ a + bi, b + ai$ where $ a^2 + b^2$ is a prime congruent to $ 1 \bmod 4$.

The norm is easily computed in each case, and we therefore find that the above product can be written

$ \frac {1}{1 - 1/2^s} \left( \prod_{p \equiv 1 \bmod 4} \frac {1}{1 - 1/p^s} \right)^2 \prod_{q \equiv 3 \bmod 4} \frac {1}{1 - 1/q^{2s}}$.

The square in the second factor appears because $ a + bi$ and $ b + ai$ are not unit multiples of each other and are counted separately as primes. The square in the second factor appears because $ N(q) = q^2$. But we can clearly factor out the Euler product of the usual Zeta function $ \zeta(s)$, and so the above can be written

$ \zeta(s) \prod_{p \equiv 1 \bmod 4} \frac {1}{1 - 1/p^s} \prod_{q \equiv 3 \bmod 4} \frac {1}{1 + 1/q^s}$.

The remaining factor should remind you of the sum I called $ L(s, \chi)$ in my original post on Dirichlet's theorem. It is in fact also a Dirichlet L-series, but instead of a Dirichlet character with respect to $ 3$ it is constructed by a Dirichlet character $ \chi$ with respect to $ 4$. The relevant character here takes $ \chi(3) = - 1, \chi(2k) = 0$ and is periodic $ \bmod 4$. The above can therefore be written as

$ \zeta(s) \sum_{n = 1}^{\infty} \frac {\chi(n)}{n^s}$

which is the Dirichlet convolution $ 1 * \chi$. It then follows that

$ \boxed{ r_2(n) = \sum_{d | n} \chi(d) }$.

With less notation, $ r_2(n)$ is the number of factors of $ n$ congruent to $ 1 \bmod 4$ minus the number of factors of $ n$ congruent to $ 3 \bmod 4$. This is consistent with everything we've said about sums of squares so far; if a prime congruent to $ 3 \bmod 4$ divides $ n$ an odd number of times, these numbers will be equal, essentially because of a pairing argument. An alternate expression for $ r_2(n)$ is the subject of a Practice Problem. Note that we do not have to talk about convergence or any other analytic properties; all we're interested in is formal manipulation of series. Indeed, Dirichlet convolution can be thought of as a framework that unites various counting arguments that can be used to prove identities involving multiplicative functions.

Will the same trick work on $ r_4(n)$? The problem here is that the Hurwitz integers aren't commutative, so we can't talk about factorizations in terms of the primes involved alone but also need to take their order into account, so an argument using Dirichlet series would be problematic (though perhaps not impossible). It's not even clear what $ r_4(p)$ is when $ p$ is prime. The original proof of the explicit formula for $ r_4(n)$ (and for that matter, $ r_2(n)$), due to Jacobi, was derived by working with another kind of generating function: the theta function

$ \theta(\tau) = \sum_{n = 0}^{\infty} e^{\pi i \tau n^2}$.

Writing $ q = e^{\pi i \tau}$ for shorthand, it can be seen that the powers of $ \theta$ are generating functions for the ways a number can be represented as the sum of various numbers of squares; in particular, the function we're interested in is

$ \theta^4(\tau) = \sum_{n = 0}^{\infty} r_4(n) q^n$.

I am not qualified to discuss the rest of the proof in depth (it involves more complex analysis than I'm comfortable with; consult Stein and Shakarchi), but it turns out that the theta function has modular symmetries, and as I mentioned in my previous post the general strategy here is to find another function with the same symmetries and compare their values in a fundamental domain (not to be confused with the fundamental region of a lattice). The relevant function is

$ - \frac {1}{\pi^2} E^{*}_2(\tau) = - \frac {1}{\pi^2} \left( \sum_m \sum_n \frac {1}{( \frac {m \tau}{2} + n)^2} - \sum_m \sum_n \frac {1}{(m \tau + \frac {n}{2})^2} \right)$

and the relevant symmetries are $ \theta^4(\tau + 2) = \theta^4(\tau), \theta^4(\tau) = - \frac {1}{\tau^2} \theta^4 \left( - \frac {1}{\tau} \right)$. The above function is related to the notion of an Eisenstein series. Without getting too in-depth, the symmetries involved generate a certain subgroup of the modular group with a somewhat larger fundamental domain, and it turns out that on this domain it's not too hard to prove that the above two functions are equal everywhere (specifically, that their ratio is bounded (hence, through complex-analytic magic, constant) and it's asymptotically $ 1$). It also turns out that

$ - \frac {1}{\pi^2} E^{*}_2(\tau) = 1 + \sum_{n = 1}^{\infty} 8 \sigma_1^{*}(n) q^n$

where $ \sigma_1^{*}(n)$ denotes the sum of the divisors of $ n$ not divisible by $ 4$. It therefore follows that $ \boxed{ r_4(n) = 8 \sigma_1^{*}(n) }$.

Unlike the proof for $ r_2(n)$, we have to do a little more than formally manipulate series; deep properties of the series involved had to be invoked, and the above proof is best understood within the context of a rich theory of which I understand little. I'm reasonably certain a simpler proof is possible, though (perhaps involving Dirichlet series). In any case, this is a great example of what I meant when I marveled at the power of complex analysis in deducing number-theoretic results.


The three approaches outlined above are quite different in philosophy, but perhaps the more interesting thing is that they're related. A lattice is a geometric depiction of an ideal, an essentially algebraic object, and generating functions are ways of letting us manipulate sequences as ensembles. Indeed, the analytic viewpoint is now indispensable to modern number theory.

If you looked at the sums of squares Mathworld page, the difficulty of the three-squares theorem is indicated by the fact that, unlike the other expressions, $ r_3(n)$ is given by an expression involving class numbers, which is both quite deep and much less amenable to any of the techniques we've discussed so far. There's a lot of fascinating theory to explore here.


Practice Problem 1: Prove Minkowski's theorem (in the plane).

Practice Problem 2: Prove that if $ p$ is prime and $ m, a, b$ are integers such that $ mp = a^2 + b^2 + 1$, then

1) $ m$ can be taken to be less than $ p$, and
2) If $ 1 < m < p$ and $ mp$ is a sum of four squares, then there exists $ n < m$ such that $ np$ is also a sum of four squares.

Conclude the four-square theorem. (This is a generalization of what we did when $ m$ was $ 4$.)

Practice Problem 3: Let the prime factorization of $ n$ be $ 2^a \prod_{p \equiv 1 \bmod 4} p^{a_i} \prod_{q \equiv 3 \bmod 4} q^{2b_i}$ and compute $ r_2(n)$ directly. Verify that this expression agrees with the other.

Practice Problem 4: Verify that the numbers $ 4^n (8k + 7), n, k \ge 0$ cannot be written as a sum of three squares. (The converse is Gauss' Three-Square Theorem.)

Comment

8 Comments

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
Why is it abusrd that $ p|(x+i)$ over $ \mathbb{Z}[i]$

(also there is a nice proof of the $ p=a^2+b^2$ theorem using pidgeonhole)

by Altheman, Aug 6, 2008, 3:23 AM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
very nice post. I was shown the geometry of numbers ideas once at MOP some years ago and am happy to have a refresher here; the ideas seem very beautiful to me.

by MysticTerminator, Aug 6, 2008, 4:20 AM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
Al: $ p | x + i \implies \exists z = a + bi : p(a + bi) = pa + pbi = x + i$, but clearly $ pb$ cannot be equal to $ 1$.

Arnav: They are indeed beautiful ideas :)

by t0rajir0u, Aug 6, 2008, 5:54 AM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
hey qiaochu,
cool post! i think your statements of minkowski's theorem and its various generalizations need an extra condition on the convex body. currently, at least the first version is false. symmetry about the origin seems like a good candidate.

by themonster, Aug 7, 2008, 5:18 PM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
Whoops. You're absolutely right. Fixed.

by t0rajir0u, Aug 7, 2008, 7:59 PM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
Minkowski's Theorem: an arnavstyle proof

this proof was probably shown to me originally by alex zhai

at least I strongly suspect that this would have been the proof he showed to me

cause I forgot it

since it was a while ago

so the open condition is kinda weird. I do not think I have seen that condition is before. I will pretend "open" means "measurable". so suppose we have some measurable set with measure more than $ 2^n$ that is symmetric about the origin and convex but doesn't contain a lattice point other than the origin. take a slightly smaller compact set $ S$ that still has measure more than $ 2^n$, take its union with reflection of itself about the origin, and take its convex closure. this should be a compact convex symmetric set with measure more than $ 2^n$ contained within the original set, i.e. still containing no other lattice points. so, it's bounded, say within some ball of radius $ R$. it also cannot contain any segments of length $ 2$ parallel to the coordinate axes because if it contains, say, the two points $ x$ and $ x + 2e_i$, it contains $ -x$ and $ x + 2e_i$ and thus $ e_i$ which is FAILURE. So translate $ S$ by all vectors with even integer entries such that each component is less than or equal to some specified $ D$. The total volume of the union of all these translates will be exactly the area of $ S$, denoted by $ A$, times the number of copies of $ S$, or $ (2D+1)^n$. However, all these copies are contained within radius $ R$ of the box with side length $ 4D$ about the origin, so they're certainly contained within the box with side length $ 4D + 2R$ about the origin, which has volume $ (4D + 2R)^n$. Hence $ A \le \Big(\frac{4D + 2R}{2D + 1}\Big)^n$ and as $ D \to \infty$, this shows $ A \le 2^n$ which is totally a contradiction?

by MysticTerminator, Aug 18, 2008, 6:05 AM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
A little late for a response, I know, but: you're right that the open condition can be dropped, but it turns out (although it is not at all obvious to me) that every convex subset of $ \mathbb{R}^n$ is Lebesgue measurable.

by t0rajir0u, Sep 1, 2008, 6:54 PM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
I never understood why everything was mod4.

by Luminescence, Feb 21, 2009, 11:36 PM

Various mathematical thoughts, ranging from problem-solving techniques to attempts to tie together related concepts.

avatar

t0rajir0u
Archives
+ March 2009
+ October 2007
+ May 2007
Shouts
Submit
  • orz $~~~~$

    by clarkculus, Jan 10, 2025, 4:13 PM

  • Insanely auraful

    by centslordm, Jan 1, 2025, 11:17 PM

  • Fly High :(

    by Siddharthmaybe, Oct 22, 2024, 8:34 PM

  • Dang it he is gone :(( /revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive.

    by 799786, Aug 4, 2022, 1:56 PM

  • annoying precision

    by centslordm, May 16, 2021, 7:34 PM

  • rip t0rajir0u

    by OlympusHero, Dec 5, 2020, 9:29 PM

  • Shoutbox bump xD

    by DuoDuoling0, Oct 4, 2020, 2:25 AM

  • dang hes gone :(

    by OlympusHero, Jul 28, 2020, 3:52 AM

  • First shout in July

    by smartguy888, Jul 20, 2020, 3:08 PM

  • https://artofproblemsolving.com/community/c2448

    has more.

    -πφ

    by piphi, Jun 12, 2020, 8:20 PM

  • wait hold up 310,000
    people visited this man?!?!??

    by srisainandan6, May 29, 2020, 5:16 PM

  • first shout in 2020

    by OlympusHero, Apr 4, 2020, 1:15 AM

  • in his latest post he says he moved to wordpress

    by MelonGirl, Nov 16, 2019, 2:43 AM

  • Please revive!

    by AopsUser101, Oct 30, 2019, 7:10 PM

  • first shout in october fj9odiais

    by bulbasaur., Oct 14, 2019, 1:14 AM

128 shouts
Tags
About Owner
  • Posts: 12167
  • Joined: Nov 20, 2005
Blog Stats
  • Blog created: Dec 5, 2006
  • Total entries: 48
  • Total visits: 321828
  • Total comments: 202
Search Blog
a