Three approaches to sums of squares
by t0rajir0u, Aug 6, 2008, 2:12 AM
The following two theorems are good examples of number-theoretic results that are quite easy to state but quite difficult to prove (by comparison). They both suggest the need for stronger tools than, say, just modular arithmetic, and in this post we will be looking at three different sets of tools that can be brought to bear on both theorems.
Fermat's Two-Square Theorem: Given a prime
, there exist integers
such that
iff
or
. Consequently, a number
is expressible in the form
iff the primes congruent to
in its prime factorization each divide
an even number of times.
Lagrange's Four-Square Theorem: Every non-negative integer can be written in the form
for integers
.
What is perhaps more interesting than the proofs of these two theorems is the fact that the corresponding statement about three squares is harder than either of the statements above, but that's a different story entirely.
Toolkit 1: Algebraic number theory. Our basis for the exploration of this toolkit is the observation that both the set
of sums of two squares and the set
of sums of four squares are closed under multiplication. The first result is due to the identity

and it has as an underlying concept the notion of multiplicativity of the norm
in our good friend the Gaussian integers. Specifically, the above is a consequence of taking the norms of
.
This is also clear from our discussion of the complex numbers as 2D rotations.
The corresponding identity for four squares is far more tedious to write out, but its basis is the multiplicativity of the norm
in what is called the Lipschitz integers, a (sub-)ring of integers in the quaternions. You may have seen the quaternions by following the links from the complex numbers post; they are a non-commutative extension of the real numbers generated by
and the imaginary units
, each of which can be interpreted to correspond to counterclockwise rotation
in the
, and
-planes, respectively. It turns out, therefore, that the quaternions are essentially 3D rotations.
is easier to understand than the Lipschitz integers, so we'll tackle the two-square theorem first. It is clearly necessary that
or
in order for a prime to be represented in the form
because
. But now we are trying to prove that this condition is also sufficient. We proceed as follows: if
is a prime with
as a quadratic residue, then there exists
such that
. But over the Gaussian integers,
.
If
is prime in the Gaussian integers, then it must divide either
or
, but this is absurd. Hence
is composite, so there exist Gaussian integers
such that
. But since
is purely real, (WLOG)
must be a rational integer multiple of
, and since
is a rational prime
and the result follows by letting
. Consequently, rational primes congruent to
are prime in the Gaussian integers (or else they could be written in the above form). It then follows that if
and
then
. Repeated application of this lemma gives the full statement of the two-square theorem.
In an algebraic context, the two-square theorem can therefore be understood as reflecting the structure of the Gaussian primes. This notion of primes factoring into particular patterns when we pass to larger rings of integers generalizes significantly, and this is therefore just one application of a rather general concept involving representability by quadratic forms like
. Note that it is very important that the Gaussian integers have unique factorization.
The four-square theorem requires a little more work: what we want to prove is that every prime can be represented as a sum of four squares, and then by multiplicativity we are done. To do so, we will need the following
Lemma: Every prime
divides an integer of the form
.
Proof. This is obvious when
; when
is odd, there are
possible values of
and
possible values of
, so the two sets are not disjoint.
Excursion: Let's discuss what's going on here geometrically some more, since, as easy as the above proof is, the choice of
appears to have been pulled out of the air. Geometric intuition over
tells us very little about how the variety
behaves since it is an imaginary ellipse and has no real component. In contrast, earlier I showed that the rational projective unit circle
was isomorphic to the rational projective line via the parameterization
, which is in fact a parameterization valid over any field (not of characteristic
), in particular
!
As
ranges through the values
we can easily count that there are
solutions if
and
solutions if
(since two values of
make the denominator vanish). Recall that the above parameterization, as given, misses exactly the point
; this gives, succinctly,
solutions. The error term disappears if we consider the projective solutions, that is, the solutions to
;
then there are always exactly
solutions, which is exactly the number of points on the projective line
, two of which are at infinity when
.
Note that the above curve is essentially identical to
when
; on the other hand, an explicit rational parameterization of the solutions to
is not as easily achieved when
because the technique we used earlier relies on projecting from a point on the curve, which requires that we already know a point on the curve, and it's not at all clear how we should construct such a point. (Nevertheless, the above argument shows that such a point exists, so it should still be true that there are
projective solutions to
.)
This makes the problem of determining points on conic sections over finite fields fairly straightforward. If one point exists, the conic section is isomorphic to the projective line over
, so there are
solutions; otherwise, there are none. For more complicated curves (that is, curves of positive genus) it is no longer true that non-empty curves have exactly
points, but this is still a reasonable approximation, and the error term is described by the Riemann hypothesis over finite fields, but that's going too far afield.
It is tempting to conclude, in complete analogy with the first proof, that every prime is composite in the Lipschitz integers, but there is a major problem: the Lipschitz integers do not have unique factorization (even up to units). In particular, we made implicit use of the fact that if
and
divides neither
nor
, then
is composite. In other words,
If an element of a ring of integers is not prime, then it is also reducible.
The distinction between prime and irreducible elements only appears in more general integral domains; over the integers, we often use the terms interchangeably. The above turns out to be false in general if we do not have unique factorization. (It's the reverse implication - that reducible implies composite - that holds.) So what we have to use is not actually the Lipschitz integers, but the Hurwitz integers, which can be constructed by adjoining the unit
to the Lipschitz integers.
To be more specific, the Hurwitz integers consist of the quaternions
where either
or
. Conway and Smith discuss some rather fascinating geometric consequences of this definition, but we will focus on the number theory for now. A version of prime factorization holds here (although it is more difficult to state because the quaternions aren't commutative) that is enough for us to carry out the above argument; what we have shown, therefore, is that for every prime
there exist a Hurwitz integer
such that
.
If
are all integers, great. Otherwise, there are odd integers
such that
.
Now note that since
are all odd we can write
to conclude that
.
Since primes are congruent to
and the LHS is congruent to
(except in the trivial case that
) exactly two terms on the RHS are odd, say
and
, and the others are even, so again we can write
to conclude that

as desired. (What we just described was essentially an algorithm for dividing by
in the Hurwitz integers.)
Two rather remarkable generalizations of these results are the 15 and 290 theorems. In any case, factorization and primality are solidly number-theoretic concepts, and so we might expect this to be the most "natural" toolkit by which to attack this problem. But the geometric interpretation of the Gaussian and Hurwitz integers as rotations begs for a geometric proof.
Toolkit 2: The geometry of numbers. By the geometry of numbers we mean results like Pick's Theorem that can be used to prove number-theoretic facts using geometric reasoning on lattice points. In fact, we will need only one innocent-looking theorem that goes by the name of
Minkowski's Theorem: If a convex region
contains and is symmetric about the origin and has area strictly greater than
, then it contains a lattice point besides the origin.
This is intuitively because
cannot be much bigger than the square with vertices
(the proof is left as an exercise) and can be phrased as follows:
can be at most four times the area of the unit square
. We will actually be strengthening this theorem slightly by considering the linear transformation
on lattice points that takes the usual lattice to the generalized lattice spanned by
and
(as opposed to
). Such a transformation takes convex sets to convex sets and takes the unit square to what is called the fundamental parallelogram of the lattice; moreover, it multiplies all areas by the determinant of the corresponding matrix, which is
(and since we're considering unsigned areas, we'll take the absolute value). Therefore Minkowski's Theorem can be stated as follows:
Minkowski's Theorem (generalized lattices): Given a lattice
with a fundamental parallelogram of area
, a convex region symmetric about and containing the origin with area strictly greater than
contains a lattice point besides the origin.
The vectors
that generate a particular lattice are not unique. For example, the standard integer lattice can be generated by the vectors
. This is straightforward since it's possible to express the vectors
in terms of these two vectors. It turns out that two sets of vectors generate the same lattice if and only if they are related by an element of the modular group. As the Wikipedia article mentions, this fact has deep consequences for the study of elliptic functions, which turn out to be closely related to modular forms (see the modularity theorem).
What lattice is relevant to the two-square theorem? Our starting point will again be the observation that
. This means that the lattice point
has distance from the origin whose square is divisible by
. Now consider the lattice
spanned by
and
, which consists of points with distance from the origin whose square is divisible by
. (Perhaps now the connection is clearer!) The convex region we will consider is the circle
, which contains the origin and has area
. But the fundamental parallelogram of
has area
and
; it therefore follows by Minkowski's theorem that
contains a lattice point besides the origin, which is a point
such that
-
, but
-
,
and so must be a point such that
.
This is a marvelously clever proof and is (apparently) of a very different flavor from the one we demonstrated above. While in the first proof the answer to the question "why is the two-square theorem true" is "because the Gaussian primes behave a particular way," in the second proof the answer is "because
is greater than
"! But the two proofs are more similar than they first appear. The lattice spanned by
and
can be identified with the complex numbers of the form
; since we know
and
both to be divisible by
(where
), setting up the lattice boils down to the Euclidean algorithm in the Gaussian integers and Minkowski's Theorem boils down to the statement that the norm defines a division algorithm, which we will discuss later. The cleverness of this solution, then, is implicitly performing Euclidean algorithm-like operations to deduce the existence of
.
Yes, this proof adapts to the four-square theorem. We will need yet another strengthening of
Minkowski's Theorem (
dimensions): Given a lattice
spanned by
linearly independent vectors in
with a fundamental region of hypervolume
, if a convex region
contains and is symmetric about the origin and has hypervolume strictly greater than
, then it contains a point of
besides the origin.
We will then need to set
, find
such that
, and consider the lattice spanned by the four vectors
.
It's slightly less trivial to construct this lattice so that the norm (the square of the distance from the origin) of any lattice point remains divisible by
, but the above manages it. The fundamental region of this lattice has hypervolume the determinant of the above matrix, which is
. Now we consider the convex region

which is a 4-ball with radius
and hypervolume

(the formulas for the volume of the
-ball being here) and as before, it follows by Minkowski's theorem that the 4-ball contains a lattice point
such that
-
, but
-
and therefore
.
As before, we can think of this as a sort of Euclidean algorithm that "filters out" the existence of
, and our answer to the question "why is the four-square theorem true" is the amusing "because
is greater than
!" I find the above proof particularly cute, though. Legend has it that Jason Bland (scorpius119) devised this proof independently on a PROMYS test, although it's likely to be classic.
I've postponed a clearer discussion of how unique factorization does or does not come about for until the relationship between different rings of integers and their geometries became clearer. First, let's clarify the notion of a generic division algorithm in an integral domain. Given such a domain
, one wishes to define a function
with the property that for any
, there exists
such that

and either
or
. Such a function ensures that we can define the Euclidean algorithm: repeated division forces the sequence
to be a strictly decreasing sequence of non-negative integers, so the Euclidean algorithm terminates. This turns out to be one of the most useful properties of the integers, and whenever it occurs we have what is called a Euclidean domain, which automatically has prime factorization (because the Euclidean algorithm is fundamental to its proof). A few common examples:
-
is a Euclidean domain: we take
.
-
(the ring of polynomials with coefficients in a field
) is a Euclidean domain: we take
.
-
is a Euclidean domain: we take
.
A non-example:
, a ring of polynomials in two variables, is not a Euclidean domain. Although it's easiest to prove this by exhibiting an ideal that is not principal, for now what this means is that we don't have a division algorithm on bivariate polynomials with the same properties as the ones above. (This is not to say that we can't talk about division at all; Gröbner bases allow one to do pretty well.)
The fact that the Gaussian integers are Euclidean, in contrast, is not immediately obvious. Geometrically, the norm above gives the following statement: in the lattice of multiples of some
(which is the lattice spanned by
and
, in real terms), any other Gaussian integer
differs from a lattice point by a distance of at most
. This is straightforward; the lattice of multiples of
consists of squares of side length
, and given an arbitrary point in such a square the closest corner is at most
away. This is in fact a stronger statement than we need.
The above argument extends directly to
and
but fails for
because the lattice of multiples becomes composed of large rectangles. Correspondingly, the first two are Euclidean domains and therefore unique factorization domains (the second being the Eisenstein integers we know and love), and the third is not (for example,
.) Of course, this doesn't rule out the possibility that some more exotic definition of
could work, and that's the limitation of this line of reasoning. In general, prime factorization still turns out to fail most of the time, even though it is possible to recover prime factorization without the Euclidean algorithm; in fact, it is known that there is a finite list of negative integers
such that the ring of integers in
has prime factorization. These very special numbers are called the Heegner numbers. It is worth mentioning that the same question for positive
is much harder, but I am not qualified to say more on this point. (Well, let me just mention that because we like prime factorization so much we often do not look at factorizations in the integers themselves, but in their ideals, which turns out to be more useful anyway.)
Why didn't we look at
? The straightforward way of defining the ring of integers in a quadratic field is as the intersection of that field (here
) with the ring of algebraic integers, complex numbers that are the roots of monic polynomials with integer coefficients. When
is odd and congruent to
it is always the case that
is an algebraic integer (a straightforward computation), and therefore that the correct ring of integers to look at is actually
and not
as would be expected. When
this is
and when
this is the Eisenstein integers. (This is the same kind of thing we did when we constructed the Hurwitz integers.)
In any case, we can now explain the discrepancy between the Lipschitz and Hurwitz integers. The problem with the Lipschitz integers is that the fundamental region (a hypercube) of the lattice of multiples of some
has side length
and diagonal twice that. In other words, it can turn out that the "remainder" we want can be in the exact center of a fundamental region and have the same norm. (For example, this occurs if we try to divide
by
, which have the same norm.) This discrepancy is resolved by introducing the center of the unit hypercube -
- into the lattice. (In
dimensions, the diagonal of the unit hypercube is
- in other words, as
becomes large, the diagonal of a hypercube becomes much larger than its side length. High-dimensional geometry is full of strange behavior like this.)
Toolkit 3: Generating functions. Define
to be the number of representations of
as a sum of
squares. We are then asking the question of when
and
, respectively, are nonzero. It turns out that we can do better - we can give explicit formulas for both, and these explicit formulas then imply the weaker existence theorems we stated above, but correspondingly require a little more work.
The motivation for the expression we're about to derive for
is that, in the identity

the two different choices of signs on the RHS correspond to different ways to write the RHS as a sum of two squares (except when one of the factors on the LHS is
). In other words, two different Gaussian integers can have the same norm. (What happens is that some of the prime factors with norms that are primes congruent to
get exchanged for their conjugates.)
therefore counts the number of Gaussian integers whose norm is
, and this count boils down to counting how many ways we can exchange a prime factor for its conjugate.
This should already suggest the correct answer, but first of all, it's not (immediately) clear that different choices of signs will actually lead to different expressions, and second of all the counting can be elegantly done using the Dirichlet series

which, by the above discussion, has Euler product
.
(This argument appears in Ireland and Rosen and is quite nice.) (By comparison, letting
in the integers we get the usual Euler product for the Zeta function, which counts the number of positive integers with a given norm (one).) Note that we are implicitly restricting ourselves to
non-negative; taking signs into account is straightforward. Now, the Gaussian primes come in one of three categories (up to multiplication by units):
- The prime
. (Note that
.)
- The primes
where
.
- The primes
where
is a prime congruent to
.
The norm is easily computed in each case, and we therefore find that the above product can be written
.
The square in the second factor appears because
and
are not unit multiples of each other and are counted separately as primes. The square in the second factor appears because
. But we can clearly factor out the Euler product of the usual Zeta function
, and so the above can be written
.
The remaining factor should remind you of the sum I called
in my original post on Dirichlet's theorem. It is in fact also a Dirichlet L-series, but instead of a Dirichlet character with respect to
it is constructed by a Dirichlet character
with respect to
. The relevant character here takes
and is periodic
. The above can therefore be written as

which is the Dirichlet convolution
. It then follows that
.
With less notation,
is the number of factors of
congruent to
minus the number of factors of
congruent to
. This is consistent with everything we've said about sums of squares so far; if a prime congruent to
divides
an odd number of times, these numbers will be equal, essentially because of a pairing argument. An alternate expression for
is the subject of a Practice Problem. Note that we do not have to talk about convergence or any other analytic properties; all we're interested in is formal manipulation of series. Indeed, Dirichlet convolution can be thought of as a framework that unites various counting arguments that can be used to prove identities involving multiplicative functions.
Will the same trick work on
? The problem here is that the Hurwitz integers aren't commutative, so we can't talk about factorizations in terms of the primes involved alone but also need to take their order into account, so an argument using Dirichlet series would be problematic (though perhaps not impossible). It's not even clear what
is when
is prime. The original proof of the explicit formula for
(and for that matter,
), due to Jacobi, was derived by working with another kind of generating function: the theta function
.
Writing
for shorthand, it can be seen that the powers of
are generating functions for the ways a number can be represented as the sum of various numbers of squares; in particular, the function we're interested in is
.
I am not qualified to discuss the rest of the proof in depth (it involves more complex analysis than I'm comfortable with; consult Stein and Shakarchi), but it turns out that the theta function has modular symmetries, and as I mentioned in my previous post the general strategy here is to find another function with the same symmetries and compare their values in a fundamental domain (not to be confused with the fundamental region of a lattice). The relevant function is

and the relevant symmetries are
. The above function is related to the notion of an Eisenstein series. Without getting too in-depth, the symmetries involved generate a certain subgroup of the modular group with a somewhat larger fundamental domain, and it turns out that on this domain it's not too hard to prove that the above two functions are equal everywhere (specifically, that their ratio is bounded (hence, through complex-analytic magic, constant) and it's asymptotically
). It also turns out that

where
denotes the sum of the divisors of
not divisible by
. It therefore follows that
.
Unlike the proof for
, we have to do a little more than formally manipulate series; deep properties of the series involved had to be invoked, and the above proof is best understood within the context of a rich theory of which I understand little. I'm reasonably certain a simpler proof is possible, though (perhaps involving Dirichlet series). In any case, this is a great example of what I meant when I marveled at the power of complex analysis in deducing number-theoretic results.
The three approaches outlined above are quite different in philosophy, but perhaps the more interesting thing is that they're related. A lattice is a geometric depiction of an ideal, an essentially algebraic object, and generating functions are ways of letting us manipulate sequences as ensembles. Indeed, the analytic viewpoint is now indispensable to modern number theory.
If you looked at the sums of squares Mathworld page, the difficulty of the three-squares theorem is indicated by the fact that, unlike the other expressions,
is given by an expression involving class numbers, which is both quite deep and much less amenable to any of the techniques we've discussed so far. There's a lot of fascinating theory to explore here.
Practice Problem 1: Prove Minkowski's theorem (in the plane).
Practice Problem 2: Prove that if
is prime and
are integers such that
, then
1)
can be taken to be less than
, and
2) If
and
is a sum of four squares, then there exists
such that
is also a sum of four squares.
Conclude the four-square theorem. (This is a generalization of what we did when
was
.)
Practice Problem 3: Let the prime factorization of
be
and compute
directly. Verify that this expression agrees with the other.
Practice Problem 4: Verify that the numbers
cannot be written as a sum of three squares. (The converse is Gauss' Three-Square Theorem.)
Fermat's Two-Square Theorem: Given a prime









Lagrange's Four-Square Theorem: Every non-negative integer can be written in the form


What is perhaps more interesting than the proofs of these two theorems is the fact that the corresponding statement about three squares is harder than either of the statements above, but that's a different story entirely.
Toolkit 1: Algebraic number theory. Our basis for the exploration of this toolkit is the observation that both the set



and it has as an underlying concept the notion of multiplicativity of the norm


This is also clear from our discussion of the complex numbers as 2D rotations.
The corresponding identity for four squares is far more tedious to write out, but its basis is the multiplicativity of the norm






![$ \mathbb{Z}[\imath]$](http://latex.artofproblemsolving.com/e/f/8/ef8e78e98502ca7181a23dce5d1ab293b37f0c64.png)









If
















In an algebraic context, the two-square theorem can therefore be understood as reflecting the structure of the Gaussian primes. This notion of primes factoring into particular patterns when we pass to larger rings of integers generalizes significantly, and this is therefore just one application of a rather general concept involving representability by quadratic forms like

The four-square theorem requires a little more work: what we want to prove is that every prime can be represented as a sum of four squares, and then by multiplicativity we are done. To do so, we will need the following
Lemma: Every prime


Proof. This is obvious when






Excursion: Let's discuss what's going on here geometrically some more, since, as easy as the above proof is, the choice of







As










then there are always exactly



Note that the above curve is essentially identical to






This makes the problem of determining points on conic sections over finite fields fairly straightforward. If one point exists, the conic section is isomorphic to the projective line over



It is tempting to conclude, in complete analogy with the first proof, that every prime is composite in the Lipschitz integers, but there is a major problem: the Lipschitz integers do not have unique factorization (even up to units). In particular, we made implicit use of the fact that if





If an element of a ring of integers is not prime, then it is also reducible.
The distinction between prime and irreducible elements only appears in more general integral domains; over the integers, we often use the terms interchangeably. The above turns out to be false in general if we do not have unique factorization. (It's the reverse implication - that reducible implies composite - that holds.) So what we have to use is not actually the Lipschitz integers, but the Hurwitz integers, which can be constructed by adjoining the unit

To be more specific, the Hurwitz integers consist of the quaternions






If



Now note that since



Since primes are congruent to







as desired. (What we just described was essentially an algorithm for dividing by

Two rather remarkable generalizations of these results are the 15 and 290 theorems. In any case, factorization and primality are solidly number-theoretic concepts, and so we might expect this to be the most "natural" toolkit by which to attack this problem. But the geometric interpretation of the Gaussian and Hurwitz integers as rotations begs for a geometric proof.
Toolkit 2: The geometry of numbers. By the geometry of numbers we mean results like Pick's Theorem that can be used to prove number-theoretic facts using geometric reasoning on lattice points. In fact, we will need only one innocent-looking theorem that goes by the name of
Minkowski's Theorem: If a convex region


This is intuitively because









Minkowski's Theorem (generalized lattices): Given a lattice



The vectors



What lattice is relevant to the two-square theorem? Our starting point will again be the observation that














-

-

and so must be a point such that

This is a marvelously clever proof and is (apparently) of a very different flavor from the one we demonstrated above. While in the first proof the answer to the question "why is the two-square theorem true" is "because the Gaussian primes behave a particular way," in the second proof the answer is "because










Yes, this proof adapts to the four-square theorem. We will need yet another strengthening of
Minkowski's Theorem (








We will then need to set




It's slightly less trivial to construct this lattice so that the norm (the square of the distance from the origin) of any lattice point remains divisible by



which is a 4-ball with radius


(the formulas for the volume of the


-

-

and therefore

As before, we can think of this as a sort of Euclidean algorithm that "filters out" the existence of



I've postponed a clearer discussion of how unique factorization does or does not come about for until the relationship between different rings of integers and their geometries became clearer. First, let's clarify the notion of a generic division algorithm in an integral domain. Given such a domain





and either



-


-
![$ k[x]$](http://latex.artofproblemsolving.com/4/2/7/4279e614e6950dd481dabf1a9747d58a5abc4b80.png)


-
![$ \mathbb{Z}[\imath]$](http://latex.artofproblemsolving.com/e/f/8/ef8e78e98502ca7181a23dce5d1ab293b37f0c64.png)

A non-example:
![$ k[x, y]$](http://latex.artofproblemsolving.com/1/5/f/15f95d924db46ac091af248f10e6707518cce421.png)
The fact that the Gaussian integers are Euclidean, in contrast, is not immediately obvious. Geometrically, the norm above gives the following statement: in the lattice of multiples of some








The above argument extends directly to
![$ \mathbb{Z}[\sqrt { - 2}]$](http://latex.artofproblemsolving.com/2/c/c/2cc28758cda8eab55b4ff9d365475f214727a98d.png)
![$ \mathbb{Z}[ \frac {1 + \sqrt { - 3}}{2} ]$](http://latex.artofproblemsolving.com/e/8/f/e8f20a7ee9243e61ce19203760460a8034f8c5b1.png)
![$ \mathbb{Z}[ \sqrt { - 5} ]$](http://latex.artofproblemsolving.com/3/8/b/38b8b5ecba34dbccbb0354b40bd1e50a1b4972ac.png)



![$ \mathbb{Q}[ \sqrt {d} ]$](http://latex.artofproblemsolving.com/c/0/b/c0b3d8bd7e47757bb7fd341b8442426acd214e58.png)

Why didn't we look at
![$ \mathbb{Z}[ \sqrt { - 3} ]$](http://latex.artofproblemsolving.com/f/0/5/f0581e1668c3ff0e946eea05358fb42e8774b2e8.png)
![$ \mathbb{Q}[ \sqrt { - 3} ]$](http://latex.artofproblemsolving.com/1/a/3/1a3faadc70a964604e5fca26abbbef8d1e463ca2.png)



![$ \mathbb{Z}\left[ \frac {1 + \sqrt {d}}{2} \right]$](http://latex.artofproblemsolving.com/c/0/a/c0ae68e8d16c12206903f46f102f2f9bb4d5673a.png)
![$ \mathbb{Z}[ \sqrt {d} ]$](http://latex.artofproblemsolving.com/1/d/0/1d01aa266cfda71df2c3c325a091a0728015a5e8.png)

![$ \mathbb{Z}[\varphi]$](http://latex.artofproblemsolving.com/7/3/d/73d24005b6ab6f907dd9aabdbee6d9c72a0fcd7d.png)

In any case, we can now explain the discrepancy between the Lipschitz and Hurwitz integers. The problem with the Lipschitz integers is that the fundamental region (a hypercube) of the lattice of multiples of some








Toolkit 3: Generating functions. Define





The motivation for the expression we're about to derive for


the two different choices of signs on the RHS correspond to different ways to write the RHS as a sum of two squares (except when one of the factors on the LHS is




This should already suggest the correct answer, but first of all, it's not (immediately) clear that different choices of signs will actually lead to different expressions, and second of all the counting can be elegantly done using the Dirichlet series

which, by the above discussion, has Euler product

(This argument appears in Ireland and Rosen and is quite nice.) (By comparison, letting


- The prime


- The primes


- The primes



The norm is easily computed in each case, and we therefore find that the above product can be written

The square in the second factor appears because





The remaining factor should remind you of the sum I called







which is the Dirichlet convolution


With less notation,








Will the same trick work on






Writing



I am not qualified to discuss the rest of the proof in depth (it involves more complex analysis than I'm comfortable with; consult Stein and Shakarchi), but it turns out that the theta function has modular symmetries, and as I mentioned in my previous post the general strategy here is to find another function with the same symmetries and compare their values in a fundamental domain (not to be confused with the fundamental region of a lattice). The relevant function is

and the relevant symmetries are



where




Unlike the proof for

The three approaches outlined above are quite different in philosophy, but perhaps the more interesting thing is that they're related. A lattice is a geometric depiction of an ideal, an essentially algebraic object, and generating functions are ways of letting us manipulate sequences as ensembles. Indeed, the analytic viewpoint is now indispensable to modern number theory.
If you looked at the sums of squares Mathworld page, the difficulty of the three-squares theorem is indicated by the fact that, unlike the other expressions,

Practice Problem 1: Prove Minkowski's theorem (in the plane).
Practice Problem 2: Prove that if



1)


2) If




Conclude the four-square theorem. (This is a generalization of what we did when


Practice Problem 3: Let the prime factorization of



Practice Problem 4: Verify that the numbers
