China Post 1: The problem of repeated roots
by t0rajir0u, Jul 19, 2008, 12:00 PM
This is the first in a series of three posts I'll have written while on vacation in China. Enjoy!
In my previous discussion of linear recurrences I restricted my attention to characteristic polynomials with no repeated roots. The structure of the corresponding solutions (sums of geometric series) was then straightforward and seen to be an exact analogue of partial fraction decomposition, among other phenomena. Let's explore what happens when this restriction is lifted.
The solutions to the corresponding recurrences surprised me when I first encountered them in the context of linear ODEs. For example, the characteristic polynomial
has repeated roots and the general solution to
is given by
for some constant
. Correspondingly, the general solution to the differential equation
is given by
. Plugging in and verifying is simple; are there deeper reasons why the repeated-roots solutions take this form?
Another of my discussions shows that the first solution is actually something we should expect. Again letting
denote the shift operator on sequences, the recurrence
can be "factored" (as per the ideas here)) as
. The operator
(where
denotes the identity operator) is in fact the forward difference operator (on the space of sequences, not on the space we get after performing a "Newton transform"!), and the set of sequences with second finite difference zero is clearly
.
So this particular result is no longer unexpected and leads to the corresponding result in linear ODEs. (The analogies between the forward difference and the derivative (i.e. the fact that the functions with second derivative zero are the linear functions), and the corresponding natural bases, are explored in the field of umbral calculus.) But this is as far as a repeated root of
goes; the solutions to
are given by
, the product of a linear and geometric sequence, and now we must develop a more general technique.
We saw before that using ordinary generating functions (i.e. thinking about ODEs and applying a Laplace transform) made life simple because it translated these problems to problems of partial fraction decomposition. Here our partial fraction decompositions involve terms of the form
for
, but the corresponding sequences are simple: they are given by

as per this post. The important point here (that is exactly the property that we want) is that the coefficient of
here is a polynomial in
of degree
, which is exactly the behavior we observed above for
. We are now prepared to state the following
Proposition: Let
be a sequence satisfying a recurrence given by a characteristic polynomial
with roots
and corresponding multiplicities
(in other words,
). Then

where
is a polynomial of degree at most
. (There are then
coefficients to work with, exactly the number of initial conditions we expect to uniquely determine the solution.)
The proof by ordinary generating functions is straightforward and follows more or less the route I have given above. It is interesting that our proof above has a combinatorial interpretation: the appearance of polynomial coefficients becomes a consequence of a counting argument (the balls-and-urns problem) that amounts to taking powers of a particular generating function. This proof is uniquely easy in this particular domain: if we try to translate it back to the space of bare sequences, we see that the special property here is that the vector space structure common to all three domains we discussed earlier has been augmented by a convolution

that is the product operation on two ordinary generating functions (and is also the proper way to make the space of terminating sequences behave like polynomials). In other words, what we have now is a (commutative and) associative algebra. While the definition of the convolution of two sequences in this way has a natural algebraic motivation in terms of ordinary generating functions, note that its motivation in terms of sequences alone is combinatorial in nature: if
counts the number of subsets of some set
with "weight"
(for an appropriate definition of "weight") and
counts the number of subsets of some set
with weight
, then
counts the number of subsets of
with weight
. Nevertheless, it is far easier to deal with this convolution as it applies to ordinary generating functions on a purely algebraic basis.
So much for a combinatorial proof. Let me motivate a second proof by working with exponential generating functions (ODEs) instead.
In my previous post I connected the Fibonacci sequence to a matrix I called the Fibonacci matrix. This connection can be generalized, and its construction can be motivated in two ways, both of which I'd prefer to demonstrate by example. The recurrence
(which has characteristic polynomial
) is equivalent to the differential equation
, which we will solve with the substitutions (exactly the ones I used before)


which gives the system (written in matrix notation)
.
The matrix that appears here is (the transpose of) the companion matrix of
. Its characteristic and minimal polynomial are both
(this generalizes). As before, the solution to the above system is
,
so the powers of the companion matrix give us the solution to our linear recurrence, which we already know to be in the form
. What we are interested in is how this form can be derived from the form of
alone.
To determine the way the Fibonacci matrix behaved (and therefore to prove Binet's formula) we diagonalized it by finding its eigendecomposition. Unfortunately, it's not possible to diagonalize
; the eigenspace associated with the eigenvalue
has dimension
(in fact, is spanned by
), not
. So what can we do instead of diagonalize?
It turns out that in general the "closest" we can get to diagonalizing is a nearly-diagonal form called Jordan normal form. The Jordan normal form of
is composed of the single Jordan block
![$ \left[ \begin{array}{ccc} 2 & 1 & 0 \\
0 & 2 & 1 \\
0 & 0 & 2 \end{array} \right]$](//latex.artofproblemsolving.com/7/7/2/7729c90dc40157dd0c7ef07ec503586d17822618.png)
and therefore to study the powers of
(and companion matrices of polynomials with repeated roots in general) it suffices to study the powers of Jordan blocks. In fact, we should expect that such matrices are not diagonalizable; if they were, the corresponding linear recurrences would be describable as if their characteristic polynomials did not have repeated roots. Hence the need for polynomial coefficients in the general solution to the recurrence problem is intimately related to the need for Jordan blocks in the general diagonalization problem.
We expect that the powers of Jordan blocks are describable in terms of polynomials (equivalently, binomial coefficients) of degree
. A few computations suggest that
![$ \left[ \begin{array}{ccc} 2 & 1 & 0 \\
0 & 2 & 1 \\
0 & 0 & 2 \end{array} \right]^n = \left[ \begin{array}{ccc} 2^n & {n \choose 1} 2^{n - 1} & {n \choose 2} 2^{n - 2} \\
0 & 2^n & {n \choose 1} 2^{n - 1} \\
0 & 0 & 2^n \end{array} \right]$](//latex.artofproblemsolving.com/a/2/8/a28f1017689aa0206d08dd77432575a44368edc8.png)
which is certainly a pattern we expect from our previous discussion of partial fractions, but can we justify this pattern using matrical methods alone? (The inductive proof offers some insight - the recurrence involved is just Pascal's rule - but I'd like to go deeper.)
The motivation we need is to go back to the notion of forward difference, which corresponds to Jordan blocks with eigenvalue
. We saw that the forward difference operator could be written as
where
is the shift operator (on infinite sequences). Now note that our Jordan block can be written as
![$ \left[ \begin{array}{ccc} 0 & 1 & 0 \\
0 & 0 & 1 \\
0 & 0 & 0 \end{array} \right] + 2 \mathbf{I}_3$](//latex.artofproblemsolving.com/4/b/5/4b5016459d96d037eaa32c2df4f05a5d5160bfe1.png)
where the operator on the left takes
to
- in other words, it is a finite shift operator (as opposed to the infinite shift operator we have been working with!). We will denote this matrix
, as it is closely related to
. Unlike the infinite shift operator, a finite shift operator is nilpotent - in particular,
, which makes the powers of the Jordan block immediately transparent. Now just the observation that
![$ \mathbf{S}_3^2 = \left[ \begin{array}{ccc} 0 & 0 & 1 \\
0 & 0 & 0 \\
0 & 0 & 0 \end{array} \right]$](//latex.artofproblemsolving.com/5/7/d/57da1ba04fc2be4ed553297cc12c29423603d493.png)
is enough for us to deduce the form of the powers of a Jordan block:
.
Of course, all this discussion generalizes to arbitrary Jordan blocks, which are really just the matrices
and are intimately related to our discussion of the shift operator on sequences (in particular, note that a linear recurrence can be factored into operators of the form
). This is the essential ingredient in the matrix proof of our Proposition that we wanted, and we get a little extra - we started off characterizing homogeneous linear recurrences and we have now characterized the powers of any matrix (through Jordan normal form). This justifies the statement in the Wikipedia article that "in principle the Jordan form could give a closed-form expression for the exponential exp(A)."
So we've shown that the behavior of a homogeneous linear recurrence is determined by the powers of the companion matrix of its characteristic polynomial. The natural question now is whether the set of companion matrices includes similarity classes of every matrix, but the answer is clearly no. A companion matrix has the property that its characteristic and minimal polynomials are identical, which is far from the case in general, so the matrix solution to the problem of homogeneous linear recurrences is just a special case of the description of the powers of matrices in general. In other other words, disappointingly enough our Proposition above cannot be used to prove the Jordan normal form theorem (so I am not quite Terence Tao!).
Nevertheless, the connection between partial fraction decomposition and powers of Jordan blocks is instructive. While I cannot offer a full proof of the Jordan normal form theorem along these lines, let me discuss a useful technique. To analyze a matrix
, pick a vector
. The sequence of vectors
sits in a vector space of dimension
, so there exists a nontrivial linear dependence. In particular, some polynomial
exists such that
.
The span of the above vectors is (I found out) known as the
-cyclic subspace of
generated by
. Relative to the basis
, the matrix of
(restricted to this subspace) is precisely the companion matrix of
. (These ideas lead to a very natural interpretation of the rational canonical form or Frobenius normal form, which generalizes the companion matrix construction.) If
contains the eigenvalues of
and if
is an eigenvector of
with eigenvalue
, then
, a linear polynomial. It is readily seen, therefore, that

where the product is taken over all eigenvalues of
and
is any vector in the span of the eigenvectors of
. In the case that
has
distinct eigenvectors, this is all of
and the polynomial on the LHS is clearly the characteristic (and minimal) polynomial of
(in other words, here is a special case of the Cayley-Hamilton theorem). As before, however, if the eigenvectors of
do not span
(which is equivalent to the presence of repeated roots in the characteristic polynomial of
), we have at best a polynomial that divides the minimal polynomial of
, which in turn divides the characteristic polynomial of
.
We already know the structure the characteristic and minimal polynomials have. The natural extension from here is to augment the set of eigenvectors with the set of generalized eigenvectors, that is, vectors such that
for some power
. The appropriate values of
then give us both the minimal and characteristic polynomials of
, and the appropriate generalized eigenvectors form a basis for
which can be used to prove the Jordan normal form theorem computationally. Unfortunately, it is here that I am out of details.
Practice Problem 1: Find a matrix whose minimal polynomial is not its characteristic polynomial (without cheating and starting from its Jordan normal form!).
Practice Problem 2: The main proposition proven here appears in the following guise in the Princeton Lectures in Analysis:
Let
be a rational function with
and
on the real axis. Prove that if
are the roots of
in the upper half-plane, then there exists polynomials
of degree less than the multiplicity of
such that

when
.
In my previous discussion of linear recurrences I restricted my attention to characteristic polynomials with no repeated roots. The structure of the corresponding solutions (sums of geometric series) was then straightforward and seen to be an exact analogue of partial fraction decomposition, among other phenomena. Let's explore what happens when this restriction is lifted.
The solutions to the corresponding recurrences surprised me when I first encountered them in the context of linear ODEs. For example, the characteristic polynomial






Another of my discussions shows that the first solution is actually something we should expect. Again letting






So this particular result is no longer unexpected and leads to the corresponding result in linear ODEs. (The analogies between the forward difference and the derivative (i.e. the fact that the functions with second derivative zero are the linear functions), and the corresponding natural bases, are explored in the field of umbral calculus.) But this is as far as a repeated root of



We saw before that using ordinary generating functions (i.e. thinking about ODEs and applying a Laplace transform) made life simple because it translated these problems to problems of partial fraction decomposition. Here our partial fraction decompositions involve terms of the form



as per this post. The important point here (that is exactly the property that we want) is that the coefficient of




Proposition: Let






where



The proof by ordinary generating functions is straightforward and follows more or less the route I have given above. It is interesting that our proof above has a combinatorial interpretation: the appearance of polynomial coefficients becomes a consequence of a counting argument (the balls-and-urns problem) that amounts to taking powers of a particular generating function. This proof is uniquely easy in this particular domain: if we try to translate it back to the space of bare sequences, we see that the special property here is that the vector space structure common to all three domains we discussed earlier has been augmented by a convolution

that is the product operation on two ordinary generating functions (and is also the proper way to make the space of terminating sequences behave like polynomials). In other words, what we have now is a (commutative and) associative algebra. While the definition of the convolution of two sequences in this way has a natural algebraic motivation in terms of ordinary generating functions, note that its motivation in terms of sequences alone is combinatorial in nature: if









So much for a combinatorial proof. Let me motivate a second proof by working with exponential generating functions (ODEs) instead.
In my previous post I connected the Fibonacci sequence to a matrix I called the Fibonacci matrix. This connection can be generalized, and its construction can be motivated in two ways, both of which I'd prefer to demonstrate by example. The recurrence





which gives the system (written in matrix notation)
![$ \left[ \begin{array}{c} y' \\
x' \\
z' \end{array} \right] = \left[ \begin{array}{ccc} 0 & 1 & 0 \\
0 & 0 & 1 \\
8 & - 12 & 6 \end{array} \right] \left[ \begin{array}{c} y \\
x \\
z \end{array} \right]$](http://latex.artofproblemsolving.com/0/6/5/0654cc98c0225f5d5fa70158d1065d49d180f400.png)
The matrix that appears here is (the transpose of) the companion matrix of


![$ \left[ \begin{array}{c} y(t) \\
x(t) \\
z(t) \end{array} \right] = e^{\mathbf{M} t} \left[ \begin{array}{c} y(0) \\
x(0) \\
z(0) \end{array} \right]$](http://latex.artofproblemsolving.com/8/2/e/82ee10680905ce65160a27ef0220b39bfcca0182.png)
so the powers of the companion matrix give us the solution to our linear recurrence, which we already know to be in the form


To determine the way the Fibonacci matrix behaved (and therefore to prove Binet's formula) we diagonalized it by finding its eigendecomposition. Unfortunately, it's not possible to diagonalize





It turns out that in general the "closest" we can get to diagonalizing is a nearly-diagonal form called Jordan normal form. The Jordan normal form of

![$ \left[ \begin{array}{ccc} 2 & 1 & 0 \\
0 & 2 & 1 \\
0 & 0 & 2 \end{array} \right]$](http://latex.artofproblemsolving.com/7/7/2/7729c90dc40157dd0c7ef07ec503586d17822618.png)
and therefore to study the powers of

We expect that the powers of Jordan blocks are describable in terms of polynomials (equivalently, binomial coefficients) of degree

![$ \left[ \begin{array}{ccc} 2 & 1 & 0 \\
0 & 2 & 1 \\
0 & 0 & 2 \end{array} \right]^n = \left[ \begin{array}{ccc} 2^n & {n \choose 1} 2^{n - 1} & {n \choose 2} 2^{n - 2} \\
0 & 2^n & {n \choose 1} 2^{n - 1} \\
0 & 0 & 2^n \end{array} \right]$](http://latex.artofproblemsolving.com/a/2/8/a28f1017689aa0206d08dd77432575a44368edc8.png)
which is certainly a pattern we expect from our previous discussion of partial fractions, but can we justify this pattern using matrical methods alone? (The inductive proof offers some insight - the recurrence involved is just Pascal's rule - but I'd like to go deeper.)
The motivation we need is to go back to the notion of forward difference, which corresponds to Jordan blocks with eigenvalue



![$ \left[ \begin{array}{ccc} 0 & 1 & 0 \\
0 & 0 & 1 \\
0 & 0 & 0 \end{array} \right] + 2 \mathbf{I}_3$](http://latex.artofproblemsolving.com/4/b/5/4b5016459d96d037eaa32c2df4f05a5d5160bfe1.png)
where the operator on the left takes





![$ \mathbf{S}_3^2 = \left[ \begin{array}{ccc} 0 & 0 & 1 \\
0 & 0 & 0 \\
0 & 0 & 0 \end{array} \right]$](http://latex.artofproblemsolving.com/5/7/d/57da1ba04fc2be4ed553297cc12c29423603d493.png)
is enough for us to deduce the form of the powers of a Jordan block:

Of course, all this discussion generalizes to arbitrary Jordan blocks, which are really just the matrices


So we've shown that the behavior of a homogeneous linear recurrence is determined by the powers of the companion matrix of its characteristic polynomial. The natural question now is whether the set of companion matrices includes similarity classes of every matrix, but the answer is clearly no. A companion matrix has the property that its characteristic and minimal polynomials are identical, which is far from the case in general, so the matrix solution to the problem of homogeneous linear recurrences is just a special case of the description of the powers of matrices in general. In other other words, disappointingly enough our Proposition above cannot be used to prove the Jordan normal form theorem (so I am not quite Terence Tao!).
Nevertheless, the connection between partial fraction decomposition and powers of Jordan blocks is instructive. While I cannot offer a full proof of the Jordan normal form theorem along these lines, let me discuss a useful technique. To analyze a matrix






The span of the above vectors is (I found out) known as the













where the product is taken over all eigenvalues of












We already know the structure the characteristic and minimal polynomials have. The natural extension from here is to augment the set of eigenvectors with the set of generalized eigenvectors, that is, vectors such that





Practice Problem 1: Find a matrix whose minimal polynomial is not its characteristic polynomial (without cheating and starting from its Jordan normal form!).
Practice Problem 2: The main proposition proven here appears in the following guise in the Princeton Lectures in Analysis:
Let








when
