Coefficients in the closed forms of linear recurrences
by t0rajir0u, Jan 12, 2009, 10:39 PM
Terence Tao recently wrote a post that describes, among other things, how to use test cases to quickly recover parts of formulas. Exercise 1, in particular, should remind you of a recent post here where the same trick was used to tease out the coefficients of the DFT matrix of order
, but using it to derive Lagrange interpolation is something I had missed when writing up the Lagrange interpolation post, and it reminded me of another context in which I'd seen the same trick.
But first, to clarify: let
be a polynomial of degree
. Suppose we are given that
for
where the
are distinct (so the interpolation problem is well-posed). Write
.
What are the coefficients
? The test case
for a particular
tells us that the residue at
on both sides must be equal; equivalently, multiplying by
on both sides gives

and substituting
gives
,
precisely the same Lagrange interpolation coefficient we can find by other methods. The actual interpolation is given by multiplying

as usual. The DFT problem is recovered by considering the interpolation question of what polynomial
of degree at most
satisfies

which is, of course,
, and consequently finding the partial fraction decomposition of
.
Note the similarity between this and the way the DFT problem was posed earlier in terms of finding the "closed form" of a sequence satisfying
. In my previous discussions of the general issue of understanding homogeneous linear recurrences, I've neglected the exact coefficients of the various terms in the closed form in favor of a clear understanding of the bases of the exponentials that appear in those terms. It's been enough to know that the coefficients are a linear function of the initial values. There are certain special cases, however, where the value of the coefficient becomes important besides the DFT case.
Problem: A six-sided die is rolled repeatedly and its values tallied up. Let
denote the probability that at some point the sum of the values of the rolls is
. Compute
.
Setup. Before I get into the computation, let's take a moment to think about this problem from an elementary perspective. As
gets large, the partial sums increase by
on average (the expected value of a dice roll) and it's intuitive (although not immediately obvious) that the probability
becomes uniform (I should say that the limit exists); in fact, the limit should clearly be
. Generally, given an arbitrary die we should expect the limit to be one over the expected value.
Can we be more rigorous? A moment's thought reveals that the probability
satisfies a linear recurrence: the partial sum can only be
if it was one of
before, so we have the straightforward recurrence
.
The characteristic polynomial of this recurrence is
. Here we run into a problem: we don't know the roots of this polynomial (other than the obvious one,
). The root
should control the limiting behavior of the sequence, so we want to show a few things:
1) All the other roots have absolute value strictly less than
. This (almost) ensures that the limit exists.
2) The root
occurs with multiplicity
. This ensures that the limit exists. (If you're not sure why any of this is true, review the previous post on this subject.)
3) We can compute the coefficient of
in the closed form of
. This tells us what the limit is.
At this point, there are two related approaches we can take.
Solution 1. 1) can be reworded as follows: the companion matrix
of
(which acts on the column vector with components
) has eigenvalue
with multiplicity
and all other eigenvalues have absolute value less than
. This sort of statement generalizes significantly; the Perron-Frobenius theorem states that this is true of any matrix with strictly positive entries (that is, there exists a strictly maximal eigenvalue). Of course,
isn't of this form, but it's enough to show that some power of it is.
Regard
as the adjacency matrix of a weighted, directed graph on
vertices. We begin with a directed path of length
(where each vertex
points to the next vertex
) and in addition specify weighted edges from
to every vertex, each of weight
. The powers
describe the number of paths of length
between vertices.
These paths are easy to exhibit explicitly: the only paths between two vertices
occur when we either travel directly between the two (when
) or when we travel to
, stay there for awhile, and travel back. In other words, for
there exist paths between any two vertices of length exactly
. (It is instructive to count exactly how many paths there are as a polynomial in
; the actual entry in the corresponding power of
is weighted by the number of times the path stays at
.) This lets us apply the Perron-Frobenius theorem to
and then we have our result.
has a special property that admits an addditional interpretation: it is the transition matrix of a finite Markov chain, and the vertices of the above graph describe its state space. That is, our Markov process consists of
states where the first five states transition to the next and the sixth state transitions, with uniform probability, to any of the other states. The Perron-Frobenius applied to a stochastic matrix tells us that
is a strictly maximal eigenvalue, which is exactly the nice result we'd like for a Markov process: it tells us that there exists a stationary distribution.
We are done here if we can compute the stationary distribution. The Perron-Frobenius theorem gets us 2) and, in fact, also gives us 3): it says that there is a unique row eigenvector
with entries summing to
such that

and
is the row eigenvector associated with the eigenvalue
. At this point it's only a quick calculation to verify that this eigenvector is

and now we only have to compute the initial values. Considering how much work I will do to avoid this calculation, it is not in fact a difficult one, but instead I will motivate a different calculation that will get us the answer without needing to compute the initial values. The important result to keep in mind is that we did not have to compute the other five eigenvectors of
(or the corresponding eigenvalues).
Solution 2. In the interest of pursuing the original point and avoiding the use of the Perron-Frobenius theorem, the proof of 1) is fairly straightforward. Suppose
is a root of
. If
, then

with equality if and only if
and
all point in the same direction (the last step is the triangle inequality), which is true if and only if
. Alternately,
is a convex combination of points that lie on the unit circle, which itself lies on the unit circle if and only if it one of the vertices of the convex hull of
(which is just the points themselves). This implies that
must be a root of unity of order at most
, and then one can verify directly that only
works. 2) can be proven by a similar argument applied to
or by dividing by
and substituting in
. How should we calculate 3)?
Considering I built up this discussion by talking about partial fraction decompositions, let's take the generating functions approach and try to figure out what
should look like. The generating function that describes the probability distribution of the sums of
independent dice rolls is just

and since the number of independent dice rolls takes values over all
, the generating function we want is
.
It's worth pointing out that
is easy to calculate even if the initial conditions aren't because of the way the problem is set up. Its denominator has roots which are the reciprocal of those of
as expected. Now, the important observation: we don't have to compute the complete partial fraction decomposition to isolate the coefficient of
! All we need to do is to factor
out of the denominator, like so:

and then we can compute the partial fraction decomposition using these factors alone. To be even lazier, the residue at
can be evaluated using the same trick as we used before: multiplying by
, this is just

exactly as we expected. Note that the computation we are performing here is, by, l'Hopital's rule, the evaluation of the derivative of the denominator of
at
, which is just the expected value of a single dice roll. This observation, as well as the rest of the argument, generalizes (see the Practice Problem).
The main result to take away from this discussion is the following
Proposition: Let
be a sequence satisfying a homogeneous linear recurrence with characteristic polynomial
of degree
, and suppose that
is a known root of
. Then the coefficient of
(which is in general a polynomial in
) in the closed form of
is a rational function of the initial conditions and
(and in particular does not depend on the nature of the rest of the roots of
).
I mention this largely because it violated my intuition: I had previously thought that it wasn't possible to find the coefficients of the closed form of a given recurrence without computing every root of the characteristic polynomial.
Despite the manner in which we first arrived at this statement, it can be proven in a very straightforward manner. We'll suppose for brevity that
has multiplicity
. Write
and write
. Then
satisfies the (shorter) recurrence with characteristic polynomial
rather than
. Given, therefore, the initial conditions


...

along with the expression of
in terms of
provides a system of
equations in
variables which we can solve. In particular,
has some coefficients
such that

hence such that (adding appropriate multiples of the equations above)

.
The numerator can be understood as "quotienting out" the initial values by the roots of
whereas the denominator can be understood either as a Lagrange interpolation coefficient or as
.
Clearly there is an interesting connection between the Lagrange interpolation problem and the problem of figuring out the coefficients of the closed form of a linear recurrence. Well, perhaps it's not too interesting. Approaching the problem "forwardly" (that is, using the most basic tools first), let's set up a system. If the roots of
are given by
(forgive me for switching around notation all the time), then the problem of determining the coefficients in

(where again I have assumed that all roots occur with multiplicity
) is simply the problem of solving the system
![$ \left[ \begin{array}{cccc} 1 & 1 & \hdots & 1 \\
r_1 & r_2 & \hdots & r_n \\
r_1^2 & r_2^2 & \hdots & r_n^2 \\
\vdots & \vdots & \ddots & \vdots \\
r_1^{n - 1} & r_2^{n - 1} & \hdots & r_n^{n - 1} \\
\end{array} \right] \left[ \begin{array}{c} c_1 \\
c_2 \\
c_3 \\
\vdots \\
c_n \end{array} \right] = \left[ \begin{array}{c} s_0 \\
s_1 \\
s_2 \\
\vdots \\
s_{n - 1} \end{array} \right]$](//latex.artofproblemsolving.com/5/5/5/55562909b6c76e3234921b2395d1c273bef358f4.png)
and what is the matrix that should appear but the transpose of the Vandermonde matrix of the roots! We have already seen that the finding-coefficients problem and the Lagrange interpolation problem are in some sense adjoint to each other in the special case of the DFT problem; how should we generalize?
One interpretation uses the partial fractions approach. In the Lagrange interpolation problem we are given the values of a polynomial
at some points; these values determine the coefficients of the partial fraction decomposition of a certain rational function with
, from which the coefficients of
can be computed. In the linear recurrence problem we are given the initial values of a recurrence
; these values determine the coefficients of the numerator of a certain rational function, from which the coefficients of its partial fraction decomposition can be computed.
Practice Problem 1: Generalize the conclusion of the solution to Problem 1. That is, given a die with probability distribution
(where
describes the probability of rolling a
) such that
and
, what is the limiting behavior of the probability that partial sums of repeated rolls hits a value of
? (In particular, when does this sequence converge? The answer is not "always.") This is equivalent to the problem of finding the outcome of a sequence of repeated weighted averages

when the number of faces of the die is finite. (The first thing you should do is generalize the verification of condition 1) and examine the cases in which it fails.)
Practice Problem 2: Interpret the
Jordan block with eigenvalue
as an adjacency matrix and compute its powers by a counting argument. How does this prove 1) the binomial theorem and 2) the balls-and-urns formula? Compare with the ordinary generating functions proof.
Practice Problem 3: In keeping with our observation that for special cases the ordinary generating function is easier to compute than the initial conditions, verify that the identity
is equivalent to Newton's sums.
(Can you find a proof of Newton's sums by computing the trace of the
power of the companion matrix of
? in a clever way? I can't.
)
Practice Problem 4: One way to state the fundamental theorem of symmetric polynomials is that any function of the roots of an irreducible polynomial
invariant under any permutation of the roots is a function of its coefficients. The discriminant is an example of such a function. Prove this fact by taking tensor powers of the companion matrix of
. (I don't actually know if this works.
)

But first, to clarify: let






What are the coefficients






and substituting


precisely the same Lagrange interpolation coefficient we can find by other methods. The actual interpolation is given by multiplying

as usual. The DFT problem is recovered by considering the interpolation question of what polynomial



which is, of course,


Note the similarity between this and the way the DFT problem was posed earlier in terms of finding the "closed form" of a sequence satisfying

Problem: A six-sided die is rolled repeatedly and its values tallied up. Let



Setup. Before I get into the computation, let's take a moment to think about this problem from an elementary perspective. As




Can we be more rigorous? A moment's thought reveals that the probability




The characteristic polynomial of this recurrence is



1) All the other roots have absolute value strictly less than

2) The root


3) We can compute the coefficient of


At this point, there are two related approaches we can take.
Solution 1. 1) can be reworded as follows: the companion matrix







Regard









These paths are easy to exhibit explicitly: the only paths between two vertices












We are done here if we can compute the stationary distribution. The Perron-Frobenius theorem gets us 2) and, in fact, also gives us 3): it says that there is a unique row eigenvector



and



and now we only have to compute the initial values. Considering how much work I will do to avoid this calculation, it is not in fact a difficult one, but instead I will motivate a different calculation that will get us the answer without needing to compute the initial values. The important result to keep in mind is that we did not have to compute the other five eigenvectors of

Solution 2. In the interest of pursuing the original point and avoiding the use of the Perron-Frobenius theorem, the proof of 1) is fairly straightforward. Suppose




with equality if and only if











Considering I built up this discussion by talking about partial fraction decompositions, let's take the generating functions approach and try to figure out what



and since the number of independent dice rolls takes values over all


It's worth pointing out that





and then we can compute the partial fraction decomposition using these factors alone. To be even lazier, the residue at



exactly as we expected. Note that the computation we are performing here is, by, l'Hopital's rule, the evaluation of the derivative of the denominator of


The main result to take away from this discussion is the following
Proposition: Let










I mention this largely because it violated my intuition: I had previously thought that it wasn't possible to find the coefficients of the closed form of a given recurrence without computing every root of the characteristic polynomial.
Despite the manner in which we first arrived at this statement, it can be proven in a very straightforward manner. We'll suppose for brevity that









...

along with the expression of







hence such that (adding appropriate multiples of the equations above)


The numerator can be understood as "quotienting out" the initial values by the roots of


Clearly there is an interesting connection between the Lagrange interpolation problem and the problem of figuring out the coefficients of the closed form of a linear recurrence. Well, perhaps it's not too interesting. Approaching the problem "forwardly" (that is, using the most basic tools first), let's set up a system. If the roots of



(where again I have assumed that all roots occur with multiplicity

![$ \left[ \begin{array}{cccc} 1 & 1 & \hdots & 1 \\
r_1 & r_2 & \hdots & r_n \\
r_1^2 & r_2^2 & \hdots & r_n^2 \\
\vdots & \vdots & \ddots & \vdots \\
r_1^{n - 1} & r_2^{n - 1} & \hdots & r_n^{n - 1} \\
\end{array} \right] \left[ \begin{array}{c} c_1 \\
c_2 \\
c_3 \\
\vdots \\
c_n \end{array} \right] = \left[ \begin{array}{c} s_0 \\
s_1 \\
s_2 \\
\vdots \\
s_{n - 1} \end{array} \right]$](http://latex.artofproblemsolving.com/5/5/5/55562909b6c76e3234921b2395d1c273bef358f4.png)
and what is the matrix that should appear but the transpose of the Vandermonde matrix of the roots! We have already seen that the finding-coefficients problem and the Lagrange interpolation problem are in some sense adjoint to each other in the special case of the DFT problem; how should we generalize?
One interpretation uses the partial fractions approach. In the Lagrange interpolation problem we are given the values of a polynomial




Practice Problem 1: Generalize the conclusion of the solution to Problem 1. That is, given a die with probability distribution







when the number of faces of the die is finite. (The first thing you should do is generalize the verification of condition 1) and examine the cases in which it fails.)
Practice Problem 2: Interpret the


Practice Problem 3: In keeping with our observation that for special cases the ordinary generating function is easier to compute than the initial conditions, verify that the identity

(Can you find a proof of Newton's sums by computing the trace of the



Practice Problem 4: One way to state the fundamental theorem of symmetric polynomials is that any function of the roots of an irreducible polynomial


