Coefficients in the closed forms of linear recurrences

by t0rajir0u, Jan 12, 2009, 10:39 PM

Terence Tao recently wrote a post that describes, among other things, how to use test cases to quickly recover parts of formulas. Exercise 1, in particular, should remind you of a recent post here where the same trick was used to tease out the coefficients of the DFT matrix of order $ n$, but using it to derive Lagrange interpolation is something I had missed when writing up the Lagrange interpolation post, and it reminded me of another context in which I'd seen the same trick.

But first, to clarify: let $ P(x)$ be a polynomial of degree $ n$. Suppose we are given that $ P(x_i) = y_i$ for $ i = 0, 1, 2, ... n$ where the $ x_i$ are distinct (so the interpolation problem is well-posed). Write

$ \frac {P(x)}{\prod (x - x_i)} = \sum \frac {c_i}{x - x_i}$.

What are the coefficients $ c_i$? The test case $ x \to x_k$ for a particular $ k$ tells us that the residue at $ x_i$ on both sides must be equal; equivalently, multiplying by $ x - x_k$ on both sides gives

$ \frac {P(x)}{\prod_{i \neq k} (x - x_i)} = c_k + \sum \frac {c_i(x - x_k)}{x - x_i}$

and substituting $ x = x_k$ gives

$ \frac {y_k}{\prod_{i \neq k} (x_k - x_i)} = c_k$,

precisely the same Lagrange interpolation coefficient we can find by other methods. The actual interpolation is given by multiplying

$ P(x) = \sum c_k \prod_{i \neq k} (x - x_i)$

as usual. The DFT problem is recovered by considering the interpolation question of what polynomial $ P(x)$ of degree at most $ n$ satisfies

$ P(e^{\frac {2 \pi i k}{n} }) = e^{\frac {2 \pi i km}{n} }, k = 0, 1, ... n$

which is, of course, $ P(x) = x^m$, and consequently finding the partial fraction decomposition of $ \frac {x^m}{x^n - 1}$.

Note the similarity between this and the way the DFT problem was posed earlier in terms of finding the "closed form" of a sequence satisfying $ s_{k + n} = s_k$. In my previous discussions of the general issue of understanding homogeneous linear recurrences, I've neglected the exact coefficients of the various terms in the closed form in favor of a clear understanding of the bases of the exponentials that appear in those terms. It's been enough to know that the coefficients are a linear function of the initial values. There are certain special cases, however, where the value of the coefficient becomes important besides the DFT case.

Problem: A six-sided die is rolled repeatedly and its values tallied up. Let $ p_n$ denote the probability that at some point the sum of the values of the rolls is $ n$. Compute $ \lim_{n \to \infty} p_n$.

Setup. Before I get into the computation, let's take a moment to think about this problem from an elementary perspective. As $ n$ gets large, the partial sums increase by $ \frac {7}{2}$ on average (the expected value of a dice roll) and it's intuitive (although not immediately obvious) that the probability $ p_n$ becomes uniform (I should say that the limit exists); in fact, the limit should clearly be $ \frac {2}{7}$. Generally, given an arbitrary die we should expect the limit to be one over the expected value.

Can we be more rigorous? A moment's thought reveals that the probability $ p_n$ satisfies a linear recurrence: the partial sum can only be $ n$ if it was one of $ n - 1, n - 2, ... n - 6$ before, so we have the straightforward recurrence

$ p_n = \frac {1}{6} \left( p_{n - 1} + p_{n - 2} + ... + p_{n - 6} \right)$.

The characteristic polynomial of this recurrence is $ f(x) = x^6 - \frac {x^5 + x^4 + x^3 + x^2 + x + 1}{6}$. Here we run into a problem: we don't know the roots of this polynomial (other than the obvious one, $ 1$). The root $ 1$ should control the limiting behavior of the sequence, so we want to show a few things:

1) All the other roots have absolute value strictly less than $ 1$. This (almost) ensures that the limit exists.

2) The root $ 1$ occurs with multiplicity $ 1$. This ensures that the limit exists. (If you're not sure why any of this is true, review the previous post on this subject.)

3) We can compute the coefficient of $ 1$ in the closed form of $ p_n$. This tells us what the limit is.

At this point, there are two related approaches we can take.

Solution 1. 1) can be reworded as follows: the companion matrix $ \mathbf{A}$ of $ f(x)$ (which acts on the column vector with components $ p_{1 + k}, p_{2 + k}, ... p_{6 + k}$) has eigenvalue $ 1$ with multiplicity $ 1$ and all other eigenvalues have absolute value less than $ 1$. This sort of statement generalizes significantly; the Perron-Frobenius theorem states that this is true of any matrix with strictly positive entries (that is, there exists a strictly maximal eigenvalue). Of course, $ \mathbf{A}$ isn't of this form, but it's enough to show that some power of it is.

Regard $ \mathbf{A}$ as the adjacency matrix of a weighted, directed graph on $ n$ vertices. We begin with a directed path of length $ 6$ (where each vertex $ v_i$ points to the next vertex $ v_{i + 1}$) and in addition specify weighted edges from $ v_6$ to every vertex, each of weight $ \frac {1}{6}$. The powers $ \mathbf{A}^k$ describe the number of paths of length $ k$ between vertices.

These paths are easy to exhibit explicitly: the only paths between two vertices $ v_i, v_j, i, j < 6$ occur when we either travel directly between the two (when $ i < j$) or when we travel to $ v_6$, stay there for awhile, and travel back. In other words, for $ k > 6$ there exist paths between any two vertices of length exactly $ k$. (It is instructive to count exactly how many paths there are as a polynomial in $ \frac {1}{6}$; the actual entry in the corresponding power of $ \mathbf{A}$ is weighted by the number of times the path stays at $ v_6$.) This lets us apply the Perron-Frobenius theorem to $ \mathbf{A}^k$ and then we have our result.

$ \mathbf{A}$ has a special property that admits an addditional interpretation: it is the transition matrix of a finite Markov chain, and the vertices of the above graph describe its state space. That is, our Markov process consists of $ 6$ states where the first five states transition to the next and the sixth state transitions, with uniform probability, to any of the other states. The Perron-Frobenius applied to a stochastic matrix tells us that $ 1$ is a strictly maximal eigenvalue, which is exactly the nice result we'd like for a Markov process: it tells us that there exists a stationary distribution.

We are done here if we can compute the stationary distribution. The Perron-Frobenius theorem gets us 2) and, in fact, also gives us 3): it says that there is a unique row eigenvector $ \mathbf{v}$ with entries summing to $ 1$ such that

$ \lim_{n \to \infty} \mathbf{A}^n = \mathbf{1} \mathbf{v}$

and $ \mathbf{v}$ is the row eigenvector associated with the eigenvalue $ 1$. At this point it's only a quick calculation to verify that this eigenvector is

$ \mathbf{v} = \left < \frac {6}{21}, \frac {5}{21}, ... \frac {1}{21} \right >$

and now we only have to compute the initial values. Considering how much work I will do to avoid this calculation, it is not in fact a difficult one, but instead I will motivate a different calculation that will get us the answer without needing to compute the initial values. The important result to keep in mind is that we did not have to compute the other five eigenvectors of $ \mathbf{A}$ (or the corresponding eigenvalues).


Solution 2. In the interest of pursuing the original point and avoiding the use of the Perron-Frobenius theorem, the proof of 1) is fairly straightforward. Suppose $ z$ is a root of $ f(x)$. If $ |z| \ge 1$, then

$ |z|^6 = \frac {|z|^6 + ... + |z|^6}{6} \ge \frac {|z|^5 + ... + 1}{6} \ge \left| \frac {z^5 + ... + 1}{6} \right|$

with equality if and only if $ |z| = 1$ and $ z^5, z^4, ... z, 1$ all point in the same direction (the last step is the triangle inequality), which is true if and only if $ z = 1$. Alternately, $ \frac {z^5 + ... + 1}{6}$ is a convex combination of points that lie on the unit circle, which itself lies on the unit circle if and only if it one of the vertices of the convex hull of $ z^5, z^4, ... 1$ (which is just the points themselves). This implies that $ z$ must be a root of unity of order at most $ 6$, and then one can verify directly that only $ 1$ works. 2) can be proven by a similar argument applied to $ f'(x)$ or by dividing by $ x - 1$ and substituting in $ x = 1$. How should we calculate 3)?

Considering I built up this discussion by talking about partial fraction decompositions, let's take the generating functions approach and try to figure out what $ F(x) = \sum_{n \ge 0} p_n x^n$ should look like. The generating function that describes the probability distribution of the sums of $ k$ independent dice rolls is just

$ \left( \frac {x^6 + ... + x}{6} \right)^k$

and since the number of independent dice rolls takes values over all $ k$, the generating function we want is

$ F(x) = \sum_{k \ge 0} \left( \frac {x^6 + ... + x}{6} \right)^k = \frac {1}{1 - \frac {x^6 + ... + x}{6} }$.

It's worth pointing out that $ F(x)$ is easy to calculate even if the initial conditions aren't because of the way the problem is set up. Its denominator has roots which are the reciprocal of those of $ f(x)$ as expected. Now, the important observation: we don't have to compute the complete partial fraction decomposition to isolate the coefficient of $ 1$! All we need to do is to factor $ 1 - x$ out of the denominator, like so:

$ \frac {1}{6} (1 - x)(6 + 5x + ... + x^5)$

and then we can compute the partial fraction decomposition using these factors alone. To be even lazier, the residue at $ 1$ can be evaluated using the same trick as we used before: multiplying by $ 1 - x$, this is just

$ \frac {6}{6 + 5 + 4 + ... + 1} = \frac {2}{7}$

exactly as we expected. Note that the computation we are performing here is, by, l'Hopital's rule, the evaluation of the derivative of the denominator of $ F(x)$ at $ 1$, which is just the expected value of a single dice roll. This observation, as well as the rest of the argument, generalizes (see the Practice Problem).


The main result to take away from this discussion is the following

Proposition: Let $ s_k$ be a sequence satisfying a homogeneous linear recurrence with characteristic polynomial $ P(x)$ of degree $ n$, and suppose that $ a$ is a known root of $ P(x)$. Then the coefficient of $ a^k$ (which is in general a polynomial in $ k$) in the closed form of $ s_k$ is a rational function of the initial conditions and $ a$ (and in particular does not depend on the nature of the rest of the roots of $ P(x)$).

I mention this largely because it violated my intuition: I had previously thought that it wasn't possible to find the coefficients of the closed form of a given recurrence without computing every root of the characteristic polynomial.

Despite the manner in which we first arrived at this statement, it can be proven in a very straightforward manner. We'll suppose for brevity that $ a$ has multiplicity $ 1$. Write $ P(x) = (x - a) Q(x)$ and write $ s_k = ca^k + t_k$. Then $ t_n$ satisfies the (shorter) recurrence with characteristic polynomial $ Q(x)$ rather than $ P(x)$. Given, therefore, the initial conditions

$ s_0 = c + t_0$
$ s_1 = ca + t_1$
...
$ s_{n - 1} = ca^{n - 1} + t_{n - 1}$

along with the expression of $ t_{n - 1}$ in terms of $ t_0, t_1, ... t_{n - 2}$ provides a system of $ n + 1$ equations in $ n + 1$ variables which we can solve. In particular, $ Q(x)$ has some coefficients $ q_1, q_2, ... q_n$ such that

$ t_{n - 1} = q_1 t_{n - 2} + q_2 t_{n - 3} + ... + q_{n - 1} t_0$

hence such that (adding appropriate multiples of the equations above)

$ s_{n - 1} - q_1 s_{n - 2} - ... = c Q(a) \implies$
$ c = \frac { s_{n - 1} - q_1 s_{n - 2} - ... }{Q(a)}$.

The numerator can be understood as "quotienting out" the initial values by the roots of $ Q(x)$ whereas the denominator can be understood either as a Lagrange interpolation coefficient or as $ P'(a)$.

Clearly there is an interesting connection between the Lagrange interpolation problem and the problem of figuring out the coefficients of the closed form of a linear recurrence. Well, perhaps it's not too interesting. Approaching the problem "forwardly" (that is, using the most basic tools first), let's set up a system. If the roots of $ P(x)$ are given by $ r_1, r_2, ... r_n$ (forgive me for switching around notation all the time), then the problem of determining the coefficients in

$ s_k = c_1 r_1^k + c_2 r_2^k + ... + c_n r_n^k$

(where again I have assumed that all roots occur with multiplicity $ 1$) is simply the problem of solving the system

$ \left[ \begin{array}{cccc} 1 & 1 & \hdots & 1 \\
r_1 & r_2 & \hdots & r_n \\
r_1^2 & r_2^2 & \hdots & r_n^2 \\
\vdots & \vdots & \ddots & \vdots \\
r_1^{n - 1} & r_2^{n - 1} & \hdots & r_n^{n - 1} \\
\end{array} \right] \left[ \begin{array}{c} c_1 \\
c_2 \\
c_3 \\
\vdots \\
c_n \end{array} \right] = \left[ \begin{array}{c} s_0 \\
s_1 \\
s_2 \\
\vdots \\
s_{n - 1} \end{array} \right]$

and what is the matrix that should appear but the transpose of the Vandermonde matrix of the roots! We have already seen that the finding-coefficients problem and the Lagrange interpolation problem are in some sense adjoint to each other in the special case of the DFT problem; how should we generalize?

One interpretation uses the partial fractions approach. In the Lagrange interpolation problem we are given the values of a polynomial $ P(x)$ at some points; these values determine the coefficients of the partial fraction decomposition of a certain rational function with $ P(x)$, from which the coefficients of $ P(x)$ can be computed. In the linear recurrence problem we are given the initial values of a recurrence $ s_n$; these values determine the coefficients of the numerator of a certain rational function, from which the coefficients of its partial fraction decomposition can be computed.


Practice Problem 1: Generalize the conclusion of the solution to Problem 1. That is, given a die with probability distribution $ q_1, q_2, ...$ (where $ q_k$ describes the probability of rolling a $ k$) such that $ q_k \ge 0$ and $ \sum q_k = 1$, what is the limiting behavior of the probability that partial sums of repeated rolls hits a value of $ n$? (In particular, when does this sequence converge? The answer is not "always.") This is equivalent to the problem of finding the outcome of a sequence of repeated weighted averages

$ p_n = q_1 p_{n - 1} + q_2 p_{n - 2} + ...$

when the number of faces of the die is finite. (The first thing you should do is generalize the verification of condition 1) and examine the cases in which it fails.)

Practice Problem 2: Interpret the $ n \times n$ Jordan block with eigenvalue $ 1$ as an adjacency matrix and compute its powers by a counting argument. How does this prove 1) the binomial theorem and 2) the balls-and-urns formula? Compare with the ordinary generating functions proof.

Practice Problem 3: In keeping with our observation that for special cases the ordinary generating function is easier to compute than the initial conditions, verify that the identity $ \frac {P'(x)}{P(x)} = \sum_{P(a) = 0} \frac {1}{x - a}$ is equivalent to Newton's sums.

(Can you find a proof of Newton's sums by computing the trace of the $ n^{th}$ power of the companion matrix of $ P$? in a clever way? I can't. :))

Practice Problem 4: One way to state the fundamental theorem of symmetric polynomials is that any function of the roots of an irreducible polynomial $ P(x)$ invariant under any permutation of the roots is a function of its coefficients. The discriminant is an example of such a function. Prove this fact by taking tensor powers of the companion matrix of $ P(x)$. (I don't actually know if this works. :) )

Comment

5 Comments

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
I believe taking the reciprocal of the expected value in the initial problem always works. I am trying to prove it. I would assume it is quite a trivial result, but I have yet to find a particularly elegant proof, as my methodology has comprised solely of using vectors in the complex plane, and I arrive at a quite nasty result which I am unable to simplify.

by SicilianFury, Jan 13, 2009, 10:04 PM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
That might be true in the case that the limit exists, but consider a die with faces 2, 2, 4, 4, 6, 6. Then $ p_n = 0$ whenever n is odd.

In a further generalization of Problem 1, take the initial conditions to be arbitrary instead of the first few probabilities.

by t0rajir0u, Jan 14, 2009, 1:56 AM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
In your above case, as n goes to infinity does the parity matter? If so, the limit would not exist.

by SicilianFury, Jan 14, 2009, 11:14 PM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
Yes. Limits have to be the same no matter what "route" you take to the answer - in this case, whether you go through the even numbers or the odd numbers (or any other subsequence).

It turns out that in the finite case, -arity considerations (for example, if every face of the die is divisible by $ 3$) are the only obstructions to the limit existing. This can be stated in the language of the rest of the post as follows: if

$ P(x) = x^n - u_1 x^{n-1} - u_2 x^{n-2} - ... - u_n$

is a polynomial such that $ u_i \ge 0$ and $ \sum u_i = 1$, then

i) $ P(1) = 0$ and every other root of $ P(x)$ has absolute value at most $ 1$,

ii) If $ |z| = 1$ such that $ P(z) = 0$, then $ z$ is a root of unity.

(The convex combination argument I gave earlier works nicely here. It's also possible, although I think a lot less trivial, to prove this result using the Perron-Frobenius theorem.) I don't know what happens in the infinite case.

by t0rajir0u, Jan 15, 2009, 1:45 AM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
hey can you show how to do #3?

by bigman, May 27, 2009, 3:25 AM

Various mathematical thoughts, ranging from problem-solving techniques to attempts to tie together related concepts.

avatar

t0rajir0u
Archives
+ March 2009
+ October 2007
+ May 2007
Shouts
Submit
  • orz $~~~~$

    by clarkculus, Jan 10, 2025, 4:13 PM

  • Insanely auraful

    by centslordm, Jan 1, 2025, 11:17 PM

  • Fly High :(

    by Siddharthmaybe, Oct 22, 2024, 8:34 PM

  • Dang it he is gone :(( /revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive.

    by 799786, Aug 4, 2022, 1:56 PM

  • annoying precision

    by centslordm, May 16, 2021, 7:34 PM

  • rip t0rajir0u

    by OlympusHero, Dec 5, 2020, 9:29 PM

  • Shoutbox bump xD

    by DuoDuoling0, Oct 4, 2020, 2:25 AM

  • dang hes gone :(

    by OlympusHero, Jul 28, 2020, 3:52 AM

  • First shout in July

    by smartguy888, Jul 20, 2020, 3:08 PM

  • https://artofproblemsolving.com/community/c2448

    has more.

    -πφ

    by piphi, Jun 12, 2020, 8:20 PM

  • wait hold up 310,000
    people visited this man?!?!??

    by srisainandan6, May 29, 2020, 5:16 PM

  • first shout in 2020

    by OlympusHero, Apr 4, 2020, 1:15 AM

  • in his latest post he says he moved to wordpress

    by MelonGirl, Nov 16, 2019, 2:43 AM

  • Please revive!

    by AopsUser101, Oct 30, 2019, 7:10 PM

  • first shout in october fj9odiais

    by bulbasaur., Oct 14, 2019, 1:14 AM

128 shouts
Tags
About Owner
  • Posts: 12167
  • Joined: Nov 20, 2005
Blog Stats
  • Blog created: Dec 5, 2006
  • Total entries: 48
  • Total visits: 321829
  • Total comments: 202
Search Blog
a