Structure-preserving maps, Part II: Vector spaces
by t0rajir0u, Jun 30, 2008, 7:30 AM
The theory of vector spaces (i.e. linear algebra) has a well-known motivation in high school calculations: it derives initially from the need to solve systems of linear equations. In order to give perspective on vector spaces, let me therefore approach linear algebra from the opposite extreme of abstraction. Our starting point will be the notion of a finitely-generated abelian (commutative) group.
Note that many groups are not finitely generated: the generators of
, for example, are the primes. On the other hand, cyclic groups are generated by a single element. A finitely-generated abelian group with generators
(written in additive notation) has the property that every element can be written in the form

where
is an integer and
denotes repeated addition (or, if
is negative, subtraction). Note that this representation need not be unique; if it is, such a group is called free abelian. The interesting thing we have done here is introduce a notation:
and
are variables being multiplied but they do not belong to the same set;
is an integer and
is a member of some group, but this operation has some meaning in terms of the group.
Here we have the notion of a module. The elements of a module
behave like a group (and we'll denote the group operation by addition), but in addition there is a second set of scalars
such that "multiplication" of a member of
by a scalar gives another member of
.
itself has a structure; here it is a ring, but we won't worry about the general case; for our purposes
will be a simple ring like the integers, which just means that you can both add and multiply.
When
also has the property that you can divide by anything that isn't zero (in other words,
is a field), the object you end up getting is a wonderful, wonderful thing called a vector space.
That was a pretty tortured chain of implication. Let's recap. A vector space requires specifying two sets: an abelian group
of vectors (additive notation) and a field
of scalars. (In the discussions that follow
will usually either be
(in which case
is called a real vector space) or
(in which case
is called a complex vector space). In addition to vector addition, a scalar multiplication
is required that satisfies several axioms which boil down to agreeing with your intuitive notion of scalar multiplication (stretching vectors and all that).
With that out of the way, let's talk about what we mean by structure-preserving maps between vector spaces. A vector space homomorphism (or linear map, or linear operator, or linear transformation) has the property of "linearity"; in other words, it is a group homomorphism with respect to vector addition and is compatible with scalar multiplication. Two vector spaces are isomorphic if a bijective linear map exists between them. Then we have vector space endomorphisms and automorphisms.
That's a lot of words to digest, but before I work through concrete examples as in the last post I'd like to throw a few more at you. The notion of generators is much stronger in a vector space than in a group (since most groups are neither finitely-generated nor free abelian). A set of vectors
such that every element of a vector space
can be written as a linear combination
where
are scalars is called a basis, and the number
is called the dimension of the vector spaces, which is (surprisingly enough) unique.
It follows that a given vector
can be uniquely represented by a
-tuple of scalars
; in other words, over a field of scalars
, a
-dimensional vector space is isomorphic to
. (When
, we get Euclidean space.) Elementary linear algebra is essentially the study of linear transformations between finite-dimensional real (or complex) vector spaces.
A linear transformation between a vector space
of dimension
and a vector space
of dimension
can be represented in the following way: since a linear transformation
respects addition and scalar multiplication,
; in other words, the effect of
on
is uniquely described by the effect of
on the basis vectors of
. The vectors
can be described in terms of the
basis vectors of
, which are given by
scalars per vector.
is therefore uniquely described by
components; in other words, by a matrix.
Composition of linear transformations is the most natural way to motivate matrix multiplication and leads to some extremely rich ideas. The set of
real (or complex) matrices (equivalently, the set of linear transformations
, or the set of linear operators) form a ring and the set of
invertible matrices (equivalently, the set of invertible linear transformations) form a group called the general linear group. The study of representing a group as a group of matrices, i.e. group representation theory, could take up the rest of this article, but I have other things to talk about.
When I talked about group homomorphisms, I centered discussion on isomorphisms to show that groups that appeared to be different were in fact the same. When studying linear transformations, however, the invertible ones form such a small subset of the rich set of linear transformations that it makes more sense to consider the linear transformations themselves, rather than the vector spaces, as the underlying object of study.
Interesting things start here.
We begin with the notion of a linear recurrence describing a sequence
in terms of some coefficients
.
Such a recurrence is called a
-order homogeneous recurrence. Note that the set of sequences
satisfying such a recurrence is a vector space. Define the shift operator
to take the sequence
to the sequence
. Recall that we can define sums of linear operators and compositions, so let
denote the shift operator applied
times. Rewrite the recurrence as

and consider the polynomial
. This polynomial is called the characteristic polynomial of the recurrence; the underlying reason why is that we can write the recurrence as
.
The above requires some explanation of notation.
refers to the sum of linear operators
; in other words, first we shift the sequence
times, then we shift the sequence
times and subtract
times that from what we had before, then... and so forth.
refers to a sequence (not indexed by
!), that is, a vector, that is the result of the linear operator
on the sequence (vector)
. Note that here
refers to the zero sequence
.
Phew! If it helps, think of a sequence as a vector with an infinite number of components; in other words, a function (on a countable set).
To wit: we have rephrased the problem of finding a sequence that satisfies a homogeneous linear recurrence as the problem of finding a sequence that is taken by a particular linear operator to zero, in other words, that is in the nullspace (or kernel) of
. Recall that the kernel of a group homomorphism was itself a group; here we see, analogously, that the nullspace of a linear transformation is itself a vector space. Now what?
Well, when studying a linear operator we frequently study its eigenvectors and eigenvalues. Let's start with
before studying
. The eigenvectors of
are the sequences
that are multiples of themselves when they are shifted; in other words,
, where
is the eigenvalue. But this is obviously a geometric sequence with common ratio
!
This is good news (especially if you already know the answer to this particular problem). Recall that an eigenvector
with eigenvalue
(a scalar) satisfies
, hence
(remember,
is a vector here, which is a sequence). It follows that

for any eigenvector (that is, geometric sequence)
. In the case that
has distinct roots
(which is the only case I want to tackle for now), the geometric sequences
with common ratios
are in the nullspace of
because
.
There are
of these vectors and they are clearly independent. On the other hand, the vector space of sequences that satisfy a
-order homogeneous linear recurrence is clearly of dimension
because it is spanned by the sequences
(where all of the first
terms are zero except one and the rest are determined by the recurrence); in other words,
has nullity
, so we have described the nullspace completely. In other words, we have the following
Proposition. The set of sequences satisfying a
-order homogeneous linear recurrence with characteristic polynomial
with distinct roots
is given by
.
The characteristic-equation method is one of the most general methods for proving (generalizations of) Binet's formula. Note that the restriction of
to the nullspace of
gives a linear operator whose matrix has characteristic polynomial
; hence in an important sense the two notions of characteristic polynomial (of matrices on the one hand and linear recurrences on the other) are the same!
Excursion: Find an "algebraic" (that is, not involving floor functions or fractional parts) equation for a sequence of numbers with period
, i.e.
for all
.
Solution 1. Let
denote a primitive solution to
. Then
for some numbers
. To find them, we solve the system


...
.
Using the fact that
when
and
otherwise, verify that
.
If we accept that the characteristic equation method is valid, this argument is fairly straightforward.
Solution 2. Introduce an inner product on two sequences
.
(Note the analogue to integration.) The basis vectors
can therefore be verified to have norm
and to be orthogonal, and therefore form an orthonormal basis. The expression above that gives
in terms of
is therefore a consequence of the expression of a vector in an orthonormal basis.
The operation we are doing above is important enough to have a special name: it is called the discrete Fourier transform, which is itself a linear transformation. It takes a sequence with period
(that is,
numbers) to
coefficients decomposing the sequence into waves, and is (in my opinion) the most general approach to the problem I gave in my previous post.
There are two other rather general approaches to proving (generalizations of) Binet's formula. One is by considering the ordinary generating function
, and the other is by considering the exponential generating function
. These approaches can be unified. The one thing they all share in common is a shift operator
.
The shift operator was an operator on sequences. The operator that assigns a generating function (of either type) to a sequence is itself a linear operator - it takes the "sequence space" to a function space - and it translates the shift operator into a different operator. On ordinary generating functions, the shift operator takes
to
, and on exponential generating functions, the shift operator takes
to
(recall that differentiation is a linear operator)!
This is very deep. The problem of finding the nullspace of
translates (with ordinary generating functions) to the problem of solving for
by multiplying out the
's (in other words, the shift operator becomes very simple) and then computing the Taylor series of the result, and (with exponential generating functions) to a (linear homogeneous ordinary) differential equation.
Those of you who are familiar with differential equations might recall the "characteristic equations" technique for solving linear homogeneous ODEs, which is in fact exactly the above technique for solving linear homogeneous recurrences, translated to an exponential generating function. So in fact these three approaches boil down to different ways of realizing the shift operator
, and the three notions of characteristic equation (of a recurrence, of a linear ODE, of a linear operator) boil down to different characteristic equations of restrictions of
to certain spaces.
If you're not sure what I'm talking about, let's prove Binet's formula using the above techniques.
Proof 1. With
as our vector we have the recurrence



.
Now,
and
, so
.
We then find the Taylor series of
using partial fractions. Let
be the roots of
as usual. Then
$ \begin{eqnarray*} F(x) & = & \frac {x}{(1 - \phi x)(1 - \varphi x)} \\ & = & \frac {1}{\phi - \varphi} \left( \frac {1}{1 - \phi x} - \frac {1}{1 - \varphi x} \right) \\ & = & \frac {1}{\phi - \varphi} \left( \sum_{k = 0}^{\infty} (\phi^k - \varphi^k) x^k \right) \\ \end{eqnarray*}$
which is exactly the answer we're looking for as usual.
(I gave this proof some time ago in a thread on AoPS if it looks familiar. It's a standard exercise in generating functions.) Just as the eigenvectors of the shift operator on sequences were geometric sequences, here the eigenvectors are the ordinary generating functions of geometric sequences; in other words, the functions
. The value of this approach is that
can be found algebraically and can be naturally decomposed into eigenvectors of the shift operator.
Proof 2. With
as our vector we have

.
Several techniques for solving such differential equations are available; the standard method is that the roots to the characteristic equation
describe the solutions

where the coefficients
can be solved for using the initial conditions (here
). Again, note that the eigenvectors of the differentiation operator are the exponential generating functions of geometric sequences; in other words, the functions
. The proof of the characteristic equation technique is essentially the proof for linear recurrences I outlined earlier.
Another common technique for solving a second-order linear differential equation is to introduce a second function
. We then have the system of differential equations

.
Such systems are usually solved by writing them in matrix notation. Letting
, we have
.
The matrix
that appears here should be familiar to you as what I called the Fibonacci operator. The solution to the above system is given by

where
is the matrix exponential. Note that the coefficients of the matrix exponential here are exactly the Fibonacci numbers (because that's how we defined
), which lets us prove the lemma about the powers of
without any computation. Marvelous!
(Binet's formula can be proven from the powers of
alone by eigendecomposition, which is the more general way to phrase my argument in my old post.)
There is another rather ingenious way to prove that the powers of
give the Fibonacci numbers. It begins with interpreting
as an adjacency matrix. An
adjacency matrix describes the location of (directed) edges in a graph on
vertices; if the graph is undirected, the matrix is symmetric. Since I introduced matrices in terms of linear transformations, an adjacency matrix is a linear operator on the vector space of functions on the vertices of a graph. One way of interpreting such an operator is that it takes some numbers on each vertex that represent the number of ways of moving to that vertex (along directed edges) and returns some numbers on each vertex that represent the number of ways of moving to that vertex in one more step.
Now,
is the adjacency matrix of a graph on two vertices
with an edge from
to itself and an undirected edge from
to
. Let's call this the "Fibonacci graph." It's well-known that the
power of an adjacency matrix give the number of paths of length
between edges.
The number of paths of length
from
to itself can be counted as follows: each path consists of two types of possible steps, one of which is to take the first edge from
to itself (a step of length one) and the other of which is to take the edge from
to
and back to
(a step of length two). But there is a famous counting proof that tells us that the number of paths of length
where we are allowed to take steps of length one or two is just
! (Briefly: let
be the number of paths of length
. The last step is of length one or two. The rest of the path is of length
or
, so
.)
The number of paths from
to
, from
to
, and from
to
can be counted in a very similar manner and give exactly the four components of the powers of
as seen before. Using eigendecomposition to find Binet's formula here is now a step towards understanding the structure of the Fibonacci graph; such techniques belong to the fascinating subject of spectral graph theory. Eigenvalues and eigenvectors of an adjacency matrix give information about the behavior of paths on a graph, which in turn give information about the graph itself.
The different proofs of Binet's formula can now be seen to all be essentially the same technique. Whether we are considering the eigenvectors of the shift operator or of
, it's linear transformations and "eigen"-reasoning that let us manipulate what was once a mysterious sequence into a totally clear sum of geometric sequences.
But we are still not done! If a linear operator takes sequences to ordinary generating functions and another one takes sequences to exponential generating functions, how do we translate directly from ordinary to exponential and vice versa? If you noticed how the shift operator related differentiation on the one hand to division by
on the other, you might already know the answer; it is (almost) given by the Laplace transform.
Specifically, the Laplace transform of an exponential generating function
is given by
.
Then we just change variables, letting
, and remove the extraneous factor of
and we have the ordinary generating function of
!
The Laplace transform is used to convert the solution of a linear ODE to the solution of an algebraic equation. If you're like me and took a differential equations course, it might seem unmotivated. But the intermediate step (where differentiation, the shift operator, and multiplying by
or dividing by
are all related) hopefully uncovers more of what's really going on here.
We therefore have the rather interesting table of correspondences.
$ \begin{tabular}{ | l | c | r | } Algebra & Linear recurrences & Linear ODEs \\ \hline (f(x) - f(0))/x & Shift operator & Differentiation \\ 1/(1 - \lambda x) & Geometric sequences & e^{\lambda x} \\ ??? & Adjacency matrix & Matrix exponential \\ \end{tabular}$.
Ordinary generating functions take the middle column to the left, exponential generating functions take the middle column to the right, and the Laplace transform and its inverse take the left and right columns to each other.
What's the missing entry? The third row describes different interpretations of
. We interpreted
as the adjacency matrix of a graph where its powers counted paths on a graph (that satisfy a second-order linear recurrence) and as the coefficient matrix of a system of differential equations where its powers form the matrix exponential that solves it.
Well, the left-column analogue to
is the matrix function
, where
is the
identity matrix. The determinant of this matrix is the Ihara zeta function of the Fibonacci graph, which dovetails nicely with the middle column, but it is here that I am out of my league.
Still, we have made a lot of progress: the different approaches to understanding the Fibonacci numbers have all come together nicely thanks to the power of linear transformations; we translate operations in one vector space to operations in another and structures that were before opaque (differential equations) became transparent (partial fractions).
Practice Problem 1: The operation that assigns a "Newton generating function" to a sequence
as
is a linear operator. The shift operator becomes the forward difference operator as described here and the eigenvectors are the sequences
. What happens when we try to use this technique to prove Binet's formula?
Practice Problem 1.5: Compute
. (This is a value of a Newton generating function. What is the corresponding sequence? What is its discrete Fourier transform?) This is the "roots of unity filter."
Practice Problem 2: The operation that assigns a "Dirichlet generating function" to a sequence
as
is a linear operator. Just as Laplace transformations take convolutions to products, this transformation takes Dirichlet convolutions to products. Using this knowledge, prove the Mobius inversion formula.
Practice Problem 3: Compute![$ \left[ \begin{array}{cccccc} 0 & 1 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & 1 & 0 \\
0 & 0 & 0 & 0 & 1 & 1 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
\end{array} \right]^2$](//latex.artofproblemsolving.com/6/1/8/618357501aa281ff1e69f2e0d3b713523a5ec364.png)
by interpreting it as an adjacency matrix. Does the resulting graph look familiar? Generalize.
Note that many groups are not finitely generated: the generators of



where







Here we have the notion of a module. The elements of a module






When


That was a pretty tortured chain of implication. Let's recap. A vector space requires specifying two sets: an abelian group








With that out of the way, let's talk about what we mean by structure-preserving maps between vector spaces. A vector space homomorphism (or linear map, or linear operator, or linear transformation) has the property of "linearity"; in other words, it is a group homomorphism with respect to vector addition and is compatible with scalar multiplication. Two vector spaces are isomorphic if a bijective linear map exists between them. Then we have vector space endomorphisms and automorphisms.
That's a lot of words to digest, but before I work through concrete examples as in the last post I'd like to throw a few more at you. The notion of generators is much stronger in a vector space than in a group (since most groups are neither finitely-generated nor free abelian). A set of vectors





It follows that a given vector







A linear transformation between a vector space
















Composition of linear transformations is the most natural way to motivate matrix multiplication and leads to some extremely rich ideas. The set of



When I talked about group homomorphisms, I centered discussion on isomorphisms to show that groups that appeared to be different were in fact the same. When studying linear transformations, however, the invertible ones form such a small subset of the rich set of linear transformations that it makes more sense to consider the linear transformations themselves, rather than the vector spaces, as the underlying object of study.
Interesting things start here.
We begin with the notion of a linear recurrence describing a sequence


Such a recurrence is called a








and consider the polynomial


The above requires some explanation of notation.











Phew! If it helps, think of a sequence as a vector with an infinite number of components; in other words, a function (on a countable set).
To wit: we have rephrased the problem of finding a sequence that satisfies a homogeneous linear recurrence as the problem of finding a sequence that is taken by a particular linear operator to zero, in other words, that is in the nullspace (or kernel) of

Well, when studying a linear operator we frequently study its eigenvectors and eigenvalues. Let's start with







This is good news (especially if you already know the answer to this particular problem). Recall that an eigenvector






for any eigenvector (that is, geometric sequence)







There are







Proposition. The set of sequences satisfying a




The characteristic-equation method is one of the most general methods for proving (generalizations of) Binet's formula. Note that the restriction of



Excursion: Find an "algebraic" (that is, not involving floor functions or fractional parts) equation for a sequence of numbers with period



Solution 1. Let






...

Using the fact that




If we accept that the characteristic equation method is valid, this argument is fairly straightforward.
Solution 2. Introduce an inner product on two sequences


(Note the analogue to integration.) The basis vectors




The operation we are doing above is important enough to have a special name: it is called the discrete Fourier transform, which is itself a linear transformation. It takes a sequence with period



There are two other rather general approaches to proving (generalizations of) Binet's formula. One is by considering the ordinary generating function



The shift operator was an operator on sequences. The operator that assigns a generating function (of either type) to a sequence is itself a linear operator - it takes the "sequence space" to a function space - and it translates the shift operator into a different operator. On ordinary generating functions, the shift operator takes




This is very deep. The problem of finding the nullspace of



Those of you who are familiar with differential equations might recall the "characteristic equations" technique for solving linear homogeneous ODEs, which is in fact exactly the above technique for solving linear homogeneous recurrences, translated to an exponential generating function. So in fact these three approaches boil down to different ways of realizing the shift operator


If you're not sure what I'm talking about, let's prove Binet's formula using the above techniques.
Proof 1. With





Now,



We then find the Taylor series of



$ \begin{eqnarray*} F(x) & = & \frac {x}{(1 - \phi x)(1 - \varphi x)} \\ & = & \frac {1}{\phi - \varphi} \left( \frac {1}{1 - \phi x} - \frac {1}{1 - \varphi x} \right) \\ & = & \frac {1}{\phi - \varphi} \left( \sum_{k = 0}^{\infty} (\phi^k - \varphi^k) x^k \right) \\ \end{eqnarray*}$
which is exactly the answer we're looking for as usual.



Proof 2. With



Several techniques for solving such differential equations are available; the standard method is that the roots to the characteristic equation


where the coefficients



Another common technique for solving a second-order linear differential equation is to introduce a second function



Such systems are usually solved by writing them in matrix notation. Letting
![$ \mathbf{P} = \left[ \begin{array}{c} H \\
G \end{array} \right]$](http://latex.artofproblemsolving.com/d/c/a/dcaa8084666ef111cba4bc40c7778feb9af8cd85.png)
![$ \mathbf{P}' = \left[ \begin{array}{cc} 1 & 1 \\
1 & 0 \end{array} \right] \mathbf{P}$](http://latex.artofproblemsolving.com/3/8/0/380bc14b83a99ee7e73fa4ebd6d4d33534892a94.png)
The matrix


where



(Binet's formula can be proven from the powers of

There is another rather ingenious way to prove that the powers of




Now,







The number of paths of length













The number of paths from







The different proofs of Binet's formula can now be seen to all be essentially the same technique. Whether we are considering the eigenvectors of the shift operator or of

But we are still not done! If a linear operator takes sequences to ordinary generating functions and another one takes sequences to exponential generating functions, how do we translate directly from ordinary to exponential and vice versa? If you noticed how the shift operator related differentiation on the one hand to division by

Specifically, the Laplace transform of an exponential generating function


Then we just change variables, letting



The Laplace transform is used to convert the solution of a linear ODE to the solution of an algebraic equation. If you're like me and took a differential equations course, it might seem unmotivated. But the intermediate step (where differentiation, the shift operator, and multiplying by


We therefore have the rather interesting table of correspondences.
$ \begin{tabular}{ | l | c | r | } Algebra & Linear recurrences & Linear ODEs \\ \hline (f(x) - f(0))/x & Shift operator & Differentiation \\ 1/(1 - \lambda x) & Geometric sequences & e^{\lambda x} \\ ??? & Adjacency matrix & Matrix exponential \\ \end{tabular}$.
Ordinary generating functions take the middle column to the left, exponential generating functions take the middle column to the right, and the Laplace transform and its inverse take the left and right columns to each other.
What's the missing entry? The third row describes different interpretations of


Well, the left-column analogue to




Still, we have made a lot of progress: the different approaches to understanding the Fibonacci numbers have all come together nicely thanks to the power of linear transformations; we translate operations in one vector space to operations in another and structures that were before opaque (differential equations) became transparent (partial fractions).
Practice Problem 1: The operation that assigns a "Newton generating function" to a sequence



Practice Problem 1.5: Compute

Practice Problem 2: The operation that assigns a "Dirichlet generating function" to a sequence


Practice Problem 3: Compute
![$ \left[ \begin{array}{cccccc} 0 & 1 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & 1 & 0 \\
0 & 0 & 0 & 0 & 1 & 1 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
\end{array} \right]^2$](http://latex.artofproblemsolving.com/6/1/8/618357501aa281ff1e69f2e0d3b713523a5ec364.png)
by interpreting it as an adjacency matrix. Does the resulting graph look familiar? Generalize.