Polynomial ring

Given a ring $R$, the polynomial ring $R[x]$ is, informally, "the ring of all polynomials in a commutative $x$ with coefficients in $R$." That is, it is the ring of all sums of the form \[\sum_{k=0}^N a_k x^k\] where $N$ is a nonnegative integer that varies from sum to sum.

The ring $R[x]$ is also an $R$-module.

Formal Definition

We can rigorously define $R[x]$ to be the set of all sequences of elements of $R$ with only finitely many terms nonzero: \[R[x] = \{(a_0,a_1,a_2,\ldots)|\text{the set }\{i|a_i\neq 0\} \text{ is finite }\}\] The we call the elements of $R[x]$ polynomials (over $R$). For a polynomial $p=(a_0,a_1,a_2,\ldots)$, the terms $a_0,a_1,a_2,\ldots$ are called the coefficients of $p$.

For example, $(0,0,0,\ldots), (0,1,0,0,\ldots), (1,4,0,3,0,0,\ldots)$ are polynomials, but $(1,1,1,1,\ldots)$ is not a polynomial.

At this point, our formal definition of a polynomial may seem unrelated to our intuitive notion of a polynomial. To relate these two concepts, we introduce the some notation.

We denote the polynomial $(a_0,a_1,a_2,\ldots)$ by $a_0+a_1x+a_2x^2+\cdots$. For instance, we write: \begin{align*} (0,0,0,\ldots) &= 0+0x+0x^2+\cdots\\ (0,1,0,0,\ldots) &= 0+1x+0x^2+0x^3+\cdots\\ (1,4,0,3,0,0,\ldots) &= 1+4x+0x^2+3x^3+0x^4+0x^5+\cdots \end{align*} Typically, we repress the terms with coefficient $0$ and we do not write the coefficient on terms with coefficient $1$. We also do not care about the order in which the terms are written, and indeed often list them in descending order of power. So we would write: \begin{align*} (0,0,0,\ldots) &= 0\\ (0,1,0,0,\ldots) &= x\\ (1,4,0,3,0,0,\ldots) &= 3x^3+4x+1 \end{align*}

We can now define addition and multiplication in $R[x]$ in the canonical way: \[\sum_i a_ix^i + \sum_i b_ix^i = \sum_i (a_i+b_i)x^i;\] \[\biggl(\sum_i a_ix^i\biggr)\cdot \biggl(\sum_j b_jx^j\biggr) = \sum_k\biggl(\sum_{i=0}^k a_ib_{k-i}\biggr)x^k\] It is now a simple matter to verify that $R[x]$ indeed constitutes a ring under these operations, and that it is commutative when $R$ is commutative. This ring has additive identity $0=(0,0,0,\ldots)$ and multiplicative identity $1 = (1,0,0,\ldots)$.

The ring $R$ can be thought of as a subring of $R[x]$ via the embedding $r\mapsto (r,0,0,\ldots)$.

For a nonzero polynomial $p = (a_0, a_1, a_2, \dotsc)$, the greatest integer $N$ such that $a_N \neq 0$ is called the degree of $p$. It is often denoted $\deg(p)$. By convention, the degree of the zero polynomial (i.e., of the polynomial $0 = (0,0,0,\dotsc)$) is either undefined, or $-1$, or $-\infty$ depending on the author.

Polynomials and Functions

Polynomials are not functions. The symbol $x$ does not represent a variable (in the usual sense of this word), but rather a commutative indeterminate, that is, a formal symbol that commutes with the elements of $R$ and whose powers are independent of each other over $R$. However, polynomials are associated with functions, called polynomial functions. This is a historically important association: originally, the two concepts were almost inseperable. Indeed, polynomial functions were almost certainly the first functions studied. The concept of "function" was not articulated until the 12th to 14th centuries. By Euler's time, "functions" were explicit rules of association built from elementary expressions, though Euler himself generalized the concept to what we now call continuous functions. This began a long debate over how "function" should be defined that did not resolve until the 20th century, when the modern, abstract definition of "function" became standard. (Although the classical concept of a function as a "deterministic rule to compute an output based on an input" has survived in constructive mathematics and functional programming!) The history of the concept of polynomial is more obscure, but they were almost certainly not divorced from their function roots until the beginnings of modern algebra in the 19th century.

Specifically, each element in $p \in R[x]$ is associated with a function mapping $R$ into itself; this function is evaluated at a value $a \in R$ by replacing the symbol $x$ with the element $a$ in $p$.

More, formally, we can prove by induction on the degree of the elements of $R$ that for any $a\in R$ and any $p \in R[x]$, there is a unique element of $R$ that is equivalent to $p$ modulo $(x-a)$. This unique element is sometimes denoted $p(a)$. Thus we may associate each element $p \in R$ with the mapping $a \mapsto p(a)$ of $R$ into itself. (Alternatively, we can associate with each element $a\in R$ a homomorphism of $R[x]$ into $R$ that is the composition of the canonical homomorphism of $R[x]$ into $R[x]/(x-a)$ and the canonical homomorphism of $R[x]/(x-a)$ into $R$.)

It is important to note that although each polynomial in $p\in R$ is associated with a function mapping $R$ into itself, it is not always possible to uniquely reconstruct $p$ from this function. In particular, if $R$ is finite, then the set of functions mapping $R$ into itself is finite, whereas $R[x]$ is infinite (unless $R$ is a trivial ring), so some functions must be associated with infinitely many different polynomials. (In fact, it follows from the theory of cosets, applied to the additive groups involved, that every function that is associated with a polynomial must be associated with infinitely many polynomials.)

For example, if $R$ is the ring of integers modulo $p$, for $p$ a prime, then Fermat's Little Theorem states that the polynomials $x^p$ and $x$ are associated with the same functions mapping $R$ into itself.

Nevertheless, in many infinite rings (such as the ring of integers), this association of polynomials with functions is unique. In such contexts, the polynomials are often identified with their functions, by abuse of language. The association of polynomials with functions is an important one: polynomials were first studied as polynomial functions, and indeed it was not until recently that functions gained their modern definition, quite divorced from polynomials.

There is yet another reason why polynomials should not be regarded as the functions they represent. Namely, if $R$ is a commutative ring, then a polynomial $p \in R[x]$ can be evaluated not only at an element of $R$, but also at an element $a$ of any $R$-algebra $A$ (by replacing every appearance of $x$ by $a$ in $p$). For instance, $p$ can be applied to any square matrix with entries in $R$, or to any other polynomial over $R$ (this results in the composite of the two polynomials), or to a linear operator on an $R$-module. This is in contrast to actual functions, which come with a predetermined domain and are not defined outside of it. Notice that the commonly used notation $p(x)$ for a polynomial $p \in R[x]$ is a particular case of evaluation: When one evaluates a polynomial $p \in R[x]$ at the indeterminate $x$, one gets $p$ back (because replacing all $x$'s by $x$ in $p$ does not change the polynomial). Thus, $p(x) = p$ (and not just by convention).

There is a yet more general context in which polynomials are defined. Namely, one can define $R[x]$ whenever $R$ is an additive abelian group (not necessarily a ring). Then $R[x]$ will be just an additive group, not a ring; nevertheless such constructions are occasionally of use (see, e.g., the definition of a [loop algebra]). The usefulness mainly stems from the fact that if $R$ is an $S$-module for some commutative ring $S$, then $R[x]$ becomes an $S[x]$-module. Here, again, trying to regard polynomials as functions leaves one hopelessly lost.

The above discussion was concerned with univariate polynomials (i.e., polynomials in one variable). One can define polynomials in multiple (even infinitely many) variables. The definition is similar to the above, although it requires some more technical bookkeeping, at least if one wants to keep track of variable names. There is a way to avoid some of the technicalities for finitely many variables by an inductive construction (i.e., defining a polynomial in $n+1$ variables to be a polynomial in $1$ variable over a polynomial ring in $n$ variables), although here again some care has to be taken (when you define a polynomial in two variables, you do not want to call both of them $x$).

Finitude of Degree

"Polynomials of infinite degree" are properly called formal power series. The set of formal power series over a ring $R$ constitutes a ring, denoted $R[[x]]$, of which the ring of polynomials is a subring. In general, formal power series are not associated with mappings of $R$ into itself, as infinitely iterated addition is not generally well-defined unless the sum converges.

Differential operators

Given a commutative ring $R$, one can define an $R$-linear map $\partial : R[x] \to R[x]$ as follows: \[\partial \left(a_0 + a_1 x + a_2 x^2 + \cdots\right) = 1 a_1 + 2 a_2 x^2 + 3 a_3 x^3 + \cdots .\] This operator $\partial$ is called the (formal) derivative (with respect to $x$), and behaves much like the derivative of a function in analysis, and in fact commutes with the natural map from polynomials to polynomial functions (although, once again, polynomials are not per-se functions). For example, the Leibniz rule $\partial \left(fg\right) = f \cdot \partial g + \left(\partial f\right) \cdot g$ and the chain rule $\partial\left(f\left(g\right)\right) = \left(\partial f\right)\left(g\right) \cdot \left(\partial g\right)$ hold for any two polynomials $f$ and $g$. Unlike the derivative in analysis, the formal derivative does not rely on any limits or topology (in particular, $R$ can be any commutative ring, not necessarily $\mathbb{R}$ or $\mathbb{C}$), although it does have a property mimicking the "difference quotient" definition of the analytic derivative: If $f \in R[x]$, then the polynomial $f(x+y)-f(x) \in R[x,y]$ is divisible by $y$, and evaluating the quotient $\dfrac{f(x+y)-f(x)}{y}$ at $y = 0$ (that is, substituting $0$ for $y$) yields precisely $\partial f$. (The second variable $y$ here is the analogue of the infamous $\varepsilon$ in analysis, but here we need not take any limits.)

See also