Difference between revisions of "Chain Rule"

Revision as of 13:27, 21 June 2006

Statement

Basically, the Chain Rule says that if $h(x) = f(g(x))$ , then $h'(x)=f'(g(x))g'(x)$ .

For example, if $f(x)=\sin{x}$ , $g(x)=x^2$ , and $h(x)=f(g(x))=\sin{(x^2)}$ , then $h'(x) = \cos{(x^2)}\cdot(2x)$ .

Here are some more precise statements for the single-variable and multi-variable cases.

Single variable Chain Rule:

Let each of $I \subset \mathbb{R}, J \subset \mathbb{R}$ be an open interval, and suppose $g:I \to J$ and $f:J \to \mathbb{R}$ . Let $h:I \to \mathbb{R}$ such that $h(x) = f(g(x)) \forall x \in I$ . If $x_0 \in I$ , $g$ is differentiable at ${x_0}$ , and ${f}$ is differentiable at $g(x_0),$ then ${h}$ is differentiable at ${x_0}$ , and ${h'(x_0) = f'(g(x_0))\cdot g'(x_0)}$ .

Multi-dimensional Chain Rule:

Let $g:\mathbb{R}^n \to \mathbb{R}^m$ and $f:\mathbb{R}^m \to \mathbb{R}^p$ . (Here each of $n$ , $m$ , and ${p}$ is a positive integer.) Let ${h}: \mathbb{R}^n \to \mathbb{R}^p$ such that $h(x) = f(g(x)) \forall x \in \mathbb{R}^n$ . Let $x_0 \in \mathbb{R}^n$ . If $g$ is differentiable at ${x_0}$ , and ${f}$ is differentiable at $g(x_0),$ then $h$ is differentiable at ${x_0}$ and $h'(x_0) = f'(g(x_0))\cdot g'(x_0)$ . (Here, each of $h'(x_0)$ , $f'(g(x_0))$ , and $g'(x_0)$ is a matrix.)

Intuitive Explanation

The single-variable Chain Rule is often explained by pointing out that

$\frac{f(g(x+\Delta x)) - f(g(x))}{\Delta x} = \frac{f(g(x+\Delta x)) - f(g(x))}{g(x+ \Delta x)-g(x)}\cdot \frac{g(x+ \Delta x)-g(x)}{\Delta x}$ .

The first term on the right approaches $f'(g(x))$ , and the second term on the right approaches $g'(x)$ , as $\Delta x$ approaches $0$ . This can be made into a rigorous proof. (But we do have to worry about the possibility that $g(x+\Delta x) - g(x)=0$ , in which case we would be dividing by $0$ .)

This explanation of the chain rule fails in the multi-dimensional case, because in the multi-dimensional case $\Delta x$ is a vector, as is $g(x+\Delta x) - g(x)$ , and we can't divide by a vector.

However, there's another way to look at it.

Suppose a function $F$ is differentiable at $x$ , and $\Delta x$ is "small". Question: How much does $F$ change when its input changes from $x$ to $x+ \Delta x$ ? (In other words, what is $F(x+ \Delta x) - F(x)$ ?) Answer: approximately $F'(x) \cdot \Delta x$ . This is true in the multi-dimensional case as well as in the single-variable case.

Well, suppose that (as above) $h(x) = f(g(x))$ , and $\Delta x$ is "small", and someone asks you how much $h$ changes when its input changes from $x$ to $x+ \Delta x$ . That is the same as asking how much $f$ changes when its input changes from $g(x)$ to $g(x+ \Delta x)$ , which is the same as asking how much $f$ changes when its input changes from $g(x)$ to $g(x) + \Delta g$ , where $\Delta g = g(x+ \Delta x) - g(x)$ . And what is the answer to this question? The answer is: approximately, $f'(g(x)) \cdot \Delta g$ .

But what is $\Delta g$ ? In other words, how much does $g$ change when its input changes from $x$ to $x+ \Delta x$ ? Answer: approximately $g'(x) \cdot \Delta x$ .

Therefore, the amount that $h$ changes when its input changes from $x$ to $x+ \Delta x$ is approximately ${f'(g(x)) \cdot g'(x) \cdot \Delta x}$ .

We know that $h'(x)$ is supposed to be a matrix (or number, in the single-variable case) such that $h'(x) \cdot \Delta x$ is a good approximation to $h(x+ \Delta x) - h(x)$ . Thus, it seems that $f'(g(x)) \cdot g'(x)$ is a good candidate for being the matrix (or number) that $h'(x)$ is supposed to be.

This can be made into a rigorous proof. The standard proof of the multi-dimensional chain rule can be thought of in this way.

Revision as of 12:08, 21 June 2006 (view source) DVO (talk \| contribs) ← Older edit		Revision as of 13:27, 21 June 2006 (view source) Inscrutableroot (talk \| contribs) m (proofreading) Newer edit →
Line 7:		Line 7:


−	Here are some more precise statements, for the single-variable and multi-variable cases.	+	Here are some more precise statements for the single-variable and multi-variable cases.


Line 42:		Line 42:


−	Well, suppose that (as above) <math>h(x) = f(g(x))</math>, and <math>\Delta x</math> is "small", and someone asks you how much <math>h</math> changes when its input changes from <math>x</math> to <math>x+ \Delta x</math>. That is the ''same'' as asking how much <math>f</math> changes when its input changes from <math>g(x)</math> to <math>g(x+ \Delta x)</math>. ~~Which~~ is the same as asking how much <math>f</math> changes when its input changes from <math>g(x)</math> to <math>g(x) + \Delta g</math>, where <math>\Delta g = g(x+ \Delta x) - g(x)</math>. And what is the answer to this question? The answer is: approximately, <math>f'(g(x)) \cdot \Delta g</math>.	+	Well, suppose that (as above) <math>h(x) = f(g(x))</math>, and <math>\Delta x</math> is "small", and someone asks you how much <math>h</math> changes when its input changes from <math>x</math> to <math>x+ \Delta x</math>. That is the ''same'' as asking how much <math>f</math> changes when its input changes from <math>g(x)</math> to <math>g(x+ \Delta x)</math>, which is the same as asking how much <math>f</math> changes when its input changes from <math>g(x)</math> to <math>g(x) + \Delta g</math>, where <math>\Delta g = g(x+ \Delta x) - g(x)</math>. And what is the answer to this question? The answer is: approximately, <math>f'(g(x)) \cdot \Delta g</math>.

Page

Toolbox

Search

Difference between revisions of "Chain Rule"

Revision as of 13:27, 21 June 2006

Statement

Intuitive Explanation