Difference between revisions of "Chain Rule"
(added a proof of multi-variable chain rule) |
m (→Statement) |
||
Line 1: | Line 1: | ||
== Statement == | == Statement == | ||
− | + | The '''Chain Rule''' is a [[theorem]] of [[calculus]] which states that if <math>h(x) = f(g(x))</math>, then <math>h'(x)=f'(g(x))\cdot g'(x)</math>. | |
− | |||
Line 20: | Line 19: | ||
Let <math>g:\mathbb{R}^n \to \mathbb{R}^m</math> and <math>f:\mathbb{R}^m \to \mathbb{R}^p</math>. (Here each of <math>n</math>, <math>m</math>, and <math>{p}</math> is a positive integer.) Let <math>{h}: \mathbb{R}^n \to \mathbb{R}^p</math> such that <math>h(x) = f(g(x)) \forall x \in \mathbb{R}^n</math>. Let <math>x_0 \in \mathbb{R}^n</math>. If <math>g</math> is differentiable at <math>{x_0}</math>, and <math>{f}</math> is differentiable at <math>g(x_0),</math> then <math>h</math> is differentiable at <math>{x_0}</math> and <math>h'(x_0) = f'(g(x_0))\cdot g'(x_0)</math>. (Here, each of <math>h'(x_0)</math>,<math>f'(g(x_0))</math>, and <math>g'(x_0)</math> is a matrix.) | Let <math>g:\mathbb{R}^n \to \mathbb{R}^m</math> and <math>f:\mathbb{R}^m \to \mathbb{R}^p</math>. (Here each of <math>n</math>, <math>m</math>, and <math>{p}</math> is a positive integer.) Let <math>{h}: \mathbb{R}^n \to \mathbb{R}^p</math> such that <math>h(x) = f(g(x)) \forall x \in \mathbb{R}^n</math>. Let <math>x_0 \in \mathbb{R}^n</math>. If <math>g</math> is differentiable at <math>{x_0}</math>, and <math>{f}</math> is differentiable at <math>g(x_0),</math> then <math>h</math> is differentiable at <math>{x_0}</math> and <math>h'(x_0) = f'(g(x_0))\cdot g'(x_0)</math>. (Here, each of <math>h'(x_0)</math>,<math>f'(g(x_0))</math>, and <math>g'(x_0)</math> is a matrix.) | ||
− | |||
== Intuitive Explanation == | == Intuitive Explanation == |
Revision as of 20:50, 30 June 2006
Statement
The Chain Rule is a theorem of calculus which states that if , then .
For example, if , , and , then .
Here are some more precise statements for the single-variable and multi-variable cases.
Single variable Chain Rule:
Let each of be an open interval, and suppose and . Let such that . If , is differentiable at , and is differentiable at then is differentiable at , and .
Multi-dimensional Chain Rule:
Let and . (Here each of , , and is a positive integer.) Let such that . Let . If is differentiable at , and is differentiable at then is differentiable at and . (Here, each of ,, and is a matrix.)
Intuitive Explanation
The single-variable Chain Rule is often explained by pointing out that
.
The first term on the right approaches , and the second term on the right approaches , as approaches . This can be made into a rigorous proof. (But we do have to worry about the possibility that , in which case we would be dividing by .)
This explanation of the chain rule fails in the multi-dimensional case, because in the multi-dimensional case is a vector, as is , and we can't divide by a vector.
However, there's another way to look at it.
Suppose a function is differentiable at , and is "small". Question: How much does change when its input changes from to ? (In other words, what is ?) Answer: approximately . This is true in the multi-dimensional case as well as in the single-variable case.
Well, suppose that (as above) , and is "small", and someone asks you how much changes when its input changes from to . That is the same as asking how much changes when its input changes from to , which is the same as asking how much changes when its input changes from to , where . And what is the answer to this question? The answer is: approximately, .
But what is ? In other words, how much does change when its input changes from to ? Answer: approximately .
Therefore, the amount that changes when its input changes from to is approximately .
We know that is supposed to be a matrix (or number, in the single-variable case) such that is a good approximation to . Thus, it seems that is a good candidate for being the matrix (or number) that is supposed to be.
This can be made into a rigorous proof. The standard proof of the multi-dimensional chain rule can be thought of in this way.
Proof
Here's a proof of the multi-variable Chain Rule. It's kind of a "rigorized" version of the intuitive argument given above.
I'll use the following fact. Assume , and . Then is differentiable at if and only if there exists an by matrix such that the "error" function has the property that approaches as approaches . (In fact, this can be taken as a definition of the statement " is differentiable at .") If such a matrix exists, then it is unique, and it is called . Intuitively, the fact that approaches as approaches just means that is approximated well by .
Okay, here's the proof.
Let and . (Here each of , , and is a positive integer.) Let such that . Let , and suppose that is differentiable at and is differentiable at .
In the intuitive argument, we said that if is "small", then , where . In this proof, we'll fix that statement up and make it rigorous. What we can say is, if , then , where is a function which has the property that .
Now let's work on . In the intuitive argument, we said that . In this proof, we'll make that rigorous by saying , where has the property that .
Putting these pieces together, we find that
, where I have taken that messy error term and called it .
Now, we just need to show that as , in order to prove that is differentiable at and that .
I believe we've hit a point where intuition no longer guides us. In order to finish off the proof, we just need to look at and play around with it a bit. It's not that bad. For the time being, I'll leave the rest of the proof as an exercise for the reader. (Hint: If is an by matrix, then there exists a number such that for all .)