Difference between revisions of "Chain Rule"
m (proofreading) |
|||
Line 7: | Line 7: | ||
− | Here are some more precise statements | + | Here are some more precise statements for the single-variable and multi-variable cases. |
Line 42: | Line 42: | ||
− | Well, suppose that (as above) <math>h(x) = f(g(x))</math>, and <math>\Delta x</math> is "small", and someone asks you how much <math>h</math> changes when its input changes from <math>x</math> to <math>x+ \Delta x</math>. That is the ''same'' as asking how much <math>f</math> changes when its input changes from <math>g(x)</math> to <math>g(x+ \Delta x)</math> | + | Well, suppose that (as above) <math>h(x) = f(g(x))</math>, and <math>\Delta x</math> is "small", and someone asks you how much <math>h</math> changes when its input changes from <math>x</math> to <math>x+ \Delta x</math>. That is the ''same'' as asking how much <math>f</math> changes when its input changes from <math>g(x)</math> to <math>g(x+ \Delta x)</math>, which is the same as asking how much <math>f</math> changes when its input changes from <math>g(x)</math> to <math>g(x) + \Delta g</math>, where <math>\Delta g = g(x+ \Delta x) - g(x)</math>. And what is the answer to this question? The answer is: approximately, <math>f'(g(x)) \cdot \Delta g</math>. |
Revision as of 13:27, 21 June 2006
Statement
Basically, the Chain Rule says that if , then .
For example, if , , and , then .
Here are some more precise statements for the single-variable and multi-variable cases.
Single variable Chain Rule:
Let each of be an open interval, and suppose and . Let such that . If , is differentiable at , and is differentiable at then is differentiable at , and .
Multi-dimensional Chain Rule:
Let and . (Here each of , , and is a positive integer.) Let such that . Let . If is differentiable at , and is differentiable at then is differentiable at and . (Here, each of ,, and is a matrix.)
Intuitive Explanation
The single-variable Chain Rule is often explained by pointing out that
.
The first term on the right approaches , and the second term on the right approaches , as approaches . This can be made into a rigorous proof. (But we do have to worry about the possibility that , in which case we would be dividing by .)
This explanation of the chain rule fails in the multi-dimensional case, because in the multi-dimensional case is a vector, as is , and we can't divide by a vector.
However, there's another way to look at it.
Suppose a function is differentiable at , and is "small". Question: How much does change when its input changes from to ? (In other words, what is ?) Answer: approximately . This is true in the multi-dimensional case as well as in the single-variable case.
Well, suppose that (as above) , and is "small", and someone asks you how much changes when its input changes from to . That is the same as asking how much changes when its input changes from to , which is the same as asking how much changes when its input changes from to , where . And what is the answer to this question? The answer is: approximately, .
But what is ? In other words, how much does change when its input changes from to ? Answer: approximately .
Therefore, the amount that changes when its input changes from to is approximately .
We know that is supposed to be a matrix (or number, in the single-variable case) such that is a good approximation to . Thus, it seems that is a good candidate for being the matrix (or number) that is supposed to be.
This can be made into a rigorous proof. The standard proof of the multi-dimensional chain rule can be thought of in this way.