Difference between revisions of "Chain Rule"
m (Chain rule moved to Chain Rule: capitalized proper noun) |
(→Intuition) |
||
Line 42: | Line 42: | ||
− | We know that <math>h'(x)</math> is supposed to be a matrix | + | We know that <math>h'(x)</math> is supposed to be a matrix (or number, in the single-variable case) such that <math>h'(x) \cdot \Delta x</math> is a good approximation to <math>h(x+ \Delta x) - h(x)</math>. Thus, it seems that <math>f'(g(x)) \cdot g'(x)</math> is a good candidate for being the matrix (or number) that <math>h'(x)</math> is supposed to be. |
This can be made into a rigorous proof. The standard proof of the multi-dimensional chain rule can be thought of in this way. | This can be made into a rigorous proof. The standard proof of the multi-dimensional chain rule can be thought of in this way. |
Revision as of 17:59, 20 June 2006
Statement
Single variable Chain Rule:
Let each of be an open interval, and suppose
and
. Let
such that
. If
,
is differentiable at
, and
is differentiable at
then
is differentiable at
, and
.
Multi-dimensional Chain Rule:
Let and
. (Here each of
,
, and
is a positive integer.) Let
such that
. Let
. If
is differentiable at
, and
is differentiable at
then
is differentiable at
and
. (Here, each of
,
, and
is a matrix.)
Intuition
The single-variable Chain Rule is often explained by pointing out that
.
The first term on the right approaches , and the second term on the right approaches
, as
approaches
. This can be made into a rigorous proof. (But we do have to worry about the possibility that
, in which case we would be dividing by
.)
This explanation of the chain rule fails in the multi-dimensional case, because in the multi-dimensional case is a vector, as is
, and we can't divide by a vector.
However, there's another way to look at it.
Suppose a function is differentiable at
, and
is "small". Question: How much does
change when its input changes from
to
? (In other words, what is
?) Answer: approximately
. This is true in the multi-dimensional case as well as in the single-variable case.
Well, suppose that (as above) , and
is "small", and someone asks you how much
changes when its input changes from
to
. That is the same as asking how much
changes when its input changes from
to
. Which is the same as asking how much
changes when its input changes from
to
, where
. And what is the answer to this question? The answer is: approximately,
.
But what is ? In other words, how much does
change when its input changes from
to
? Answer: approximately
.
Therefore, the amount that changes when its input changes from
to
is approximately
.
We know that is supposed to be a matrix (or number, in the single-variable case) such that
is a good approximation to
. Thus, it seems that
is a good candidate for being the matrix (or number) that
is supposed to be.
This can be made into a rigorous proof. The standard proof of the multi-dimensional chain rule can be thought of in this way.