Let p(t) be a path function in n space. This is a continuous map from one variable, perhaps time, into many variables, perhaps the coordinates in n space, thus defining a path. In our example, p(t) might trace the path of an intrepid space ship as it flies through a star. We want to know how f changes with time as the ship moves along its path. It's obviously getting hotter as the ship moves towards the center of the star, and cooler as the ship flies back out again; we would like to quantify this.
Compute temperature as a function of time by composing the two functions: g(t) = f(p(t)). Now we're back to one input variable and one output variable. Can we compute the rate of change, the increase of temperature with time? Is g differentiable? It is, if f and p are also differentiable.
Start with the usual difference quotient: f(p(t+h))-f(p(t)) over h. For convenience, we'll assume t = 0, and p(t) is the origin, and f(p(t)) = 0. This merely relabels coordinates and simplifies notation; it does not change the proof. Now our difference quotient becomes f(p(h)) over h.
Replace the numerator, the change in f as we stray from the origin, with the linear approximation of f, and its error term.
∇f(0).p(h) + |p(h)|×e(p(h))
Divide by h and take the limit. We don't have a rigorous formula for |p(h)|/h, but we know that a vector's distance is less than the sum of the absolute values of the components, which is less than n times the largest component. If p1 is the component that is changing the fastest, then n times the derivative of p1 is an upper bound on the limit of |p(h)|/h. As long as we can put a bound on it, any bound, them multiplying by e(p(h)) produces a limit of 0, since e approaches 0.
That leaves the first term, a dot product. This expands into a linear combination of the component functions p1 through pn. The derivative is thus the linear combination of the individual derivatives, and we obtain the following formula.
g′(t) = ∇f(p(t)).p′(t)
If the number of dimensions is one, as though the space ship could only move forward and backward, this formula reproduces the original chain rule.