First, this theorem proceeds through all the ordinals, not just the finite ordinals. To quote Buzz Lightyear, we go "To infinity and beyond." You can exploit this principle by showing f(0) is true, and that f(U) holds if f(x) holds for every x below U. Then, by transfinite induction, f(0) → f(1) → f(2) → … f(ω) → f(ω+1) …, and so on through all the ordinals.
The second generalization is an extension of f. It need not be boolean, true or false; it could be any function. For example, define f(n) as n×f(n-1), and set f(0) = 1, and a new function springs to life, namely n factorial. If we had set f(n) = f(n-1)+f(n-2), and f(0) = f(1) = 1, we would create another function, the Fibonacci sequence. (These are not the best examples, as they do not extend into infinite ordinals.)
Since we are defining a function recursively, this theorem is sometimes called the recursion theorem. Now let's prove it.
Let S be an ordinal and let R be the range set. Let W be the set of functions from S into R. Note that W is a product set, sometimes written RS. If x is an ordinal in S, there is a set of functions from the members of x into R, written Rx. Let U be the union over all these functions, for all x in S. In other words, U contains every function mapping any initial segment of S into R.
Assume there is a function g with domain U and range R. If e is an element of U, then e is really a function from an initial segment of S into R. If x bounds this initial segment, g(e) provides the "next" value, the image of x. Given such a g, there is a unique function f(S) into R, such that f is determined by g at each step.
To show uniqueness, let e and f differ on x, for x minimal. They cannot differ at x = 0, since f(0) is always equal to g(∅). Restrict e and f to the initial segment below x, and since they agree on everything below x, they become the same function; call it h. Now g(h) determines both e(x) and f(x), hence e(x) = f(x). This is a contradiction, hence the function determined by g, if it exists, is unique.
If there is an f, determined by g, for everything in x, we call this an x approximation. Suppose there is no g-compatible function mapping S into R, hence there is no S approximation. If there are lesser ordinals in x that also lack x approximations, choose the least of these and replace S with x. Now there is no map from S into R, although each ordinal in S can be mapped into R, consistent with g.
If S is the successor of T, f maps the members of T into R, and T can be mapped to g(f(members(T))), thus extending f to the last ordinal T. If S is a limit ordinal, let f be the union of f(T) over all the ordinals T in S. This union is well defined, because functions always agree on their intersection. We proved this earlier; the g-compatible function on a common domain is unique. Each new ordinal T extends the function further, and when all the ordinals are taken together, the composite function maps all of S into R, and is compatible with g.
This completes the construction of f from its recursive definition, at least through the members of S. The function exists, as a set and as a formula.
Note that S could be any well ordered set, since such a set is isomorphic to an ordinal.
In some cases we would like to extend induction or recursion up through all the ordinals. Since the class of ordinals is not a set, we must think of g as a formula, rather than a function. Given a map f from an ordinal S into some set R, the formula g(f,y) is satisfied by exactly one set y in our universe, and that should be the image of S. (Note that the new set y may not lie in R, but we can always bring y in and create a new range set by replacement.)
The proof is almost self-evident. The function exists for any ordinal S, as described above, and is unique on that ordinal. If we derive f for a larger ordinal T, and restrict it to S, we must get the same function f(S) back again. Thus f is well defined on all the ordinals, and agrees with g at each step.
If the range set R is well ordered, g need not be a function. Instead of forcing a unique y such that g(f,y) is true, assume there is at least one y in R such that g(f,y) is true. We can replace g with h, where h asserts g(f,y) and y ≤ z whenever g(f,z). In other words, h selects the least y satisfying g, and h becomes a function that implements the recursive step. We can build f from h, and it will be compatible with g.
This is sometimes called recursion with choice, because the axiom of choice ensures a well ordering on any set, including R.