Remember that a unitary matrix implements a rigid rotation in space. So the phrase "unitarily similar" is just like turning your head. You're looking at the same linear transform from another direction. From this vantage point, the last coordinate is scaled, the second to last coordinate is a linear combination of the last two coordinates, and so on, building an upper triangular matrix. Of course you could turn your head another way and the transform becomes a lower triangular matrix. We move to an upper triangular as a matter of convention.
This theorem only works over a algebraically closed field such as the complex numbers. If the field is not closed the theorem might fail.
To illustrate, consider a linear map inR2 that rotates the plane 90 degrees. This is implemented by the matrix [0,1|-1,0]. Subtract s from the main diagonal and get a characteristic polynomial of s2+1, which has no roots, at least not in the reals. This is not a surprise, since there are no real eigen vectors. Every vector is moved 90 degrees by the transform. No matter how hard you try, you cannot turn your head and see a transform with an eigen vector, for there are none to be found. And if there is no eigen vector, the map cannot be implemented by an upper triangular matrix, since the last coordinate would become an eigen vector.
With that example behind us, let's assume the field is algebraically closed, e.g. the complex numbers. Every polynomial has a root, and every matrix has at least one eigen value and a corresponding eigen vector.
Proceed by induction on the number of dimensions.
A 1x1 matrix is automatically an upper triangular matrix, so consider an n×n matrix.
As mentioned earlier, there is an eigen value s and an eigen vector x. If we want a lower triangular matrix, let x be a row vector such that x*M = sx. For an upper triangular matrix let x be a column vector such that M*x = sx. We'll follow the latter convention.
Assume there is a well defined dot product, as is the case in the reals or the complex numbers. Use the Gram Schmidt process to build B, a unitary basis containing x. If |x| exists in the field of interest, and is nonzero, divide x by |x| to build a unit vector, and so on to build a unitary basis. If you can't build a unitary basis, then let B be any basis and on we go.
Let x be the first vector of B, in this case the leftmost column of the matrix that defines B.
Since x is eigen, M*x = sx. This means the leftmost column of M*B is sx.
Let C be the inverse of B, which happens to be the tranjugate of B, i.e. the transpose of the conjugate. Consider C*M*B. To find the first column, multiply the rows of C by the first column of M*B, which is sx. The result is s in the upper left, and zeros down the rest of the first column. Thus C*M*B begins to look like an upper triangular matrix.
Ignore the first row and column of C*M*B and consider the remaining n-1 dimensional matrix. Find P and Q, where Q is P inverse, to make this submatrix similar to an upper triangular matrix. This can always be done (by induction). Now add an extra row of zeros above and to the left of P, and do the same for Q. Place a 1 in the upper left of the augmented versions of P and Q. Note that P*Q is still the identity matrix.
Consider PCMBQ. We know that CMB has s in the upper left and zeros down the first column. Multiply by P on the left, and the top row and left column remain the same, while the submatrix is premultiplied by the submatrix of P. Them multiply by Q. The upper left entry is still s, and the rest of the top row is, well, who knows. The left column is still zeros, except for s in the upper left. Finally the submatrix, with P on the left and Q on the right, becomes upper triangular. Therefore the entire product, PCMBQ, is upper triangular. The result is similar to M via the invertible matrix PC. That proves the theorem.
Let M be a real matrix with real, orthogonal eigen vectors. Let x be one of these vectors and build B as above. Note that B is build using Gram Schmidt as before, but now B is a basis for Rn, rather than Cn. The product matrix CMB has [1,0,0,0…] as a column eigen vector. Remember that the n orthogonal eigen vectors of M map, via similarity, to n orthogonal eigen vectors of CMB. Since B is real the new orthogonal vectors are all real. We have the first one in hand, 1 in the first coordinate and all other coordinates 0. The others are all orthogonal to this one, hence they all lie in the subspace defined by the last n-1 coordinates. In other words, they all start with zero. The submatrix of CMB, without the first row and column, has n-1 real orthogonal eigen vectors. By induction it is similar to an upper triangular matrix. Let P and Q implement this similarity. Augment P and Q with zeros and ones, as we did before. Build PCMBQ, whence M is unitarily similar to an upper triangular matrix.
Since C is real, and P is real by induction, we know PC is real. The rotation is accomplished inside Rn, and the resulting matrix is upper triangular.
A matrix with one element is real, and upper triangular, so proceed by induction on n.
Let x be an eigen vector of M with real eigen value s. Remember, our convention places x on the right, as a column vector, whence M*x = sx.
To keep things simple, assume x is real. Use Gram Schmidt to build an orthonormal basis B, spanning real space, with x in the leftmost column. Let C be the tranjugate of B, which is the inverse of B. And since B is real, C is just the transpose of B.
The matrix CMB now has 1,0,0… as an eigen vector. This is the eigen vector that corresponds to x. The other eigen vectors of M are mapped to orthogonal eigen vectors of CMB, and the eigen values are preserved. The eigen vectors start with 0, and live in n-1 space. They also have the same real eigen values. The inductive criteria are met. The submatrix of CMB is similar to an upper triangular matrix via a real, orthonormal matrix P. Augment P and Q as we did before, and PCMBQ becomes upper triangular. Since C and P are real, PC is also real.
The problem is, x might not be a real vector. It might have imaginary components. Let x live inside the eigen space K. These are all the eigen vectors with eigen value s. Since K contains x, it is at least one dimensional.
Subtract s from the main diagonal of M and find a real, singular matrix. This matrix has a kernel in real space. In other words, there are real vectors in the kernel of M-s, real eigen vectors with eigen value s, real vectors inside K. Let r be such a vector.
Now r could be a multiple of x. If x were pure imaginary, for instance, scale x by i and get r. This is really the same eigen vector, in complex space. So x is real, and we can invoke the earlier paragraph.
If x is not real, and not a scale multiple of a real vector, we have more work to do. To recap, the eigen space K contains the complex vector x and the real vector r, such that x and r are not multiples of each other. Thus K is at least two dimensional.
To keep things simple, let's say K contains 3 of our orthogonal eigen vectors, x y and z. Thus K is at least 3 dimensional. Let g be the set of n-3 orthogonal eigen vectors, other than x y and z. The vectors in g have eigen values other than s. Remember, eigen vectors with different eigen values are linearly independent. Thus x y z and r are all independent of the n-3 vectors in g. If they are independent of each other, we have n+1 independent vectors, which is impossible. Therefore r is spanned by x y and z.
Use Gram Schmidt to build 3 new vectors spanning K, the first being r. Since all of K is orthogonal to g, this new basis is orthogonal to g. We have a new set of orthogonal eigen vectors, with the same eigen values, and one of them is r. Use the real eigen vector r to complete the inductive step, as we did before. That completes the proof.
If M is real, with orthogonal eigen vectors and real eigen values, it is similar to a real upper triangular matrix, via a real orthonormal matrix P.