Let v be the vector [4,0], pointing along the x axis, and let w = [1,1], pointing up and to the right. Now v and w form the bottom and left side of a parallelogram. In fact the plane can be tiled with these parallelograms. Integer multiples of v and w span a lattice in the plane, and if you connect the dots, you get an infinite pattern of parallelograms, like a checkerboard that has been stretched into rectangles and pushed over.
We would like to know the area of the parallelogram spanned by v and w, and the best way to do that is to push it back into a rectangle. Subtract ¼ of v from w, hence w becomes [0,1]. Now w points straight up, and the area is the length of v times the length of w. This happens to be the determinant of the matrix formed by v and w. Remember that row subtraction doesn't change the determinant, so we could have taken the determinant using the original v and w. The determinant of [4,0|1,1] is indeed 4. Let's prove this in general.
Let M be a square matrix that defines n vectors in Rn. If M is not a basis, the space spanned by M, and the pushed-over box determined by M, is squashed flat into a lower dimensional subspace. There is no volume, and as if in confirmation, det(M) = 0.
Let M be a basis for Rn. One by one we can push the vectors of M around until they are orthogonal. This is the Gram Schmidt process. The operations are all row subtractions, just as we subtracted ¼ of v from w above. This does not change the determinant of M.
In any number of dimensions, the volume of a parallelatope is its cross-sectional area times the distance between floor and ceiling. This is an exercise in multi-dimensional integral calculus, and it's pretty obvious. This is the "base×height" formula for the area of a parallelogram. Gram Schmidt slides the ceiling over, until the walls are perpendicular to the floor, but the ceiling does not rise or fall. Hence the volume does not change. Therefore Gram Schmidt preserves both determinant and volume.
Now M is an orthogonal basis. Divide its rows by their lengths. This scales the determinant and the volume by the same factors. The result is an orthonormal basis that defines a hypercube in Rn. A rigid rotation carries this cube back onto the unit cube, whose volume is 1 by definition. The rotation does not change volume or determinant. Thus the volume spanned by an orthonormal basis is 1, and its determinant is ±1. Working back to our original shape, the volume of a parallelatope spanned by a set of vectors in Rn is + or - the determinant of the matrix. Don't discard the sign; negative volume makes sense in certain applications.