Notice that each tterm, each product of 3 matrix elements, represents each row once and each column once. We don't go straight down or straight across, we multiply along diagonal lines. For the 3×3 case, there are 6 ways to do this, hence 6 terms in the formula. Choose any entry from the first row and mark that column as "used". Then choose an enttry from the second row, avoiding the used column. Mark that column used, and in the third row, there is only one entry available. That's 3×2×1 or 6 combinations.
If you write down the numbers of the columns, in the order they are "used" by the rows, you'll find a permutation on the numbers 1 through n. Each product is associated with one of these permutations. The product is negated when the permutation is odd. At least that's how it looks in the 3×3 case.
If we derive a formula for the determinant of a 4×4 matrix, it will have 24 terms, each term a product of 4 entries according to a permutation on 4 columns. Half the terms are negated, according to the parity of the permutations. A 5×5 matrix gives a formula with 120 terms, and so on. This quickly becomes impractical, but let's prove it anyways.
Let det2(M) be the sum of all permutation products, where a product is negated when its associated permutation is odd. This definition agrees with det1 for n = 1, 2, and 3. Proceed by induction on n.
Given an n×n matrix, delete the first row and jth column, and write the subdeterminant as a sum of permutation products. For each permutation, add 1 to each column number that is j or larger. We are pushing them over to make room for column j. Write this permutation as a series of swaps. (Remember that swaps, like transpositions, are odd.) The number of swaps gives the parity. Now place j in position, just after the first j-1 numbers. Each swap is still a swap, and still odd. Some of them pass through j, but that's ok. The result is the same permutation, with the same parity, but if has j in position j. However, j is suppose to be first in line. After all, det1 says to multiply the subdeterminant by the entry in row 1 column j, so j comes first. A series of j-1 transpositions moves j to the head of the line. Have we changed the parity of the permutation by putting j in front? Only if an odd number of transpositions are needed to move j from its home in position j to the front in position 1. This happens only when j is even. When j is 4, swap it with 3 2 and 1 to move it to the front, giving 3 transpositions. In the same way, det1(M) tells us to negate the product precisely when j is even. The two formulas agree.