Probability is used to describe / predict the outcome of an experiment, when the internals of that experiment are well understood. A machine is analyzed, and the scientist tries to predict its behavior. Yet the machine has random elements, or too many elements, so the prediction is given in terms of probabilities. The machine is most likely to do this, and least likely to do that, and it cannot do the other thing at all.

Although a meteorologist attempts to predict the weather,
and a psychologist attempts to predict a person's behavior,
we usually begin with a machine that is *far* simpler.
A pair of dice, for instance.

When two dice are thrown, there are 36 possible outcomes. The sum of the upward faces is never 1, so one is impossible. We say the odds, or probability of throwing a 1 is 0. At the same time, the sum is always between 2 and 12. Thus the probability that the sum is less than 13 is 1. In the language of probability, 0 is impossible and 1 is guaranteed.

That's not very interesting, so consider the odds of a total of 7. There are 6 ways to do this, out of 36 possibilities, hence 7 is produced one sixth of the time. The odds of rolling 7 are 1/6, or 0.166666.

But how do you know there are 6 ways to produce 7, and 36 outcomes all together? The answer is combinatorics. In fact probability is an extension of combinatorics. If you aren't familiar with the basic theorems of combinatorics, stop here and review those theorems first.

Sometimes the real world gets in the way of our platonic models. The die that is tossed high in the air will produce a 6 slightly more than one sixth of the time. This is because the face with 1 pip is heaviest, while the face with 6 pips is lightest. After all, the latter is short 5 dots worth of material. So there is a tiny, negligible bias. We're declaring it negligible in any case. You won't notice it when you're playing Monopoly. However, some dice are deliberately counterweighted, so that the center of mass is at the center of the cube. These are "fair" dice. Other dice are "loaded", with a hidden weight at one face. These are designed to make certain outcomes more likely than others. As you can see, you need to know your "machine" well, in order to predict its behavior accurately.

In contrast, statistics is a black-box analysis of a machine, often a complicated machine, such as a human being. Does the human recover faster when taking this drug? Some people do, some don't; a few even react to the drug. So - the machine behaves certain ways, under certain conditions, and we use "reverse engineering" to deduce the underlying probabilities. Let's return to our earlier example.

Perhaps we don't know the machine is based on a pair of dice, but after a thousand trials we notice that 7 comes up 173 times. We might infer that 7 is produced with probability 1/6. And we might give a measure of our confidence in this assertion. This confidence level grows with more tests. Push the button another ten thousand times and get 7 1,698 times. Now we are even more sure - the odds of getting a 7 is 1/6. If a statistician is honest, he will give you his answer, in this case 1/6, and his confidence in that answer, e.g. 99%.

As you might imagine, statistics requires a solid grounding in probability. In fact both branches of math use many of the same theorems. Some text books and courses combine the two, presenting a title like "Probability and Statistics", but I prefer to think of them as separate, albeit related fields. So for now, let's proceed with probability theory.