I remember the first time the concept of joint probability distribution was introduced to me I found it completely unintuitive (like so many topics in probability), declared myself too stupid to get it, and considered giving up on statistics.

The problem was that they used these silly gambling examples to demonstrate the concept. Flip of the coin this, roll of the die that. Urrg. Once I reset it in the context that I could relate too, everything became easier. So here we go.

Suppose you are a B level celebrity in London. Then suppose that the probability of you having sex on any given day is ⅔ (it would be virtually 1, if you were an A level celebrity in LA, but that would not make for interesting example). Also, suppose that the probability of a cloudy day is ⅘. Here is how we write it in the language of probabilities.

X is a random variable tracking our sex patterns. In this case, X can take on the values of X = Sex and X = NoSex. Y is a random variable tracking weather in London during the day. In our case Y = Cloudy or Y = Sunny.

It should be obvious that P(X=Sex) or P(X=NoSex) = P(Y=Cloudy) or P(Y=Sunny) = 1. The probability of the whole sample space is 1 in both cases (you either have sex or not or it is either sunny or cloudy). So P(X=Sex) + P(X=NoSex) = ⅔ + ⅓ = 1. By the same token P(Y=Cloudy) + P(Y=Sunny) = ⅘ + ⅕ = 1.

In this analysis, we assume that the sex is independent of the weather, an elusive assumptions and the one that should not be confused with correlation. I will try to make the differences clear when I discuss conditional independence. OK, here is the punch line. When we say joint probability distribution, we mean the probability of the combination of the events in question. For finite and discrete random variables such as the ones we are talking about, we can summarize the joint distribution using a table.

Sum = 1 |
P(X=Sex) = 2/3 |
P(X=NoSex) = 1/3 |

P(Y=Sunny) = 1/5 |
P(X=Sex,Y=Sunny) = 2/15 | P(X=NoSex,Y=Sunny) = 1/15 |

P(Y=Cloudy) = 4/5 |
P(X=Sex,Y=Cloudy) = 8/15 | P(X=NoSex,Y=Cloudy) = 4/15 |

In the above table when I write P(X=Sex, Y=Sunny) I mean the probability of both sex AND cloudy weather. (Do not confuse this with a notation P(X | Y), which means that we are only interested in Sex given a particular Weather outcome had already occurred.) This is why it is called joint probability. The probabilities listed in the margins of the table, are called … marginal. Each value in the cells is the product of two marginals. For instance the probability of having sex while it is sunny is ⅔ * ⅕ = 2/15. The cool thing is that if you observe only the joint probabilities you can easily calculate the marginals by summing across rows or columns (the reverse does not work – you can’t get joints from marginals, but you can get them from a good dealer in the Bronx.)

So, if I observe you, the celebrity in question, for 150 days having sex during 20 sunny days and 80 cloudy days (you were not having sex during the other 50 days, sorry), I may conclude that the marginal probability of you having sex in London is 2/15 + 8/15 = 10/15 = ⅔ = P(X=Sex) = P(X=Sex,Y=Sunny)+ P(X=Sex,Y=Cloudy), which is in fact consistent with our assumptions. This can be formalized as follows:

This rule is quite general (it is called the sum rule of probabilities) and it says that for any joint distribution X,Y to get back the probability of X we have to sum across all possible values of Y.

Try summing across other rows columns and you will see that results are consistent there as well.