Basic Theoretical Probability

Intro to theoretical probability

Probability is simply how likely something is to happen. 

Whenever we’re unsure about the outcome of an event, we can talk about the probabilities of certain outcomes: how likely they are. The analysis of events governed by probability is called statistics.

Or we might tell, probability is asking for some type of way of getting our hands around an event that’s fundamentally random.

We check for:

  • How many equally likely possibilities are out there?
  • And, of the number of equally possibilities, we care about the number that contain our event.

# of possibilities that meet our conditions / # of equally likely possibilities

Flipping a coin

In case of flipping a fair coin: P(H) = 1/2 = 50%

Another way to think (or conceptualize) the probability is, to say, if we were to run the experiment* of flipping a coin many times (the more the better), what percentage of those would give us what we care about? 

* Experiment here is, everytime we run this random event.

The larger the number that we are doing, the more likely we’re going to get something close to 50%.

Rolling a dice

Or, in case of rolling a (single) fair dice:

P(1) = 1/6 
P(1 or 6) = 2/6 = 1/3 
P(2 and 3) = 0/6 = 0
P(even) = 3/6 = 1/2

Common ways of reporting probability are: 
  • fractions
  • decimals
  • percentages

We can convert between these different values using simple mathematical functions:

  • fraction to decimal -> divide the values in the fraction (ex: 1/2 = .5)
  • decimal to percentage -> multiple by 100 (ex: .5 * 100 = 50%)
Let’s talk about some terms too:

In case of pulling a yellow marble from a bag with 3y, 2r, 2g and 1b marble:

  • Event is picking a yellow marble
  • Sample space is all possible outcomes
  • Trial is picking out of the bag

Basic Probability Rules

1) Possible values for probabilities range from 0 to 1

0 = impossible event
1 = certain event

2) Complement Rule – In a random experiment, the probabilities of all possible events (the sample space) must total to 1. This also means that some outcome must* occur on every trial.

P(Ac) = 1 – P(A)

* In probability theory and logic, a set of events is jointly or collectively exhaustive if at least one of the events must occur. For example, when rolling a six-sided die, the events 1, 2, 3, 4, 5, and 6 are collectively exhaustive, because they encompass the entire range of possible outcomes.

3) Addition Rule – the probability that one or both events occur

mutually exclusive events: P(A or B) = P(A) + P(B)
not mutually exclusive events: P(A or B) = P(A) + P(B) – P(A and B)

4) Multiplication Rule – the probability that both events occur together

independent events: P(A and B) = P(A) * P(B)
dependent events: P(A and B) = P(A) * P(B|A)

5) Conditional Probability – the probability of an event happening given that another event has already happened

P(A|B) = P(A and B) / P(B)

I’ll write more about when these rules will be helpful by providing some samples, but first, let’s bring another perspective to them.

Alper’s Notes:

– Seeing the terms like ‘addition’ or ‘multiplication’ makes it harder for me to grasp the idea behind these concepts, simply because it resonates with something abstract. In fact, they (+ or *) are the results of an idea; they are just a tool, not the concept itself. That’s why reading these rules as some certain types of events makes it easier for me to understand what’s going on.

So addition rule is just an offer/solution to a situation when one or both events occur, while multiplication rules is for a situation when both events occur together.

– It’s also better for me to see the equations above as a whole, rather than different versions for different situations. For instance, rather than seeing the addition rule in two different ways for mutually and not mutually exclusive events, grasping the one that covers both is better for me, that is:

P(A or B) = P(A) + P(B) – P(A and B)

→ When events A and B are mutually exclusive P(A and B) would be zero, that’s it!

The same goes for the multiplication rule too. The one that covers both is:

P(A and B) = P(A) * P(B|A)

→ When event A and B are independent, then P(B|A) would be the same as P(B), that’s it!

In fact, even the conditional probability equation derives from the multiplication rule above! And that’s also how Bayes’ theorem is stated mathematically.

Key Terminology

Mutually Exclusive – this indicates that two events cannot happen at the same time.

For example, consider the following two events: A) rolling a 2 and B) rolling an odd number. Since 2 is an even number, it’s not possible to roll a 2 and for that number to be odd. Therefore, these events are mutually exclusive.

Independent Events – the probability of one event does not change based on the outcome of the other event

Consider a basketball player shooting 2 free throws. If the player’s probability of making the second shot changes based on whether or not they make the first shot, then these events are dependent. If the probability does not change, then they would be independent.

Basic set operations

A sample space is the set of all possible outcomes of a statistical experiment, and it is usually denoted using set notation. The possible outcomes, or sample points, are listed as elements in the set. That’s why we’ll first get familiarized with the notion of a set and also perform some operations on sets.

Let’s start with ‘real-life’ samples to have a concrete understanding.

Probability using sample spaces
Case 1

Throwing a coin 3 times and having exactly 2 Heads.

In this case, our sample space is:
HHH
HHT
HTH
HTT
THH
THT
TTH
TTT

Solution: There are 8 possible outcomes, so among them P(exactly 2 H) = 3/8 

Case 2

Rolling two six-sided dice and having a double.

Sample space:

Solution: 36 possible outcomes, so P() = 6/36 = 1/6  

When multiple events occur simultaneously or in succession (like above: rolling two dice), it’s called a compound event. We can represent its sample space in different ways too. Let’s see another case.

Case 3

Let’s say we have 3 flavors (Chocolate, Strawberry, Vanilla) and 3 sizes (small, medium, large). In such a case, we can look at it in different ways.

Approach 1

We can draw a tree diagram to think about the sample space.

___Chocolate
| |__s
| |__m
| |__l
|___Strawberry
| |__s
| |__m
| |__l
|___Vanilla
|__s
|__m
|__l

Approach 2

We can also use a grid (like we did above with dice). That would be another way to think about all of the possible outcomes.

ChocolateStrawberryVanilla
Smalls Cs Ss V
Mediumm Cm Sm V
Largel Cl Sl V
Approach 3

A third way can be using a table.

FlavorSize
Cs
Cm
Cl
Ss
Sm
Sl
Vs
Vm
Vl

Sample space is not telling us if the outcomes are equally likely or not. It’s telling, if we’re going to do an experiment, what are all the different possibilities for that experiment.

In the case when they are equally likely, it can be very useful. Because we could easily answer, for instance, what’s the probability of getting something that is either small or chocolate, by looking at one of these graphs above.

Now that we know a little about sample spaces, let’s perform some operations on sets.

Intersection and union of sets

Set is a collection of distinct objects.

X = {3, 12, 5, 13}
Y = {14, 15, 6, 3}

Intersection: ~and X ∩ Y = {3}
Union : ~or X ∪ Y = {3, 12, 5, 13, 14, 15, 6}

One way to visualize intersections and unions is using a Venn diagram.
We don’t think about the order, when we’re talking about a set.

Relative complement or difference between sets

A = {5, 3, 17, 12, 19}
B = {17, 19, 6}

A – B = {5, 3, 12}

This is “B subtracted from A”, or “Relative complement of B in A”.
We can denote this also as: A \ B

B \ A = {6}
A \ A = { }

This is called an “Empty set” or “Null set”.

Universal set and absolute complement

A’ = U – A = U \ A

This is “Absolute Complement of A”, or “Set of all things in the universe that are not in A”.

If we have:
U = Z (integers)
C = {-5, 0, 7}

Then:
-5 ∈ C
0 ∈ C
-8 ∉ C
53 ∉ C

* ∈ means ‘membership of a set’. It is not epsilon.

C’ = U – C = U \ C

Also:
-5 ∉ C’
0 ∉ C’
-8 ∈ C’
53 ∈ C’

Subset, strict subset, and superset

A = {1, 3, 5, 7, 18}
B = {1, 7, 18}
C = {18, 7, 1, 19}

B ⊆ AB is a subset of AEvery element in B is a member of A
B ⊊ AB is a strict (proper) subset of A—’’—, but everything that’s in A is not a member of B
A ⊇ BA is a superset of BA contains every element that is in B
A ⊋ BA is a strict superset of B—’’—, and A is not equivalent to B

We can write A ⊆ A, but can not write A ⊊ A.

Set Operations

A = {3, 7, -5, 0, 13}
B = {0, 17, 3, Blue, ☆}
C = {Pink, ☆, 3, 17}

(A\(A∩(B\C)’))∪(B∩C) = ?

B\C = {0, Blue}
(B\C)’ = set of all things in universe that are neither a 0 or a Blue

A∩(B\C)’ = {3, 7, -5, 13}

A\(A∩(B\C)’) = {3, 7, -5, 0, 13} \ {3, 7, -5, 13} = {0}

B∩C = {17, 3, ☆}

(A\(A∩(B\C)’))∪(B∩C) = {0, 17, 3, ☆}

Let’s finish with a popular problem in the statistical field.

The Monty Hall problem

You are in a TV show.
There are three slots, only one has a reward and you pick one.
The host shows one that is empty from the remaining two.
And asking if you want to change your initial choice or not.
Would you change or stick to your choice?

We can think about two scenarios below.

When we don’t switch (when we always stick to our first guess):
P(W) = 1/3 
P(L) = 2/3 

When we always switch:
P(W) = 2/3 
In this case (when we switch), the way we win is if we pick wrong initially.
And there are 2/3 ways to pick wrong.
P(L) = 1/3 
If we pick correctly and switch, we’ll lose.
Because the way we lose is we pick right at first, that is 1/3 of a chance.

Another way to think about it is, when we first make our initial pick, there’s a 1/3 chance that it’s there, and there’s a 2/3 chance that it’s in one of the other two doors. And they’re going to empty out one of them. So when we switch, we essentially are capturing that 2/3 probability.


Disclaimer: Like most of my posts, this content is intended solely for educational purposes and was created primarily for my personal reference. At times, I may rephrase original texts, and in some cases, I include materials such as graphs, equations, and datasets directly from their original sources.

I typically reference a variety of sources and update my posts whenever new or related information becomes available. For this particular post, the primary sources were: