Random Numbers and Probability

Random Numbers and Probability

Bases on DataCamp

Introduction

What are the chances?

Chance (also known as probability) is simply how likely something is to happen.

Measuring chance

Probability - Formula, Calculating, Find, Theorems, Examples

Sampling from a data frame

# Sample use np.random.rand
sales_counts.sample() # 1st Attempt. Out: Brian 128
sales_counts.sample() # 2nd Attempt. Out: Claire 75

To ensure the same result when invoking sample function, we should set the random seed then it will generate the same random value each time

np.random.rand(10)
sales_count.sample() # 1st Attempt. Out: Brian 128

np.random.rand(10)
sales_counts.sample() # 2nd Attempt. Out: Brian 128

np.random.rand(10)
sales_counts.sample() # 3rd Attempt. Out: Brian 128
# Sampling with replacement
sales_counts.sample(5, replace = True)

Independent Event vs Dependent Event

Independent EventDependent Event
The probability the next event is not affected by the previous oneThe probability the next event is affected by the previous one
With ReplacementWithout Replacement

Calculating probabilities

# Count the deals for each product
counts = amir_deals['product'].value_counts()

# Calculate probability of picking a deal with each product
probs = counts / len(amir_deals['product'])
print(probs)

Sampling deals

# Set random seed
np.random.seed(24)

# Sample 5 deals without replacement
sample_without_replacement = amir_deals.sample(5, replace=False)
print(sample_without_replacement)

# Sample 5 deals with replacement
sample_with_replacement = amir_deals.sample(5, replace = True)
print(sample_with_replacement)

Discrete Distributions

A discrete distribution is a distribution of data in statistics that has discrete values. Discrete values are countable, finite, non-negative integers, such as 1, 10, 15, etc.

Probability Distributions

Probability distributions describe the probability of each possible outcome in a scenario.

Expected value: mean of probability distributions

We can visualize probability distributions using a bar plot, where each bar represents an outcome, and each bar's height represents the probability of its outcome.

Law of large number

As the size of your sample increase, the sample mean will approach the expected value.

Sample SizeMean
103.0
1003.40
10003.48

Creating a probability distribution

# 1. 
# Create a histogram of restaurant_groups and show plot
restaurant_groups['group_size'].hist(bins=[2,3,4,5,6])
plt.show()

# 2
# Create probability distribution
size_dist = restaurant_groups['group_size'] / len(restaurant_groups)

# Reset index and rename columns
size_dist = size_dist.reset_index()
size_dist.columns = ['group_size', 'prob']

print(size_dist)

# 3
# Expected value
expected_value = np.sum(size_dist['group_size'] * size_dist['prob'])
print(expected_value)

#4
# Subset groups of size 4 or more
groups_4_or_more = size_dist[size_dist['group_size'] >= 4]

# Sum the probabilities of groups_4_or_more
prob_4_or_more = np.sum(groups_4_or_more['prob'])
print(prob_4_or_more)

Continuous Distribution

Uniform distribution in Python

from scipy.stats import uniform
uniform.cdf(7, 0, 12) # P(wait_time <= 7)
1 - uniform.cdf(7, 0, 12) # P(wait_time >= 7)
uniform.cdf(7, 0, 12) - uniform.cdf(4, 0, 12) # P(4 <= wait_time <= 7)

Generating random numbers according to uniform distribution

from scipy.stats import uniform
uniform.rvs(0, 5, size=10)

Data backups

#1
# Min and max wait times for back-up that happens every 30 min
min_time = 0
max_time = 30

#2
# Calculate probability of waiting less than 5 mins
prob_less_than_5 = uniform.cdf(5, min_time, max_time)
print(prob_less_than_5)

#3
# Calculate probability of waiting more than 5 mins
prob_greater_than_5 = 1 - uniform.cdf(5, min_time, max_time)
print(prob_greater_than_5)

#4
# Calculate probability of waiting 10-20 mins
prob_between_10_and_20 = uniform.cdf(20, min_time, max_time) - uniform.cdf(10, min_time, max_time)
print(prob_between_10_and_20)

Simulating wait times

#1
# Set random seed to 334
np.random.seed(334)

#2 
# Import uniform
from scipy.stats import uniform

#3
# Generate 1000 wait times between 0 and 30 mins
wait_times = uniform.rvs(0, 30, size=1000)

print(wait_times)

#4
# Create a histogram of simulated times and show plot
plt.hist(wait_times)
plt.show()

Binominal Distribution

Describe the probability of the number of successes in a sequence of independent event trials.

Binary Outcome is an outcome of binary value which is 0 and 1.

Expected value = n * p

from scipy.stats import binom
binom.rvs(1, 0.5, size=1)
# binom.pmf(num heads, num trials, prob of heads)
binom.pmf(7, 10, 0.5) # P(heads=7)
binom.cdf(7, 10, 0.5) # P(heads <= 7)
1 - binom.cdf(7, 10, 0.5) # P(heads > 7)

Simulating sales deals

#1
# Import binom from scipy.stats
from scipy.stats import binom

# Set random seed to 10
np.random.seed(10)

#2
# Simulate a single deal
print(binom.rvs(1, 0.3, size=1))

#3
# Simulate 1 week of 3 deals
print(binom.rvs(3, 0.3, size=1))

#4
# Simulate 52 weeks of 3 deals
deals = binom.rvs(3, 0.3, size=52)

# Print mean deals won per week
print(np.mean(deals))

Calculating binominal probabilites

#1
# Probability of closing 3 out of 3 deals
prob_3 = binom.pmf(3, 3, 0.3)

print(prob_3)

#2
# Probability of closing <= 1 deal out of 3 deals
prob_less_than_or_equal_1 = binom.cdf(1, 3, 0.3)

print(prob_less_than_or_equal_1)

#3
# Probability of closing > 1 deal out of 3 deals
prob_greater_than_1 = 1 - binom.cdf(1, 3, 0.3)

print(prob_greater_than_1)

How many sales will be won?

# Expected value = n * p

# Expected number won with 30% win rate
won_30pct = 3 * 0.3
print(won_30pct)

# Expected number won with 25% win rate
won_25pct = 3 * 0.25
print(won_25pct)

# Expected number won with 35% win rate
won_35pct = 3 * 0.35
print(won_35pct)