Part 13: Simulation

Kerry Back, Rice University

Overview

  • Numpy arrays and operations
  • Random number generation
  • Rolling dice and flipping coins
  • Monte Carlo simulation
  • Business applications

Creating Numpy Arrays

import numpy as np

# create a 1-dimensional array
arr_1d = np.array([1.1, 1.4, 2.6])
print('1-Dimensional Array:', arr_1d)

# create a 2-dimensional array
arr_2d = np.array([
    [1.1, 1.4, 2.6],
    [-3.2, 5.603, 4]
])
print('\n2-Dimensional Array:')
print(arr_2d)
1-Dimensional Array: [1.1 1.4 2.6]

2-Dimensional Array:
[[ 1.1    1.4    2.6  ]
 [-3.2    5.603  4.   ]]

Practice: Creating Arrays

Exercise 1 (with Gemini): Ask Gemini to “create a 2D numpy array with 3 rows and 4 columns”

Exercise 2 (on your own): Type import numpy as np then arr = np.array([1, 2, 3]) then print(arr) and run it.

Slicing Arrays

# slice a 1-dimensional array (like slicing a list)
print('Extract a single element:', arr_1d[1])
print('Extract a range:', arr_1d[:2])

# slice a 2-dimensional array - index is [row, column]
print('\nExtract element from 2d array:', arr_2d[0, 1])
print('Extract range from 2d array:')
print(arr_2d[:, :2])
Extract a single element: 1.4
Extract a range: [1.1 1.4]

Extract element from 2d array: 1.4
Extract range from 2d array:
[[ 1.1    1.4  ]
 [-3.2    5.603]]

Practice: Slicing Arrays

Exercise 1 (with Gemini): Ask Gemini to “extract the first row from a 2D numpy array”

Exercise 2 (on your own): Type arr = np.array([[1, 2], [3, 4]]) then print(arr[0, 1]) and run it.

Array Operations

# array addition (sum element by element)
arr_1d_second = np.array([7, 4.3, 1.1])
print('Sum of arrays:', arr_1d + arr_1d_second)

# array multiplication (multiply element by element)
print('Product of arrays:', arr_1d * arr_1d_second)

# broadcasting: multiply by scalar
print('\nBroadcast (multiply by 2):')
print(2 * arr_2d)
Sum of arrays: [8.1 5.7 3.7]
Product of arrays: [7.7  6.02 2.86]

Broadcast (multiply by 2):
[[ 2.2    2.8    5.2  ]
 [-6.4   11.206  8.   ]]

Practice: Array Operations

Exercise 1 (with Gemini): Ask Gemini to “create two numpy arrays and add them together element-wise”

Exercise 2 (on your own): Type import numpy as np then arr = np.array([1, 2, 3]) then print(arr * 2) and run it.

Array Methods

# sum method
print('Array:', arr_1d)
print('Sum:', arr_1d.sum())

# standard deviation method
print('\nStandard deviation:', arr_1d.std())

# other useful methods: mean(), min(), max()
print('Mean:', arr_1d.mean())
Array: [1.1 1.4 2.6]
Sum: 5.1

Standard deviation: 0.6480740698407861
Mean: 1.7

Practice: Array Methods

Exercise 1 (with Gemini): Ask Gemini to “calculate the sum and mean of a numpy array”

Exercise 2 (on your own): Type arr = np.array([10, 20, 30]) then print(arr.max()) and run it.

Random Number Generation

import pandas as pd
import matplotlib.pyplot as plt

# Set random seed for reproducible results
np.random.seed(42)

# Roll a six-sided die 1000 times
num_rolls = 1000
dice_rolls = np.random.randint(1, 7, num_rolls)

print(f"First 10 rolls: {dice_rolls[:10]}")
print(f"Average: {dice_rolls.mean():.2f} (expected: 3.5)")
First 10 rolls: [4 5 3 5 5 2 3 3 3 5]
Average: 3.46 (expected: 3.5)

Practice: Random Dice Rolls

Exercise 1 (with Gemini): Ask Gemini to “simulate rolling two dice 100 times and calculate the average sum”

Exercise 2 (on your own): Type np.random.randint(1, 7, 10) and run it to simulate 10 dice rolls.

Visualizing Dice Rolls

plt.figure(figsize=(10, 4))

plt.subplot(1, 2, 1)
plt.hist(dice_rolls, bins=range(1, 8), alpha=0.7,
         color='skyblue', edgecolor='black')
plt.title('Distribution of Dice Rolls')
plt.xlabel('Dice Value')
plt.ylabel('Frequency')
plt.xticks(range(1, 7))

plt.subplot(1, 2, 2)
running_average = np.cumsum(dice_rolls) / np.arange(1, num_rolls + 1)
plt.plot(running_average, color='red', linewidth=2)
plt.axhline(y=3.5, color='blue', linestyle='--',
            label='Expected (3.5)')
plt.title('Running Average')
plt.xlabel('Number of Rolls')
plt.ylabel('Average Value')
plt.legend()
plt.tight_layout()

Normal Distributions

# Monte Carlo simulation: monthly sales forecast

base_monthly_sales = 25000
monthly_variation = 0.20  # 20% variation

# Seasonal factors for each month
seasonal_factors = [
    0.9, 0.9, 1.0, 1.0, 1.1, 1.1,  # Jan-Jun
    1.0, 1.0, 1.0, 1.1, 1.2, 1.3   # Jul-Dec
]

def simulate_one_year():
    monthly_sales = []
    for month in range(12):
        expected_sales = base_monthly_sales * seasonal_factors[month]
        actual_sales = np.random.normal(expected_sales,
                                       expected_sales * monthly_variation)
        actual_sales = max(actual_sales, 0)
        monthly_sales.append(actual_sales)
    return monthly_sales

Monte Carlo Simulation

# Simulate 1,000 possible years
num_simulations = 1000
all_annual_totals = []

for sim in range(num_simulations):
    year_sales = simulate_one_year()
    all_annual_totals.append(sum(year_sales))

all_annual_totals = np.array(all_annual_totals)

print(f"Average annual sales: ${all_annual_totals.mean():,.0f}")
print(f"Minimum: ${all_annual_totals.min():,.0f}")
print(f"Maximum: ${all_annual_totals.max():,.0f}")
print(f"10th percentile: ${np.percentile(all_annual_totals, 10):,.0f}")
print(f"90th percentile: ${np.percentile(all_annual_totals, 90):,.0f}")
Average annual sales: $314,614
Minimum: $256,747
Maximum: $373,673
10th percentile: $292,487
90th percentile: $336,429

Practice: Monte Carlo Simulation

Exercise 1 (with Gemini): Ask Gemini to “run a Monte Carlo simulation to estimate the probability of getting heads at least 6 times in 10 coin flips”

Exercise 2 (on your own): Type results = [np.random.randint(0, 2, 10).sum() for _ in range(1000)] then print(np.mean(results)) and run it.

Visualizing Results

plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.hist(all_annual_totals, bins=50, alpha=0.7,
         color='lightblue', edgecolor='black')
plt.axvline(all_annual_totals.mean(), color='red',
            linestyle='--', label=f'Mean: ${all_annual_totals.mean():,.0f}')
plt.title('Distribution of Annual Sales')
plt.xlabel('Annual Sales ($)')
plt.ylabel('Frequency')
plt.legend()

plt.subplot(1, 2, 2)
month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
               'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
example_year = simulate_one_year()
plt.bar(month_names, example_year, color='lightgreen', alpha=0.7)
plt.title('One Example Year')
plt.xlabel('Month')
plt.ylabel('Sales ($)')
plt.xticks(rotation=45)
plt.tight_layout()

Probability Analysis

# Business question: What's the probability of meeting target?
target_sales = 300000
prob_meeting_target = np.mean(all_annual_totals >= target_sales) * 100

print(f"Probability of meeting ${target_sales:,} target: {prob_meeting_target:.1f}%")

# Show different targets
for target in [280000, 300000, 320000]:
    prob = np.mean(all_annual_totals >= target) * 100
    print(f"Target ${target:,}: {prob:.1f}% probability")
Probability of meeting $300,000 target: 79.8%
Target $280,000: 97.2% probability
Target $300,000: 79.8% probability
Target $320,000: 37.0% probability

Summary

  • Numpy arrays: Efficient numerical operations
  • Random numbers: np.random.randint(), np.random.normal()
  • Monte Carlo: Simulate many scenarios to understand distributions
  • Applications: Risk analysis, forecasting, decision making
  • Key insight: Run thousands of simulations to estimate probabilities