In the last post we reviewed two simple methods to simulate samples from a pair of independent normal random variables: Box-Muller and Marsaglia-Bray. In this post we are going to compare the performance of our implementation of such methods.

Table of Contents

## Introduction

First let’s revisit our Python functions:

```
import math
import numpy as np
from numpy import pi
def box_muller_sample(n=1):
u1 = np.random.rand(n)
u2 = np.random.rand(n)
r = np.sqrt(-2*np.log(u1))
x = np.cos(2*pi*u2)
y = np.sin(2*pi*u2)
z1 = r*x
z2 = r*y
return z1, z2
def marsaglia_sample(n=1):
def marsaglia():
while True:
w1 = (np.random.rand() * 2) - 1
w2 = (np.random.rand() * 2) - 1
s = w1 * w1 + w2 * w2
if s < 1:
t = math.sqrt((-2) * math.log(s)/s)
z1 = w1 * t
z2 = w2 * t
return z1, z2
z1 = np.empty(n)
z2 = np.empty(n)
for i in range(n):
z1[i], z2[i] = marsaglia()
return z1, z2
```

To compare their performance we generate 1000 samples of size $n=100000$ and look at the average time for this task.

```
import time
t = 0
for _ in range(1000):
start = time.time()
box_muller_sample(100000)
end = time.time()
t = t + (end - start)
print('The Box-Muller Function simulates a sample of size n=100000 on an average of {:.6f} seconds'.format(t/1000))
```

The Box-Muller Function simulates a sample of size n=100000 on an average of 0.002097 seconds

```
t = 0
for _ in range(1000):
start = time.time()
marsaglia_sample(100000)
end = time.time()
t = t + (end - start)
print('The Box-Muller Function simulates a sample of size n=100000 on an average of {:.6f} seconds'.format(t/1000))
```

The Box-Muller Function simulates a sample of size n=100000 on an average of 0.148888 seconds

Note that Box-Muller functions is much faster than the Marsaglia’s one despite using trigonometric functions.

## First Improvement – Marsaglia 1.1

We can try to improve our Marsaglia function by using numpy vector functions where possible.

```
def marsaglia_sample_vectorial(n=1):
def marsaglia():
while True:
w1 = (random.rand() * 2) - 1
w2 = (random.rand() * 2) - 1
if w1 * w1 + w2 * w2 < 1:
return w1, w2
w1 = np.empty(n)
w2 = np.empty(n)
for i in range(n):
w1[i], w2[i] = marsaglia()
s = w1*w1 + w2*w2
t = np.sqrt(-2*np.divide(np.log(s), s))
z1 = w1*t
z2 = w2*t
return z1, z2
```

The Vectorial version of the Box-Muller Function simulates a sample of size n=100000 on an average of 0.120829 seconds

The new Marsaglia function is faster than the original one but still much slower compared to the Box-Muller’s one. This is partly due to the accept-reject process that takes place within the ** while loop** where we discard those samples which are outside the unit circle. Let’s analyse the accept/reject process that happens in the loop.

### Understanding the Accept-Reject Method

First, observe that the acceptance probability is

$$ p = \mathbb{P}(w_1^2 + w_2^2 < 1) = \frac{\pi}{4} \approx 0.7854$$

since $(w_1, w_2)$ are uniformly distributed on a square of area 4.

Thus to avoid the loop we need to find the minimum points $N\geq n$ that we need to simulate in order to guarantee –with high probability– that we will accept at least $n$ points. Formally, for a given number of simulations $N$ consider $S$ the number of accepted points, i.e.

$$S = \#\{ (w_1, w_2) : s = w_1^2 + w_2^2 \leq 1 \}.$$

Clearly, $S$ follows a Binomial distribution with parameters $(N,p)$ with $p=\pi/4$. So, the probability of accepting at least $n$ points is given by

$$ \mathbb{P}(S\geq n) = 1 – \mathbb{P}(S\leq n) = 1 – \sum_{k=0}^{n} \binom{N}{k}p^{k}(1-p)^{N-k}. $$

Using the Normal approximation we obtain

$$ \mathbb{P}(S\geq n) \approx \mathbb{P}(Z\geq n) = 1- \mathbb{P}(Z\leq n),$$

where $Z \sim \mathcal{N}(Np, \sqrt{Np(1-p)}$. We can write a Python function which calculates such probability for any given $n$ and $N$.

```
def calculate_probability(n, N):
p = pi/4
mu = N*p
sigma = np.sqrt(N*p*(1-p))
prob = 1 - norm.cdf(n, mu, sigma)
return prob
```

Let’s visualise this function for $n=100$ and varying the total number of simulations $N$ from 100 to 200:

Similarly, if $n=1000$ and $N$ varies from 1000 to 2000 we obtain:

These graphs show that the probability increases steadily and then more sharply until the point that reaches almost one.

### Finding the Right N

We know the probability of acceptance is $p = \frac{\pi}{4} \approx 0.78$, or equivalently the probability of rejection is $1-p \approx 0.22$. Hence, we could think of simulating a sample which is $22\%$ bigger than our target number of accepted observations. That is, for any given $n$ we simulate $N = \lceil n (1.22) \rceil$.

Once again, using the fact that $Z \sim \mathcal{N}(Np, \sqrt{Np(1-p)}$ we can write a function that calculates the probability of accepting at least $n$ samples using this choice of $N$.

```
def calculate_probability_122(n):
N = n*1.22
p = pi/4
mu = N*p
sigma = np.sqrt(N*p*(1-p))
prob = 1 - norm.cdf(n, mu, sigma)
return prob
```

Let’s visualise this probability as a function of the desired number of accepted samples $n$.

The plot shows that the probability of accepting enough points decreases rapidly and is close to zero for $n\geq 1000$. So the approach of simulating $N = \lceil 1.22n\rceil$ is not a good idea after all.

In order to find the right number of simulations $N$ we are going to explode the properties of the normal distribution. In particular, observe that

$$\mathbb{P}(Z\geq \mu – 6\sigma ) \approx 0.999999999$$

where $\mu=Np$ and $\sigma = \sqrt{Np(1-p)}$. Then, we can set the equation

$$ Np – 6\sqrt{Np(1-p)} = n, $$

which can be solved for $N$ as

$$N= \left(\frac{6\sqrt{p(1-p)} + \sqrt{36p(1-p) + 4pn}}{2p}\right)^2.$$

This means that for any given $n$, we can find the exact number of simulations $N$ for which the probability of accepting at least $n$ samples is high (0.999999999)! A natural question is how much bigger is this $N$ with respect to $n$? The answer is given by the limit

$$\lim_{n\rightarrow \infty} \frac{N}{n} – 1 = \lim_{n\rightarrow \infty} \dfrac{1}{n}\left(\frac{6\sqrt{p(1-p)} + \sqrt{36p(1-p) + 4pn}}{2p}\right)^2 -1 = \frac{4}{\pi} -1\approx 0.2732$$

As usual, a plot helps us to illustrate this convergence

Note that the graph agrees with our previous point about the fact that simulating an extra 22% was not enough make sure that we had enough accepted samples with a high probability.

## Second Improvement – Marsaglia 1.2

Using our previous finding we can introduce a new version of our functions without the while loop.

```
p = pi/4
aux = p*(1-p)
def marsaglia_optimal_sample(n=1):
x = (3*math.sqrt(aux) + math.sqrt(9*aux + p*n))/p
N = math.ceil(x*x)
w1 = random.rand(N) * 2 - 1
w2 = random.rand(N) * 2 - 1
s = w1 * w1 + w2 * w2
index = s<1
w1 = w1[index][:n]
w2 = w2[index][:n]
s = s[index][:n]
t = np.sqrt(-2*np.divide(np.log(s), s))
z1 = w1*t
z2 = w2*t
return z1, z2
```

Let’s test its performance:

```
t = 0
for _ in range(1000):
start = time.time()
marsaglia_optimal_sample(100000)
end = time.time()
t = t + (end - start)
print('The new version of the Box-Muller Function simulates a sample of size n=100000 on an average of {:.6f} seconds'.format(t/1000))
```

The new version of the Box-Muller Function simulates a sample of size n=100000 on an average of 0.004235 seconds

Although this function is still slower compared with the Box-Muller it is certainly much closer to it … and we had some fun doing some elementary maths!