## A Jolly Student’s Tea Party – a.k.

Last time we took a look at the chi-squared distribution which describes the behaviour of sums of squares of standard normally distributed random variables, having means of zero and standard deviations of one.
Tangentially related is Student's t-distribution which governs the deviation of means of sets of independent observations of a normally distributed random variable from its known true mean, which we shall examine in this post.

## Chi Chi Again – a.k.

Several years ago we saw that, under some relatively easily met assumptions, the averages of independent observations of a random variable tend toward the normal distribution. Derived from that is the chi-squared distribution which describes the behaviour of sums of squares of independent standard normal random variables, having means of zero and standard deviations of one.
In this post we shall see how it is related to the gamma distribution and implement its various functions in terms of those of the latter.

## Will They Blend? – a.k.

Last time we saw how we can create new random variables from sets of random variables with given probabilities of observation. To make an observation of such a random variable we randomly select one of its components, according to their probabilities, and make an observation of it. Furthermore, their associated probability density functions, or PDFs, cumulative distribution functions, or CDFs, and characteristic functions, or CFs, are simply sums of the component functions weighted by their probabilities of observation.
Now there is nothing about such distributions, known as mixture distributions, that requires that the components are univariate. Given that copulas are simply multivariate distributions with standard uniformly distributed marginals, being the distributions of each element considered independently of the others, we can use the same technique to create new copulas too.

## Mixing It Up – a.k.

Last year we took a look at basis function interpolation which fits a weighted sum of n independent functions, known as basis functions, through observations of an arbitrary function's values at a set of n points in order to approximate it at unobserved points. In particular, we saw that symmetric probability density functions, or PDFs, make reasonable basis functions for approximating both univariate and multivariate functions.
It is quite tempting, therefore, to use weighted sums of PDFs to construct new PDFs and in this post we shall see how we can use a simple probabilistic argument to do so.

## What’s The Lucky Number? – a.k.

Over the last few months we have been looking at Bernoulli processes which are sequences of Bernoulli trails, being observations of a Bernoulli distributed random variable with a success probability of p. We have seen that the number of failures before the first success follows the geometric distribution and the number of failures before the rth success follows the negative binomial distribution, which are the discrete analogues of the exponential and gamma distributions respectively.
This time we shall take a look at the binomial distribution which governs the number of successes out of n trials and is the discrete version of the Poisson distribution.

## Bad Luck Comes In Ks – a.k.

Lately we have been looking at Bernoulli processes which are sequences of independent experiments, known as Bernoulli trials, whose successes or failures are given by observations of a Bernoulli distributed random variable. Last time we saw that the number of failures before the first success was governed by the geometric distribution which is the discrete analogue of the exponential distribution and, like it, is a memoryless waiting time distribution in the sense that the distribution for the number of failures before the next success is identical no matter how many failures have already occurred whilst we've been waiting.
This time we shall take a look at the distribution of the number of failures before a given number of successes, which is a discrete version of the gamma distribution which defines the probabilities of how long we must wait for multiple exponentially distributed events to occur.

## If At First You Don’t Succeed – a.k.

Last time we took a first look at Bernoulli processes which are formed from a sequence of independent experiments, known as Bernoulli trials, each of which is governed by the Bernoulli distribution with a probability p of success. Since the outcome of one trial has no effect upon the next, such processes are memoryless meaning that the number of trials that we need to perform before getting a success is independent of how many we have already performed whilst waiting for one.
We have already seen that if waiting times for memoryless events with fixed average arrival rates are continuous then they must be exponentially distributed and in this post we shall be looking at the discrete analogue.

## One Thing Or Another – a.k.

Several years ago we took a look at memoryless processes in which the probability that we should wait for any given length of time for an event to occur is independent of how long we have already been waiting. We found that this implied that the waiting time must be exponentially distributed, that the waiting time for several events must be gamma distributed and that the number of events occuring in a unit of time must be Poisson distributed.
These govern continuous memoryless processes in which events can occur at any time but not those in which events can only occur at specified times, such as the roll of a die coming up six, known as Bernoulli processes. Observations of such processes are known as Bernoulli trials and their successes and failures are governed by the Bernoulli distribution, which we shall take a look at in this post.

## Slashing The Odds – a.k.

In the previous post we explored the Cauchy distribution, which, having undefined means and standard deviations, is an example of a pathological distribution. We saw that this is because it has a relatively high probability of generating extremely large values which we concluded was a consequence of its standard random variable being equal to the ratio of two independent standard normally distributed random variables, so that the magnitudes of observations of it can be significantly increased by the not particularly unlikely event that observations of the denominator are close to zero.
Whilst we didn't originally derive the Cauchy distribution in this way, there are others, known as ratio distributions, that are explicitly constructed in this manner and in this post we shall take a look at one of them.

## Moments Of Pathological Behaviour – a.k.

Last time we took a look at basis function interpolation with which we approximate functions from their values at given sets of arguments, known as nodes, using weighted sums of distinct functions, known as basis functions. We began by constructing approximations using polynomials before moving on to using bell shaped curves, such as the normal probability density function, centred at the nodes. The latter are particularly useful for approximating multi-dimensional functions, as we saw by using multivariate normal PDFs.
An easy way to create rotationally symmetric functions, known as radial basis functions, is to apply univariate functions that are symmetric about zero to the distance between the interpolation's argument and their associated nodes. PDFs are a rich source of such functions and, in fact, the second bell shaped curve that we considered is related to that of the Cauchy distribution, which has some rather interesting properties.

## Calculating statement execution likelihood

In the following code, how often will the variable `b` be incremented, compared to `a`?

If we assume that the variables `x` and `y` have values drawn from the same distribution, then the condition `(x < y)` will be true 50% of the time (ignoring the situation where both values are equal), i.e., `b` will be incremented half as often as `a`.

```a++;
if (x < y)
{
b++;
if (x < z)
{
c++;
}
}
```

If the value of `z` is drawn from the same distribution as `x` and `y`, how often will `c `be incremented compared to `a`?

The test `(x < y)` reduces the possible values that `x` can take, which means that in the comparison `(x < z)`, the value of `x` is no longer drawn from the same distribution as `z`.

Since we are assuming that `z` and `y` are drawn from the same distribution, there is a 50% chance that `(z < y)`.

If we assume that `(z < y)`, then the values of `x` and `z` are drawn from the same distribution, and in this case there is a 50% change that `(x < z)` is true.

Combining these two cases, we deduce that, given the statement `a++;` is executed, there is a 25% probability that the statement `c++;` is executed.

If the condition `(x < z)` is replaced by `(x > z)`, the expected probability remains unchanged.

If the values of `x`, `y`, and `z` are not drawn from the same distribution, things get complicated.

Let's assume that the probability of particular values of `x` and `y` occurring are and , respectively. The constants and are needed to ensure that both probabilities sum to one; the exponents and control the distribution of values. What is the probability that `(x < y)` is true?

Probability theory tells us that , where: is the probability distribution function for (in this case: ), and the the cumulative probability distribution for (in this case: ).

Doing the maths gives the probability of `(x < y)` being true as: .

The `(x < z)` case can be similarly derived, and combining everything is just a matter of getting the algebra right; it is left as an exercise to the reader :-)

## Archimedean Crew – a.k.

We have recently seen how we can define dependencies between random variables with Archimedean copulas which calculate the probability that they each fall below given values by applying a generator function φ to the results of their cumulative distribution functions, or CDFs, for those values, and applying its inverse to their sum.
Like all copulas they are effectively the CDFs of vector valued random variables whose elements are uniformly distributed when considered independently. Whilst those Archimedean CDFs were relatively trivial to implement, we found that their probability density functions, or PDFs, were somewhat more difficult and that the random variables themselves required some not at all obvious mathematical manipulation to get right.
Having done all the hard work implementing the `ak.archimedeanCopula`, `ak.archimedeanCopulaDensity` and `ak.archimedeanCopulaRnd` functions we shall now use them to implement some specific families of Archimedean copulas.

## Archimedean Review – a.k.

In the last couple of posts we've been taking a look at Archimedean copulas which define the dependency between the elements of vector values of a multivariate random variable by applying a generator function φ to the values of the cumulative distribution functions, or CDFs, of their distributions when considered independently, known as their marginal distributions, and applying the inverse of the generator to the sum of the results to yield the value of the multivariate CDF.
We have seen that the densities of Archimedean copulas are rather trickier to calculate and that making random observations of them is trickier still. Last time we found an algorithm for the latter, albeit with an implementation that had troubling performance and numerical stability issues, and in this post we shall add an improved version to the `ak` library that addresses those issues.

## Archimedean View – a.k.

Last time we took a look at how we could define copulas to represent the dependency between random variables by summing the results of a generator function φ applied to the results of their cumulative distribution functions, or CDFs, and then applying the inverse of that function φ-1 to that sum.
These are known as Archimedean copulas and are valid whenever φ is strictly decreasing over the interval [0,1], equal to zero when its argument equals one and have nth derivatives that are non-negative over that interval when n is even and non-positive when it is odd, for n up to the number of random variables.
Whilst such copulas are relatively easy to implement we saw that their densities are a rather trickier job, in contrast to Gaussian copulas where the reverse is true. In this post we shall see how to draw random vectors from Archimedean copulas which is also much more difficult than doing so from Gaussian copulas.

## Archimedean Skew – a.k.

About a year and a half ago we saw how we could use Gaussian copulas to define dependencies between the elements of a vector valued multivariate random variable whose elements, when considered in isolation, were governed by arbitrary cumulative distribution functions, known as marginals. Whilst Gaussian copulas are quite flexible, they can't represent every possible dependency between those elements and in this post we shall take a look at some others defined by the Archimedean family of copulas.

## A Measure Of Borel Weight – a.k.

In the last few posts we have implemented a type to represent Borel sets of the real numbers, which are the subsets of them that can be created with countable unions of intervals with closed or open lower and upper bounds. Whilst I would argue that doing so was a worthwhile exercise in its own right, you may be forgiven for wondering what Borel sets are actually for and so in this post I shall try to justify the effort that we have spent on them.

## A Borel Universe – a.k.

Last time we took a look at Borel sets of real numbers, which are subsets of the real numbers that can be represented as unions of countable sets of intervals Ii. We got as far as implementing the `ak.borelInterval` type to represent an interval as a pair of `ak.borelBound` objects holding its lower and upper bounds.
With these in place we're ready to implement a type to represent Borel sets and we shall do exactly that in this post.

## A Decent Borel Code – a.k.

A few posts ago we took a look at how we might implement various operations on sets represented as sorted arrays, such as the union, being the set of every element that is in either of two sets, and the intersection, being the set of every element that is in both of them, which we implemented with `ak.setUnion` and `ak.setIntersection` respectively.
Such arrays are necessarily both finite and discrete and so cannot represent continuous subsets of the real numbers such as intervals, which contain every real number within a given range. Of particular interest are unions of countable sets of intervals Ii, known as Borel sets, and so it's worth adding a type to the `ak` library to represent them.

## Copulating Normally – a.k.

Last year we took a look at multivariate uniformly distributed random variables, which generalise uniform random variables to multiple dimensions with random vectors whose elements are independently uniformly distributed. We have now seen how we can similarly generalise normally distributed random variables with the added property that the normally distributed elements of their vectors may be dependent upon each other; specifically that they may be correlated.
As it turns out, we can generalise this dependence to arbitrary sets of random variables with a fairly simple observation.

## Copulating Normally – a.k.

Last year we took a look at multivariate uniformly distributed random variables, which generalise uniform random variables to multiple dimensions with random vectors whose elements are independently uniformly distributed. We have now seen how we can similarly generalise normally distributed random variables with the added property that the normally distributed elements of their vectors may be dependent upon each other; specifically that they may be correlated.
As it turns out, we can generalise this dependence to arbitrary sets of random variables with a fairly simple observation.

## The Cumulative Distribution Unction – a.k.

We have previously seen how we can generalise normally distributed random variables to multiple dimensions by defining vectors with elements that are linear functions of independent standard normally distributed random variables, having means of zero and standard deviations of one, with

Z' = L × Z + μ

where L is a constant matrix, Z is a vector whose elements are the independent standard normally distributed random variables and μ is a constant vector.
So far we have derived and implemented the probability density function and the characteristic function of the multivariate normal distribution that governs such random vectors but have yet to do the same for its cumulative distribution function since it's a rather more difficult task and thus requires a dedicated treatment, which we shall have in this post.

## The Cumulative Distribution Unction – a.k.

We have previously seen how we can generalise normally distributed random variables to multiple dimensions by defining vectors with elements that are linear functions of independent standard normally distributed random variables, having means of zero and standard deviations of one, with

Z' = L × Z + μ

where L is a constant matrix, Z is a vector whose elements are the independent standard normally distributed random variables and μ is a constant vector.
So far we have derived and implemented the probability density function and the characteristic function of the multivariate normal distribution that governs such random vectors but have yet to do the same for its cumulative distribution function since it's a rather more difficult task and thus requires a dedicated treatment, which we shall have in this post.

## Multiple Multiply Normal Functions – a.k.

Last time we took a look at how we could define multivariate normally distributed random variables with linear functions of multiple independent standard univariate normal random variables.
Specifically, given a Z whose elements are independent standard univariate normal random variables, a constant vector μ and a constant matrix L

Z' = L × Z + μ

has linearly dependent normally distributed elements, a mean vector of μ and a covariance matrix of

Σ' = L × LT

where LT is the transpose of L in which the rows and columns are switched.
We got as far as deducing the characteristic function and the probability density function of the multivariate normal distribution, leaving its cumulative distribution function and its complement aside until we'd implemented both them and the random variable itself, which we shall do in this post.

## Multiple Multiply Normal Functions – a.k.

Last time we took a look at how we could define multivariate normally distributed random variables with linear functions of multiple independent standard univariate normal random variables.
Specifically, given a Z whose elements are independent standard univariate normal random variables, a constant vector μ and a constant matrix L

Z' = L × Z + μ

has linearly dependent normally distributed elements, a mean vector of μ and a covariance matrix of

Σ' = L × LT

where LT is the transpose of L in which the rows and columns are switched.
We got as far as deducing the characteristic function and the probability density function of the multivariate normal distribution, leaving its cumulative distribution function and its complement aside until we'd implemented both them and the random variable itself, which we shall do in this post.

## Every Which Way Is Normal – a.k.

A few months ago we saw how we could generalise the concept of a random variable to multiple dimensions by generating random vectors rather than numbers. Specifically we took a look at the multivariate uniform distribution which governs random vectors whose elements are independently uniformly distributed.
Whilst it demonstrated that we can find multivariate versions of distribution functions such as the probability density function, the cumulative distribution function and the characteristic function, the uniform distribution is fairly trivial and so, for a more interesting example, this time we shall look at generalising the normal distribution to multiple dimensions.

## Every Which Way Is Normal – a.k.

A few months ago we saw how we could generalise the concept of a random variable to multiple dimensions by generating random vectors rather than numbers. Specifically we took a look at the multivariate uniform distribution which governs random vectors whose elements are independently uniformly distributed.
Whilst it demonstrated that we can find multivariate versions of distribution functions such as the probability density function, the cumulative distribution function and the characteristic function, the uniform distribution is fairly trivial and so, for a more interesting example, this time we shall look at generalising the normal distribution to multiple dimensions.