In the last few posts we've been looking at the BFGS quasi-Newton algorithm for minimising multivariate functions. This uses iteratively updated approximations of the Hessian matrix of second partial derivatives in order to choose directions in which to search for univariate minima, saving the expense of calculating it explicitly. A particularly useful property of the algorithm is that if the line search satisfies the Wolfe conditions then the positive definiteness of the Hessian is preserved, meaning that the implied locally quadratic approximation of the function must have a minimum.
Unfortunately for large numbers of dimension the calculation of the approximation will still be relatively expensive and will require a significant amount of memory to store and so in this post we shall take a look at an algorithm that only uses the vector of first partial derivatives.
Last month we took a look at quasi-Newton multivariate function minimisation algorithms which use approximations of the Hessian matrix of second partial derivatives to choose line search directions. We demonstrated that the BFGS rule for updating the Hessian after each line search maintains its positive definiteness if they conform to the Wolfe conditions, ensuring that the locally quadratic approximation of the function defined by its value, the vector of first partial derivatives and the Hessian has a minimum.
Now that we've got the theoretical details out of the way it's time to get on with the implementation.
In the previous post we saw how we could perform a univariate line search for a point that satisfies the Wolfe conditions meaning that it is reasonably close to a minimum and takes a lot less work to find than the minimum itself. Line searches are used in a class of multivariate minimisation algorithms which iteratively choose directions in which to proceed, in particular those that use approximations of the Hessian matrix of second partial derivatives of a function to do so, similarly to how the Levenberg-Marquardt multivariate inversion algorithm uses a diagonal matrix in place of the sum of the products of its Hessian matrices for each element and the error in that element's current value, and in this post we shall take a look at one of them.
Last time we saw how we could efficiently invert a vector valued multivariate function with the Levenberg-Marquardt algorithm which replaces the sum of its second derivatives with respect to each element in its result multiplied by the difference from those of its target value with a diagonal matrix. Similarly there are minimisation algorithms that use approximations of the Hessian matrix of second partial derivatives to estimate directions in which the value of the function will decrease.
Before we take a look at them, however, we'll need a way to step toward minima in such directions, known as a line search, and in this post we shall see how we might reasonably do so.
Last time we took a look at linear regression which finds the linear function that minimises the differences between its results and values at a set of points that are presumed, possibly after applying some specified transformation, to be random deviations from a straight line or, in multiple dimensions, a flat plane. The purpose was to reveal the underlying relationship between the independent variable represented by the points and the dependent variable represented by the values at them.
This time we shall see how we can approximate the function that defines the relationship between them without actually revealing what it is.
Several months ago we saw how we could use basis functions to interpolate between points upon arbitrary curves or surfaces to approximate the values between them. Related to that is linear regression which fits a straight line, or a flat plane, though points that have values that are assumed to be the results of a linear function with independent random errors, having means of zero and equal standard deviations, in order to reveal the underlying relationship between them. Specifically, we want to find the linear function that minimises the differences between its results and the values at those points.
A few years ago we saw how we could approximate a function f between pairs of points (xi, f(xi)) and (xi+1, f(xi+1)) by linear and cubic spline interpolation which connect them with straight lines and cubic polynomials respectively, the latter of which yield smooth curves at the cost of somewhat arbitrary choices about their exact shapes.
An alternative approach is to construct a single function that passes through all of the points and, given that nth order polynomials are uniquely defined by n+1 values at distinct xi, it's tempting to use them.
We have spent a few months looking at how we might interpolate between sets of points (xi, yi), where the xi are known as nodes and the yi as values, to approximate values of y for values of x between the nodes, either by connecting them with straight lines or with cubic curves.
Last time, in preparation for interpolating between multidimensional vector nodes, we implemented the
ak.grid type to store ticks on a set of axes and map their intersections to
ak.vector objects to represent such nodes arranged at the corners of hyperdimensional rectangular cuboids.
With this in place we're ready to take a look at one of the simplest multidimensional interpolation schemes; multilinear interpolation.
We have seen how it is possible to smoothly interpolate between a set of points (xi, yi), with the xi known as nodes and the yi as values, by specifying the gradients gi at the nodes and calculating values between adjacent pairs using the uniquely defined cubic polynomials that match the values and gradients at them.
We have also seen how extrapolating such polynomials beyond the first and last nodes can yield less than satisfactory results, which we fixed by specifying the first and last gradients and then adding new first and last nodes to ensure that the first and last polynomials would represent straight lines.
Now we shall see how cubic spline interpolation can break down rather more dramatically and how we might fix it.
Last time we took a look at how we can use linear interpolation to approximate a function from a set of points on its graph by connecting them with straight lines. As a consequence the result isn't smooth, meaning that its derivative isn't continuous and is undefined at the x values of the points, known as the nodes of the interpolation.
In this post we shall see how we can define a smooth interpolation by connecting the points with curves rather than straight lines.
Given a set of points (xi,yi), a common problem in numerical analysis is trying to estimate values of y for values of x that aren't in the set. The simplest scheme is linear interpolation, which connects points with consecutive values of x with straight lines and then uses them to calculate values of y for values of x that lie between those of their endpoints.
On the face of it implementing this would seem to be a pretty trivial business, but doing so both accurately and efficiently is a surprisingly tricky affair, as we shall see in this post.