Fitting a mixture of von Mises distributions
In this post we’ll go through the von-mises-mixture script, and see how to fit a mixture of von Mises distributions with Goal. I have developed the Goal libraries primarily to do type-safe statistical modelling, and this post will touch on many of the features of Goal that make it worth using.
import Goal.Core import Goal.Geometry import Goal.Probability import Goal.Graphical import qualified Goal.Core.Vector.Storable as S
are standard for a Goal script. The four unqualified imports pull in all the Goal libraries, and the qualified import of
Goal.Core.Vector.Storable, provides the tools for working with Vectors that have not been wrapped up as
Goal treats the task of fitting a model to data as that of finding a
Point on a
Manifold. A statistical manifold is a manifold where points are probability distributions. In this case we consider manifolds of Von Mises distributions, which are approximately like normal distributions on a unit circle. The von Mises manifold in Goal is indicated by the
VonMises type, and is typically parameterized by two coordinates: the mean and concentration (approximately the inverse variance). In Goal we use the so-called
Source coordinates to describe a model in its “Standard” coordinate system, e.g. for normal distributions this is the mean and variance. For von Mises distributions this is the mean and concentration, and we can create a von Mises distribution in source coordinates by writing
vm :: Source # VonMises = fromTuple (0,1)vm
vm is a von Mises distribution with mean 0 and concentration 1.
In this tutorial we’re not simply mixing von Mises distributions, but rather products of von Mises distributions, so that we can model distributions on a bounded plane. Such a product manifold is defined by the type
(VonMises,VonMises), and we create three such product von Mises distributions with the lines
vm3 :: Source # (VonMises,VonMises) vm1,vm2,= fromTuple (1, 1.5, 4.5, 2) vm1 = fromTuple (3, 2, 3.5, 3) vm2 = fromTuple (4.5, 4, 1.5, 2)vm3
To create a mixture distribution we also need to define weights for the component distributions, which in Goal is represented by the
Categorical distribution. We create a Categorical distribution over three categories with the following lines
wghts :: Source # Categorical 2 = fromTuple (0.33,0.33)wghts
where the probability of the first category is given by
1 - sum (listCoordinates wghts).
Finally, we may create a mixture distribution out of these previously defined distributions with the lines
strumxmdl :: Source # Mixture (VonMises,VonMises) 2 = joinSourceMixture (S.fromTuple (vm1,vm2,vm3)) wghtsstrumxmdl
In Goal, I attempt to define the
Source coordinates to match conventional definitions provided in textbooks and on Wikipedia. Typically, when creating example or initial distributions, it is most intuitive to do so in
Source coordinates. Neverthless, most algorithms and implementations in Goal are based on exponential family formulations of statistical models, and thus the workhorse coordinate systems of Goal are the
To convert our mixture model from
Source coordinates to
Natural coordinates, we simply use the
trumxmdl :: Natural # Mixture (VonMises,VonMises) 2 = transition strumxmdltrumxmdl
Having created a mixture distribution, our next task will be to generate some data from this “ground-truth” distribution, and see if we can recover the ground-truth distribution through observation with the power of statistics. To do so we’re going to define a few variables to initialize and train our model.
Firstly, we’re going to want to randomly generate an initial point in our statistical model that we will then optimize. We will choose this point by setting each of its coordinates randomly according to the normal distribution
mxmdlint :: Source # Normal = fromTuple (0,0.1)mxmdlint
Secondly, there are a handful of variables pertaining to training that we need to define, namely
nsmps :: Int = 100 nsmps eps :: Double = 0.05 eps bnd :: Double = 1e-5 bnd admmlt :: Int = 10 admmlt -- EM nepchs :: Int = 200nepchs
but I’ll explain them as we review the training procedures that we’ll be using.
Now, the standard framework for fitting a mixture model is expectation-maximization (EM), and Goal provides the
expectationMaximization function which works for a wide class of models. Nevertheless, it requires that the model in question satisfies certain analytic properties, which are not satisfied by von Mises mixtures. For such cases Goal provies the
expectationMaximizationAscent function, which approximates the maximization step of EM with gradient descent, and has looser restrictions on the model. We use this function to construct the EM iterator
vonMisesEM :: Sample (VonMises,VonMises) -- ^ Observations -> Natural # Mixture (VonMises,VonMises) 2 -> Natural # Mixture (VonMises,VonMises) 2 = cauchyLimit euclideanDistance bnd vonMisesEM zs nmxmdl $ expectationMaximizationAscent eps defaultAdamPursuit zs nmxmdl
bnd indicates the the threshold for stopping the gradient ascent, and
eps is the learning rate.
An alternative procedure for fitting latent variable models is to simply perform gradient ascent on the marginal log-likelihood of the mixture given data. This procedure fails when applied to Gaussian mixture models, but we will test if it works for von Mises mixtures.
To continue, let us generate
nsmps samples from our ground-truth mixture distribution
trumxmdl, initialize our model
mxmdl0, and collect our observations
main :: IO () = do main <- realize $ sample nsmps trumxmdl cxys <- realize $ initialize mxmdlint mxmdl0 let xys = fst <$> cxys
We we will then run our two procedures for
nepchs number of iterations
let emmxmdls = take nepchs $ iterate (vonMisesEM xys) mxmdl0 let admmxmdls = take nepchs . takeEvery admmlt $ vanillaGradientSequence (logLikelihoodDifferential xys) eps defaultAdamPursuit mxmdl0
In the case of ascending the
logLikelihoodDifferential, we multiply the number of steps taken by
admmlt, since each iteration of the EM algorithm requires a number of sub steps, and we want a fair comparison of these two algorithms.
let emnlls = logLikelihood xys <$> emmxmdls = logLikelihood xys <$> admmxmdls admnlls
We plot the results of our simulations in the above figure. On the left we have the model performance (log-likelihood) as function of number of training iterations, where an “iteration” is one step of EM, or ten steps of gradient pursuit. In contrast with normal distributions, mixtures of von Mises distributions can be trained effectively with gradient pursuit.
On the right side representations of the ground-truth mixture distribution, the 100 sampled data-points from the ground-truth distribution, and the models learned by EM and GP. Indeed, we see that they both learn more-or-less the same results. I’ll note though that the results aren’t always so similar — try running the von-mises-mixture script yourself to see.