Introduction to gradient pursuit on manifolds

In this post we’ll go through the gradient-descent script, and see how to find the maximum of a function with gradient descent in Goal. This post won’t demonstrate why Goal is worth using, but it will serve to introduce many of the basic types and functions which constitute the Goal libraries. Moreover, we’ll see that relatively straightforward things remain relatively straightforward in Goal.

To begin, let us start with the pragmas

{-# LANGUAGE DataKinds,TypeOperators #-}

and imports

import Goal.Core
import Goal.Geometry

import qualified Goal.Core.Vector.Boxed as B

at the beginning of gradient-descent.hs. This structure for the imports is common to nearly any script and module written with Goal. Firstly, Goal.Core and Goal.Geometry import the essential definitions of the goal-core and goal-geometry packages. In general, each package in the Goal libraries is fully imported with a single import statement, and is designed to be imported unqualified. The only exceptions to this are the vector modules in goal-core. In this case do a qualified import of the Goal.Core.Vector.Boxed module, which re-exports boxed, static-sized vectors (from the vector-sized package), with a few extra definitions for the numeric applications of Goal.

For the sake of efficiency, most gradients in Goal are calculated through a combination of hand-written derivatives, combinators, and class instances. Nevertheless when speed isn’t critical, Goal also provides a simple interface for automatic differentation, as provided by ad. It’s a little ad-hoc, but it does the trick.

So to begin, we want to descend the gradient of the function f

f :: Floating x => B.Vector 2 x -> x
f xs =
    let (x,y) = B.toPair xs
     in square x + square y + square (x-y)

which takes a two-dimensional vector as input. In Goal we evaluate functions on manifolds, but in this case since we don’t have any particular geometric structure in mind, we assume that our space of interest is flat and Euclidean. For such cases we have Euclidean manifolds in Cartesian coordinates. To implement gradient descent of f on a Euclidean manifold, we first create the initial point

p0 :: Cartesian # Euclidean 2
p0 = fromTuple (-4,2)

the type operator # is an infix synonym for a Point, in this case the point (-4,2) in Cartesian coordinates on the Euclidean plane.

Gradient descent is an iterative optimization algorithm, and we can specify the number of iterations directly, or with some condition. In this example we define the cauchify function, which takes all the elements of a given list until the step-size of an iteration is less than bnd.

bnd :: Double
bnd = 0.0001

cauchify :: [Cartesian # Euclidean 2] -> [Cartesian # Euclidean 2]
cauchify = cauchySequence euclideanDistance bnd

We then create a gradient pursuit path by appling the cauchify function to the gradientSequence function, which takes the differential of f, a step size eps, a gradient pursuit algorithm gp, and an initial point p0, and returns an infinite list of gradient pursuit steps.

eps :: Double
eps = -0.05

path :: GradientPursuit -> [Cartesian # Euclidean 2]
path gp = cauchify $ gradientSequence (differential f) eps gp p0

In the definition of path, we’ve already defined all the arguments to the gradientSequence function except for the desired GradientPursuit algorithm gp. In Goal I’ve implemented three forms of gradient pursuit: classic gradient pursuit, gradient pursuit with momentum, and finally the Adam optimizer.

mtm :: Double
mtm = 0.9

grds,mtms,adms :: [Cartesian # Euclidean 2]
grds = path Classic
mtms = path $ defaultMomentumPursuit mtm
adms = path defaultAdamPursuit
Simple Gradient Descent

This is the core of the gradient-descent script, with the rest being definitions for generating plotting data, and feeding the results to gnuplot. On the right we see paths generated by the three gradient pursuit algorithms. Interestingly, although the momentum algorithm is faster than classic gradient pursuit, the Adam optimizer appears quite a bit slower, in spite of being one of the most widely applied gradient optimization algorithms. Nevertheless, in most non-trivial applications Adam is much more efficient than either classic or momentum-based gradient descent, and I personally use it as my standard gradient pursuit algorithm.