In this post we’ll explore how to couple Tensorflow and CasADi. Thanks to Jonas Koch (student @ Applied Mathematics WWU Muenster) for delivering inspiration and example code.

# One-dimensional regression with GPflow

An important part of machine learning is about regression: fitting a (non-)linear model through sparse data. This is an unconstrained optimization problem for which dedicated algorithms and software are readily available.

Let’s create some datapoints to fit, a perturbed sine.

data = np.linspace(0,10,N).reshape((N,1))
value = np.sin(data)+np.random.normal(0,0.1,(N,1))


We make use of GPflow, software which is built on top of Tensorflow, to perform a Gaussian process regression:

model = gpflow.models.GPR(data, value, gpflow.kernels.Constant(1) + gpflow.kernels.Linear(1) + gpflow.kernels.White(1) + gpflow.kernels.RBF(1))
gpflow.train.ScipyOptimizer().minimize(model)


The trained model has a predict_y method to deliver us the mean and variance of the fit at any location(s):

xf = np.linspace(0,10,10*N).reshape((-1,1))
[mean,variance] = model.predict_y(xf)


# Embedding the regression model in a CasADi graph

The mechanism for embedding foreign code in CasADi is through the use of a Callback:

class GPR(Callback):
def __init__(self, name,  opts={}):
Callback.__init__(self)
self.construct(name, opts)

def eval(self, arg):
[mean,_] = model.predict_y(np.array(arg[0]))
return [mean]


After instantiating the Callback, we end up with a plain-old CasADi function object which we can evaluate numerically or symbolically. One caveat, the user is responsible of keeping at least one reference to this Function.

gpr = GPR('GPR', {"enable_fd":True})


As an example of embedding, we can ask Ipopt to compute the minimum of this function. Finite differences will be used to differentiate the Callback:

x = MX.sym("x")
solver = nlpsol("solver","ipopt",{"x":x,"f":gpr(x)})
res = solver(x0=5)


# Optimal control example (slow)

The GPR model from the previous paragraph was a mapping from $R$ to $R$. To make things more interesting, we will increase the dimension to a mapping from $R^\textrm{nd}$ to $R$.

I didn’t have much inspiration to come up with high-dimensional data to fit, so these random arrays will have to do. Of course, it’s rather silly to fit a model through purely random data..

data = np.random.normal(loc=0.5,scale=1,size=(N,nd))
value = np.random.random((N,1))
model = gpflow.models.GPR(data, value, gpflow.kernels.Constant(nd) + gpflow.kernels.Linear(nd) + gpflow.kernels.White(nd) + gpflow.kernels.RBF(nd))


Our callback will need to override the default-scalar sparsity for its input:

class GPR(Callback):
def __init__(self, name,  opts={}):
...

def get_sparsity_in(self,i):
return Sparsity.dense(nd,1)


For the remainder, we will just use the code of CasADi’s direct_multiple_shooting example, but the solver construction is modified as follows. The objective is replace by our nd-dimensional model evaluated on the concatenation of all x-states over the optimal control horizon. We use a limited-memory Hessian approximation here to avoid the cost of obtaining second-order sensitivities.

w = vertcat(*w)

prob = {'f': gpr(w[0::3]), 'x': w , 'g': vertcat(*g)}
options = {"ipopt": {"hessian_approximation": "limited-memory"}}
solver = nlpsol('solver', 'ipopt', prob,options);


The problem takes quite some time to solve. If you look at the timings printout, you’ll see that in particular the cost of computing the gradient of the objective (nlp_grad_f) is excessive:

               t_proc [s]   t_wall [s]    n_eval
nlp_f        0.291        0.261        31
nlp_g       0.0079      0.00789        31
nlp_jac_g       0.0496       0.0502        32
solver         26.2         23.6         1


If you insert a print statement to inspect the arguments at our Callback’s eval method, you’ll see the finite differences happening live in your terminal: each of the nd inputs to gpr is individually perturbed, making the cost to compute the gradient of the objective nd times more expensive than the cost of just the objective.

The solution requires 2783 calls to our Callback, and the Callback ran for a total of 21.7 seconds.

# Optimal control example (fast)

We’ll speed up our code on two fronts:

First, we notice that predict_y incurs quite some overhead. Instead of calling it repeatedly, let’s work with the underlying Tensorflow graphs:

X = tf.placeholder(shape=(1,nd),dtype=np.float64)
[mean,_] = model._build_predict(X)


Running this mean graph in a tensorflow session has much less overhead:

import tensorflow as tf
with tf.Session() as session:
session.run(mean, feed_dict(X: np.array(...)))


Second, we notice that tensorflow supports algorithmic differentation. If we add a reverse mode implementation to our Callback, we can get the gradient for (a small multiple of) the cost of the objective. Note that (vanilla) Tensorflow offers only the reverse mode of AD, not the forward mode. This is logical, since it focusses on unconstrained optimization.

It’s quite easy to create a general purpose TensorFlowEvaluator Callback (tensorflow_casadi.py for full code):

class TensorFlowEvaluator(casadi.Callback):
def __init__(self,t_in,t_out,session, opts={}):
....

def eval(self,arg):
# Associate each tensorflow input with the numerical argument passed by CasADi
d = dict((v,arg[i].toarray()) for i,v in enumerate(self.t_in))
# Evaluate the tensorflow expressions
ret = self.session.run(self.t_out,feed_dict=d)
return ret

# Construct tensorflow placeholders for the reverse seeds
adj_seed = [tf.placeholder(shape=self.sparsity_out(i).shape,dtype=tf.float64) for i in range(self.n_out())]
# Construct the reverse tensorflow graph through 'gradients'
# Create another TensorFlowEvaluator object