Getting Started

This is a getting started tutorial of MXNet. We will train a multi-layer perceptron (MLP) on the MNIST handwritten digit dataset to get the basic idea of how to use MXNet.

Train MLP on MNIST

On MNIST, each example consists of a 28 x 28 gray image of a handwritten digit such as and its label, which is an integer between 0 and 9. Denote by x and y the 784-length vector of the image pixels and the label, respectively.

MLP with no hidden layer predicts the label probabilities by

\[\textrm{softmax}(W x + b)\]

here W is a 784-by-10 weight matrix, b is a 10-length bias vector. The softmax normalizes a vector into a probability distribution, namely

\[\textrm{softmax}(x) = \left[ \ldots, \frac{\exp(x_i)}{\sum_j \exp(x_j)}\ldots \right]\]

We can stack the layers one by one to get a MLP with multiple hidden layers. Let x0 = x be the output of layer 0. Then layer i outputs a ni-length vector

\[x_i = \sigma_i (W_i x_{i-1} + b_i)\]

where σi is the activation function such as tanh, and Wi is of size ni-by-ni-1. Next we apply softmax to the last layer to obtain the prediction.

The goal of training is to obtain both weights and bias for each layer to minimize the difference between the predicted label and the real label on the training data.

In the following section we will show how to implement the training program using different languages in MXNet.

Python

We first import MXNet

import mxnet as mx

Then we declare the data iterators to the training and validation datasets

train = mx.io.MNISTIter(
    image      = "mnist/train-images-idx3-ubyte",
    label      = "mnist/train-labels-idx1-ubyte",
    batch_size = 128,
    data_shape = (784, ))
val   = mx.io.MNISTIter(...)

and declare a two-layer MLP

data = mx.symbol.Variable('data')
fc1  = mx.symbol.FullyConnected(data = data, num_hidden=128)
act1 = mx.symbol.Activation(data = fc1, act_type="relu")
fc2  = mx.symbol.FullyConnected(data = act1, num_hidden = 64)
act2 = mx.symbol.Activation(data = fc2, act_type="relu")
fc3  = mx.symbol.FullyConnected(data = act2, num_hidden=10)
mlp  = mx.symbol.SoftmaxOutput(data = fc3)

Next we train a model on the data

model = mx.model.FeedForward(
    symbol = mlp,
    num_epoch = 20,
    learning_rate = .1)
model.fit(X = train, eval_data = val)

Finally we can predict by

test = mx.io.MNISTIter(...)
model.predict(X = test)

R

Scala

We first import MXNet

import ml.dmlc.mxnet._

Then we declare the data iterators to the training and validation datasets

val trainDataIter = IO.MNISTIter(Map(
  "image" -> "data/train-images-idx3-ubyte",
  "label" -> "data/train-labels-idx1-ubyte",
  "data_shape" -> "(1, 28, 28)",
  "label_name" -> "sm_label",
  "batch_size" -> batchSize.toString,
  "shuffle" -> "1",
  "flat" -> "0",
  "silent" -> "0",
  "seed" -> "10"))

val valDataIter = IO.MNISTIter(Map(
  "image" -> "data/t10k-images-idx3-ubyte",
  "label" -> "data/t10k-labels-idx1-ubyte",
  "data_shape" -> "(1, 28, 28)",
  "label_name" -> "sm_label",
  "batch_size" -> batchSize.toString,
  "shuffle" -> "1",
  "flat" -> "0", "silent" -> "0"))

and declare a two-layer MLP

val data = Symbol.Variable("data")
val fc1 = Symbol.FullyConnected(name = "fc1")(Map("data" -> data, "num_hidden" -> 128))
val act1 = Symbol.Activation(name = "relu1")(Map("data" -> fc1, "act_type" -> "relu"))
val fc2 = Symbol.FullyConnected(name = "fc2")(Map("data" -> act1, "num_hidden" -> 64))
val act2 = Symbol.Activation(name = "relu2")(Map("data" -> fc2, "act_type" -> "relu"))
val fc3 = Symbol.FullyConnected(name = "fc3")(Map("data" -> act2, "num_hidden" -> 10))
val mlp = Symbol.SoftmaxOutput(name = "sm")(Map("data" -> fc3))

Next we train a model on the data

import ml.dmlc.mxnet.optimizer.SGD
// setup model and fit the training set
val model = FeedForward.newBuilder(mlp)
      .setContext(Context.cpu())
      .setNumEpoch(10)
      .setOptimizer(new SGD(learningRate = 0.1f, momentum = 0.9f, wd = 0.0001f))
      .setTrainData(trainDataIter)
      .setEvalData(valDataIter)
      .build()

Finally we can predict by

val probArrays = model.predict(valDataIter)
// in this case, we do not have multiple outputs
require(probArrays.length == 1)
val prob = probArrays(0)
// get predicted labels
val py = NDArray.argmaxChannel(prob)
// deal with predicted labels 'py'

Julia

We first import MXNet

using MXNet

Then load data

batch_size = 100
include("mnist-data.jl")
train_provider, eval_provider = get_mnist_providers(batch_size)

and define the MLP

mlp = @mx.chain mx.Variable(:data)  =>
  mx.FullyConnected(num_hidden=128) =>
  mx.Activation(act_type=:relu)     =>
  mx.FullyConnected(num_hidden=64)  =>
  mx.Activation(act_type=:relu)     =>
  mx.FullyConnected(num_hidden=10)  =>
  mx.SoftmaxOutput()

The model can be trained by

model = mx.FeedForward(mlp, context=mx.cpu())
optimizer = mx.SGD(lr=0.1, momentum=0.9, weight_decay=0.00001)
mx.fit(model, optimizer, train_provider, n_epoch=20, eval_data=eval_provider)

and finally predict by

probs = mx.predict(model, test_provider)

Tensor Computation

Next we briefly introduce the tensor computation interface, which is often more flexiable to use than the previous symbolic interface. It is often used to implement the layers, define weight updating rules, and debug.

Python

The python inferface is similar to numpy.NDArray.

>>> import mxnet as mx
>>> a = mx.nd.ones((2, 3),
... mx.gpu())
>>> print (a * 2).asnumpy()
[[ 2.  2.  2.]
 [ 2.  2.  2.]]

R

Scala

You can do tensor/matrix computation in pure Scala.

scala> import ml.dmlc.mxnet._
import ml.dmlc.mxnet._

scala> val arr = NDArray.ones(2, 3)
arr: ml.dmlc.mxnet.NDArray = ml.dmlc.mxnet.NDArray@f5e74790

scala> arr.shape
res0: ml.dmlc.mxnet.Shape = (2,3)

scala> (arr * 2).toArray
res2: Array[Float] = Array(2.0, 2.0, 2.0, 2.0, 2.0, 2.0)

scala> (arr * 2).shape
res3: ml.dmlc.mxnet.Shape = (2,3)

Julia