# Getting Started¶

This is a getting started tutorial of MXNet. We will train a multi-layer perceptron (MLP) on the MNIST handwritten digit dataset to get the basic idea of how to use MXNet.

## Links to Other Resources¶

Here are some other resources that can also be helpful

- See
*Installation Guide*on how to install mxnet. - See
*How to pages*on various tips on using mxnet. - See
*Tutorials*on tutorials on specific tasks.

## Train MLP on MNIST¶

On MNIST, each example consists of a 28 x 28 gray image of a handwritten digit
such as
and its label, which is an integer between 0 and 9. Denote by *x* and *y* the
784-length vector of the image pixels and the label, respectively.

MLP with no hidden layer predicts the label probabilities by

here *W* is a 784-by-10 weight matrix, *b* is a 10-length bias vector. The
*softmax* normalizes a vector into a probability distribution, namely

We can stack the layers one by one to get a MLP with multiple hidden layers. Let
*x _{0} = x* be the output of layer 0. Then layer

*i*outputs a

*n*-length vector

_{i}where *σ _{i}* is the activation function such as

*tanh*, and

*W*is of size

_{i}*n*-by-

_{i}*n*. Next we apply

_{i-1}*softmax*to the last layer to obtain the prediction.

The goal of training is to obtain both weights and bias for each layer to minimize the difference between the predicted label and the real label on the training data.

In the following section we will show how to implement the training program using different languages in MXNet.

### Python¶

We first import MXNet

```
import mxnet as mx
```

Then we declare the data iterators to the training and validation datasets

```
train = mx.io.MNISTIter(
image = "mnist/train-images-idx3-ubyte",
label = "mnist/train-labels-idx1-ubyte",
batch_size = 128,
data_shape = (784, ))
val = mx.io.MNISTIter(...)
```

and declare a two-layer MLP

```
data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data = data, num_hidden=128)
act1 = mx.symbol.Activation(data = fc1, act_type="relu")
fc2 = mx.symbol.FullyConnected(data = act1, num_hidden = 64)
act2 = mx.symbol.Activation(data = fc2, act_type="relu")
fc3 = mx.symbol.FullyConnected(data = act2, num_hidden=10)
mlp = mx.symbol.SoftmaxOutput(data = fc3)
```

Next we train a model on the data

```
model = mx.model.FeedForward(
symbol = mlp,
num_epoch = 20,
learning_rate = .1)
model.fit(X = train, eval_data = val)
```

Finally we can predict by

```
test = mx.io.MNISTIter(...)
model.predict(X = test)
```

### R¶

### Scala¶

We first import MXNet

```
import ml.dmlc.mxnet._
```

Then we declare the data iterators to the training and validation datasets

```
val trainDataIter = IO.MNISTIter(Map(
"image" -> "data/train-images-idx3-ubyte",
"label" -> "data/train-labels-idx1-ubyte",
"data_shape" -> "(1, 28, 28)",
"label_name" -> "sm_label",
"batch_size" -> batchSize.toString,
"shuffle" -> "1",
"flat" -> "0",
"silent" -> "0",
"seed" -> "10"))
val valDataIter = IO.MNISTIter(Map(
"image" -> "data/t10k-images-idx3-ubyte",
"label" -> "data/t10k-labels-idx1-ubyte",
"data_shape" -> "(1, 28, 28)",
"label_name" -> "sm_label",
"batch_size" -> batchSize.toString,
"shuffle" -> "1",
"flat" -> "0", "silent" -> "0"))
```

and declare a two-layer MLP

```
val data = Symbol.Variable("data")
val fc1 = Symbol.FullyConnected(name = "fc1")(Map("data" -> data, "num_hidden" -> 128))
val act1 = Symbol.Activation(name = "relu1")(Map("data" -> fc1, "act_type" -> "relu"))
val fc2 = Symbol.FullyConnected(name = "fc2")(Map("data" -> act1, "num_hidden" -> 64))
val act2 = Symbol.Activation(name = "relu2")(Map("data" -> fc2, "act_type" -> "relu"))
val fc3 = Symbol.FullyConnected(name = "fc3")(Map("data" -> act2, "num_hidden" -> 10))
val mlp = Symbol.SoftmaxOutput(name = "sm")(Map("data" -> fc3))
```

Next we train a model on the data

```
import ml.dmlc.mxnet.optimizer.SGD
// setup model and fit the training set
val model = FeedForward.newBuilder(mlp)
.setContext(Context.cpu())
.setNumEpoch(10)
.setOptimizer(new SGD(learningRate = 0.1f, momentum = 0.9f, wd = 0.0001f))
.setTrainData(trainDataIter)
.setEvalData(valDataIter)
.build()
```

Finally we can predict by

```
val probArrays = model.predict(valDataIter)
// in this case, we do not have multiple outputs
require(probArrays.length == 1)
val prob = probArrays(0)
// get predicted labels
val py = NDArray.argmaxChannel(prob)
// deal with predicted labels 'py'
```

### Julia¶

We first import MXNet

```
using MXNet
```

Then load data

```
batch_size = 100
include("mnist-data.jl")
train_provider, eval_provider = get_mnist_providers(batch_size)
```

and define the MLP

```
mlp = @mx.chain mx.Variable(:data) =>
mx.FullyConnected(num_hidden=128) =>
mx.Activation(act_type=:relu) =>
mx.FullyConnected(num_hidden=64) =>
mx.Activation(act_type=:relu) =>
mx.FullyConnected(num_hidden=10) =>
mx.SoftmaxOutput()
```

The model can be trained by

```
model = mx.FeedForward(mlp, context=mx.cpu())
optimizer = mx.SGD(lr=0.1, momentum=0.9, weight_decay=0.00001)
mx.fit(model, optimizer, train_provider, n_epoch=20, eval_data=eval_provider)
```

and finally predict by

```
probs = mx.predict(model, test_provider)
```

## Tensor Computation¶

Next we briefly introduce the tensor computation interface, which is often more flexiable to use than the previous symbolic interface. It is often used to implement the layers, define weight updating rules, and debug.

### Python¶

The python inferface is similar to `numpy.NDArray`

.

```
>>> import mxnet as mx
>>> a = mx.nd.ones((2, 3),
... mx.gpu())
>>> print (a * 2).asnumpy()
[[ 2. 2. 2.]
[ 2. 2. 2.]]
```

### R¶

### Scala¶

You can do tensor/matrix computation in pure Scala.

```
scala> import ml.dmlc.mxnet._
import ml.dmlc.mxnet._
scala> val arr = NDArray.ones(2, 3)
arr: ml.dmlc.mxnet.NDArray = ml.dmlc.mxnet.NDArray@f5e74790
scala> arr.shape
res0: ml.dmlc.mxnet.Shape = (2,3)
scala> (arr * 2).toArray
res2: Array[Float] = Array(2.0, 2.0, 2.0, 2.0, 2.0, 2.0)
scala> (arr * 2).shape
res3: ml.dmlc.mxnet.Shape = (2,3)
```