🧠 First Steps with Swift for TensorFlow

We just finished presenting at the inaugural TensorFlow World conference, in Santa Clara, California. Mars, Tim, and Paris presented what might be the first 3-hour tutorial session on the brand new Swift for TensorFlow machine learning platform.

This post serves as both a follow-up to that session (which was recorded, and will be posted soon — we’ll update this post when that happens) and a standalone guide and tutorial to get started with Swift for TensorFlow.

We’ll be posting follow-up tutorials, which will get more advanced, over the coming weeks. (In the mean time, check out our new book on Practical Artificial Intelligence with Swift!)

Getting Swift for TensorFlow

There are two ways to get Swift for TensorFlow that we’d recommend right now. The first is to use Google’s Colaboratory (Colab), an online data science and experimentation platform, which means you use it via a browser and a Jupyter Notebooks-like environment.

The second is to install it locally, using Docker.

If you use Windows, we recommend using Google Colab, and if you use Linux or macOS, we recommend installing using the Docker image (it’s much easier than Docker’s reputation might suggest!)

Installing Swift for TensorFlow with Docker

➡️ First, make a folder on your local system in which to store your Swift Jupyter notebooks. For example, mine is located at /Users/parisba/S4TF/notebooks. You don’t need to put anything in there, just make sure you’ve created it.

➡️ Download and install Docker: https://hub.docker.com/

We’re not going to explain this process much, because once it’s done you don’t need to think about Docker or any of this process again. If you want to learn how Docker works, there are plenty of sources online.

➡️ Now, clone the following git repository:

git clone https://github.com/google/swift-jupyter.git

➡️ Then, change directory into the cloned repository, and execute the following command:

docker build -f docker/Dockerfile -t swift-jupyter .

➡️ Then, to launch the Docker container and Jupyter notebooks, execute the following command:

docker run -p 8888:8888 --cap-add SYS_PTRACE -v /path/to/books:/notebooks swift-jupyter

⚠️ Note that you will need to replace the /path/to/books in the above with the path to folder on your local system that you created earlier.

➡️ Open the URL that is displayed in your terminal, similar to the following:

Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:

➡️ You should see something that looks like the following screenshot:

➡️ You’re ready to go!

Using Google Colaboratory

You don’t need to do much to use Google Colaboratory!

➡️ Make sure you have a Google Account, and then head to Google Colab’s blank Swift notebook.

➡️ That’s it! You’re done.

Training a Model

In this example, we assemble a multilayer peceptron network that can perform XOR.

It’s not very useful, but it showcases how you build up a model using layers, and how to execute training with that model. XOR was one of the first stumbling blocks of early work with artificial neural networks, which makes it a great example for the power of modern machine learning frameworks.

It’s simple enough that you know whether it’s correct… which is why we’re doing it!

➡️ Create a new notebook, and import the TensorFlow framework:

import TensorFlow

To represent our XOR neural network model, we need to create a struct, adhering to the Layer Protocol (which is part of Swift For TensorFlow’s API). Ours is called XORModel.

Inside the model, we want three layers:

  • an input layer, to take the input
  • a hidden layer
  • an output layer, to provide the output

All three layers should be a Dense layer (a densely-connected layer) that takes an inputSize and an outputSize.

The inputSize specifies that the input to the layer is of that many values. Likewise outputSize, for the out of the layer.

Each will have an activation using an activation function determines the output shape of each node in the layer. There are many available activations, but ReLU and Sigmoid are common.

For our three layers, we’ll use sigmoid.

We’ll also need to provide a definition of our @differentiable func, callAsFunction(). In this case, we want it to return the input sequenced through (passed through) the three layers.

Helpfully, the Differentiable protocol that comes with Swift for TensorFlow has a method, sequenced() that makes this trivial.

➡️ To do this, add the following code:

struct XORModel: Layer
  var inputLayer = Dense<Float>(inputSize: 2, outputSize: 2, activation: sigmoid)
  var hiddenLayer = Dense<Float>(inputSize: 2, outputSize: 2, activation: sigmoid)
  var outputLayer = Dense<Float>(inputSize: 2, outputSize: 1, activation: sigmoid)
  @differentiable func callAsFunction(_ input: Tensor<Float>) -> Tensor<Float>
    return input.sequenced(through: inputLayer, hiddenLayer, outputLayer)

➡️ Then we need to create an instance of our XORModel Struct, which we defined above. This will be our model:

var model = XORModel()

Next, we need an optimiser, in this case we’re going to use stochastic gradient descent (SGD) optimiser, which we can get from the Swift for TensorFlow library.

➡️ Our optimiser is, obviously, for the model instance we defined a moment ago, and wants a learning rate of about 0.02:

let optimiser = SGD(for: model, learningRate: 0.02)

➡️ Now we need an array of type Tensor to hold our training data ([0, 0], [0, 1], [1, 0], [1, 1]):

let trainingData: Tensor<Float> = [[0, 0], [0, 1], [1, 0], [1, 1]]

➡️ And we need to label the training data so that we know the correct outputs:

let trainingLabels: Tensor<Float> = [[0], [1], [1], [0]]

➡️ To train, we’ll need a hyperparameter for epochs:

let epochs = 100_000

Then we need a training loop. We train the model by iterating through our epochs, and each time update the gradient (the 𝛁 symbol, nabla, is often used to represent gradient). Our gradient is of type TangentVector, and represents a differentiable value’s derivatives.

Each epoch, we set the predicted value to be our training data, and the expected value to be our training data, and calculate the loss using meanSquaredError().

Every so often we also want to print out the epoch we’re in, and the current loss, so we can watch the traning. We also need to return loss.

Finally, we need to use our optimizer to update the differentiable variables, along the gradient.

➡️ To do this, add the following code:

for epoch in 0..<epochs
    let 𝛁model = model.gradient { model -> Tensor<Float> in

        let ŷ = model(trainingData)

        let loss = meanSquaredError(predicted: ŷ, expected: trainingLabels)

        if epoch % 5000 == 0
          print("epoch: \(epoch) loss: \(loss)")
        return loss

    optimiser.update(&model, along: 𝛁model)

➡️ Run the notebook! You should see something resembling the following output:

epoch: 0 loss: 0.25470454
epoch: 5000 loss: 0.24981761
epoch: 10000 loss: 0.2496698
epoch: 95000 loss: 0.16970003

➡️ Test your (incredibly useful) XOR model by adding a cell to your notebook with the following code:

print(round(model.inferring(from: [[0, 0], [0, 1], [1, 0], [1, 1]])))

➡️ The output should be as follows:


➡️ Congratulations! You just trained a machine learning model that can, badly, perform XOR.

We’ll be posting more Swift for TensorFlow material in the coming weeks! 🚀

For more Swift AI content, check out our latest book, Practical Artificial Intelligence with Swift! It covers using Swift for AI in iOS applications, using Apple’s CreateML, CoreML, and Turi Create. If you like filling your brain with words, why not fill them with ours?

If you want to learn a little more about Swift for TensorFlow, we recommend this session from TensorFlow World as a great starting point: