Automatically deploy a fully configured GPU instance on AWS incl. PyTorch 1.0 and fast.ai 1.0

Automatically deploy a fully configured GPU instance on AWS incl. PyTorch 1.0 and fast.ai 1.0

A common workflow for deep learning practitioners is to work locally on a project via JupyterLab (or Jupyter Notebook). Then to train the neural network a powerful GPU cloud instance is used and the local files are moved to that instance for the intensive computations. After the compute intensive part the cloud instance is shut down again, to save resources/reduce fees.

The following TerraForm configuration will fully automate this process using AWS GPU compute instance.

TerraForm is an open source tool to define infrastructure as code. For more info checkout their website at https://www.terraform.io/.

This TerraForm configuration will setup a new p2.xlarge (alternatively p3.2xlarge) spot instance with an Ubuntu 18.04 image. It will then install the NVIDIA drivers, Anaconda with Python 3.7.1, CUDA, cuDNN, Pytorch 1.0 and fast.ai 1.0. In addition to the configuration and installation of the instance, the TerraForm configuration will also upload local files (from “shared_files”) to the cloud server. Once the GPU intensive part of the workflow is complete, you can have TerraForm automatically delete the instance.

The example configuration uses the us-west-2 region (Oregon). Make sure you already have your access keys setup (https://console.aws.amazon.com/iam/home?region=us-west-2#/users$new?step=details). The TerraForm configuration needs the access keys to authenticate against AWS.
You will also need a ssh key pair configured with AWS (https://us-west-2.console.aws.amazon.com/ec2/v2/home?region=us-west-2#KeyPairs:sort=keyName). The ssh key pair will be used to upload any needed files from your local computer and you will need them to log into your server.

To get started install TerraForm on your local computer (https://learn.hashicorp.com/terraform/getting-started/install.html) and then clone the GitHub repository for the configuration https://github.com/OliverMaerz/dl-terraform with:

git clone https://github.com/OliverMaerz/dl-terraform.git

Review the variables.tf file and change it to match your configuration. In particular change the default value for ‘ssh_key_pair’ and ‘private_ssh_key’. With ‘spot_price’ you can set the maximum price you are willing to pay for the spot instance. You can check the most current spot prices at https://aws.amazon.com/ec2/spot/pricing/ (make sure to select “US West (Oregon)” in the Region drop-down).

Copy the files (Notebook, scripts, etc.) you need uploaded to the cloud instance to the ‘shared_files’ folder.

Once everything is ready initialize the TerraForm configuration and then deploy it with just a couple of commands:

terraform init
terraforn apply

That´s it. After about 15 minutes you instance will be ready for you to log in and run your Jupyter Notebooks.

For more details see also the instructions on GitHub: https://github.com/OliverMaerz/dl-terraform/blob/master/README.md.

Leave a Reply

Your email address will not be published. Required fields are marked *