Skip to content

Azure Data Science Virtual Machine#

The Azure Data Science Virtual Machines (DSVMs) are virtual machine with popular data science tools preinstalled and configured. The DSVMs can also contain a single or multiple graphics processing units (GPUs) for deep learning.

alt-text

All DSVMs have the following programming languages pre-installed:

  • Python
  • R
  • Julia

The Linux DSVM comes with Linux Ubuntu 18.04 and the following deep learning and machine learning tools:

  • Caffe and Caffe2
  • Chainer 5.2
  • Microsoft Cognitive Toolkit (CNTK) 2.5.1
  • MXNet 1.3.0
  • Keras 2.2.4
  • PyTorch 1.2.0
  • Tensorflow 1.13
  • Theano 1.0.3
  • Horovod 0.16.1
  • XGBoost 0.80
  • Vowpal Wabbit 8.1
  • Weka 3.8.0
  • H2O
  • LightGBM
  • Rattle

The DSVM is also configured with CUDA, CuDNN, NVIDIA driver, NVidia-smi, Docker, Intel MKL, and NCCL. Additional libraries and environments can be installed via the package manager, Conda.

More information about all tools installed on the Linux Data Science Virtual Machine can be found here.

Additionally, DSVMs can be configured with an additional network share, which allows the sharing of documents and files between all users in the same Data Science Labs.

Interfaces#

  • CLI tools can connect to the DSVM via SSH
  • JupyterHub: JupyterHub is an open-source user web-application for launching Jupyter Notebooks. Jupyter Notebook is an open-source web application that you can use to create and share documents that contain live code, equations, visualizations, and narrative text. JupyterHub allows you to host multiple instances of a single-user Jupyter notebooks. For more information about JupyterHub, see https://jupyterhub.readthedocs.io .
  • Rstudio: RStudio is an integrated development environment (IDE) for R, a programming language for statistical computing and graphics. For more information about RStudio, see https://www.rstudio.com/ .

Hardware configuration#

The recommended hardware configurations are listed below. The sizing of the VMs will also be discussed during the technical workshops. When considered useful in the envisioned use cases, other VM types can also be reviewed.

Data Science: General Purpose virtual machines#

Dv3-series VMs runs on the Intel® Xeon® 8171M 2.1GHz (Skylake), Intel® Xeon® E5-2673 v4 2.3 GHz (Broadwell), or the Intel® Xeon® E5-2673 v3 2.4 GHz (Haswell) processors with the Intel Turbo Boost Technology 2.0.

Instance Size vCPU Memory (GiB) Temp Storage SSD (GiB) Maw temp storage throughput (IOPS/Read/Write Mbps) Expected Network Bandwidth (Mbps)
Standard_D2_v3 2 8 50 3000/46/23 1000
Standard_D4_v3 4 16 100 6000/93/46 2000
Standard_D8_v3 8 32 200 12000/187/93 4000
Standard_D16_v3 16 64 400 24000/375/187 8000
Standard_D32_v3 32 128 800 48000/750/375 16000
Standard_D48_v3 48 192 1200 96000/1000/500 24000
Standard_D64_v3 64 256 1600 96000/1000/500 30000

For an updated list, please visit the Microsoft documentation.

Deep learning: GPU virtual machines#

The GPU VMs are targeted for heavy graphic rendering and video editing, as well as model training and inferencing (ND) with deep learning. NC-series VMs are powered by the NVIDIA Tesla K80 card and the Intel Xeon E5-2690 v3 (Haswell) processor.

Instance GPUs* vCPU Mem (GiB) GPU Mem (GiB) Temp storage SSD (Gib)
Standard_NC6 1 6 56 12 340
Standard_NC12 2 12 112 24 680
Standard_NC24 4 24 224 48 1440
* 1 GPU = one-half K80 card

For an updated list, please visit the Microsoft documentation.