Azure Data Science Virtual Machine#
The Azure Data Science Virtual Machines (DSVMs) are virtual machine with popular data science tools preinstalled and configured. The DSVMs can also contain a single or multiple graphics processing units (GPUs) for deep learning.
All DSVMs have the following programming languages pre-installed:
- Python
- R
- Julia
The Linux DSVM comes with Linux Ubuntu 18.04 and the following deep learning and machine learning tools:
- Caffe and Caffe2
- Chainer 5.2
- Microsoft Cognitive Toolkit (CNTK) 2.5.1
- MXNet 1.3.0
- Keras 2.2.4
- PyTorch 1.2.0
- Tensorflow 1.13
- Theano 1.0.3
- Horovod 0.16.1
- XGBoost 0.80
- Vowpal Wabbit 8.1
- Weka 3.8.0
- H2O
- LightGBM
- Rattle
The DSVM is also configured with CUDA, CuDNN, NVIDIA driver, NVidia-smi, Docker, Intel MKL, and NCCL. Additional libraries and environments can be installed via the package manager, Conda.
More information about all tools installed on the Linux Data Science Virtual Machine can be found here.
Additionally, DSVMs can be configured with an additional network share, which allows the sharing of documents and files between all users in the same Data Science Labs.
Interfaces#
- CLI tools can connect to the DSVM via SSH
- JupyterHub: JupyterHub is an open-source user web-application for launching Jupyter Notebooks. Jupyter Notebook is an open-source web application that you can use to create and share documents that contain live code, equations, visualizations, and narrative text. JupyterHub allows you to host multiple instances of a single-user Jupyter notebooks. For more information about JupyterHub, see https://jupyterhub.readthedocs.io .
- Rstudio: RStudio is an integrated development environment (IDE) for R, a programming language for statistical computing and graphics. For more information about RStudio, see https://www.rstudio.com/ .
Hardware configuration#
The recommended hardware configurations are listed below. The sizing of the VMs will also be discussed during the technical workshops. When considered useful in the envisioned use cases, other VM types can also be reviewed.
Data Science: General Purpose virtual machines#
Dv3-series VMs runs on the Intel® Xeon® 8171M 2.1GHz (Skylake), Intel® Xeon® E5-2673 v4 2.3 GHz (Broadwell), or the Intel® Xeon® E5-2673 v3 2.4 GHz (Haswell) processors with the Intel Turbo Boost Technology 2.0.
Instance Size | vCPU | Memory (GiB) | Temp Storage SSD (GiB) | Maw temp storage throughput (IOPS/Read/Write Mbps) | Expected Network Bandwidth (Mbps) |
---|---|---|---|---|---|
Standard_D2_v3 | 2 | 8 | 50 | 3000/46/23 | 1000 |
Standard_D4_v3 | 4 | 16 | 100 | 6000/93/46 | 2000 |
Standard_D8_v3 | 8 | 32 | 200 | 12000/187/93 | 4000 |
Standard_D16_v3 | 16 | 64 | 400 | 24000/375/187 | 8000 |
Standard_D32_v3 | 32 | 128 | 800 | 48000/750/375 | 16000 |
Standard_D48_v3 | 48 | 192 | 1200 | 96000/1000/500 | 24000 |
Standard_D64_v3 | 64 | 256 | 1600 | 96000/1000/500 | 30000 |
For an updated list, please visit the Microsoft documentation.
Deep learning: GPU virtual machines#
The GPU VMs are targeted for heavy graphic rendering and video editing, as well as model training and inferencing (ND) with deep learning. NC-series VMs are powered by the NVIDIA Tesla K80 card and the Intel Xeon E5-2690 v3 (Haswell) processor.
Instance | GPUs* | vCPU | Mem (GiB) | GPU Mem (GiB) | Temp storage SSD (Gib) |
---|---|---|---|---|---|
Standard_NC6 | 1 | 6 | 56 | 12 | 340 |
Standard_NC12 | 2 | 12 | 112 | 24 | 680 |
Standard_NC24 | 4 | 24 | 224 | 48 | 1440 |
* 1 GPU = one-half K80 card |
For an updated list, please visit the Microsoft documentation.