Amazon SageMaker#

The EC Data Platform Amazon SageMaker building block is a fully managed machine learning service. With SageMaker, data scientists and developers can quickly and easily build and train machine learning models, and then directly deploy them into a production-ready hosted environment. It provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so you don't have to manage servers. It also provides common machine learning algorithms that are optimized to run efficiently against extremely large data in a distributed environment.

With native support for bring-your-own-algorithms and frameworks, SageMaker offers flexible distributed training options that adjust to your specific workflows. Deploy a model into a secure and scalable environment by launching it with a few clicks from SageMaker Studio or the SageMaker console. Training and hosting are billed by minutes of usage.

In EC Data Platform, you can use Amazon SageMaker to import data from an Amazon storage solutions that Data Platform provides for training advanced machine learning models.

Interfaces#

The interface that the Amazon SageMaker service provides, is called the Amazon SageMaker Studio. This is a seperate environment which can be accessed from the AWS Console.

alt-text

From within the Studio, there are multiple ways to create, train and deploy your ML models using several built-in models of SageMaker or using popular machine learning frameworks (scikit-learn, Tensorflow, PyTorch, etc.). Notebooks are available with the Python environment configured. Additionally, you can create models using the SageMaker Autopilot (AutoML feature), which automatically pre-processes and trains popular machine learning models in the wild applied on your data. Finally, you can use the large library of SageMaker open-source Notebook tutorials in the Studio and Github to train a machine learning model without starting from scratch.

Hardware configuration#

The SageMaker Studio's Notebooks need to be connected to a compute instance. A managed EC2 instance is started in the background that execute your submitted code. You can choose between all available compute instance types that SageMaker Studio offers, excluding the following expensive instances:

Instance Type	CPU	Memory
ml.m5.24xlarge	96 vCPU	384 GiB
ml.m5.16xlarge	64 vCPU	256 GiB
ml.m5.12xlarge	48 vCPU	192 GiB
ml.c5.24xlarge	96 vCPU	192 GiB
ml.c5.18xlarge	72 vCPU	144 GiB
ml.g4dn.16xlarge	64 vCPU	256 GiB
ml.g4dn.12xlarge	48 vCPU	192 GiB
ml.p3.16xlarge	64 vCPU	488 GiB
ml.p3.8xlarge	32 vCPU	244 GiB

Users are responsible to close Notebook instances themselves to avoid paying unnecessary costs. For an updated list of all available instance types, please visit the Amazon SageMaker pricing website.