Skip to content

Amazon SageMaker#

1. SageMaker Capabilities#

1.1. General Overview#

Amazon SageMaker is a fully managed machine learning service. With SageMaker, data scientists and developers can quickly and easily build and train machine learning models, and then directly deploy them into a production-ready hosted environment. It provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so you don't have to manage servers. It also provides common machine learning algorithms that are optimized to run efficiently against extremely large data in a distributed environment. With native support for bring-your-own algorithms and frameworks, SageMaker offers flexible distributed training options that adjust to your specific workflows. SageMaker is made possible in the EC Data Platform by utilizing the federated login to access the AWS Management Console. Accessing SageMaker and SageMaker Studio via this workflow is an easy and secure way that doesn't require a WorkSpace or Bastion Host.

alt-text

To access Amazon SageMaker via the AWS Management Console, please use the link:

  • myapplications.microsoft.com

then click on Amazon Web Services (AWS).

alt-text

1.2. SageMaker Studio#

1.2.1. General Overview#

Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for machine learning that lets you build, train, debug, deploy, and monitor your machine learning models. Studio provides all the tools you need to take your models from experimentation to production while boosting your productivity. In a single unified visual interface, customers can perform the following tasks:

• Write and execute code in Jupyter notebooks.

• Build and train machine learning models.

• Deploy the models and monitor the performance of their predictions.

• Track and debug the machine learning experiments.

1.2.2. Additional features in Studio#

There are a lot of services that can be used within the Studio, which really enhance the overall ML experience, such as:

• SageMaker Pipelines to automate and manage automated ML workflows.

• SageMaker Autopilot to automatically create ML models with full visibility.

• SageMaker Experiments to organize and track your training jobs and versions.

• SageMaker Debugger to debug anomalies during training.

• SageMaker Model Monitor to maintain high quality models.

• SageMaker JumpStart to easily deploy ML solutions for many use cases.

1.2.3. Benefits of using Studio Notebooks#

In the EC Data Platorm you only have the option to start a new noteboook in the Studio and not in the regular SageMaker console. The reason for this is that there are many benefits to using a Studio notebook compared to regular Notebook Instances, including the following:

• Starting a Studio notebook is faster than launching an instance-based notebook. Typically, it is 5-10 times faster than instance-based notebooks.

• Notebook sharing is an integrated feature in SageMaker Studio. Users can generate a shareable link that reproduces the notebook code and also the SageMaker image required to execute it, in just a few clicks.

• SageMaker Studio notebooks come pre-installed with the latest Amazon SageMaker Python SDK.

• SageMaker Studio notebooks are accessed from within Studio. This enables you to build, train, debug, track, and monitor your models without leaving Studio.

• Each member of a Studio team gets their own home directory to store their notebooks and other files. The directory is automatically mounted onto all instances and kernels as they're started, so their notebooks and other files are always available. The home directories are stored in Amazon Elastic File System (Amazon EFS) so that you can access them from other services.

• Studio notebooks are equipped with a set of predefined SageMaker image settings to get you started faster.

Note The Studio can't do the training and processing jobs itself. The studio will, by means of APIs and other kernel gateways, call upon these services and Sagemaker will do these for the Studio. The Studio provides a better visual representation for these features and also includes lists of pipelines & experiments for a better overview, but is still reliant on the underlying SageMaker features. Subsequently, you do not have access to the machine and the gateways will run for the duration of the execution of the process. This is visualised in the image below.

alt-text

1.2.4. Difference Data Science Studio & Sagemaker Studio#

Currently, the common way to do ML in the EC Data Platform is to set-up an EC2 instance with JupyterHub installed on the VM. SageMaker is a potential way to change this workflow and is a service dedicated to making the life of data scientists a lot easier. The main advantages of using SageMaker compared to VM's are the following:

• The user will access SageMaker directly from the console. So, there is no need to use a Bastion Host or a WorkSpace and is therefore more user-friendly.

• Starting and Stopping the instance can be done directly from the Management Console. This is beneficial for the costs that will be charged for the DSL.

• The underlying infrastructure of SageMaker Studio notebooks is flexible.

1.3. Pricing#

SageMaker follows a strict pay-for-what-you-use policy. When building, training and deploying models you will be billed by the second, with no upfront commitments and no minimum fees. The instancy type used will determine the price you'll pay. The moment you're instance, or application as it's called in the Studio Environment, is running and in the "ready" state the billing will start. Users are able to spin up the resources they want but also hold the responsibility of the costs this will incurr. So, users have to proceed with caution when choosing the size and number of instance types they use

In Studio, when the environment is first deployed, an Elastic File System (EFS) Volume is created for the whole team. When a member of the team opens their Studio Environment a home directory is deployed for the given user. This directory will have storage costs attached to it. Subsequently, additional storage charges are incurred for the notebooks and data stored in the respective directory.

Note: If you don't want to run the risk of incurring costs when you create or open a notebook, you just open the notebook and choose "No Kernel" when asked to choose a kernel. This way you can edit or read a notebook but you can't run cells.

2. Sagemaker Studio Architecture#

2.1. Studio Notebooks: Key Building Blocks#

The Studio Notebooks consist of 4 Key Building Blocks, which are Applications, Images, Kernels and KernelGateway Apps. These Building Blocks can all be adapted in the notebook. Each of these are described below.

2.1.1. Application#

The Application, which is an EC2 Instance, is the hardware configuration where your notebook will run on. AWS provides a great number of different instance types to ensure that there is a fit for every use case. These types contain a lot of different combinations of memory, CPU, storage and networking capacity and give you the flexibility to choose the best suited mix of resources. Each instance type has one or more instance sizes, which means that these instances are completely scalable to the requirements of your workload. The instance types used in Sagemaker are types which are particularly suited to use in Machine Learning use-cases.

2.1.2. Images#

SageMaker Studio Notebooks provide a number of built-on images (TensorFlow, PyTorch, Conda, ...) for data science and ML frameworks. These images are container images that are compatible with the Studio. They consist of kernels (described in the next section), language packages and other files which are required to run notebooks. It is possible to have mulltiple images in 1 instance. By default, the SageMaker Python SDK is already installed on the built-in SageMaker images and the latest verion of the backend runtime process.

2.1.3. Kernel#

A Kernel is a computer program that is the heart and core of an Operating System. Since the Operating System has control over the system so, the Kernel also has control over everything in the system. A kernel is defined by a kernel spec in the image. There can be multiple kernels in an image.

For a list of available Amazon SageMaker kernels, please visit this link.

2.1.4. KernelGateway Apps#

A SageMaker image runs as a KernelGateway app. The app provides access to the kernels in the image. There is a one-to-one correspondence between a SageMaker image and a SageMaker app.

2.2. Networking#

By default, SageMaker Studio allows direct internet access. You can choose to restrict which traffic can access the internet when launching Studio, in a VPC of your preference. This allows you complete control of network access and internet connectivity of your SageMaker Studio. Direct internet access can be disabled on request to provide more security.

The Studio comes with a network interface that allows communication with the internet through a VPC managed by SageMaker. Traffic to AWS services like Amazon S3 and CloudWatch goes through an internet gateway as does traffic that accesses the SageMaker API and SageMaker runtime. Traffic between the domain and your Amazon EFS volume goes through the VPC that you specified when you onboard to Studio or call the CreateDomain API.

There are 2 options regarding networking in EC Data Platform:

  1. A VPC is attached with no internet access.

  2. A VPC is attached but with SageMaker provided internet access.

2.3. User Management#

2.3.1. User Profiles in the Studio#

When going in the Sagemaker Studio Console you will notice that you can make different user profiles in your domain, which represent the different members of your team. These users all have their own home directory where they can store their files and share them with the other users. Each user also has an execution role that contains the permissions of what a particular member of the team can and can't do in their Studio Notebook. When going deeper into this user profile, you can see which exact execution role this member has and the amount, size and status of the applications the user has deployed and used. (These applications can be managed from both the user's studio and from the outside in the general Sagemaker Dashboard). An example of this is shown below.

alt-text

In EC Data Platform we have 1 Execution role that the user assumes in the Studio environment and 2 IAM roles that the users assume in the SageMaker Console: an admin user and a default user. These will be described in more detail in the following section.

2.3.2. Roles in EC Data Platform#

  1. Default Role: Users who have this IAM role will only be able to access the user environments of the user profiles in the Control Panel of the Studio.

  2. Admin Role: Users who have this IAM role will be able to access the user environments of the user profiles in the Control Panel of the Studio. Additionally, the admin has the permissions to create new user profiles.

  3. Execution role: This is the role all users assume whithin their Studio environment. This role has no particular restrictions towards the number of notebooks they want to deploy, so the user will be able to spin-up images and kernels freely inside their own domain. However, there will be a restriction on the instance types the users wil be able to deploy.

3. Getting Started#

3.1. Console Panel#

When first getting in to the console panel of the Studio you'll be able to see all the different user profiles created for your domain. You can open the respective home directory from here. Additionally, an overview of the Studio domain is given with the general execution role for the studio, which will be predefined, instance id and the current status. The images below show this general overview.

alt-text

alt-text

When you want a more in-depth view of each user you can click on the user to see their execution role and the apps that they currently have and their respective status. First thing you'll notice when taking a look at the user profile is that these users all have a default app running. This is a basic instance thath hosts their Studio instance and is a part of the AWS Free Tier, so it doesn't incurr any costs. This 'default' app shouldn't be deleted.

As stated before, every user profile has their own Home Directory which is stored on the EFS of the Studio. The users are able to go through this directory and access CodeCommit repositories to share code with the other user profiles in the Studio Domain.

3.2. SageMaker Studio UI#

Once you entered the Studio Environment, you will be able to see that there are 3 main components that the Studio consists of. These are: left sidebar, file and resource browser and the main working area.

alt-text

3.2.1. Left Sidebar#

Consists of a number of icons/features which are described below and in order from top to bottom.

• File Browser

• Git: Where you can connect to a Git repository and then access a full range of Git tools and operations.

• Running Terminals & Instances

• Commands: The majority of the menu commands are available here.

• Notebook Tools: You can access a notebook's metadata through the Advanced Tools section. You can only see this icon when you've opened a notebook

• SageMaker Jumpstart: Provides a list of solutions, model endpoints, or training jobs created with SageMaker Jumpstart.

• Open Tabs

• Components & Registries: Provides a list of projects, data wrangler flows, pipelines, experiments, trials, models, or endpoints, or access to the feature store.

3.2.2. File & Resource Browser#

The file and resource browser displays lists of your notebooks, experiments, trials, trial components, and endpoints. On the menu at the top of the file browser, choose the plus (+) sign to open the Studio Launcher. The Launcher allows you to create a notebook, launch a Python interactive shell, or open a terminal.

3.2.3. Main Working Area#

The main work area consists of multiple tabs that contain your open notebooks and terminals, and detailed information about your experiments and endpoints. One commonly used tab is the Trial Component List. This list is referred to as the Leaderboard because it's where you can compare experiments and trials.

3.3. SageMaker Studio Launcher#

The Launcher consists of following sections:

• Get started: Provides material to get started using SageMaker Studio, such as videos and tutorials, and one-click solutions for machine learning problems.

• ML tasks & components: Create machine learning tasks and components, such as new feature groups, data flows, and projects.

• Notebooks & compute resources: Create a notebook, open an image terminal, or open a Python console. A more thorough explanation is given in the Notebooks section

• Utilities & files: More in-depth explanation in the next sub-section

An image of the launcher is shown below

alt-text

3.3.1. Utilities & Files#

Show contextual help from a notebook, create files, or open a system terminal. The following items are available:

• Show Contextual Help: Opens a new tab that displays contextual help for functions in a Studio notebook. To display the help, choose a function in an active notebook. To make it easier to see the help in context, drag the help tab so that it's adjacent to the notebook tab.

• System terminal: Opens a bash shell in the root folder for the user.

• Text File and Markdown File: Creates a file of the associated type in the folder that you have currently selected in the file browser. To view the file browser, in the left sidebar, choose the File Browser.

3.4. Creating a New Notebook#

When you create a notebook in Amazon SageMaker Studio, you have to select a SageMaker image and kernel for the notebook. SageMaker launches the notebook on a default instance of a type based on the chosen SageMaker image. For CPU based images, the default instance type is ml.t3.medium. For GPU based images, the default instance type is ml.g4dn.xlarge. The steps to create this notebook are described below

Step 1: Go to the file tab in upper left corner of your screen. Then click on the 'new' option and select the Notebook option

alt-text

Step 2: Select the desired Kernel you want to use for your Notebook. There's a variety of choices like: PyFlow, PyTorch, TensorFlow,... or you can even select no kernel if you just want a read notebook

alt-text

alt-text

Step 3: When the kernel is selected you can specify which instance type you desire for your Notebook. This is done by clicking on the 'unknown' tab of the Main Working Space. Here you have a variety of options that you can use in the EC Data Platform.

alt-text

alt-text

Once you have chosen your kernel and instancy types, you are able to start building your Machine Learning Model.

4. Sagemaker features#

In this section, several supported features of AWS Sagemaker are introduced:

  • Git integration
  • Importing data
  • Training
  • Model registry
  • Model hosting
  • Sharing a Notebook
  • Autopilot (AutoML)
  • JumpStart

To have a first-hand demo on working with these features, AWS provides an abundant of AWS Sagemaker example Jupyter Notebooks from this Github repository.

Accessing Sagemaker features requires usage of the Sagemaker Python SDK. Using the SDK requires the definition of a session, a client and an execution role in a Notebook:

import boto3
import sagemaker

sess = boto3.Session()
sm = sess.client('sagemaker')
my_role = sagemaker.get_execution_role()

Please define these variables before working with the Sagemaker Python SDK. For more info, consult the SageMaker Python SDK docs.

4.1 Git integration#

Sagemaker Studio offers full integration with Git to source control your projects. To clone a repository in Sagemaker Studio,

Step 1. Click on the Git tab in the Sagemaker Studio.

Step 2. Click on "Clone a Repository".

alt-text

Step 3. For example you could clone the AWS Sagemaker example Jupyter Notebooks using the link: https://github.com/aws/amazon-sagemaker-examples.git. Click on "Clone".

alt-text

Step 4. Click on the folder tab in the Sagemaker Studio to find your cloned repository folder.

alt-text

Fore more info on Git integration in Sagemaker Studio, consult its Developer guide.

4.2 Importing data#

You can import datasets into Sagemaker Studio from

  • from a local file
  • from an S3 bucket.

We will demonstrate both options. For more info on importing datasets into the Sagemaker Studio, go through the AWS Sagemaker example: Amazon SageMaker Studio Walkthrough.

4.2.1 From local file#

You can import a local dataset from your computer into the Sagemaker Studio. This dataset will live in the S3 bucket where the Studio is hosted.

This can be done by,

Step 1. Click on the folder tab in the Sagemaker Studio.

Step 2. Click on the upload symbol in the folder tab.

alt-text

Step 3. Select the dataset you want to upload in your local computer and click on "Open".

alt-text

If everything went well, your dataset can be found in the current folder in the Sagemaker Studio.

alt-text

4.2.2 From S3 bucket#

AWS Sagemaker is integrated with AWS S3 bucket. You can therefore import a dataset when you have access to a S3 bucket. However, it is good practice to upload your data to an available S3 bucket and reference it using a pointer when using it for training. This can be done by,

account_id = sess.client('sts', region_name=sess.region_name).get_caller_identity()["Account"]
bucket = 'sagemaker-studio-{}-{}'.format(sess.region_name, account_id)
prefix = 'folder-containing-data'

S3Uploader.upload('data/train.csv', 's3://{}/{}/{}'.format(bucket, prefix,'train'))
S3Uploader.upload('data/validation.csv', 's3://{}/{}/{}'.format(bucket, prefix,'validation'))

where you upload training.csv and validation.csv from your local folder. Your datasets can then be referenced by using

s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train'.format(bucket, prefix), content_type='csv')
s3_input_validation = sagemaker.s3_input(s3_data='s3://{}/{}/validation/'.format(bucket, prefix), content_type='csv')

4.3 Experiment#

You have the following options for a training algorithm:

  • Use an algorithm provided by SageMaker — SageMaker provides training algorithms.

  • Submit custom code to train with deep learning frameworks — You can submit custom Python code that uses TensorFlow, PyTorch, or Apache MXNet for model training.

  • Use your own custom algorithms — Put your code together as a Docker image and specify the registry path of the image in a SageMaker CreateTrainingJob API call.

Note: When conducting machine learning training, you occur costs in a pay-as-you-go manner. This means that billing is incurred for every process you run in the Sagemaker Studio. The amount depends on selected machines and time per training job.

We shall shortly demonstrate how you can use an algorithm provided by Sagemaker for model training (option 1). When training a model provided by Sagemaker, Sagemaker uses these steps:

  • Importing the Sagemaker container model
  • Defining an experiment (a folder structure to organize trials)
  • Defining a trial (an attempt to train the model)
  • Defining hyperparameters (what you need to provide to a model before training)
  • Defining an estimator (what encapsulates the hyperparameters and the untrained model)
  • Submitting an estimator (submitting the estimator specifying a trial and an experiment)

We shall shortly demonstrate how you can execute these steps.

For more info on training in the Sagemaker Studio, go through the AWS Sagemaker example:

consult the SageMaker Python SDK docs or consult its developer guide.

4.3.1 Importing an Sagemaker container model#

For example, you can import an XGBoost algorithm container of Sagemaker using

from sagemaker.amazon.amazon_estimator import get_image_uri
docker_image_name = get_image_uri(boto3.Session().region_name, 'xgboost', repo_version='1.0-1')

4.3.2 Defining an experiment#

To run a trial, you first have to create an experiment. An experiment is a way to organize the many training trials that you might conduct when optimizing your model. You can do this by using

sess = sagemaker.session.Session()

my_exp = Experiment.create( experiment_name="my_experiment",
                            description="This is my first experiment.",
                            sagemaker_boto_client=boto3.client('sagemaker'))

4.3.3 Defining a trial#

An experiment contains training trials. You can define a trial by using

my_tr = Trial.create(trial_name="my_trial",
                     experiment_name=my_exp.experiment_name,
                     sagemaker_boto_client=boto3.client('sagemaker'))

4.3.4 Defining hyperparameters#

Before model training, you need to sometimes provide hyperparameters to the machine learning model before training (just like our XGBoost example). You can fix these hyperparameters using a Python struct, like this,

my_hp = {"max_depth":5,
               "subsample":0.8,
               "num_round":600,
               "eta":0.2,
               "gamma":4,
               "min_child_weight":6,
               "silent":0,
               "objective":'binary:logistic'}

4.3.5 Defining an estimator#

You need to connect the container model and hyperparameters using an estimator.

xgb_est = sagemaker.estimator.Estimator(image_name=docker_image_name,
                                    role=my_role,
                                    hyperparameters=my_hp,
                                    train_instance_count=1,
                                    train_instance_type='ml.m4.xlarge',
                                    output_path='s3://{}/{}/output'.format(bucket, prefix),
                                    base_job_name="My first training job.",
                                    sagemaker_session=sess)

4.3.6 Submitting an estimator#

Finally you can submit your training job for execution. When you have constructed an estimator, you need to specify the experiment and trial, like this,

xgb_est.fit({'train': s3_input_train,
             'validation': s3_input_validation},
                experiment_config={
                    "ExperimentName": my_exp.experiment_name,
                    "TrialName": my_trial.trial_name,
                    "TrialComponentDisplayName": "Training",
                }
)

Note: When you run this code snippet in Sagemaker Studio's Notebook, you will incur costs. When conducting machine learning training, you occur costs in a pay-as-you-go manner. This means that billing is incurred for every process you run in the Sagemaker Studio. The amount depends on selected machines and time per training job.

After training the model, you could

  • Download the associated files from the output path in S3 for furthing analyzing the performance of the trained model.

  • Register the model in the model registry for model versioning.

  • Deploy the model behind a Sagemaker real-time hosted endpoint for dynamic inferencing in a production environment.

The first path is out of scope in this user documenation. The last two paths is explained in separate sections.

4.4 Model registry#

With the SageMaker model registry you can do the following:

  • Catalog models for production.
  • Manage model versions.
  • Associate metadata, such as training metrics, with a model.
  • Manage the approval status of a model.
  • Deploy models to production.

We will describe how you can

  • Create a model group
  • Register a model version
  • View model version

For more info on Model Registry, please consult its Developer guide.

4.4.1 Create a model group#

To create a model group,

Step 1. Click on the "Sagemaker Components and registries" tab.

Step 2. In the drop-down list, click on "Model registry".

alt-text

Step 3. Click on "Create model group".

alt-text

Step 4. Provide a model name. Additionally, you can also provide a description, tags and a project name (the project feature is out of scope in this user documentation).

Step 5. Click on "Create model group".

alt-text

Alternatively, you can also create a model group using the Sagemaker Notebook, like this,

model_package_group_name = "my-model"
model_package_group_input_dict = {
 "ModelPackageGroupName" : model_package_group_name,
 "ModelPackageGroupDescription" : "This is my first model group."
}

create_model_pacakge_group_response = sm.create_model_package_group(**model_package_group_input_dict)
print('ModelPackageGroup Arn : {}'.format(create_model_pacakge_group_response['ModelPackageGroupArn']))

4.4.2 Register a model version#

We shall demonstrate how you can register a trained model provided by Sagemaker. To do this, you need to specify the model group to which it belongs, specify its model artifacts (that what your model outputs) and specify the inference code for the model.

Step 1. Create an inference specification. This includes the information of the model docker image and model artifacts. The location of the model artifacts of your trained model can be found in the trial of your experiment.

from sagemaker.amazon.amazon_estimator import get_image_uri
docker_image_name = get_image_uri(boto3.Session().region_name, 'xgboost', repo_version='1.0-1')

modelpackage_inference_specification =  {
    "InferenceSpecification": {
      "Containers": [
         {
            "Image": docker_image_name,
         }
      ],
      "SupportedContentTypes": [ "text/csv" ],
      "SupportedResponseMIMETypes": [ "text/csv" ],
   }
 }

# Specify the model data
model_url = "location in my S3 bucket where I stored my model artifacts"
modelpackage_inference_specification["InferenceSpecification"]["Containers"][0]["ModelDataUrl"]=model_url

Step 2. Create an input dictionary. This includes the model group, a description of the model version and the model status.

create_model_package_input_dict = {
    "ModelPackageGroupName" : model_package_group_name,
    "ModelPackageDescription" : "My new version of the model.",
    "ModelApprovalStatus" : "PendingManualApproval"
}
create_model_package_input_dict.update(modelpackage_inference_specification)

Step 3. Call the create model method.

create_mode_package_response = sm.create_model_package(**create_model_package_input_dict)
model_package_arn = create_mode_package_response["ModelPackageArn"]
print('ModelPackage Version ARN : {}'.format(model_package_arn))

4.4.3 View model version#

To view your registered model in the model registry,

Step 1. Click on the "Sagemaker Components and registries" tab.

Step 2. In the drop-down list, click on "Model registry".

alt-text

Step 3. Click on your model group where you have registered your model.

alt-text

Step 4. Click on the version of the model you want to view. If you have just registered a model, it is the latest version.

alt-text

Step 5. Here you can view the history of the model status, associated metrics, ECR image URI, model artifact location, etc.

alt-text

Alternatively, you can also view the registered models using the Sagemaker Notebook, like this,

sm.list_model_packages(ModelPackageGroupName=model_package_group_name)

To view more details of your registered model is out of scope in this user documentation. For more info, consults its Developer guide.

4.5 Model hosting#

After training a machine learning model, you can deploy it behind an Amazon Sagemaker real-time hosted endpoint. This will allow you to make predictions (or called inferencing) from the model dynamically in a production environment.

We shall demonstrate how to

  • Create a real-time hosted endpoint
  • Use the real-time hosted endpoint
  • View the real-time hosted endpoints
  • Delete the real-time hosted endpoint

Note: It is important to realize that a real-time hosted endpoint will generate costs while active. Please delete this endpoint if you are not using it anymore.

For more info on real-time hosted endpoints, please go through the of the AWS Sagemaker example:

consult the SageMaker Python SDK docs or consult its Developer guide.

4.5.1 Creating a real-time hosted endpoint#

Assume that you have trained a model in Sagemaker Studio named my_model. You can deploy the model using

my_model_predictor = my_model.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")

given an initial instance count and the instance type of the public endpoint. This will return

4.5.2 Using the real-time hosted endpoint#

To use the real-time hosted endpoint, you can pass HTTP POST requests to get back predictions. However, using the Sagemaker Python SDK, this is all executed under the hood. To utilize the Sagemaker Python SDK for inferencing on the real-time hosted endpoint,

Step 1. Create a serializer and a deserializer using

from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import JSONDeserializer

linear_predictor.serializer = CSVSerializer()
linear_predictor.deserializer = JSONDeserializer()

Step 2. For inferencing a new_data csv-file, use

my_result = my_model_predictor.predict(train_set, initial_args={"ContentType": "text/csv"})
pred = my_result['predictions']

4.5.3 View the real-time hosted endpoints#

To view all the deployed real-time hosted endpoints in the Sagemaker Studio user interface (UI),

Step 1. Click on the Sagemaker component and registries tab.

Step 2. In the drop-down list, click on "Endpoints".

alt-text

Step 3. Here, you can see all the active real-time hosted endpoints. Right-click an endpoint and click on "Describe Endpoint".

alt-text

Step 4. A new tab will open, if you click on "AWS Settings", you can see information regarding the creation time, encryption key, etc.

alt-text

4.5.4 Deleting the real-time hosted endpoint#

A real-time hosted endpoint will generate costs while active. To remove the hosted endpoint, use

import sagemaker
sagemaker.Session().delete_endpoint(model_predictor.endpoint)

Alternatively, you can delete the hosted endpoint by specifying its name,

import sagemaker
sagemaker.Session().delete_endpoint("name_of_my_endpoint")

4.6 Sharing a Notebook#

It is possible to share your Studio notebooks with other users from your team. The shared notebook is a copy. After you share your notebook, any changes you make to your original notebook aren't reflected in the shared notebook and any changes your colleague's make in their shared copies of the notebook aren't reflected in your original notebook. If you want to share your latest version, you must create a new snapshot and then share it. The steps to share a notebook are described below.

Step 1: Select the Share option in the Main Working Space, which is located in the upper right corner.

alt-text

Step 2: You have the option of also sharing the Git repository information as well as the output of the model. This is entirely up to the preference of the user.

alt-text

Step 3: After the shareable snpashot is made you will receive a link that you can copy. To share the notebook you just send them this link and your notebook will be shared.

alt-text

4.7 Autopilot (AutoML)#

Amazon SageMaker Autopilot (AutoML) automatically builds, trains, and tunes the best machine learning models based on your data, while allowing you to maintain full control and visibility. In the development context, you might want to try a few popular machine learning models in the wild before tuning a specific model, AutoML is great for this use case.

Note: You an only use Sagemaker Autopilot when you have a tabular dataset in CSV format. The dataset should contain a header row indicating the feature names.

You can use AutoML by

  • using the Sagemaker Studio user interface (UI)
  • using Python code the Notebook

We shall demonstrate both options. Additionally we shall demonstrate how you can explore the Autopilot experiment using the UI.

For more information on the Autopilot feature of Sagemaker, please read this developer guide.

4.7.1 Autopilot using UI#

To run an Autopilot experiment using the Sagemaker Studio UI,

Step 1. Click on the folder tab in the Sagemaker Studio.

Step 2. Click on the "+" symbol in the folder tab to open a new launcher.

alt-text

Step 3. Click on "New autopilot experiment" in the launcher.

alt-text

Step 4. In the Autopilot experiment settings, provide an experiment name. Additionally, you can provide tags and specifying an exisiting project for managing your autopilot experiments.

alt-text

Step 5. Provide the location in the S3 bucket containing your training dataset. Then, provide the target attribute (the column you want to predict) of your dataset.

alt-text

Step 6. Provide the location in the S3 bucket and directory name for storing the result of your Autopilot experiment.

alt-text

Step 7. Select the machine learning problem type: auto (let the Autopilot decide for itself), binary classification, regression or multiclass classification.

Depending on the problem, provide its objective metric (the metric you want to optimize). Furthermore, you can choose to let the Autopilot run a complete experiment or only generate Notebooks to run afterwards.

alt-text

Step 8. Submit your Autopilot experiment by clicking on "Create Experiment".

alt-text

For more info on using the Autopilot using the Sagemaker Studio UI, please consult its Developer guide.

4.7.2 Autopilot using Notebook#

To run an Autopilot experiment using the Sagemaker Notebook,

Step 1. Provide information to Sagemaker Autopilot regarding the location of the training data, the target attribute name, the output folder for the result .

input_data_config = [{
      'DataSource': {
        'S3DataSource': {
          'S3DataType': 'S3Prefix',
          'S3Uri': 's3://{}/{}/train'.format(bucket,prefix)
        }
      },
      'TargetAttributeName': 'Churn?'
    }
  ]

output_data_config = {
    'S3OutputPath': 's3://{}/{}/output'.format(bucket,prefix)
  }

Step 2. Submit an Autopilot trial using

sm.create_auto_ml_job(AutoMLJobName="my_automl_job",
                      InputDataConfig=input_data_config,
                      OutputDataConfig=output_data_config,
                      AutoMLJobConfig={'CompletionCriteria':
                                       {'MaxCandidates': 20}
                                      },
                      RoleArn=my_role)

Step 3. Track the Autopilot trial using

describe_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)
print (describe_response['AutoMLJobStatus'] + " - " + describe_response['AutoMLJobSecondaryStatus'])

an example output of the above code cell can be

InProgress - AnalyzingData

indicating that the Autopilot trial is in progress and is analyzing the data.

Step 4. Extracting the best candidate model

best_candidate = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)['BestCandidate']

Other analysis on the Autopilot trial can be performed using for example describe_auto_ml_job, list_candidates and create_model method of the Sagemaker client sm. For more info on using the Autopilot using Sagemaker Python SDK, please go through the of the AWS Sagemaker examples:

consult the SageMaker Python SDK docs or consult its Developer guide.

4.7.3 Explore the Autopilot experiment#

To explore a submitted Autopilot experiment,

Step 1. Click on the Sagemaker component and registries tab.

Step 2. In the drop-down list, click on "Experiments and trials".

alt-text

Step 3. Right-click the Autopilot experiment you want to explore and click on "Describe AutoML Job".

alt-text

Step 4. A new tab will show up summarizing all the trials in the experiment. These describe all the models that have been trained on your dataset. Click on "Job profile".

alt-text

Step 5. Here you can view metadata of your experiment (creation time, problem type, status, etc.). Click on "Open candidate generation notebook".

alt-text

Step 6. Here you can view the auto-generated notebook of the experiment containing candidate model definitions.

alt-text

Step 7 Go back to your experiment tab and click on "Open data exploration notebook".

alt-text

Step 8 Here you can view the auto-generated notebook of the experiment containing results during the data exploration process.

alt-text

For more info on these auto-generated notebooks, please consult its Developer guide.

4.8 JumpStart#

Sagemaker JumpStart helps you easily and quickly bring machine learning (ML) applications to market using pre-built solutions for common use cases and open source models from popular model zoos.

Sagemaker JumpStart contains two parts:

  • One-click solutions: Completely worked out machine learning problems, deploying relevant AWS resources and infrastructure.
  • Model Zoo: Deploying a pre-trained model from Sagemaker.

We will describe both parts.

Note: It is important to know that created real-time hosted endpoints or AWS resources from one of these two parts of Sagemaker Jumpstart will generate costs while active. Please delete these endpoints or resources if you are not using it anymore.

For more info on Sagemaker JumpStart, please consult its Developer guide.

4.8.1 One-click solutions#

We will describe how you can launch an One-click solution and delete an One-click solution.

4.8.1.1 Launching an One-click solution#

To launch an One-click solution,

Step 1. Click on the Sagemaker Jumpstart tab.

Step 2. Click on "Browse Jumpstart".

alt-text

Step 3. Here, you can see the list of one-click solutions. For example, click on the on-click solution "Fraud Detection in Financial Transactions".

alt-text

Step 4. Here, you can read the description of the solution. Click on "Launch" to launch the solution.

alt-text

Step 5. Your solution will be deployed. When it is completed, you can open the associated notebook by clicking on "Open Notebook".

alt-text

alt-text

Step 6. Here, you can see the notebook that instructs you how to consume this one-click solution.

alt-text

4.8.1.2 Deleting an One-click solution#

A real-time hosted endpoint will generate costs while active. To delete an One-click solution,

Step 1. Click on the Sagemaker Jumpstart tab.

Step 2. In the drop-down list, click on "Solutions".

alt-text

Step 3. Click on the One-click solution you want to delete.

alt-text

Step 4. Click on "Delete all resources".

alt-text

4.7.2 Model Zoo#

We will describe how to deploy a model from the model zoo and how to delete a deployed model from the model zoo.

4.8.2.1 Deploy a model from the model zoo#

To deploy a pre-trained model from the model zoo,

Step 1. Click on the Sagemaker Jumpstart tab.

Step 2. Click on "Browse Jumpstart".

alt-text

Step 3. Here, you can view the different pre-trained models of Sagemaker. For example, click on "BERT Base Uncased".

alt-text

Step 4. Here, you can read the description of the model. Click on "Deploy" to deploy the model.

alt-text

Step 5. Your solution will be deployed. When it is completed, you can open the associated notebook by clicking on "Open Notebook".

alt-text

alt-text

Step 6. Here, you can see the notebook that instructs you how to consume the hosted endpoint of the model.

alt-text

4.8.2.2 Delete a deployed model from the model zoo#

A real-time hosted endpoint will generate costs while active. To delete a deployed model from the model zoo,

Step 1. Click on the Sagemaker Jumpstart tab.

Step 2. In the drop-down list, click on "Model Endpoints".

alt-text

Step 3. Click on the deployed model from the model zoo you want to delete.

alt-text

Step 4. Click on "Delete".

alt-text