Automating ML model for re-training and re-evaluation in AWS studio.

Lokesh Kum
4 min readApr 23, 2021

This story will answer below questions.

  1. How to re-train your model on every week’s or 14 days data file?
  2. What is lambda function(AWS) how to use it in ML projects?
  3. Alternatives of this approach.

So let’s understand that we created one ML model but now business decide to get this model on production and so we need couple of things before we move it out. This answer one very important question (and if you read all my other articles) that we are creating a statistical model and creating an application out of it which will provide insight/business to us.

Assuming you have created one ML model with good performance and now you wanted this model to get re-train every time you get new data .(frequency of the data file can be 1,2,3 or 4 week anything).

Step1.Get your file on S3 and create a trigger on it.

Step2.This will trigger the lambda function and the lambda will in-turn trigger the pipeline execution .

Step3.Here you will provide the pipeline details.

Once your pipeline get executed the you can wait for the model to run and give you a evaluation value to understand it form . (You can refer to my other article where I shared, how we were able to provide the custom performance matrix to the model build . )

This is the idea now lets visualize it.

High Level arch

Lamda function here will get a notification from S3 and will trigger the pipeline .

Let’s look @ the implementation of S3 notification.

Step4. Go to S3 and open and any bucket . On top you will see {Object, Properties, Permission..} → Click on properties and scroll down to ‘Create event notification’. Click on this and enter the needed details as shown below.

In event type enter the details like when should this notification occur. If some file gets uploaded or copied to the bucket or deleted from bucket.. As of now we will choose put, post, copy. For any file coming from glue job will get copied to S3 bucket will initiate a event notification.

Now choose destination, as in what this notification will do. Over here when we choose lambda it will run the lambda as soon as the file get copied on the location. Save Changes.

Step5. Go to Lambda function in AWS. Click on Create a lambda function.

Choose python3.X as per your need.

After the function got created select the “Add trigger ” event and add the trigger . It will show you to enter the trigger config → Enter S3 there and then fill in the bucket details where we have created the event notification.

After filling in all these details “Add” the trigger.

you can make changes to the below code as per your need.

Here the lambda function should have pipeline execution role which your cloud / solution architect will know how to add or refer to the aws articles on how to add the execution role .

Once this is done check it manually but uploading the file to S3 bucket, these couple of ways . Upload from pc or sagemaker notebook. Check the cloud watch or cloud console for the data points to get reflected.

Scenario1. No data points are getting reflected under cloud watch . This means the event notification is not correctly setup . Reconfigure it and upload the sample file to get the data points here.

Data invocation points will be shown here.

Scenario2. Data points are getting reflected but the pipeline is not getting executed. Check the name of the pipeline and execution role.

Once the pipeline gets executed and the model gets trained on the new file its time to check the evaluation score. Here the pre-defined metrics will show up the scores. Evaluate the model score with the last good model and take decision to approve or reject the model. see below

Custom metrics
Once the metrics looks good update the status of the model

Once you update the status of the model it will get implemented replacing the last good model so the endpoint can get the new model and hence we don't have to make changes to endpoint.

Alternative approach can be using step function of AWS and source code can be on our sagemaker notebook.

Thanks for your time

Lokesh

Data Scientist @Useready

--

--