Lesson video

In progress...

Hello, my name is Mrs. Holborow, and welcome to computing.

I'm so pleased you've decided to join me for the lesson today.

In today's lesson, we'll be improving a model by cleaning data to improve accuracy and adding a null class to prevent misclassification.

Welcome to today's lesson from the unit "Machine Learning Using the micro:bit." This lesson is called "Improving a Model," and by the end of today's lesson, you'll be able to make changes to how your model is trained to improve the accuracy of the output.

Shall we make a start? We will be exploring these keywords in today's lesson.

Misclassification.

When a model predicts the wrong class for a sample.

Data cleaning.

The process of removing incorrect, inconsistent, or irrelevant data to improve the quality of a data set before training a model.

Noise.

Unwanted or random variations in data that doesn't represent meaningful patterns.

False positive.

A sample that has been incorrectly classified.

Look out for these keywords in today's lesson.

Today's lesson is split into three parts.

We'll start by considering diverse data to reduce bias.

We'll then clean data to enhance accuracy, and then we'll finish by adding a null class to prevent misclassification.

Let's make a start by considering diverse data to reduce bias.

An AI model was trained to detect pneumonia in chest X-rays.

However, the model was not trained accurately on medical features related to pneumonia.

The training relied on hospital-specific information in the images, such as differences in scanner types.

The model performed well on test data from the same hospital but failed when tested on X-rays from different hospitals.

Training data needs to be diverse to improve representation.

Jacob says, "I've only used data from people that I know." Lucas says, "Your model may misclassify valid movements performed by others." That's a really good point, Lucas.

We need to have a range of people and get a range of movements from those individuals.

True or false? Diverse data should only be introduced when improving the model.

Pause the video whilst you have a think.

Did you select false? Well done.

The data you need for a model should be considered at the first stage.

You should start with thinking how to reduce bias and potential harm.

True or false? It would not be helpful to include training data from people with injuries.

Pause the video whilst you have a think.

Did you select false? Well done.

Training data should be diverse.

If you do not include data from everyone who plays the sport, their movements may be misclassified.

Time for the first task of today's lesson.

For your movement, think of how the data in your model could lead to incorrect conclusions or unfair decisions.

Pause the video whilst you have a think.

How did you get on? Obviously, your models are all gonna be different, so let's just have a look at a couple of sample answers from Lucas and Jacob.

Lucas says, "My training data only uses data from injured people." Jacob says, "This could lead to people being trained to move in an inefficient way." Did you manage to identify how your model could lead to incorrect conclusions or unfair decisions? Here's another example.

Andeep says, "My training data is only from people with one running style." Izzy says, "Some people may have unique techniques that are effective but misclassified as bad." Okay, we are now moving on to the second part of today's lesson.

We're going to clean data to enhance accuracy.

An outlier is a sample that significantly differs from the other samples.

The three samples here may be outliers.

That's because if you have a look at them carefully, these three samples differ from the rest of the samples.

The process of removing or replacing these outliers is called data cleaning.

Time to check your understanding.

Data cleaning can be achieved by: A, adding more data samples.

B, replacing outliers.

Or C, retraining the model.

Pause the video whilst you have a think.

Did you select B? Well done.

Data cleaning can be achieved by replacing outliers and removing them from our data set.

When data is clean, the model learns real patterns instead of noise.

This reduces bias caused by incorrect or skewed data and ensures the model works well in different situations.

True or false? The training data here contains an outlier.

Pause the video here and look carefully at the data.

Did you select true? Well done.

The X acceleration value at 120004 is 100, which is much larger than the other values labeled "jump." So it's an outlier.

Okay, we're moving on to our second task of today's lesson, and you've done a fantastic job so far, so well done.

I'd like you to describe the features of these three samples that make them outliers.

Pause the video whilst you complete the activity.

How did you get on? Did you manage to describe the features of these three samples? The red peaks in these three samples are not as high as the other samples, so that may mean that they're outliers.

I'd like you to now look through your training data for each class, and I'd like you to remove any outliers.

If you do remove outliers, add more training data to make sure that you have at least 10 samples for each class, and then retrain and test your model again.

Pause the video whilst you complete the task.

So I'm going to start by looking through the training data for each of my classes and see if I can spot any outliers.

If I scroll along, I can see that there are a couple that are probably outliers.

For example, here, the sample has quite flat movement of each of the axes, so I'm going to press the cross and delete this sample.

Again, this sample here is quite similar in the fact that there's no kind of dip at the start of the recording, so I'm going to delete that one too.

So that's two samples I've deleted.

I am also going to delete this one, 'cause again, this sample is quite flat all the way through.

So that's three samples I've deleted.

I'm going to replace those samples with three new recordings, which will hopefully be more accurate samples for good.

Okay, that last one wasn't great again, so I'm going to delete it and rerecord it.

That looks better this time.

I'm now going to have a look through the bad samples and see if there's any outliers here.

This sample here looks like a bit of an outlier 'cause it's got a very wavy line compared to the others.

So I'm going to delete this one.

I'm also going to delete this one here 'cause it's very flat compared to the other samples.

So that's two.

I'm going to rerecord those now.

Okay, so I've got two more recordings.

I'm going to click train my model and retrain the model again.

(computer mouse clicks) Okay, we are now moving on to the final part of today's lesson, where we are going to add a null class to prevent misclassification.

In this example, the model classifies every input as one of the existing classes, even if the data doesn't fit either class well.

So the four samples in the bottom left have been misclassified.

Adding a null class allows the model to recognize when data does not fit existing categories, reducing the chances of incorrect or forced classification.

So you can see here we've added a null class, which is called no movement.

Those four samples in the bottom left have now been classified as no movement, reducing false positives for class A and class B.

Time to check your understanding.

A false positive is: A, output data that matches more than one class.

B, input data that has been misclassified.

Or C, training data that always shows up in the output.

Pause the video whilst you have a think.

Did you select B? Well done.

A false positive is input data that has been misclassified.

A null class can be used to reduce: A, uncertainty.

B, inputs.

Or C, training.

Pause the video whilst you have a think.

Did you select A? Well done.

A null class can be used to reduce uncertainty.

Okay, we're moving on to our final task of today's lesson.

I'd like you to add a null class to your model.

Call the class "null" and add 10 samples of no movement.

What do you notice when you retrain and test your model? Pause the video whilst you complete the task.

I'm now going to add a null class to my model.

So I'm going to click the add action button at the bottom, and this time I'm going to call the class null.

I'm then going to record 10 samples of no movement.

(computer mouse clicks) So you can see I'm trying to keep the micro:bit still to record the null movement.

I've done three.

I'm going to carry on until I've recorded 10 samples.

I now have 10 samples of my null recording, so I'm going to train the model again.

So I'm gonna click the train model button, and I'm gonna hit start training.

My model will then hopefully start training.

(computer mouse clicks) I'm now going to test my model again and see if the estimations are a little bit more accurate.

You can see I'm holding the micro:bit still at the moment, and the estimated action is null, so that's a positive sign.

Let's start doing some movements.

That was a good shot.

Another good shot.

Okay, let's try some bad shots.

Okay, so we're getting a bit of a mix there.

We're getting a bit of unknown, we're getting a bit of bad, a bit of good.

If I hold it still again, it does classify it as a null class, though.

How did you get on? Did you manage to add the null class? Well done.

Here's a sample answer.

My model correctly predicts when there's no movement.

The sample should be classified as "null." Did creating a null class improve your model? Okay, we've come to the end of today's lesson, and you've done a fantastic job, so well done.

Let's summarize what we've learned in this lesson.

Data cleaning can remove outliers so the model learns real patterns instead of noise.

False positives occur when data is misclassified.

Creating a null class reduces the chances of misclassification, as the model has new patterns against which input data can be compared.

I hope you've enjoyed today's lesson, and I hope you'll join me again soon.

Bye.

File you will need for this lesson

Download these files to use in the lesson.

microbit-Oak1.29 MB (HEX)

I've finished the video