Lesson video

In progress...


Hello everyone.

It's Mr. Millar here.

In this lesson, we're going to be looking at lines of best fit.

So first of all I hope that you are all doing well.

And last lesson, we looked at this idea of correlation between two variables.

So we looked at positive correlation, negative correlation and no correlation.

I also mentioned last lesson a couple of times about this idea of lines of best fit.

And in this lesson, we're going to look at that in more detail.

But anyway, to start off with, let's have a look at the Try this task.

So here we have got three different scatter graphs.

First of all, what's the same, and what's different between these three graphs? And second of all, can you come up with some examples of pairs of variables that fit these graphs? So pause the video for three or four minutes, write down a couple of sentences, and then we will have a look through.

Okay, great.

So let's talk through it.

So first of all, what's the same? Well, in each of these cases, we have got a positive correlation, which we had looked at last time.

So what was the correlation means is that as one variable goes up, so does the other.

So what's different? Well, you could have a thought of a couple of things here.

First of all, in the first two examples, we can see that all of the points are quite close together.

So we would say that this is quite a strong positive correlation in these two cases here.

And the third case, there's still definitely a positive correlation because we can see that as one variable increases so does the other, but the correlation isn't quite as strong because the points are fairly spread out.

So we would say that this is a fairly weak positive correlation.

Also what you could have been thinking that's different is that if you imagine drawing a straight line that fit these points as best as possible, you would find that they would intersect the X or Y axis at different places.

So for example, the one in the top right, you can imagine that if you drew a straight line, I'm not going to use a ruler, but if you were using a ruler to do a straight line like this, it would hit the Y axis here.

Whereas for this one, it actually hits the X axis down here.

And this one here would maybe go something like this and hit the Y axis like that.

So they all have a different intercept.

And finally, some examples of pairs of variables.

Well, for the first two, you need to think what two variables would be very strongly correlated.

So for example, you might have a think that the amount of time that the heating is on compared to the heating bill would give you a very strong positive correlation.

A slightly weaker positive correlation might be, for example, the amount of time spent studying for a test and the score on that test.

Clearly, if you study more you're likely to do better, but, you know, some people might get away with not studying so much.

So you might expect to see a slightly weaker correlation there.

Anyway, let's move on now to the Connect task.


So here is the connect task.

And in this scatter diagram, we have got the temperature outside against the number of ice creams sold.

Now, what we can do, I've done this a few times already, is we can draw a line of best fit going through the data and it would look something like this.

So a couple of points to mention first of all.

First of all, it is a nice straight line and it's one straight line so it doesn't kind of zigzag around going between the data points.

So if you draw your line of best fit like that, which sometimes might happen.

Definitely don't do that because you don't want to connect all the points together.

You want one nice straight line that is the best fit between those points.

The other thing to notice is that the line I've drawn, you've got approximately the same number of pieces of data above that line compared to below that line.

So if I look at the point above that line, we've got one, two, three, four, five, six, and you've got approximately the same number below that line.

So that is the second point to mention.

And the third point we talked about this idea of an outlier last lesson.

So down here, you know, it was a very hot day, but not many ice creams were sold.

Maybe that's because it was hot, but also rainy.

Sometimes that happens.

Mainly because the ice cream van didn't turn up till later, who knows? But anyway, when you're thinking about the line of best fit, you don't worry too much about that outlier.

The line of best fit you're thinking what's the correlation overall? What does the data look like overall? Let's draw the best possible line to go through it.

Anyway, that is how we draw the line of best fit and it's useful for a number of reasons.

First of all, number one, it more easily.

You can more easily see what the correlation between the data is as we talked about already.

Second of all, it helps us identify the outliers.

So any outliers like this one down here would be a long way away from the line.

And third, we can work out an expected relationship.

So let's talk through what does that mean? Work out an expected relationship? Well, let's have a look at what the first student is saying here, he's saying "I would expect that if it was 20 degrees out, "blank, ice creams would be sold." So what do you think? What would go in that missing gap there? Well, it turns out that we can use the line of best fit to work out an expected relationship between the temperature and the ice creams sold.

So say for example, if it was 20 degrees out, we would follow up the line, follow up the graph up to the line until we get to there and then see what number of ice creams that gets to, and we can see it gets to 40 ice creams. Now that's not saying for sure that if it were 20 degrees out we would definitely sell 40 ice creams. And the reason why we can't say that for sure is because the relationship isn't exactly perfect.

But we can say that we might expect that to be 40 ice creams by looking at the line of best fit.

So really important that the line of best fit is helpful for a number of reasons including predicting or expecting what might one variable might be If we know another.

Let's now have a look at the Independent task.


So three questions to have a look at here for the independent task.

The first question says, which of these lines of best fit is the most appropriate to use.

So we've got three different lines.

The data is the same in all three graphs, but you need to decide, which is the best line.

Second, use your line the best fit to predict the value of a car which is seven years old.

So whichever line of best fit that you think in question number one is the best to use.

Use that one to predict the value of the car, which is seven years old.

Third, state which car is an outlier.

Pause the video, it shouldn't take you any more than a couple of minutes to do these three questions.

Okay, great.

So let's go through this.

The first question, well, if you're thinking that the third line of best fit is the best, then well done.

Why is that? Well, we definitely don't want the second one because as we said before, we want it to be one nice straight line.

What's wrong with the first one? Well, if you have a look at the number of points above the line compared to below the line, there are many more points here below the line compared to above the line, so we don't want that.

The third one is the best one, because we've got a nice straight line of best fit through the data with the same number of points above and below it.

Second, use your line the best fit to predict the value of the car which is seven years old.

So we're going to use the third line of best fit that we decided was the best.

We're going to find at seven years old, which is around here, we're going to go up to our graph and we see that the value is 4,000 pounds.

Finally, state which car is an outlier.

Nice and easy, it's this one up here, which it looks like it's seven years old and 14,000.

That is definitely an outlier.


In the explore task, we're going to have a look to see why these lines of my fit might not always be perfect.

So let's now have a look at the Explore task.


So, Zaki collects data about the height and shoe sizes of some of his class.

What do you think of his statements? Draw a scatter graph and line of best fit to help you decide.

So here we've got 15 different pupils from A to O, and for each one at Zak has recorded the shoe size and the heights.

And he is making two statements about that.

What you need to do is first of all, draw a scatter graph using if you have it, some graph paper, if you don't have it you can use normal paper.

If you're not able to draw the scatter graph, then on the next slide, I will show you the scatter graph.

So pause the video now if you're able to draw a scatter graph.

if not, I will show you a scatter graph in a second.


So here is the a graph that reflects the points in the table on the previous slide.

So if you did manage to plot the scatter graph well done.

This is what it should look like.

If not, now is your time to have a think about the two statements that Zaki is making.

So I would expect someone 157 centimetres tall to have size seven shoes.

And someone hang in 35 centimetres tall to half the size size zero shoes.

Pause the video if you haven't already to write down a sentence which will fit each of these statements.

Okay, great.

So well done for doing that.

Let's think about the results.

So first of all, if I were to plot a line of best fit going through the data, I would plot something like that, where I can see that there are roughly the same number of points above the line compared to below the line.

And if I looked at my first statement, well, I can see someone who is 157 centimetres tall, would be around here on my graph.

And I could see that roughly they would have a shoe size of seven.

So I would agree with this statement and I would agree with it because I feel comfortable saying, "Okay, if my height was 157 centimetres, "I might have a size seven shoes." So that one I agree with.

But the second one, I would be much more capital about.

And can you have a think why? Can you think why I drew my line of best fit like this, rather than like this? Well, the reason why I only extended it to here and here was because that is roughly where the data I recorded lay.

So I don't feel comfortable, extending my line of best fit to above and below the data that I looked at.

And the reason that I don't is because the relationship between the two variables might look very different.

For example, if you looked at the shoe sizes and heights of some adults, maybe some adults are, you know, 180, 190 centimetres tall, and the graph would tell us that if someone was 190 centimetres tall, they might have size, I don't know, size 15 shoes.

But we know that there's very few people who have shoe sizes that big.

So you might not feel comfortable extending the graph for higher.

Higher heights.

The same is true when you think about extending the graph below a height of around 150 centimetres.

You haven't collected the data, so you shouldn't necessarily feel comfortable extending that line of best fit.

Because for example, someone 135 centimetres tall, well, not many people have a size zero shoe.

That may not even be possible.

So I would not feel comfortable with this statement.

So what this tells us is that you need to be careful with your lines of best fit because they may not always follow the same pattern if you extend it beyond the data that you have already gathered.

So you always need to think, does this make sense? Is it realistic to do this? Okay.

That is it for today's lesson.

Hope you've enjoyed it.

And I will see you next time.

Thanks so much for watching.

Have a great day and bye bye.