# Lesson video

In progress...

Hello everyone, it's Mr Millar here.

Welcome to the second lesson on statistics.

And in this lesson, we're going to be looking at, forming and testing a hypothesis.

So first of all, I hope that you're all doing well.

In the previous lesson on statistics, the first lesson in this topic, we had a look at this picture, which we call the data handling cycle.

And I spoke a little bit about it, but really in assessing, we're going to be looking at it in a bit more detail, because the topic of this lesson is, as I just said, it's all about the hypothesis, which is the first step in the data handling cycle.

So it's really, really important because it's going to inform everything that follows.

So before we collect our data, before we analyse it, we need to know what we are actually looking at, and this is where writing a hypothesis is going to come in.

So let's find out what a hypothesis is.

So, a hypothesis it's quite hard to say, a hypothesis is an unproven statement which can be tested.

So it's a statement which can be tested and we're going to test it with our experiments.

So here's an example of a good hypothesis, something that is valid, something that we could use.

There are less bus, there are less buses running in the local bus service than there were five years ago.

So there are a number of things which make this a good hypothesis.

First of all, it's a statement, not a question.

So if instead it said, are there less buses running? Then that would not be a hypothesis because that is a question and a hypothesis needs to be a statement, not a question.

Secondly, it states exactly what we are testing.

So we're talking about buses, we're comparing them to five years ago, and we're talking about the local bus service so it's really, really clear.

It's not going to be too confusing or too complicated, it states exactly what we're testing.

Third, it's written with clear, simple language.

So yeah, again, it's clear, it's easy to understand, it makes sense, really important.

And finally, it's manageable and measurable.

So we're testing something that we can clearly have information on.

So, you know, we're counting on the fact that, there are records of the service five years ago, we can work that out.

If there weren't any records, then we wouldn't be able to run this kind of experiment.

So our hypothesis must be manageable and it must be measurable.

There must be some way to determine using data if this is true or not.

So let's have a look at some examples of hypotheses and you can think if they are good or not.

Okay, so here are four a hypothesis for you to have a read.

So pause the video now and have a read of these four hypothesis, which of them do you think are good ones, and which of them do you think there's a problem with? So pause the video for a minute or two and have a think about these four different hypothesis.

Brilliant.

So I hope you've had a look and let's discuss this.

So the first one, is it hotter in London than Brighton? Well, if you're thinking that hypothesis must be a statement rather than a question, then, well done, because you can see that this has a question mark here, so it's a question, so I'm going to say no, this is not a good hypothesis.

A better one would be something like London is hotter than Brighton.

That is a statement that is saying, London is hotter than Brighton, and that is, that would be a good one, that would be something that we could test.

Second, goals in football are more likely in the second half than the first.

Well, I actually think this is a good hypothesis, I don't see anything wrong with it.

There's lots of data that we could collect on football matches.

So we could tell, are goals more likely in the second half than the first, so, absolutely fine.

Next, cats are better than dogs.

Well, I don't really like this one, and I don't like this one because what does better really mean? It's impossible, it's impossible to tell, It's impossible to know, if cats are better than dogs, that just doesn't really mean anything.

We could say something like, people prefer cats to dogs, that will be fine because a preference is a thing, but saying cats are better than dogs, like there's no way we can judge that.

So that's not a good one.

Finally, teenagers prefer TikTok to Instagram, I think that's fine.

I think that's something that we could test or we can ask teenagers which one they want, I think there's nothing wrong with that.

So here are some examples of hypothesis and later in the lesson, you're going to be having a go at writing some yourself.

Let's move on to the connect task.

Okay so Xavier and Yasmin are discussing different sources of data they could use to test their hypothesis.

So we've discussed what a hypothesis is, and now we need to find somewhere to test it.

And there's a number of places that we could go to.

And there are six that these two students are discussing.

And what I want you to think about is what's the same and what's different about them.

So pause the video now to read through this six sources of data and have a think what's the same and what's different.

Could you group them into two or three different groups? Pause the video now, have a think.

Okay, great.

So there's lots of different things that you could be talking about here, you may see a similarity between maybe an online survey and an interview, those are two things which are fairly similar, I guess, interviewing customers would be, you know, a face to face interview, whereas an online survey would be something you did online.

You could see a similarity between a government website and a survey from a news article, maybe the two are often seen together.

But the real similarity and difference here is the fact that some of these sources are what we call primary sources of data and some are called secondary.

So what's the difference? Well, a primary source of data is something that you collect yourself.

So out of this six, which of these do you collect yourself? Well, if you're thinking, run a lab experiment, yeah, you do that yourself.

You conduct an online survey yourself, and one more, you interviewed customers yourself.

So primary is something that you do it yourself.

Secondary, are the other three.

What do you think is different about secondary? Well, if primary is data that is collected yourself, then secondary is something that's been collected by other people.

So the government website, that's data that's being collected by people that work for the government.

A news article has been written by journalists, and an academic journal, has been written by academics.

So really important, the difference between primary and secondary sources of information.

So, write this down, primary data is data, primary source is data you collect yourself, a secondary source is data that someone else has already collected.

Write this down, because the important, the difference is very important.

Okay it's time now for the independent task.

So three questions for you to have a think about.

In the first two, you have to write a hypothesis and answer the other question here.

So give yourself six or seven minutes to pause the video, read through these questions and write down a couple of sentences for each of them, what do you think? We're going to go through this after.

Pause the video now.

Great, I hope that this got you thinking, let's talk them through.

So the first one, Mr Millar is interested in whether boys or girls complete their maths homework more often.

Well, a hypothesis could be, boys complete their math homework more often than girls, or girls complete it more often than boys, it doesn't matter which way round we do it.

Remember, it's a hypothesis is a statement and we're interested in, whether boys or girls complete there maths homework, so this will do.

Well, primary data, primary source and secondary source.

Well a primary source, given that Mr Miller, that's me, given that I teach people at my own school, my own class, I could collect data from my own class, that would be data that I collected myself.

And secondary data, well that's data gathered from other people, so that could be data that other teachers at my school have gathered or data that other schools have gathered.

Second one, you're interested in whether London or Manchester is more rainy, what's the hypothesis and why should you use secondary? Well, hypothesis, London gets more rainfall, in millimetres per year than Manchester.

So making it really clear, now that I'm talking about the whole year, so that is a very clear hypothesis, and I would definitely do a secondary source because imagine using a primary source and going out, and, you know, collecting that data yourself, that would just be very, very silly, it would take a long time, you'd have to stand out there, collecting rainfall for a very long time, be very expensive and you shouldn't do that.

Secondary source, well definitely.

People collect weather data all the time, it's really, it's widely available, it's probably free, and as late as of weather reports, which have this kind of data, so you shouldn't collect data on rainfall yourself, because it's already out there, you don't need to do it yourself.

Question three, the UK Government is interested, whether people exercise more or less in the first month of lockdown.

So when you're watching this video, but, you know, explain why you couldn't use gyms as a secondary source.

Well, as you probably know, or remember, gyms were shut in the first few months of lockdown, so, no one could use the gyms. So, you know, if you're testing, whether people exercise more or less based on use of gyms, then that would be very silly because, you know, gyms were opened before lockdown, but not after, so this wouldn't work, it wouldn't give you a reliable result.

You could use a primary source.

So you could use a survey for example, you could ask people, how much they exercised before or after.

Let's move on to the explore task to finish off with.

Great so, just reminder, what are we dealing with? So in the last lesson, which you should have watched, I said that we are talking about a designer clothes brand, which I call, Cool Sports Inc, who want to hire a pupil to do some data analysis.

So in the previous lesson, I showed you some different types of data and I asked you to classify them.

We're building up to, a pitch or a report that we're doing and at the end of these four lessons.

And today they have some questions, and for each of these questions, they want you to, first of all, write it as a hypothesis, rather than a question.

And secondly, think about a primary or secondary source, you could use to test each of these hypothesis.

So here are the three questions, you can have a read of them.

So first of all, write it as a hypothesis instead of a question.

And second of all, have a think.

How could you find out, how could you test this? What source of data, what sources of data could you use? Okay, that is it for today's lesson, hope that you have enjoyed it.

And I will see you next time where we're going to be talking about sampling.

Thanks so much, take care, have a lovely day.

Bye bye.