Loading...
Hello, my name's Mrs. Jones, and I'm really pleased you decided to join this lesson today.
In this lesson, we will look at the investigative cycle and how this can help us work through from the initial problem to conclusions using data.
So let's get started.
Welcome to today's lesson.
Today's lesson is called the "Investigative Cycle" from the unit "Data Science," and by the end of this lesson, you'll be able to solve a problem by implementing the steps of the investigative cycle on a data set.
There are three key words to today's lesson: "PPDAC." A framework for us to follow when asking and answering real-world problems using data.
"Correlation." A correlation is simply measuring how two things move together.
"Outlier." An outlier is a data point that significantly differs from the rest of the data in a data set.
There are three sections to today's lesson.
The first is identify the steps of the investigative cycle.
The second is implement the investigative cycle on a data set.
And the third: use findings to support a recommendation.
So let's start with: identify the steps of the investigative cycle.
We've investigated data sets to see patterns or to extract meaning.
The PPDAC cycle is a framework that helps us ask and answer real-world problems using data.
And you can see using this illustration here, we have problem, plan, data, analysis, conclusions, and the first letter of each one of those is the PPDAC.
We start with the problem.
Pose a question that you think the data will help you to answer.
Context is important when framing questions.
Sam has posed a question: "What is the average number of goals scored in the first half for teams in the Premier League?" In Sam's example, the question includes variables that can be compared with one another.
The plan.
Involves working out: where will we get the data from? How will it be collected, if we are going to collect it ourselves? The steps of the plan may include: predict an answer, find a data set, evaluate the quality of data, and plan how to collect the data.
Let's have a quick check.
What does the P in the first step of PPDAC stand for? Is it A, plan; B, problem; C, hole; or D, pattern? Pause the video to consider your answer, and then we'll check it.
Let's check your answer.
The answer was B, problem.
Well done if you got that correct.
The data is the next step.
In this step, we gather the data.
Once we have the data needed to help us answer the question, we should look through the data to see if it needs cleaning, and that means, is there anything we need to detect and correct, or remove corrupt or inaccurate data? Analysis.
This step is all about making sense of the data.
To do this, you need to visualise the data, spot any patterns, trends, correlations, or outliers.
Write down your observations of what the data is showing you.
Conclusions.
What's the answer to your question? How does the data help prove the answer? Is the answer reliable? What can we do with the results? For example, can we use this data to make a case for action, or has it led to further questions that need to be answered? Let's do a quick check.
What is the missing stage of the PPDAC cycle? You can see them on the right in that diagram.
Which is the missing one in that green section? Pause the video to consider your answer, and then we'll check it.
Let's check your answer.
The answer was analysis, the step where you make sense of the data.
Well done if you got that correct.
Let's do an activity, and you'll need your worksheet.
Name the five steps of the PPDAC cycle.
Briefly describe each step and explain how each one could be applied in a real-world situation.
Pause the video, go back through the slides, use your worksheet, and then we'll go through and answer.
Let's check your answer.
The first one is problem.
This is where we start by asking the question we want to answer.
For example, what's the most popular fruit in our class? Plan.
Now we decide how to find the answer.
We might plan to ask everyone their favourite fruit and write down their answers.
Three, data.
Time to collect the data.
We go and ask our classmates about their favourite fruits and note down their answers.
Four, analysis.
Here we look at all the data we've collected.
We might count how many people like apples, bananas, or oranges, and maybe even make a chart to visualise the data.
Five, conclusion.
Finally, we figure out the answer to our original question.
If most people like apples, then apples are the most popular fruit in our class.
That was an example of how to complete that activity.
Yours might be different.
But well done for completing that activity and looking at those five steps of the investigative cycle.
Let's move on to the next step: implement the investigative cycle on a data set.
River Kingdom is a new theme park that is opening in the UK.
They want you to recommend design considerations that will help make a great experience for the visitors.
One of the main restrictions that they know of is that they can't build a rollercoaster over 350 feet tall due to limitations of the site.
Andeep says, "We could ask people what makes a really cool rollercoaster." Is Andeep's suggestion a good idea? Consider if that question is a good one.
Well, Andeep's suggestion would be considered a poorly defined problem.
It doesn't clearly state what we are measuring for the term "cool." What variables about rollercoasters could we measure in order to make the design recommendations? Alex says, "We could use speed.
How fast do other rollercoasters go?" Sam suggests, "We could use height.
How tall is the tallest rollercoaster?" And Izzy says, "We could use the number of twists and loops, inversions." Some really good suggestions.
Let's have a quick check.
We can get data on existing rollercoasters.
What other variables or features could we measure? Pause the video and consider what are the variables or features we could measure.
And then we'll go through the answer.
Let's check your answer.
We could check speed, height, drop; number of twists and loops, inversions; the length or distance of the rollercoaster; the duration, how long it lasts; or the position.
Are we sitting down or are we suspended, for example? Well done if you got those correct.
You can get the data from the website CODAP, which you can also use to visualise the data.
You can see a screenshot on the right here of what it looks like.
The data is on the left, in the middle we have the graph, and on the right there is a tutorial to help you work through.
If you click an individual dot, it will show you which rollercoaster this refers to in the data.
So you can see in the graph there are lots of little dots, so if you click on one of those, it will highlight up which part of that data, where is the data that meets and refers to that dot.
If you scroll across the data table, you will see additional columns of data.
You see the column headers across there, additional columns of data are there, and the scroll bar is along the bottom of that data.
You can change the axis of the graph by clicking a column header and dragging it to the graph's axis.
You can see highlighted there, the click a column heading and drag it to the axis on the graph.
So drag it over to the dots.
And you can see on the left and the right, you have the X and the Y axes.
In this example, the maximum height has been dragged across to the vertical axis, the Y.
Notice that the dots have changed.
You can also see that max height in feet has also replaced on the left with numbers 0, 100, 200, 300, 400, so you can see that that data has been dragged across.
But at the moment, along the bottom, along the X, the horizontal axis, there is nothing there at the moment.
To get some useful data, we need to compare two features.
In this example, top speed has been dragged onto the horizontal axis, and now you can see on the graph in the middle that we have top speed across the horizontal from 0 to 140 in miles per hour, and the max height is on the left on the vertical Y axis from 0 to 400 in feet.
So now we can see there is a correlation between the maximum height and the top speed.
Alex says, "That makes sense.
The higher the rollercoaster goes, the faster it is due to the drops." A correlation simply measures how two things move together.
A positive correlation means two things move in the same direction.
When one goes up, the other tends to go up too.
When one goes down, so does the other.
A negative correlation means that as one variable increases, the other tends to decrease and vice versa.
Note: just because there is a correlation, it doesn't necessarily mean that one thing causes the other to change.
Let's have a quick check.
Why should we ignore the two circle dots on the graph for our analysis? Pause the video, have a look at that graph, and consider why we should ignore those two dots, and then we'll go through the answer.
Let's check your answer.
These rollercoasters are over 350 feet high, which is over our size limitation.
The data on these rollercoasters is not useful for our investigation.
Well done if you got that correct.
If you want to compare a third value, you can drag the column heading into the middle of the graph where the dots are, and it will color-code the dots based on the data.
This graph now categorises wooden and steel rollercoasters, and you can see that there is now a key at the bottom where we can see the colours; that if it's a pinky red, it is wooden, and if it's that green, it is a steel one.
Let's do an activity.
Write two precise questions to help us find the answer to the larger question of what makes a cool rollercoaster.
What data do you need to collect to answer the question? Use the website at oak.
link/codap to visualise the data and help you find answers to your questions.
And what does the data tell you? Remember, we cannot build a rollercoaster over 300 feet.
Pause the video, use your worksheet, and use that link, and then we'll go through possible answer.
Let's check your answer.
So in this one, the question is: what are the fastest rollercoasters made from? The data needed is we will need data on the height and speed of the rollercoasters and what material they are made from.
The visualisation is that graph there.
We have the maximum height and top speed, and the type: wooden and steel.
What does the data tell you? The data shows that the taller the rollercoaster, the faster it travels.
It also shows that the majority of the fastest rollercoasters are made from steel.
Well done for getting that correct, and I hope your visualisations and your conclusions have come up well too.
Let's move on to the last part of this lesson: use findings to support a recommendation.
Izzy says, "How can I present my findings?" You could present your findings as a report, a presentation, a visualisation, or a combination of those three.
Reports.
A report puts facts and ideas in order so anyone reading it, like your teacher or classmates, can see what you found out.
They can see what the important findings are and what should happen next.
Presentations.
A presentation can allow you to summarise the findings of your report to present to a group.
Presentations allow for the clear and structured delivery of information, making complex ideas more accessible and understandable for the audience.
Visualisations.
Pictures and charts can help us understand information faster than just reading words or numbers.
When we look at data in a graph or infographic, it's easier to see patterns and see trends in the data.
Imagine looking through all the data that was used to create that graph and finding out the information that we can see very quickly by just looking at all those dots there, and see and understand, and come to a conclusion a lot quicker.
Let's have a quick check.
What is the primary purpose of a report? Is it A, to advertise a product or service? B, to entertain the audience? C, to present information and findings? Pause the video to consider your answer, and then we'll check it.
Let's check your answer.
The answer was C, to present information and findings.
Well done if you got that correct.
Look carefully at the red dot highlighted in the visualisation.
What does this dot represent? The red dot is an outlier.
Most of the fastest rollercoasters are made from steel, but this wooden rollercoaster is one of the faster rollercoasters in the data set.
An outlier is a data point that significantly differs from the rest of the data in a data set.
Outliers can happen for a variety of reasons, including data entry errors, measurement errors, or genuine variation in the data.
Identifying and understanding outliers is important, as they can influence the decisions you make and may indicate anomalies or errors.
Let's do an activity.
Write a conclusion based on your findings and make a recommendation to River Kingdom.
Consider: what is your recommendation? How does the data help support it? Are there any outliers in your data? Is this data enough to support your recommendation, or is there any further action or research that they should do? Pause the video, consider your answer, create your conclusion, and then we'll go through one.
Let's check your answer.
Based on my research, my recommendation is that River Kingdom should build a steel rollercoaster between 250 feet and 300 feet tall.
The rollercoaster should have a top speed of 90 miles per hour.
This would make it one of the tallest and fastest rollercoasters, which would make it popular for visitors to the theme park and thrill seekers.
Well done for completing that activity.
In summary, the PPDAC cycle is a framework that can be followed when asking and answering real-world problems using data.
A correlation shows that there is a relationship between two or more variables, but that doesn't guarantee that one causes the other.
Data that sits outside a trend is known as an outlier.
Well done for completing this lesson on the investigative cycle.