Lesson video

In progress...

Loading...

Hello.

My name is Mrs. Jones and I'm really pleased that you've decided to learn with me today.

In this lesson, we will look what data science is and explore visualising data by looking at the benefits and the software that we can use to support this.

So let's get started.

Welcome to today's lesson.

Today's lesson is called Using data to support decision making, from the unit Data science, and by the end of this lesson, you'll be able to explain how visualising data can help identify patterns and trends to gain insights.

There are four keywords to today's lesson.

Data science.

Data science is extracting meaning from large data sets to gain insights that support decision-making.

Visualisation.

Visualisation is a variety of techniques used to illustrate a problem and/or its solution to make it easier to understand.

Insight.

Insight is the actionable understanding gained from analysing data.

Infographic.

Infographic is a visual representation of data, often involving pictures that reflect patterns and help tell a story.

There are three sections to today's lesson.

The first is Define data science, followed by Explain benefits of visualising data, and then Use a software tool to visualise data.

So let's start with Define data science.

Data science is extracting meaning from large data sets to gain insights that support decision-making.

The key elements of data science include: data collection, which is gathering raw data from different sources.

Data cleaning.

Data cleaning is fixing or removing inaccurate or incomplete data.

Data analysis.

Data analysis is using statistical and machine learning methods to explore data.

Data visualisation.

Data visualisation is creating charts, graphs, and diagrams to help understand data.

And decision-making.

Decision-making is using data insights to support or automate decisions.

TV streaming services use data science to recommend TV programmes and movies that you may like.

They analyse your watching history and compare it to what other users watched.

Let's have a quick check.

Which of the following best defines data science? A, the study of computers; B, a method of creating websites; C, using data to support decision-making; or D, solving mathematical equations with data.

Pause the video, consider your answer, and then we'll check it.

Let's check your answer.

The answer was C, using data to support decision-making.

Well done if you got that correct.

Sam asks, "Why is data science important?" Really good question.

Data science helps businesses make better decisions, it helps governments plan better services, and it helps healthcare providers improve diagnosis and treatment.

Your school wants to improve the lunch menu.

They've collected a survey from 200 students about their lunch preferences, satisfaction, and suggestions.

What could the school find out using the data collected? Well, the school could use the data to answer questions such as: what are the most or least popular menu items? How much should they charge for menu items? What could they do to improve the school lunches? Let's have a quick check.

Which of the following is not a likely use of data science? A, predicting weather; B, designing a logo; C, recommending products online; or D, detecting fraud in bank transactions.

Pause the video to consider your answer and then we'll check it.

Let's check your answer.

The answer was B, designing a logo.

Well done if you got that correct.

Let's do an activity, and you'll need your worksheet for this.

There are two parts to this.

The first is in two or three sentences, define what is meant by data science.

And the second is give a real-world example of where data science is used.

Pause the video, go back through the slides, use your worksheet, and then we'll check your answers.

Let's check your answers.

For the first part, in two or three sentences define what is meant by data science, they have the answer as data science is the process of collecting, analysing, and interpreting large amounts of data to discover patterns, gain insights, and make decisions.

And for a real-world example of where data science is used, we have a real-world example of data science is how a music streaming service recommends songs.

It analyses what a user listens to, skips, and likes, and then uses that data to suggest music then might enjoy.

Well then if you got those correct.

Let's move on to the second part of today's lesson, Explain benefits of visualising data.

What is this data showing you? What information can you extract from it? Does it tell you a story? We've got a table here with column headers and lots of data there.

Just take a moment to look at that, and you do have access to this, to have a look at what is this showing us? What can we extract from it? Does it tell us a story? Joseph Minard used these numbers in 1869 to find meaning and tell a story with the data.

The data you looked at relates to Napoleon's march on Russia in 1812.

The numbers by themselves don't tell much of a story, but Joseph Minard created what is widely regarded to be the best statistical graph of all time.

This visual representation shows the size of the French army as it advanced into Russia and then retreated.

It incorporates data on troops, geographical location, direction of travel, and temperature.

And this shows a zoom in on some of those areas with some of the figures that are attached to it so that you can see some of the data close up.

In 1854, there was an outbreak of cholera in the Soho area of London.

At the time, it was widely believed that cholera was caused by pollution in the air.

John Snow's observation of the evidence led to him discounting this belief, but he could not prove how people did become affected.

John Snow made a dot map of Soho.

The dots, or shaded-in parts of the map, represent where a cholera-related death had occurred.

You can see on the map there all the dots and how they are located in that area, and you can also see a key at the top that shows that the dot means deaths by cholera.

You can also see the yards to give you perspective of the size there.

What do the dots tell you about the cholera-related deaths in Soho? Pause the video, look at that map, discuss, and consider your answer.

What does it show us about cholera-related deaths in Soho? Let's check your answer.

The people who died were in a small area of Soho.

The dots are all very close together, and you can see that on the map, that all the dots are in one area and then become more dispersed as you move away from that central area.

Well done if you got that correct.

John Snow highlighted on the map the position of a water pump on Broad Street.

You can see the X there at the end of Broad, there's an X very clearly labelled.

It's also on the key at the top where it says X Pump.

So that was where that was located on the map.

What conclusions do you think John Snow was able to make using this visualisation? Pause the video, consider the position of that X representing the pump in relation to all those cholera-related deaths, and what do you think you can conclude from that? And then we'll check your answers.

Let's check your answer.

That the cholera outbreak was not caused by air pollution, but by people drinking water from the water pump on Broad Street.

Well done if you got that correct.

This data visualisation helped him to prove his theory that all the people that died had used this water pump for drinking water.

This map helped convince the local council to immediately remove the pump handle, and many lives were saved.

Data visualisations are visual representations of data, such as charts and graphs, intended to help an audience process the information more easily and get a clear idea about the data at a glance.

Infographics are visual representations of data often involving pictures that reflect patterns and help tell a story.

Infographics often include various visualisations.

We've got an example here on the right, which is an infographic on the impact of code clubs.

You can see the different graphs, the use of visual colours, using the different graphs, and also a map at the top.

There are lots of benefits of visualising data from large data sets, and these include: charts, graphs, and illustrations are sometimes easier to understand as they make the data easier to read and highlight patterns.

Presenting data in a visual format can help you to see relationships, how data changes over time, and it can even help you to spot errors.

Let's do an activity, and you'll need your worksheet.

Explain the benefits of visualising data.

Pause the video, go back through the slides, use your worksheet, and then we'll check your answers.

Let's check your answer.

Visualising data helps you understand large amounts of information quickly.

Presenting data in a visual format can help you to see relationships, how data changes over time, and can even help you to spot errors.

For example, if a company has data on what millions of people watch on a TV streaming service, they can use a pie chart or a bar graph to see which shows are most popular.

This makes it easier to spot trends instead of looking through thousands of rows of numbers.

Well done if you got that correct.

Let's move to the last section of today's lesson, Use a software tool to visualise data.

Analysing large amounts of data can be time-consuming and difficult.

Software tools can help you analyse data sets and create visualisations.

Using tools for data visualisation is helpful because they make it easy to turn big sets of data into clear charts, graphs, and infographics.

These tools save time, are easy to use, and often let you change colours or styles to make your work look better and more understandable.

In this part of the lesson, we're going to use some data about TV viewing figures.

You are going to visualise some data about the top TV programmes and how they were viewed, for example, on a TV or an other device.

Some sample data has been provided as an additional resource for this lesson, and it's called viewing-data.

csv.

You will provide insights on which devices, other than TVs, are most popular for watching TV programmes.

You're going to use an online tool for data visualisation, which can be accessed at oak.

link/datawrapper, and this is a screenshot of what you should see when you open that link.

You click onto Start creating.

To start, you need to upload your data.

Click on the XLS/CSV upload button and navigate to the viewing-data.

csv that you have saved onto your computer.

Remember, that's the file that we said we have provided, so you'll need to save that to your computer to complete this step.

And you'll see there on the left on the screenshot where it says XLS/CSV upload.

The data from the CSV file will appear in the preview panel.

You can see that on the screenshot there, where it has appeared in that area on the right.

Click onto Check & Describe.

You can see that at the top, that's now highlighted in red, section 2 Check & Describe.

This shows the data that you're going to visualise.

You may not want to include all the data in our visualisation.

Click on the column heading of the data you do not want to include.

In this case, we do not want to include the channel, as this is not needed for our investigation.

So you can do that, you click on the column header C to highlight that column.

Then click the check button Hide column from visualisation.

You can see that over on the left there where you have a little tick area and check box to tick, which will hide it.

And what you'll find then is this will then strikethrough the text in the selected column.

Let's have a quick check.

Is there another column you want to exclude from your visualisation? Here's a hint.

You are going to visualise some data about the top TV programmes and how they were viewed.

Pause the video, have a look at the data, and decide which column do we not want to include in our visualisation, and then we'll go through the answer.

Let's check your answer.

The answer is D column.

Broadcaster group is not needed for your visualisation, as it does not relate to the device used.

Well done if you got that correct.

Click onto the 3 Visualise tab across the top there.

So you can see number 3 Visualise is now in red.

The graph will default to a line graph.

What is the problem with this graph? Have a look at that graph, pause the video, consider your answer, and then we'll go through the answer.

Let's check your answer.

Because there is such a big gap or difference between the total and TV set viewings and the viewings on other devices, the graph isn't very helpful and doesn't tell us very much.

So how can we fix this? By removing TV set and total from the visualisation, like we did for channel and broadcast group, we can focus more on the data for the other devices.

Go back to step 2 Check & Describe and remove TV set from the visualisation.

Click 3 Visualise to generate the visualisation again.

The graph is more even now, but is this the best type of chart to help us understand the data? Let's do the activity, and there are four sections to this activity.

Follow the steps shown to upload the CSV file viewing-data.

csv and visualise the data.

Explore the graph options and select the one you think best communicates the information.

Explain the reasons for your choice.

Are there any other columns of data you could remove to improve the visualisation, and what insights have you gained from your visualisation? Pause the video, go back through the slides, use the data and the software tool, and then we'll go through the answers.

Let's check the answer.

I decided to use the grouped bars chart type, as this showed the viewing figures for each of the different types of devices in a format that was easy to understand.

I also removed the rank column from the data visualisation as it wasn't helpful for showing the information I was interested in.

The visualisation has provided me with the following insights.

"Ant and Dec's Saturday Night Takeaway" was the most popular programme.

Tablets were the most popular devices, other than TVs, used for viewing.

This is probably due to their good balance of portability and screen size.

Well done if you got those insights correct.

In summary, data science is extracting meaning from large data sets to gain insights that support decision-making.

Data visualisations are visual representations of data, such as charts and graphs.

Infographics are visual representations of data, often involving pictures that reflect patterns and help tell a story.

Some online tools can be used to create data visualisations.

Well done for completing this lesson, Using data to support decision making.

File you will need for this lesson

Download these files to use in the lesson.
  • viewing-data1.38 KB (CSV)