Lesson video

In progress...

Loading...

Hello, my name is Mrs. Holborow, and welcome to Computing.

I'm so pleased you've decided to join me for the lesson today.

In today's lesson, we're going to be exploring the term big data and what we mean by big data.

We'll also be exploring some of the challenges associated with big data.

Welcome to today's lesson from the unit Databases and SQL.

This lesson is called Big Data, and by the end of today's lesson, you'll be able to explain what big data is and describe some of the challenges it creates for processing and storage.

Shall we make a start? We will be exploring these keywords throughout today's lesson.

Shall we take a look at them now? Big data.

Big data.

Data that is too large and complex to be dealt with by traditional processing methods and systems. Volume.

Volume.

The total amount of data stored, processed and transmitted, typically measured in bytes.

Variety.

Variety.

The diverse range of data types and formats.

Velocity.

Velocity.

The speed at which data is created, processed and analysed.

Look out for these key words throughout today's lesson.

Today's lesson is broken down into two parts.

We'll start by defining the term big data, and then we'll move on to explain the challenges associated with big data.

Let's make a start by defining the term big data.

Can you think of any examples of data? Maybe pause the video whilst you have a quick think.

Did you think of some examples of data? Let's share some now.

So we could have purchases, location, mouse movements, search history, content you've streamed or downloaded, which adverts you've watched or clicked, demographic information, for example, age, gender, job, computer characteristics, so browser type, battery life, screen size, banking details, conversations, contact details, and medical records.

These are just some examples.

I'm sure you thought of some others too.

Big data is defined as large data sets that are analysed in order to identify patterns and trends, often used by organisations to better understand their customers.

Big data is a term used to cover all data that cannot be handled using traditional processing methods and systems. Big data is different from other data sets, because it has the following set of characteristics.

Volume, the capacity required to store the data exceeds a single server.

Variety, the data is very diverse.

Data can appear in different types.

For example, text, video, images.

And in different forms, structured, unstructured and semi-structured.

We're going to look at those in a bit more detail later on in the lesson.

Velocity.

The data is produced and/or processed at very high speed.

In other words, big data refers to enormous quantities of data that is generated and captured at breakneck speed, and most of it is unstructured.

Sam says, "I can't imagine all that data." Sofia says, "Think about social media platforms. They deal with millions of photos, messages, posts, and videos streamed by their users every single minute." That's a really good example there from Sofia.

Humans aren't the only ones creating data.

Internet of Things, or IoT devices, like smart thermostats or security cameras also collect lots of data using sensors.

These devices are used in places like smart homes, traffic systems, farms and weather stations.

They collect and send data all the time, which needs to be stored and processed quickly.

For example, a home security camera constantly checks for movement and sends alert to your phone when it sees something.

Time to check your understanding.

Which of the following are the three main characteristics of big data? A, speed, safety, source, B, volume, velocity, variety, C, storage, structure, scale, or D, value, viewing, verification.

Pause the video whilst you think about your answer.

Did you select B? Well done.

Volume, velocity, and variety are the three main characteristics of big data.

Big data sets are analysed using data analytics.

These are the techniques used to combine different types of data and extract meaning from them.

This might involve searching for common patterns in a data set or trying to characterise the behaviour or certain types of data subject.

Data analytics may involve machine learning, text analysis, predictive analysis, optimization problems, and cleaning and combining data sets.

Time to check your understanding.

We're going to fill in the gaps here.

So the words you have provided are volume, format, velocity, variety and size.

I'm going to read through the sentences now and then you can pause your video and think about what should be in the gaps.

It is estimated that popular social media platforms receive and process millions of posts every minute.

Therefore, a defining characteristic of big data is its.

However, what is more challenging is that big data lacks a predetermined.

Each post can contain text, images, or video combined in unpredictable ways.

Pause the video here whilst you fill in the gaps.

How did you get on? Did you manage to identify what should be in the gaps? Let's have a look at the answers together.

It is estimated that popular social media platforms receive and process millions of posts every minute.

Therefore, a defining characteristic of big data is its velocity.

However, what is more challenging is that big data lacks a predetermined format.

Each post can contain text, images, or video combined in unpredictable ways.

Okay, we're moving on to our first task of today's lesson.

Big data is used in many real world systems such as weather forecasting and traffic control.

For part one, I'd like you to define the term big data, and then for part two, I'd like you to describe two characteristics of big data.

Pause the video here whilst you complete the tasks.

How did you get on with the tasks? I'm sure you did a fantastic job.

Let's have a look at some sample answers together.

For part one, you were asked to define the term big data.

Big data refers to extremely large and complex sets of data that are generated at high speed and come in many different formats.

These data sets are too big to be handled by traditional data processing tools.

Big data is often analysed in order to identify patterns and trends.

For part two, you were asked to describe two characteristics of big data.

In this sample answer, we've chosen volume and velocity.

Volume.

Big data involves huge amounts of data.

For example, social media platforms collect millions of posts and messages every day.

Big data cannot be stored on a single server.

Velocity.

Data is created and collected at very high speed.

For instance, sensors in smart traffic systems constantly send live data that needs to be processed in real time.

Remember, if you want to pause your video to make any corrections, you can do that now.

Okay, so we've defined the term big data.

Let's now move on to explain the challenges associated with big data.

Traditionally, data is organised in relational databases that accept data based on a predetermined model with tables, fields and relations.

This type of data is considered structured as it follows strict predefined format.

So here's some example of structured data.

We have some customer information.

So we have a customer ID, we have the customer's first name, surname, their date of birth and their order ID.

Semi-structured data is not as predictable as structured data, but it does contain elements that can be used to identify the underlying structure of the data.

For example, CSV files are plain text files that use delimiters, typically commas, to separate different data values.

So here we have a CSV file called players and can see it's storing some information about players in a game, so we have player name, player score, and then we have the names of their players followed by the score.

Notice that each individual element is separated by the comma.

Unstructured data has no predefined format and cannot be modelled into distinct components before it's received.

This makes unstructured data much more difficult to collect, process and analyse than structured or semi-structured data.

Unstructured data comes from many sources such as emails, presentations, images, videos and audio content.

This data may be stored in a structured database, but all of the actual information that these files contain is not processed or understood.

Big data is mostly unstructured, so it doesn't fit into the row and column format of relational databases.

Even if it could be made to fit, relational databases don't work well when the data is spread across many servers.

Time to check your understanding.

What would the contents on a webpage be considered? A, structured, B, semi-structured, or C, unstructured? Pause the video whilst you think carefully about your answer.

Did you select C? Well done.

A webpage can contain images, videos, text, and other functionalities that are considered to be unstructured data.

The sheer volume of big data means that it is difficult to store and process data using traditional systems. Big data is typically processed through either batch processing where the data is processed in large chunks periodically, so for example, every hour or at the end of each day.

Or stream processing where data is processed as it arrives in real time.

Handling big data raises concerns about protecting personal and sensitive information.

Maintaining privacy and security of big data is challenging.

Managing and processing big data can be expensive due to the need for advanced technology, storage requirements and security measures.

Time to check your understanding.

We have a true or false statement here.

The sheer volume of big data means that it's easy to store and process data using traditional systems. Is this statement true or false? Pause the video whilst you have a think.

Did you select false? Well done.

But why is it false? The sheer volume of big data means that it is difficult to store and process data using traditional systems. Okay, we're moving on to our second task of today's lesson, and you've done a fantastic job to get this far, so well done.

For part one, I'd like you to explain the challenges associated with big data.

Pause the video whilst you construct your answer to the task.

How did you get on with the task? Did you manage to explain some of the challenges associated with big data? Well done.

Let's have a look at a sample answer together.

Big data refers to extremely large data sets that are difficult to store, manage, and analyse using traditional methods.

One challenge is storage, as big data requires large amounts of space and often needs cloud-based solutions.

Another issue is processing speed.

Analysing such large amounts of data can be time-consuming and requires powerful hardware.

Implementing and maintaining the infrastructure required for big data storage, processing and analysis can be expensive.

Finally, security and privacy are concerns, because sensitive data must be protected against misuse or cyber attacks.

There's lots of points there.

Did you have those in your answer? Remember, if you want to take some time to pause the video and add any extra detail to your answers, you can do that now.

Okay, we've come to the end of today's lesson, Big Data, and you've done a great job, so well done.

Let's summarise what we've learned together during this lesson.

Big data refers to extremely large and complex set of data that are generated at high speed and come in many different formats.

It is often categorised using the three Vs, volume, variety, and velocity.

Storing big data is difficult and often needs cloud systems. Processing big data takes a lot of time and computing power.

Big data raises security and privacy concerns due to the amount of personal data.

I hope you've enjoyed today's lesson and I hope you'll join me again soon.

Bye.