Loading...
Hello, I'm Dr.
Das.
Welcome to Computing.
I'm so pleased that you've decided to join me for this lesson today.
In today's lesson, we'll be looking at run length encoding.
Welcome to today's lesson from the unit Data Compression.
This lesson is called Run Length Encoding.
And by the end of this lesson, you should be able to describe and use run length encoding to compress data.
Some of the key words that we will be using in this lesson are run length encoding, run length encoding, a common type of lossless compression.
Data patterns, data patterns, parts that repeat or follow a regular order.
Redundancy, redundancy, repeated or extra data that isn't needed.
This lesson is made up of two sections.
The first section talks about describing run length encoding.
The second section looks at using run length encoding to compress data.
Let's start by looking at the first section where you will learn how to describe run length encoding.
Data compression is the process of reducing the size of a file.
To compress a file, compression algorithms are used.
Data compression algorithms work by removing unnecessary data or rearranging information more efficiently.
There are two categories of compression algorithms, lossy compression, such as MP3 and JPEG, and lossless compression, such as PNG and ZIP.
The choice of compression method will depend on factors like the need for data accuracy, the acceptable level of quality loss, and the importance of reducing file size.
Run length encoding is a common type of lossless compression.
It works by finding runs of repeated data patterns and replacing them with a single instance of the pattern and a number that specifies how many times the pattern is repeated.
The data in a bitmap image is stored as bits.
So run length encoding finds runs of repeated bit patterns.
Right, let's do a quick check before we move on.
Run length encoding is a lossy compression method.
What do you think? Is this statement true or is it false? Why don't you pause the video here and have a quick think.
The correct answer is false.
Run length encoding is not a lossy compression method.
It is a common type of lossless compression.
Well done.
A bitmap image can be used as an example to explain run length encoding.
If you look at the image here, the black pixels are represented with 00, and the white pixels are represented as 11.
Let's see how you can use run length encoding on a bitmap.
First, you identify a run of bit patterns or pixels that are the same.
So we are going to look at the first row of the pattern.
Next, you count the number of matching bit patterns in that run.
So if you look at that first row, the first run, you have three similar bit patterns.
Finally, you create a data pairing with the number of occurrences and the bit pattern.
So it'll be three, and the bit pattern is 00.
You can then repeat this for the entire bitmap representation.
So in that first row, in that first run, you have three instances of the pattern 00 and one instance of 11, and you record that.
The next line, the next run, has three 11s, and one occurrence of 00.
The next run has three occurrences of 00 and one occurrence of 11.
Finally, the last run has four occurrences of 11.
Run length encoding can be a useful compression technique.
However, it does come with some advantages and disadvantages as outlined in the table below.
So the advantages are that it is simple to implement and it's good for certain types of data.
The disadvantages are that it is inefficient if the data has few repeats, and it can increase the file size.
Note that run length encoding can be a useful compression technique, but it can actually increase the file size if not chosen carefully.
Encoding the string "Hello" With run length encoding results in one H, one E two L, one O.
Because the encoded version contains more characters than the original file, it requires more bits to store.
Right, let's do another quick check.
Run length encoding works by finding data patterns that repeat, and replacing them with a single value and count.
What do you think? Is the statement true or is it false? Why don't you pause the video here and have a think and try and work this out.
The correct answer is true.
Run length encoding is a compression method which works by finding data patterns that repeat and replacing them with a single value and count.
Well done.
Another check.
Run length encoding, and you've been given three options.
Why don't you pause the video again and see if you can work this out.
I'm sure you'll be able to.
Did you choose option A? You were right.
Run length encoding does work well on repeated patterns of data.
Well done.
This brings us to the end of this section.
Before we move on, why don't you make sure that you have understood what we covered in this section by attempting this task.
Task A is made of two parts.
The first part asks you to write two or three sentences to describe run length encoding.
The second part asks you to write two or three sentences to describe some advantages and disadvantages of run length encoding.
Why don't you pause the video here and try and work these parts out.
I'm sure you'll be able to.
Okay, let's look at the answers.
For the first part where you had to write two or three sentences to describe run length encoding.
Run length encoding is a type of lossless data compression.
It works by finding parts of data that repeat, like the same letter or pixel.
It then replaces the repeated values with just one value and the number of times it appears.
For the second part where you were asked to write two to three sentences to describe some advantages and disadvantages of run length encoding, you could have said run length encoding is simple to implement and can be an efficient way to reduce file size when data has lots of repetition, like in basic images.
It doesn't lose any data because it's a type of lossless compression.
A disadvantage is that if the data doesn't have many repeated values, it can actually increase the file size.
Well done on attempting that task.
Let's move on.
The next section looks at how you can compress data with run length encoding.
Jun says, "Sofia, I understand the idea of using run length encoding to compress file size, but I heard someone mention that it works by relying on something called redundancy? What's redundancy?" Sofia says, "Redundancy means that some data is repeated more than it needs to be." Run length encoding relies on a concept called redundancy.
Data redundancy is when the same data is repeated and can be stored more efficiently using compression techniques.
The more redundancy in the data, the more effective run length encoding becomes at reducing the compressed file size without losing any important information.
The previous example of run length encoding used a decimal number to state the number of occurrences used for each bit pattern.
So as you can see in this image, the first run, just for example, the first run has 3 as the number and 00 as the pattern that's repeating.
In reality, these decimal numbers need to be in binary in order for a computer to interpret them.
Before you can allocate a binary number for the number of occurrences, you need to know the maximum number of bits that are required to store that particular number.
You work this out by finding out the longest run length in a row that is possible in the bitmap image.
The longest possible run length for a four by four pixel image is four.
Three bits will allow you to store the number four as the binary number 100.
This means that you need to allocate three bits to store the number of occurrences.
You can then go through each decimal number and change it to its binary equivalent.
Make sure that you use three bits for each number.
So if we change row number one, 3 becomes 011 and 1 becomes 001.
So we are using three bits to represent each number.
The second row is 011, 11, and 001, 00.
Third row, 011, 00, 001, 11, and finally, 100, 11.
The number of bits must stay the same in order for the computer to know which bits are stating the number of occurrences And which bits are representing the colour.
Right, let's do a quick check.
Decimal numbers need to be in binary in order for a computer to interpret them.
What do you think? Is this statement true or is it false? Why don't you pause the video here and have a quick think.
The correct answer is true.
Yes, decimal numbers need to be in binary in order for the computer to interpret them.
Data redundancy is about data that, you've been given three options, why don't you pause the video here again And try and work this out.
The correct option is B, data redundancy is about data that is repeated and can be stored more efficiently.
Well done on working that one out.
Let's move on.
That brings us to the end of this section.
Just to make sure that you have understood what we covered in this section, why don't you attempt this task? Task B is made up of two parts.
The first part asks you to compress the bit map image using run length encoding and log the number of occurrences as a decimal.
The first line has been completed for you.
So in this image, black is represented as 00 and white is represented as 11.
So the first run has been done for you.
There are three instances of 11, two instances of 00, and three instances of 11.
Why don't you pause the video here and try and work this one out.
Let's look at the solution for the first part.
The first row was already completed for you.
So there are three occurrences of the pattern 11, two occurrences of the pattern 00, and three occurrences of the pattern 11.
In the next run, you have one occurrence of the pattern 11, six occurrences of the pattern 00, and one occurrence of the pattern 11 again.
The rule after that has two occurrences of pattern 00, then one occurrence of pattern 11, two occurrences of 00, one of 11 ending with two occurrences of 00.
The rule after that has eight occurrences of 00.
The row after that has one occurrence of 11, then two occurrences of 00, two occurrences of 11, two occurrences of 00, ending with one occurrence of 11.
The rule after that has two occurrences of 11, four occurrences of 00, and two occurrences of 11 again.
The rule after that has one occurrence of 11, six occurrences of 00, and one occurrence of 11.
Finally, two 00s, four 11s, and two 00s.
Well done on working that one out.
Let's look at the next part.
In this one, they're asking you to convert the number of occurrences in the encoded data you created in question one from a decimal value to a binary value.
As you can see, the first row has been completed for you.
Remember, we start by looking at the longest run.
In this case, the longest run is the eight occurrences of the pattern 00.
The first row has been completed for you.
Why don't you pause the video here and try and work the rest out.
You've been doing so well, I'm sure you'll be able to.
Let's look at how begin work this one out.
To start off, you must look at the longest run, which in this case is the eight occurrences of the pattern 00.
Now, eight in binary is written as 1000, so you need four bits.
So each number will be represented using four bits.
Keeping that in mind, row two looks like 0001, 11, 0110, 00, 0001, 11.
Row three looks like 0010, 00, 0001, 11, 0010, 00, 0001, 11, 0010, 00.
The next row looks like 1000, 00.
Row number five is 0001, 11, 0010, 00, 0010, 11, 0010, 00, and 0001, 11.
The next row is 001011, 0100, 00, 0010, 11.
The row after that is 0001, 11, 0110, 00, 0001, 11.
Finally, the last row is 0010, 00, 010011, and 0010, 00.
Well done on attempting this part.
You have done brilliantly.
Let's move on to the summary.
To summarise, run length encoding is a lossless compression technique that reduces file size by replacing repeated data patterns with a single value and a count.
Run length encoding uses the concept of redundancy to reduce file size.
Redundancy is when the same data is repeated and can be stored more efficiently using compression techniques without losing important information.
Run length encoding may not always reduce file size and may not be suitable for all applications.