Loading...
Hello, my name is Mrs. Holborow and welcome to Computing.
I'm so pleased you've decided to join me for the lesson today.
In today's lesson, we're going to be comparing CSV and text files, and then we'll learn how to read data from a CSV file into a Python programme.
Welcome to today's lesson from the unit, "Programming Dictionaries and Data Files." This lesson is called, "Reading CSV files," and by the end of today's lesson, you will be able to read and structure data in a CSV file.
Shall we make a start? We will be exploring these keywords in today's lesson.
CSV, CSV, comma-separated values are a plaintext data file where each value is separated by a single comma.
Tabular, tabular, display data in a grid of rows and columns.
Strip(), strip(), a Python method which removes any leading or trailing white spaces or characters that you specify.
Split(), split(), a Python method that is used to break a string into a list based on a specified delimiter.
Watch out for these keywords throughout today's lesson.
Today's lesson is split into three sections.
We'll start by comparing CSV and text files.
We'll then read data from a CSV file.
We'll then finish by reading a CSV file to a 2D list.
Let's make a start and compare CSV and text files.
CSV stands for comma-separated values.
CSV files can be read by a range of different spreadsheet software packages, as well as Python files.
So here's an example of a CSV file called players.
In the file we have, Player name, Player score.
On the next line, we have Aisha, 0.
And then on the next line we have Jun, 2.
CSV files contain data structured so that a comma separates individual items in the file, and each record is on a new line of the file.
So, "Player name" here is an individual item.
The comma separates the individual items. And each line is a record, so "Jun, 2" is a record.
A CSV file can be created in a text editor by changing the extension of the file to.
csv.
You can also create CSV files in spreadsheet software by exporting them as a.
csv file.
When you open a CSV file in spreadsheet software, the commas are used as a guide to display the data neatly in rows and columns.
When a CSV file is opened in spreadsheet software, it is displayed in a tabular format.
So here we have the CSV file and, on the right hand side, we have how this would be displayed in spreadsheet software.
A CSV file is still a text file.
Each row is stored as a string.
Modern spreadsheet software will typically recognise a number and store it as that data type.
Python will not do this automatically though.
Any data is always read as a string and will need to be casted before it is used as any other data type.
Time to check your understanding.
What does CSV stand for? A, computer system variables, B, comma-separated variables, or C, comma-separated values.
Pause the video whilst you have a think.
Did you select C? Well done.
CSV stands for comma-separated values.
Each row in a CSV file represents A, an attribute, B, a record, or C, an item.
Pause the video whilst you have a think.
That's right, B, a record.
Each row in a CSV file represents an individual record.
This table compares some of the features of CSV and text files.
Let's have a look at each feature.
Format.
In a text file, the format is plaintext, which can be structured or unstructured.
In a CSV file, structured text is in a tabular form.
Feature, delimiter.
In a text file you may or may not use delimiters.
In CSV files, commas are used as delimiters.
Data structure.
A text file can contain any type of text, often unformatted.
A CSV file is organised in rows and columns.
Let's have a look at some common use cases.
Text files are commonly used for notes, logs, configuration files.
CSV files are commonly used for storing tabular data, like spreadsheets or databases.
Okay, we've come to our first task of today's lesson.
Task A.
For part one, explain the difference between a text file, a.
txt file, and a.
csv file.
For part two, give one example of a type of data that could be stored in a CSV file.
Pause the video whilst you answer the questions.
How did you get on? Did you manage to answer the questions? Well done.
Let's look at some sample answers together.
For part one, you were asked to explain the difference between a.
txt file and a.
csv file.
A CSV file is structured in tabular form and uses commas as delimiters.
Data in a.
csv file is organised into rows and columns.
A.
txt file can be unstructured and may not include delimiters.
For part two, you were asked to give one example of a type of data that could be stored in a CSV file.
A CSV file could be used to store usernames and passwords.
Did you think of any other examples? So, we've compared CSV and text files.
Now let's move on to read data from a CSV file.
A CSV file is just another text file, so you can use all of the same methods that you already know with a CSV file.
The only difference here, you'll see, is the file name has the extension.
csv rather than.
txt.
Using the read method, we will read all of the data from the file in the same way as it did with your text file.
So here we have the line, data = file.
read() That will read the entire contents of the CSV file.
So in this example, you can see our CSV file contains Player name, Player score and then it has Aisha and Jun and their scores.
The modes are also the same.
So you can use "r" for read, "w" for write, and "a" for append.
The difference with a CSV file is that you might want to perform different operations with it in comparison to a standard text file.
This means that you need to think carefully about how your data might be structured as it is read into your Python programme.
Data represented in a CSV file is often in tabular form.
This means that it's stored in rows and columns.
Lucas says, "What data structure could I use for data that is stored in rows and columns?" Can you think about what data structure might be appropriate? Maybe pause the video here and have a think.
Ah, a 2D list or a list of records.
Did you have that as your answer? Well done.
Reading data from a CSV file and holding it in a list is very similar to the method used for standard text files.
Here is a CSV file with one column of data.
So we have playernames.
csv and we have three names, Aisha, Jun, and Laura.
When you assign the contents of the file to a variable, it's held like this.
So here we have data and it holds Aisha, Jun and Laura, but notice, the new line characters after Aisha and Jun.
We don't have a new line character after Laura 'cause it's the last line in the file.
Just as with a standard text file, the new line character is included.
To assign this data to a list data structure, you need to append each line to a list.
So we can do this using a for loop.
So, for line in file: data.
append(line) The list, which we've called "data," now looks like this.
So you can see we have each item separated by a comma.
But notice we still have the new line characters here.
Each item is now in the list, but you still have an extra piece of data to deal with.
The new line character.
What method could you use to remove unwanted pieces of data? Is it A, remove(), B, pop(), C, strip(), or D, delete()? Pause the video here whilst you have a think.
That's right, the strip() method can be used to remove unwanted pieces of data.
For each line in the file, you need to strip() the new line character.
We can do this by adding the line, line = line.
strip() The strip() method defaults to removing any white space from the start and end of the string unless you specify the characters or arguments.
You now have a list with all of the extra pieces of data removed.
So we now have open square brackets for the list and then we have our three names, Aisha, Jun and Laura.
You can now interrogate this data as you would with any other list.
Okay, we're moving on to our second task of today's lesson and you're doing a fantastic job so far, so well done.
I'd like you to open the starter programme at oak.
link/game-scores.
Note, this programme already contains a CSV file called schools.
csv.
Create a programme that A, reads all the data from schools.
csv, B, appends the data from the file to a list, C, displays the highest score, and D, displays the lowest score.
Pause the video here whilst you complete the activity.
How did you get on? Did you manage to create your programme? Great work.
Let's have a look at some sample code together.
So on line one we are opening the file in read mode.
So, scores.
csv On line two, we're initiating a list called "data," but we're keeping it empty at this point.
On line four, we start our for loop.
So, for line in file: line = line strip() to remove the new line character.
And then on line six, we're appending the data to the list.
So, data.
append(line) We are then closing the file on line eight and on line 10 we have, datalength = len(data) Remember the LEN function returns the length of a list.
The program's then continued.
So, on line 12 we have another for loop.
So, for score in range(datalength): data[score] = int(data[score]) And then, we're using the sort method to sort the list.
So, data.
sort() That will put all of the scores in order from lowest to highest.
We can then work out which is the highest score and which is the lowest score.
So, highest = data[datalength-1] lowest = data[0] On line 20 and 21, we're then printing out these results to the user.
Did you have something similar or did you code it in a slightly different way? Remember if you need to pause the video here and make any corrections to your programme to get it to work, you can do that now.
So, we've read data from a CSV file, but now we're going to read a CSV file into a 2D list.
CSV files present data in tabular form.
In the last example, you used a CSV file with only one column of data.
You are now going to work with a CSV file that has two columns of data.
So this time we have the player name and then their score.
The players.
csv file has data stored in two columns and three rows.
If you use the same code as the previous activity with this CSV file, then the data is held in a single list.
So you can see we have, [`Player name, Player score', `Aisha, 0', `Jun, 2,'] all in the same list.
Each row in the CSV file is held as an item in the list.
This means that each item has two pieces of data held within it, and that isn't very practical.
Note that the commas from the CSV file are part of the string here and they're not separating the items in the list.
So actually, Player name and Player score are one item in the list.
For each piece of data to be held as an item, the data needs to be split() at the comma of each line.
The split() method takes a string and splits it into a list of items. Which method takes a string and separates it into a list of items? Is it A, split(), B, strip(), or C, slice()? Pause the video whilst you have a think.
Did you select A, split()? Well done.
The "words" variable holds a string of words.
So here we've got our string, `here are lots of words' The split() method splits the string where it finds a space and returns it into a list.
So we now have, `here are lots of words' as single items in the list.
The character that is passed as an argument into the method will be used to split() the string.
So in this case, we use the space character.
Changing the space to an "o" will result in a different list.
Here we have, [`here are L', `ts', `f w', and then, `rds'] Note that the letter "o" is also removed from the string.
How might the split() method help with the CSV file? Pause the video here whilst you have a think.
That's right.
The split() method can be used to split each line when it finds a comma in the CSV file.
By splitting each line into a list, you now have a 2D list.
So we have data, and notice we've got double square brackets now, [[`Player name', `Player Score'] close the first square bracket, and then we have the next list, which is [`Aisha', `0'], and then on the next line, the next list, which is, [`Jun', `2'] And then we have two square brackets because we're closing both lists there.
This will now allow you to perform operations on the data.
Okay, we've come to our last activity of today's lesson and you've done a fantastic job so far, so well done.
I'd like you to open the starter programme at oak.
link/display-scores.
Note, this programme already contains a CSV file called players.
csv.
For part two, create a programme that performs the following.
A, reads all the data from the players.
csv file.
B, asks the user to enter a player name.
C, displays the score for that player.
Hint, the in operator may help you with this programme.
Pause the video whilst you have a go.
How did you get on? Did you manage to create your programme? Well done.
Let's have a look at the sample answer.
I've split this over two slides so that we can see it a bit more clearly.
So, we start by opening the file, players.
csv in read mode and we initiate a list called "data," but at this point we're going to have it as an empty list.
We then go through each line in the file with a for loop.
So, for line in file: line = line.
strip() to remove the new space character.
And then we're using, line = line.
split(",") to split it with the commas.
And then we're appending that to line.
And then on line nine we're closing the file.
Okay, so here's the programme continued.
On line 11, we have a print statement which asks the user to enter the player's name.
And we hold that in a variable called "player" on line 12.
On line 14, we're setting an empty string variable called "location." On line 16, we start our second for loop.
So, for item in data: if player in item: location = (data.
index(item)) On line 20, we have, score = data[Location][1] And then, finally, we print out the message to the user.
So, print(f"The last score for {player} was {score}") Did you manage to create your programme successfully? If you need to, you can always pause your video here and make any corrections.
We've come to the end of today's lesson, "Reading CSV files." You've done a fantastic job to get this far, so well done.
Let's summarise what we've learned together.
Comma-separated values, CSV text files, group data together on individual lines and separate values in these groups by commas.
Each item in a CSV file is separated by a comma.
Each record is separated over a new row in the file.
The strip() method defaults to removing any white space from the start and end of a string unless you specify the characters as arguments.
The split() method is used to break a string into a list based on a specified delimiter.
I hope you've enjoyed today's lesson and I hope you'll join me again soon.
Bye.