Loading...
Hello, my name is Mrs. Jones and I'm really pleased that you're here to learn with me today.
Today's lesson is called "Substrings" and we're going to look at how to identify substrings, check the contents of a substring, and how to use these areas in Python programme code.
So let's get started.
Welcome to today's lesson.
Today's lesson is called "Substrings" from the unit "Programming: strings and lists." And by the end of this lesson, you'll be able to slice a string to create a substring and then use list methods on the substring.
There are three keywords to today's lesson.
Substring.
Substring is part of a string.
Slice, slice is the process of extracting a specific substring.
And ASCII, ASCII is a character set that represents each character with a unique numerical value.
There are two sections to today's lesson.
The first is "Identify substrings" and the second is "Check the contents of a substring." So let's start with "Identify substrings." A substring is a part of a string.
Separating a string into substrings is often called string slicing.
You can see here the word something is a string, and they are, when we slice it, there are now substrings.
And that process of cutting it, slicing it in half in different sections is called string slicing.
Andeep is asking, "Why do you need to know about substrings?" Well, they can be used for many purposes, and these include validation.
For example, if we want to check an email address or web address contains the right parts.
Creating usernames from first and last names.
For example, 24SmitGeo.
Identifying file extensions.
For example, checking if a file is a PDF or a JPEG.
And detecting inappropriate words in user input.
Let's have a quick check.
Fill in the gaps.
A something is part of a string.
Separating a string into substrings is often called something.
Pause the video, go back through the slides to consider your answer, and then we'll check it.
Let's check your answer.
A substring is a part of a string.
Separating a string into substrings is often called string slicing.
Well done if you got that correct.
In Python, you can use the square bracket syntax to slice a string into substrings.
Instead of using one index number, you use a start index and a stop index.
So here we have a programme that says word = "HELLO" first_two = this is the syntax here, word, so we have the variable that has being set up to store the string, and in the square brackets you have the start and the stop, 0:2, and the colon is important, it's part of the syntax.
And then print(first_two) which will output HE.
And Izzy points out, "So it includes the letter at the start index and goes up to, but doesn't include, the stop index." So even though it says 2 as the stop, it does not include that position, that index.
It only outputs what's stored in index 0 and 1.
Let's have a quick check.
What would be printed out after running this programme? So we have the variable word storing the string "HELLO." We have sub_string = word and then in the square brackets we have 1:4 and print(sub_string) So would it be A, ELLO? Would it be B, ELL? Or C, HELL? Pause the video to consider your answer and then we'll check it.
Let's check your answer.
The answer was B, ELL.
You can see there that the start position was 1, so we're starting at the E, and the end position was 4, which we do not include, so it's up to 4.
So it would be the letters E-L-L stored in positions 1, 2, and 3.
Well done if you got that correct.
Let's do an activity and you'll need your worksheet for this one.
Complete the missing values in this table if the string in the variable word is "pineapple".
So on the left you have the Python instruction and on the right you have the string produced.
The first one, the first row, is fully completed as an example.
So in the square brackets you have the start position 0 and the end position 2, which will output "pi".
Where the Python instruction is given, fill in the right-hand side of what will be produced.
And where the right-hand side is completed and you have the string that's produced, you need to complete the Python instruction that you would need to get that output.
Pause the video, use your worksheet, go back through the slides if you need to, and then we'll go through the answers.
Let's check your answer.
So for the second one, which had the Python instruction word[2:5] it would output the letters "nea".
The third one, word[5:8] would output "ppl".
For the Python instruction with the start position 4 and end position 9, you would have the output "apple".
On the last four, we had the string that was produced, so when the string "pp" is produced, the code to get that would be word[5:7].
To get the string "ine" you would need word[1:4].
To get the string produced "ple" you would need the Python instruction word[6:9].
And to get the string "pin" you would need the Python instruction word[0:3].
Well done if you got those correct.
Let's move on to the second part, looking at checking the contents of a substring.
A substring is just another string and its contents can be checked in the same way.
For example, we can use the double ==, which is equals two, exactly equal to, to check equality.
So in the first example, we have the word "banana" being stored and we've got first_two = word and we have the start position 0 with the end position 2.
last_two = word[5:7] and then print(first_two == last_two) So we're checking if the first two letters match the last two letters.
Now we can see looking at that that it doesn't and the output is False because it does not match.
The second one has got the word "onion" and first two against the last two again, we're going to check if they match.
And this time it is True because at the start you have the O-N and at the end you have O-N.
It is checking if the first two letters are the same as the last two.
We can use in to check if a substring is contained in a string.
So on the first one here, we have the word "asparagus".
first_three = word[0:3] print("sp" in first_three) And the output for that would be True because we're checking if the first three letters contains what's inside those quote marks, the S and the P, and that is True.
On the second one, we have the word "broccoli" and exactly the same, it's now looking if "sp" is in the first three letters, and the answer is False.
We can also use for to iterate through the characters of a substring.
So we have the phrase "Eat your greens".
Remember the spaces are also part of that string.
And we have the last_word = phrase[9:15] So we're looking at the last word here.
for char in last_word: print(char) Okay, so we're looking at that char is short for character, so that's what we're using here, char, which is storing its temporary variable within the loop.
And it's outputting G-R-E-E-N-S.
Because it's a for loop, it will output them vertically.
It's iterating over the characters in the last word of that phrase.
Let's have a quick check.
What would be output by the following Python programme? And we have the word = "strawberry" substring = word[5:10] for char in substring: if char in "aeiou": print(char) Would that output A, which would output a followed by e? Is it B, and it would output a? Is it C, would it output e? Or is it D, nothing, it would cause an error? Pause the video to consider your answer and then we'll check it.
Let's check your answer.
The answer was C, it will output e, because that is the only vowel that is in the positions 5 to 10.
Well done if you got that correct.
ASCII is a character set used to represent all of the characters on a standard American or English keyboard.
Each character has a unique code and you can find the codes by looking at an ASCII table.
And you can see here that you have the space, you have symbols, numbers, uppercase and lowercase letters.
You can see here, so the capital A is represented by a decimal number, 65, or a hexadecimal number, 41.
For each character in the table, the decimal code increases by one as the alphabet increases.
So you can see here A, decimal 65, hex 41.
B is 66 and 42.
C is 67 and 43.
What will the decimal and hex values of D be? D would be 68 and hex would be 44, and we can see the increase by one on each column.
Python has two functions that perform ASCII conversions.
On the left here we have Python expression, so we have chr and inside the brackets the number 97.
And that will take a decimal number, in this case 97, and return its character equivalent.
So on the right we can see what it will return, that will return the letter a in lowercase.
For the python expression ord open brackets and in these speech marks, in quote marks we have "a" close brackets that's gonna take a character and return its decimal equivalent.
So it will output there, the equivalent is 97.
Let's have a quick check.
What would be output by the following Python programme? character = "A" and it's a capital A.
print(ord(character)) decimal = 66 print(chr(decimal)) Pause the video.
You might want to go back and look at the ASCII table to consider your answer, and then we'll check it.
Let's check your answer.
The answer was A, 65 and B.
Well done if you got that correct.
Let's have a look at an activity.
For the first part of this activity, I want you to write a Python programme that asks for a character.
Checks if the user has only entered one character.
If they have entered only one character, check if the character is a vowel.
If it is, it prints out "It is a vowel".
Otherwise, it prints out "It is not a vowel".
In both cases, it prints out the ASCII value of the character.
If the user has entered more than one character, the message "Invalid input" is shown.
Pause the video and have a go at creating that Python programme, then we'll go through a solution.
Let's have a look at a solution.
So we have here char = input("Enter a character: ") We have vowels = and inside the quote marks notice that we have the upper and lowercase AEIOU because we need to consider that they may enter it in both formats.
if len(char) != 1: then it's going to print("Invalid input.
Please enter a single character.") else is going to if char in vowels: print("It is a vowel.") else: print("It is not a vowel.") ascii_value = ord(char) print(f"ASCII value: {ascii_value}") Well done if you got that correct.
Let's have a look at another one.
This time, write a Python programme to encode a string by shifting each character forward in the alphabet by three characters.
Ask the user to enter a word.
Loop through each character in the input word.
For each character, find its ASCII value.
Shift the ASCII value forward by adding three.
Convert the shifted ASCII value back to a character, and print the encoded character.
Pause the video and have a go at creating that Python programme, and then we'll go through a solution.
Let's have a look at a solution.
print("Enter a word to encode: ") word = input() shift = 3 for char in word: old_ascii = ord(char) new_ascii = old_ascii + shift new_char = chr(new_ascii) print(new_char) Well done if you got that correct.
Now I want you to amend that programme to now decode a word rather than encode it.
The programme should ask the user for the amount_to_shift.
Loop through each character in the input word.
For each character, find its ASCII value.
Shift the ASCII value backward by deducting the amount_to_shift.
Convert the shifted ASCII value back to a character.
Print the decoded character.
Pause the video and have a go at amending your programme to match these new details, then we'll go through a solution.
Let's have a look at a solution.
This time we have print("Enter a word to decode: ") word = input() print("How many characters to shift?: ") shift = int(input()) for char in word: old_ascii = ord(char) new_ascii = old_ascii - shift new_char = chr(new_ascii) print(new_char) Well done if you got that correct.
In summary, a substring is a shorter part of a string.
Separating a string into substrings is often called string slicing.
Substrings can be checked using == to check for equality, using in to check if it contains one or more characters, and using iteration, such as a for loop, to access characters within a string.
ASCII is a character set used to represent characters where each character has a unique number.
These numbers increase in sequence through the alphabet.
You can use chr() to convert a decimal number to a character and ord() to convert a character into its ASCII value.
Well done for completing this lesson on substrings.