Loading...
Hi, everyone.
I'm Mr. Gratton.
Welcome to another maths lesson where I will be your teacher today as we use different measures of central tendency and spread to compare between at least two datasets.
A statistical summary is a set of statistics that sum up the properties and features of a dataset.
It may contain the mean, median, mode, and range.
If I refer to a summary statistic, that is one of the measures that we will use in a statistical summary.
Pause here to take a look at the definitions of central tendency and spread, or dispersion.
The first cycle will involve using one summary statistic to start comparing between different datasets.
And that's actually a true purpose of a summary statistic.
Using the mean or range, for example, gives some insight when analysing one dataset, but it is much more insightful when comparing two datasets or two different populations.
Here are the test scores for Aisha and Sam.
Let's look at each summary statistic one at a time.
Starting with the mean.
Sam's mean is higher than Aisha's.
Therefore, Sam's mean average score, or typical score, is higher.
We can say that, on mean average, Sam scored five marks higher per test than Aisha did.
On the other hand, both of their medians are similar.
Therefore, both of their most central or middle scoring tests were of a similar value.
The median is very good at summarising these middle tests rather than focusing on the tests they did very good at or very bad at.
But Aisha's modal score was much higher.
This means that Aisha's most common scores were higher than Sam's most common scores.
This is easy to spot 'cause Aisha's score of 86 appears twice whilst Sam's score of 73 appears twice.
And lastly, Aisha's range is much greater than Sam's range.
The range is a different measure from the other three we've looked at.
It measures how spread out or varied a dataset is compared to the mean, median, and mode, which looked at central tendency.
The bigger the range, the more varied the dataset.
Alternatively, the smaller the range, the more consistent the dataset.
Therefore, Aisha had a bigger variety of scores whilst Sam had a more consistent set of scores.
Here is a summary of the four summary statistics that we have just discussed.
Okay, here are three quick checks for understanding.
Which of these statements are true for the mode of Sofia and Andeep's statistical summaries? Pause to look through all of these options.
The answer is B.
The mode looks at the most frequent values, to which both Sofia and Andeep's were pretty similar at 77 and 76.
Similar again, but this time for the mean.
Pause to see which of these three statements is true using their statistical summaries.
The answer is C.
The mean references mean average or typical values.
In this case, Sofia's typical score is nine higher than Andeep's typical score.
If you add up all of Sofia's scores and share them equally, each score would be nine higher than if Andeep did the same to his set of scores.
Lastly, the range.
Pause now to choose the most representative statement from the statistical summaries.
The answer is A.
The range describes variety, and Andeep had a much greater variety in his scores than Sofia did.
Lucas used dot plots to compare his test scores between Year 7 and Year 8.
Lucas's mean score increased from 63 to 70.
This means his typical score increased over the two years.
On each test in Year 8, he is expected to score seven more marks than the equivalent test in Year 7.
The median will look at his middle scoring test rather than the tests he did best or worst at.
His middle scores, which are here, as you can see, he improved by 10 marks between the two years.
For the mode, we can look at the tallest column of dots.
In Year 7, there was one such column at 75, whilst in Year 8, there are two at 55 and 75.
This means Lucas's data is now bimodal.
Lucas's most frequent score of 75 hasn't changed, but now he also has a second equally common score of 55.
And finally, the range.
Lucas's range has decreased from Year 7 to Year 8.
This is shown by a smaller distance between the lowest value dot and the highest value dot on each dot plot.
This means Lucas has become more consistent over the two years, which is especially impressive since his mean score has also increased.
Lucas's scores therefore seem to have become better and more consistent across the two years.
On to the next check for understanding.
Which of these statements is accurate for the range of these two dot plots? Pause to evaluate these statements.
The answer is A.
This is because there is more variation in class A because there is a bigger distance between the smallest value and the largest value dot on the dot plot.
On to the mode.
Pause here to evaluate which statement, or statements, about the mode are correct.
Well done if you spotted that there are two correct answers, A and B.
For A, modal frequency means the number of dots or data points on that modal value.
The modal score of class B was 55 compared to 49 in class A, but they both had a modal frequency of three because there were three dots or data points representing that value.
Right, time for some practise.
For question 1, which statement is correct for the median, and which one is correct for the range? The other three statements are not correct for these two summary statistics.
Pause now to read through all five options.
Here are Lucas's scores in English over two years.
Calculate and interpret these two datasets.
Pause now to calculate the mean.
Here are the answers.
For question 1, A was the median whilst D was the range.
For question number 2, Lucas's median or typical English score increased by three marks across the two years.
Using one summary statistic to compare two datasets is good, but using multiple summary statistics will help build a more well-rounded comparison of the datasets, especially when the distribution of each one is very different from the other.
For example, these dot plots show the number of goals of two football players.
The mean number of goals for both Mary and Frank is three per match.
But each dot plot looks dramatically different from each other.
So how can the mean number of goals be the same? By using a second summary statistic, in this case the range, we can look at their scores in more detail.
Mary had a lot of variance in the goals that she scored.
Her range was seven, but Frank's range was only three, meaning he more consistently scored around three goals whilst Mary sometimes scored a lot of goals but other times didn't score many at all.
By using a third summary statistic, we can help support this claim.
Mary's modal score was zero, whilst Frank's bimodal scores were three and four.
Using more summary statistics help make a better comparison between two datasets.
Comparing the datasets with just the mean gives a little insight into the typical number of goals scored but without the context of how varied each match can be.
By comparing the ranges of the two people in addition to the mean will provide us more detail, this time on the variance between the two people.
By using a further measure of central tendency, such as the mode, this will help us justify even further the assessment that we've made with the mean and the range.
On to some checks.
Two footballers, Laura and David, played in 10 matches.
The range of each player was five goals.
Which of these statements is correct? Pause to consider which statements are definitely true with this limited information.
The answers are B and D.
The range shows variation.
An equal range means an equal amount of variation.
However, the range gives no insight into the mean or any other of the averages.
And it certainly does not tell you anything about the number of goals scored or the distribution of their dot plots.
Next check.
Sticking with David and Laura, which of these statements are correct interpretations of their dot plots? Pause for time to compare the dot plots and these statements.
The answer is B.
The only correct statement about the mode is that David's modal score is three compared to Laura's modal score of two.
By considering each dot on each dot plot as a data point, calculate the mean for each football player and then select the correct statement about the mean.
Pause to give time to do this.
And the answer is A.
Laura's typical score is half a goal higher than David's.
This means that, on average, Laura will score half a goal more per match than David.
Lastly for these checks, here is a statistical summary of Laura and David's goals.
Which of these conclusions are sensible? Pause to look through the statistical summary table, and choose any of these sensible conclusions.
And the answer is all of them.
Even if we use many summary statistics to compare two datasets, the comparisons we make are ultimately up for different interpretations by different people.
This is especially true if the results are reasonably close, like with David and Laura's scores.
It is a very important skill to be able to communicate the comparisons you make with the summary statistics that you have calculated.
In which of these two businesses did its employees work a greater number of hours per week? In your comparison, it is best to always use at least one measure of central tendency, the mean, the median, or the mode, and one measure of spread.
In this case, the range.
Here is a model explanation communicating all of the summary statistics that you could find for these two datasets.
Whilst the average typical number of hours worked for people in Data Incorporated is higher with a mean of 33.
9 compared to 32.
3, Stats & Co had less variance in the number of hours worked with a range of 6 compared to 17.
In conclusion, I think that Data Incorporated works a greater number of hours per week, especially since their modal number of hours worked is also higher at 37 compared to 34.
This is a model explanation comparing these two datasets because we have a measure of central tendency, the mean in this case, we have a measure of dispersion, the range, and in the conclusion, we have a third statistic, in this case, the mode, to support the interpretation that you've already done.
Final few checks.
This table of statistical summaries shows the number of hours worked by the employees of two different businesses.
By selecting three of these statements, create a well-structured comparison of these two businesses.
Pause here to give yourself time to read through these six statements and consider which combination makes the most sense for these two statistical summaries.
The correct combination of statements is A, C, and D.
However, the conclusion would be even more effective if a further statistic, the mode, was used in addition to the other two.
On to the final set of practise questions.
Select the correct combination of three statements that compare the results of Ella and Wayne.
Pause now to give yourself some time to look through all six of these options and choose the most representative three.
For question number 2, by looking at the statistical summary of these two locations, complete each sentence using either the words Oxford or Stornoway or the correct number or calculation using the summary table above.
Pause now to fill in all of those gaps.
And similar again for question 3.
Complete the sentences for parts A, B, C, and D using both these dot plots.
I will pause four times, once for each question that will appear on screen.
Pause now to answer question part A.
And pause again for question part B.
And again for part C, pause now.
And here is part D, pause for this last part of the question.
By using the raw data for both Tiree and Valley, calculate the mean, median, and mode in order to complete the statistical summary table.
Afterwards, interpret each pair of summary statistics by making a comparison between Tiree and Valley.
Pause now to give yourself some time to calculate all of these summary statistics and make those interpretations.
Here are the answers.
A, D, and E are the correct statements and conclusions.
For question number 2, Stornoway had the lower mean days of air frost.
Oxford had the higher range.
And Oxford had, in conclusion, more air frost, supported by a higher median.
Question 3A, the means were equal at around 12.
5 to 30 minutes each.
Part B, class B had the much higher median of 13 compared to 9.
Class B also had a slightly higher modal time.
For part C, class A had a much higher range, showing more variation at 31 minutes compared to 4 minutes.
And in conclusion, class B took longer on average due to the higher modal and median results.
Here are the summary statistics and interpretations for Tiree and Valley.
Pause here to give yourself some time to compare your interpretations to the ones on screen.
Very well done on getting through a very analytical and communication-heavy lesson based on calculating statistical summaries and interpreting what they mean in the context of comparing between two or more datasets, where using more summary statistics will help build a better, but never full, picture of these datasets.
Thank you so much for joining me in today's lesson.
I hope to see you again for another maths lesson.
But for now, have a nice day.