You have turned-in this assignment. You can review the lesson and see your previous answers.

Lesson video

In progress...

Hello, my name's Mrs. James.

Welcome to Computing.

I'm so pleased that you decided to join me for the lesson today.

In today's lesson, you will be exploring Understanding bias in AI.

Welcome to today's lesson from the unit, Using AI and digital tools responsibly.

This lesson is called Understanding bias in AI.

And by the end of today's lesson, you will be able to explain how data patterns lead to AI bias and describe why outputs may be unfair or unrepresentative.

Shall we make a start?

There are four keywords for today's lesson.

Bias.

Bias is when something is unfair towards or against something or someone.

Dataset.

A dataset is a collection of information used to train an AI system such as images, text, or numbers.

Data bias is when data is unfair or unbalanced, which can lead to wrong or unfair results.

Fairness.

Fairness is when all groups of people are treated equally and there are no outcomes that disadvantage or misrepresent some people more than others.

There are three section to today's lesson.

The first section is called Define AI bias and its causes.

The second section is called Identify reasons for bias in AI outputs.

And the third section is called Explain how AI bias can be reduced.

Let's make a start.

What is AI bias?

Artificial intelligence systems are often thought of as being objective because they are machines.

However, AI bias can occur.

This happens when an AI system produces outputs that are unfair, inaccurate, or unrepresentative of the real world and certain groups of people.

You may have heard of AI being treated as a black box.

Well, in our diagram, we've got a purple box.

And arrows representing bias in means bias out.

And we'll explain a little bit more about that in a minute.

Bias in an AI system is not random.

It's not a malfunction.

It's a reflection of problems in how the system was built or trained.

When machine learning models are being trained, they identify patterns within large amounts of information called training data, which is composed of datasets.

If this dataset contains human prejudices or reflects existing inequalities in society, the AI system will understand them as rules to follow.

The system then applies these biased rules to new situations, often making the unfairness even worse.

Developers choose what data to collect, what to label, and what problem to solve, and these choices can also introduce bias.

Okay, first question.

True or false?

AI systems are naturally neutral and objective because they are machines.

True or false?

Take a moment to think about this.

Let's look at the answer.

If you said, "False," well done, you're correct.

AI systems are not naturally neutral or objective.

They identify patterns and data created by humans.

If that data contains prejudice or gaps, the AI system can amplify them.

For example, an AI tool used to recognize fruit has been given the images and labels shown below as a training dataset.

The training dataset contains two categories, apples and tomatoes.

The apples are all green and the tomatoes are all red.

What do you think will happen if it's asked to identify a red apple?

Take a moment to think about it before I show you the answer.

If you said that the AI system will predict that it's a tomato, you're probably right.

This AI model is giving inaccurate answers because its training dataset contains data bias.

In reality, they should have provided lots of different pictures of apples, some red ones, some green ones, to make this AI system more accurate.

We're going to have a look at some types of bias now.

So the first type we're going to look at is called historical bias.

The training dataset reflects past inequalities.

For example, if a recruitment AI tool is trained on decades of hiring records that favored just one group, it will continue to replicate that pattern.

Another type of bias is called representation bias.

Some groups are underrepresented in the training dataset.

For example, a facial recognition system trained mostly on lighter-skinned faces will then perform poorly on others.

And another type of bias is called measurement bias.

The way that the data is collected or labeled is the thing that introduces the error.

For example, if one community is policed more heavily, then crime data will overrepresent that community, skewing any future predictions.

Okay, another question.

Here's a statement.

"An AI tool to recognize voices doesn't work with someone with a strong accent.

" What kind of bias is this describing?

Is it A, historical bias, B, representational bias, or C, measurement bias?

Have a think.

If you said, "B, representational bias," you'd be correct.

Well done.

Okay.

How does bias get into an AI system?

Well, it could be when the data is collected.

Because human developers gather data, it could be unrepresentative or biased data.

It could happen at the data labeling stage.

Human developers apply subjective or prejudiced labels perhaps.

Or it could happen at the model training phase.

An algorithm is trained on patterns which includes the biased data.

These biased outputs will affect decisions in the real world.

Okay, another question.

Select all the ways that bias can enter an AI system.

A, data labeling, B, hardware malfunction, C, model training, or D, data collection.

So it says, "Select all the ways.

" So have a think and select all the answers that could introduce bias.

Let's have a look at the answers.

If you chose A, C, and D, well done.

Okay, the first lesson task.

You've been given six statements and you're asked to put them into one of three categories.

The three categories are: definition of AI bias, cause of AI bias, not a cause of AI bias.

And the six statements you've been given are: AI systems produce unfair or unrepresentative outcomes for certain groups, a hardware malfunction corrupts a computer system, training data reflects past inequalities in society, developers choose what data to collect and label, some groups are underrepresented in the datasets used, an AI system learns patterns from datasets.

So put those statements into those three categories and see how you get on.

Let's take a look at the answers.

So the definition of AI bias was AI systems produce unfair or unrepresentative outcomes for certain groups.

Then we have four causes of AI bias.

One, developers choose what data to collect and label.

Two, some groups are underrepresented in the datasets used.

Three, training data reflects past inequalities in society.

And four, an AI system learns patterns from datasets.

The one statement that was not a cause of AI bias was a hardware malfunction corrupts a computer system.

Well done if you got some of those right.

Okay, we now move into the second section of this lesson, identify reasons for bias in AI outputs.

So we're now gonna talk through some actual real-world examples of bias.

Recruitment AI.

An AI tool used to screen job applications devalued CVs that included the word, women's, or phrases like, "women's chess club.

" Why?

Well, the AI tool was trained on historical hiring datasets in which men were predominantly hired for senior roles.

The next example is called predictive policing.

Algorithms predicted higher crime risk in certain postcodes, which led to increased police patrols in those areas.

Why?

Well, the training datasets included arrest data* shaped by the over-policing of particular communities, which then created a self-reinforcing cycle.

Facial recognition was another real-world example.

Studies found that error rates for darker-skinned women were significantly higher than for lighter-skinned men.

Why?

Well, the training datasets contained far more images of lighter-skinned and male faces, which made the model less accurate for others.

And the last real-world example is in the area of language translation.

When translating gender-neutral pronouns, the AI tools defaulted to male pronouns for roles like doctors and female pronouns for roles like nurse.

Why?

Well, the training datasets from the internet reflected existing gender stereotypes embedded within the language.

Okay, another question.

Which of these examples of an AI output shows bias?

A, a music streaming app recommends a new song based on your listening history, B, a translation app defaults to he when translating the pronoun to be used with the word, engineer, C, a weather app predicts a 70% chance of rain using local temperature data, Or D, a spell checker flags an incorrectly spelled word in an essay.

Have a think and choose your answer.

Let's look at the answer.

If you chose B, really well done.

You're now beginning to spot bias in outputs.

Izzy is saying, "These stories are worrying.

What can we do about bias in these tools?

" Sam replies, "Maybe there are some signs we could use to spot bias.

" How to spot bias in an AI output?

Who is missing?

Does the output represent all relevant groups?

Are there certain people or perspectives absent?

Who is harmed?

Does the output place greater risk or disadvantage on a particular group?

What assumptions are made?

Does the output rely on stereotypes or generalizations that don't apply to everyone?

And what data was it trained on?

Would the training datasets have included a balanced range of people and situations?

Here's a scenario.

A school uses an AI tool to predict which students are likely to need extra academic support next year.

The system flags a high number of students from one neighborhood as high risk, but very few from other areas.

What happened?

Well, students from one postcode are more likely to be labeled as needing intervention, even when their grades are similar to others.

Why?

Well, the tool was trained on historical school datasets.

In the past, students from that neighborhood did have lower results.

Now, this tool has identified this pattern and treats the postcode as a predictor of future performance.

Why is this a problem?

Well, it may unfairly label students based on where they live.

It may reinforce existing inequalities, assuming that past patterns will always continue.

Okay, another question.

True or false?

If an AI system's prediction reflects past inequalities in society, it may reinforce those inequalities.

True or false?

Take a moment to think.

Let's look at the answer.

If you said, "True," well done.

Because AI systems learn from historical data, so if that data reflects inequality and shows bias, the system may then strengthen those patterns.

Okay, next task for you.

Read each scenario and describe why that output is biased.

So we've got six scenarios.

Think about what are the reasons for the bias.

First scenario.

A health app on a wearable device gives less accurate heart rate readings for people with darker skin tones.

Next scenario.

An AI tool that suggests careers mainly recommends technical jobs to boys and caring jobs to girls.

Next scenario.

A speech-to-text tool often mishears people with regional accents.

And the next scenario, an AI system used to set car insurance prices charged higher premiums in certain areas.

Next scenario.

A homework marking tool gives lower scores to essays that include references it does not recognize.

And the final scenario, a fitness app sets daily step targets that are too high for some users with disabilities.

Okay, quite a lot to think about there, but you should be able to come up with some ideas for the reasons for bias for all six of those scenarios.

Have a think.

Okay, let's take a look at some answers.

So, for the scenario about the health app giving less accurate heart rates for people with darker skin, a reason for bias could be the system may have been tested mainly on lighter-skinned users so the data did not represent everyone equally.

The scenario for the AI tool suggesting stereotypical careers.

The reason for bias might be the training data may reflect gender stereotypes about jobs so the system has learnt these patterns.

The scenario about the speech-to-text tool mishearing people with regional accents.

The reason for bias is the training data may not have included enough examples of different accents.

The scenario about the AI system used to set car insurance prices, charging higher premiums in certain areas.

The reason for bias might be that the model relies on historical data that reflects past inequalities or patterns linked to postcode rather than individual's driving behavior.

The scenario about the homework marking tool giving lower scores to essays that include references it doesn't recognize.

The system may have been trained on a narrow range of examples, so it doesn't fairly assess unfamiliar points in essays.

And the final scenario about a fitness app setting daily step targets that are too high for users with disabilities.

The reason for bias could be that the training datasets did not include people with different physical needs.

So quite a lot to write down there, quite a lot to think about.

Really well done if you've got any of those reasons for bias for those scenarios.

Okay, the final section of the lesson.

You're doing brilliantly.

This section's called Explain how AI bias can be reduced.

Izzy is now saying, "I'm worried about bias now.

It looks like it could affect a lot of things in my life.

What can be done about it?

" Sam replies, "I'm not sure this is something we can do alone.

" There are several things that can be done.

Better data.

Developers can use more diverse and representative datasets, so systems work well for different groups of people.

They could test for fairness.

AI systems must be checked regularly to see whether they perform differently for different groups.

Human oversight.

Important decisions should not rely only on AI systems.

People should review and question automated decisions.

There should be clear rules and laws.

Governments can create regulations to make sure that AI systems are used responsibly.

Diverse development teams.

Teams with varied backgrounds and perspectives are much more likely to notice assumptions and blind spots in design.

Sam says, "And that's why it's so important to get a wide range of people studying computing.

" If we only have a small group of people who build these AI systems or contribute to the development of algorithms, we will only ever get a very narrow viewpoint.

And so it's very important to get as many people studying computing and being in those rooms when decisions like this get made.

Okay, another question.

Which of the following is a strategy to reduce bias?

Select all that apply.

Is it A, human oversight, B, diverse development teams, C, more diverse datasets, or D, less government regulation?

So more than one is correct.

See how you get on.

Let's look at some answers.

So if you picked A, B and C, well done.

So Izzy is saying, "So if AI developers put all these strategies in place, then bias will be removed forever, right?

" Sam replies.

"I think there may always be some bias in AI systems.

Perhaps we should always be critical of answers given by an AI system.

" And that is really important.

As long as you're aware of what AI bias is, you'll be more likely to spot it in answers given by AI tools.

It is important to remember that AI systems are made by humans.

Humans design, test, and improve these systems.

As more people understand bias, there is more pressure to make AI tools fairer and more accountable.

Okay, another question.

True or false?

Bias and AI systems can be completely removed with better data, diverse teams, more testing, and new laws from governments.

True or false?

What do you think?

Let's look at the answer.

Unfortunately, it's false.

No, bias in AI systems will always be present to some extent, which is why it's important to critically evaluate any response from an AI system.

Okay, some tasks for you to do.

So on the screen, there are six strategies.

You need to sort each strategy into two different categories.

The categories are: helps to reduce bias, does not help to reduce bias.

And the six actions are: using more diverse and representative datasets, removing all government regulation, including people from different backgrounds and development teams, regularly testing systems for different error rates across groups, ensuring humans review important AI decisions, ignoring small differences in performance between groups.

So put those strategies into one or other of the categories below.

Let's take a look at the answers.

Okay, so we've got four strategies in the category called helps to reduce bias and two strategies in the category called does not help to reduce bias.

So the four strategies that help reduce bias are: including people from different backgrounds and development teams, ensuring humans review important AI decisions, using more diverse and representative datasets, and regularly testing systems for error rates across different groups.

And then the two that do not help to reduce data bias are: removing all government regulation and ignoring small differences in performance between groups.

Hopefully you've got some of those right.

Well done if you've got all of those right.

The final task for this lesson is to read the scenario and answer the question below.

A company discovers that its AI loan approval system rejects a higher percentage of applicants from one community.

Choose two strategies from the lesson and explain how they could help reduce bias in this system.

So something to think about there.

Think back to the strategies we've discussed and pick two that might help answer this question.

Okay, let's have a look at the answer.

So the student has said, "One strategy would be to test the system for fairness.

The company could check if the AI system is rejecting more people from one community than others and then try to find out why.

Another strategy would be to use better training data.

If the data is unfair or missing certain groups, the AI system can replicate the wrong patterns.

Using a more balanced dataset could help make its outputs fairer.

" Hopefully your answer was similar to that, or you used some of the other strategies we discussed in the lesson.

Well done for your attempt.

Okay, we've reached the end of the lesson.

So in summary, AI bias occurs when a system produces inaccurate results because it is learned from its training datasets that are flawed.

Because AI systems identify patterns in human-created data, they can repeat existing social stereotypes and amplify unfairness.

AI tools trained on datasets where as certain groups are missing or underrepresented lead to tools that perform poorly.

And critically evaluating AI outputs is essential for identifying bias.

Well done for completing this lesson on understanding bias in AI.

I've finished the video