<?xml version="1.0" encoding="utf-8"?>

AI systems make mistakes differently than traditional software. While a calculator gives wrong answers only when broken, AI can produce incorrect results even when working perfectly. This happens because AI operates on probability, making educated guesses based on patterns in data.

These errors take many forms. A photo app might label your cat as a dog. A music service might recommend heavy metal to a jazz lover. A navigation system might suggest a route through a flooded road. Each represents a different type of failure with unique causes and solutions. What makes AI errors particularly complex is that users and systems often disagree about what counts as an error. When a recommendation system suggests something unexpected, is it discovering your hidden interests or simply getting it wrong? The answer depends on context, user expectations, and how well the system understands your current situation.

Understanding these error types helps product teams build better experiences. By recognizing patterns in how AI fails, designers can create appropriate safeguards, set realistic expectations, and provide users with meaningful ways to recover when things go wrong.

Exercise #1

AI errors vs traditional errors

Traditional software follows clear rules. A calculator processes the same equation the same way every time. When you type 2+2, you always get 4. When something goes wrong, you can trace the exact cause and fix it so the error never happens again.

AI systems work differently. They make predictions based on patterns learned from data. This means they can give wrong answers even when functioning perfectly. IBM's Watson for Oncology demonstrated this clearly. The supercomputer analyzed cancer cases and made treatment recommendations, but doctors found themselves rejecting its suggestions, not because Watson was broken, but because its probabilistic recommendations didn't align with their medical judgment.[1][2]

Exercise #2

AI errors vs traditional errors

AI systems experience 3 distinct types of issues, each requiring different design responses:

  • Errors happen when AI produces unexpected or inaccurate results. Think of a recipe app suggesting desserts when you search for salads. Users can usually work around these with some extra effort. The system works but gives wrong answers.
  • Failures are more serious. They occur when AI faces inherent limitations or stalled processes. Imagine a translation app that simply can't process a rare dialect. It's not broken. It's hitting a fundamental boundary. Failures leave users stuck, unable to complete their task through the AI.
  • Disruptions derail users entirely. A smart speaker that randomly announces weather updates during your meditation session doesn't just fail at one task. It actively interferes with what you're trying to do. The AI interrupts your primary activity.

Teams must identify which category they're dealing with to respond appropriately.

Exercise #3

Alignment errors and user assumptions

Alignment errors and user assumptions Bad Practice
Alignment errors and user assumptions Best Practice

Alignment errors happen when AI makes incorrect assumptions about what users want, when they want it, or why. These errors feel personal because the AI seems to ignore obvious clues about your context.

Imagine opening a recipe app at 6 AM to plan dinner. The app suggests breakfast recipes because it assumes morning equals breakfast time. For most users, this makes sense. But if you work night shifts or plan meals ahead, this assumption fails. The AI lacks the broader context of your daily routine.

These misunderstandings multiply across cultures. A music app might play upbeat party songs during a religious fasting period. A fitness tracker might push intense workout notifications when you're sick. The AI correctly identifies patterns but misses important context about your current state.

The challenge is that AI systems see patterns, not purposes. They process data without understanding the meaning. A spike in restaurant searches might mean you're hungry, planning a birthday dinner, or writing a food blog. The AI guesses based on statistics, not the situation. Each recommendation follows logical data patterns but can feel completely wrong based on your actual needs.

Exercise #4

System limitations and failstates

Every AI system has boundaries. These limits aren't bugs to fix but fundamental constraints based on training data and design choices. Understanding these boundaries helps users work effectively within them rather than fighting against them.

Consider a plant identification app trained on common garden plants. Show it a rare orchid from the Amazon, and it fails. This isn't a malfunction. The app correctly recognizes that this plant falls outside its knowledge. It's like asking a French translator to handle Mandarin. The limitation is built into the system's design.

Users often expect AI to handle anything within its general domain. A weather app should know all the weather everywhere. A translation tool should handle every dialect. But AI systems have specific training that creates natural boundaries. They excel within their focus area but fail at the edges.

Clear communication about these limits builds trust. Instead of vague error messages, specific explanations help users understand what went wrong. "This app identifies plants native to North America" sets better expectations than "Plant not found." Users can then decide whether the tool meets their needs rather than discovering limitations through frustration.

Exercise #5

Understanding error types through outcomes

Understanding error types through outcomes

AI teams use a framework called the confusion matrix to categorize different types of correct and incorrect predictions. This helps weigh the real-world impact of errors. Consider a hypothetical running app, RUN:

  • A true positive happens when RUN suggests a trail you love and choose to run. The AI correctly predicted what you'd want.
  • A true negative occurs when RUN avoids suggesting steep trails after you've indicated you dislike inclines. It correctly identified what to exclude.
  • False positives frustrate users with irrelevant suggestions. RUN might recommend a mountain trail to someone who only runs on flat paths. The AI wrongly thought this would appeal to you.
  • False negatives represent missed opportunities. RUN might skip suggesting a perfect waterfront trail because it misunderstood your preferences.

Not all errors carry equal weight. A false positive in RUN wastes a few seconds. A false negative in a medical AI could miss critical symptoms. A false positive in allergy detection means avoiding safe foods. A false negative could trigger dangerous reactions. Teams must consider these outcome types when optimizing their systems. The same technical error creates vastly different human consequences.

Exercise #6

From useful inaccuracies to harmful errors

AI errors exist on a spectrum from potentially helpful to seriously harmful. Understanding this range helps teams design appropriate safeguards.

Some "errors" actually benefit users during creative tasks. A writing assistant that suggests an unexpected metaphor might spark better ideas than what users originally intended. These useful inaccuracies help with ideation and expand creative options.

Minor errors slow progress without causing real harm. When a search engine returns one irrelevant result among nine good ones, users simply skip it. These errors create friction, not failures. With good controls and interaction design, users recover quickly.

But errors can escalate to serious harm. A financial AI giving wrong tax advice could trigger audits. A medical AI missing allergies could risk lives. These situations demand conservative design and human oversight.

The most severe errors involve policy violations. Hate speech, dangerous content, misinformation, or child sexual abuse material (CSAM) require immediate intervention. These aren't just inconveniences but potential catastrophes. Design teams must map where each feature falls on this spectrum and build appropriate safeguards. Creative tools can embrace useful inaccuracies. Critical systems need conservative designs that prevent harmful errors.

Exercise #7

Background errors and hidden failures

Background errors and hidden failures

The most dangerous AI errors are invisible. These background errors operate silently, affecting decisions without anyone noticing. Unlike obvious mistakes that users report, these failures can persist for months or years.

Search engines demonstrate this perfectly. When they return incorrect information that seems plausible, users have no way to identify the error. Someone searching for historical facts might receive and trust the wrong dates. The system logs this as a successful search while misinformation spreads. Users think they're learning facts when they're absorbing fiction.

Detection requires looking beyond surface metrics. High usage doesn't equal satisfaction. Users might engage with a system while being poorly served by it. IBM Watson showed this pattern. Doctors used the system regularly, creating positive engagement metrics. But deeper analysis revealed they disagreed with its recommendations most of the time. Teams must actively hunt for these hidden failures through user research, A/B testing, and by monitoring real outcomes rather than just engagement numbers.

Exercise #8

Error impact and situational stakes

Context determines whether an AI error is annoying or dangerous. The same mistake can waste seconds or risk lives. Understanding these stakes helps teams prioritize their error prevention efforts. Mental state affects error tolerance. A relaxed user browsing recipes laughs off weird suggestions. A stressed parent making dinner with hungry children has no patience for mistakes.

Research helps teams gauge error risk systematically. Users with expertise tolerate errors better than novices, who can't spot mistakes. Multitasking users have less mental capacity to catch problems. The stakes matter too. Experimentation welcomes errors, but health decisions don't. One study found that people allowed to slightly modify AI algorithms felt more satisfied and trusted the system more. This matters especially in high-stakes domains. Different domains need different approaches. Entertainment apps can experiment and learn from failures. Healthcare AI requires conservative designs with human oversight. A bad movie recommendation wastes two hours. A bad medical recommendation ruins lives. Teams must match their error tolerance to user consequences.[1]

Exercise #9

Input and training data errors

Input and training data errors

AI errors often stem from problems in training data or user input. When models learn from flawed data, they confidently repeat those flaws for every user. Input errors happen when systems can't understand what users really mean.

Consider training data problems first. A voice assistant trained mostly on American English fails consistently for British users. Every British person experiences identical recognition failures. These aren't random bugs but systematic blind spots.

Mislabeled training data causes especially confusing mistakes. If training data mixed up muffins and cupcakes, the AI now confidently misidentifies every muffin. Users see obvious errors and wonder how advanced AI fails at simple tasks. They don't know the AI learned these mistakes from its training.

Input errors frustrate differently. Users expect AI to understand typos and context. Searching for "resturant near me" should obviously return restaurant results. When AI processes this literally and returns nothing, users feel the system is being deliberately obtuse. They expect intelligence, not just pattern matching. Both error types require different solutions. Training data problems need new data collection and model retraining. Input errors need better preprocessing and understanding of user intent.

Exercise #10

System conflicts and hierarchy errors

System conflicts and hierarchy errors

Modern homes often have multiple AI systems that don't communicate well. Your smart thermostat, energy monitor, and home assistant might work at cross purposes. When these systems conflict, users get caught in confusing situations without clear solutions.

Voice commands trigger multiple responses. Say "play music" and your phone, smart speaker, and TV might all respond. Each device correctly heard you, but their overlapping responses create noise instead of help. Users can't process three different audio streams or figure out which device to address next.

These conflicts need designed solutions, not technical fixes. Users should control which system takes priority in different situations. Clear interfaces must show when systems disagree and explain why. Without this coordination, adding more AI to daily life creates confusion instead of convenience.

Complete this lesson and move one step closer to your course certificate