Conversational UX & Multimodal Interfaces course lesson

Conversational and multimodal interfaces change how we interact with AI. Instead of clicking buttons, people chat, speak, or use visual cues with these systems. This creates experiences that feel more like talking to another person. Designing these interactions comes with unique challenges.

AI conversations need a natural back-and-forth flow. Problems occur when the system interrupts users, forgets previous messages, or loses track of the conversation topic. Each way of interacting has its own rules. Chat interfaces work differently than voice systems, and visual AI has its own approach. Things get more complex when these methods combine. Users might speak a command, see a visual response, and then type their next request. Keeping the conversation smooth across these changes requires careful design. These interfaces also need special attention to accessibility. Chat histories must work well with screen readers. Voice systems need options for people who can't speak clearly. Visual AI elements need text descriptions for those who can't see them.

The best multimodal AI experiences feel natural and work well for everyone, regardless of how they need to interact with technology. When designed thoughtfully, users can focus on what they want to accomplish rather than struggling with how to communicate with the system.

Exercise #1

Conversation design patterns for AI interfaces

Every AI conversation follows specific patterns that shape how users interact with the system. These patterns guide expectations and create structured experiences:

Query-response pattern: Users ask questions and the AI provides direct answers. Works well for information retrieval but can feel mechanical over time.
Guided dialogue pattern: The AI proactively suggests next steps or asks clarifying questions to narrow down options. Creates more engaging interactions.
Task-completion pattern: Breaks interactions into clear stages with confirmation points, giving users a sense of progress through complex processes.
Mixed-initiative pattern: users and AI can steer the dialogue naturally, creating more human-like exchanges that flow in multiple directions.

The pattern choice should match user goals and context. Information-seeking benefits from direct query-response, while complex decision-making works better with guided dialogues. Shopping experiences often combine patterns, starting with open exploration before shifting to task completion during checkout. Pattern consistency builds user confidence by making interactions predictable while allowing for natural variations within the established framework.

Exercise #2

Crafting AI personality and voice guidelines

How an AI "speaks" shapes how people feel about it. A clear personality creates a consistent experience that users recognize across conversations. This personality comes through in word choices, response styles, and how the AI behaves. Begin by defining core traits that match your brand and what users expect. Should your AI be friendly and casual or professional and efficient? These traits guide decisions about vocabulary, sentence structure, and conversation style.

Write these decisions in voice guidelines with examples of good responses for common situations. Include how the AI should respond to different emotions, what level of formality to use, and how to adapt to different contexts while staying consistent. Consider cultural differences when creating personality traits to ensure the AI communicates respectfully with diverse users. The AI's personality should stay recognizable while slightly adjusting to user preferences over time. This balance creates an experience that feels familiar yet responsive, building trust through repeated interactions.

Pro Tip: Write the same response in several versions with increasing personality to find the right balance between bland and over-the-top.

Exercise #3

Managing user expectations in chat interactions

Setting clear expectations is vital for chat interfaces. Users often expect AI to understand and do more than it actually can. Good onboarding that shows both what the AI can and cannot do helps align user expectations with reality.

Here are key strategies to manage user expectations effectively:

Use visual cues: Provide clickable response choices that show users what the AI can understand and act upon.
Format responses thoughtfully: Structure templates for routine tasks and conversational formats for flexible interactions to signal capabilities.
Provide honest explanations: When the AI cannot fulfill a request, acknowledge limitations without excessive apologies to maintain trust.
Introduce features gradually: Progressive disclosure prevents overwhelming new users while letting experienced ones discover more capabilities.
Show system status clearly: Let users know when the AI is processing information or encountering difficulties.

These transparent approaches help users build an accurate understanding of what the system can do, reducing frustration and increasing satisfaction with the actual benefits the AI provides.

Exercise #4

Voice interface affordances and constraints

Voice interfaces have unique benefits and limitations that affect design decisions. Unlike screens, voice commands vanish once spoken, leaving nothing to reference later. This makes users work harder to remember what they said and how the AI responded. Voice also presents information in sequence, one piece at a time. Users can't scan ahead or quickly review options as they would on a screen. Background noise and different accents create recognition challenges not found in other interfaces.

Despite these limits, voice works great for hands-free situations like cooking or driving. It helps users with motor or visual impairments and often feels more natural for conversation-based tasks. Some advanced systems like ChatGPT address the ephemeral nature of voice by transcribing spoken requests into text, allowing users to reference their previous commands and the conversation history.

Good voice interfaces work within these constraints by keeping interactions brief, confirming understanding frequently, and using consistent command patterns. They avoid long option lists and instead break complex information into smaller chunks. The best designs include backup options when voice recognition fails and alternative interaction methods when available, recognizing that voice works best as part of a broader interaction approach.

Pro Tip: Test voice interactions in noisy environments to ensure they work well in real-world conditions.

Exercise #5

Keeping track of conversation history

Maintaining conversation history creates experiences that feel natural and reduce repetition. Unlike human conversations, where context flows naturally, AI systems must deliberately track and use relevant information. Context works at different levels. Short-term context tracks the current topic and recent messages. Session context remembers information during a single conversation. Long-term context stores preferences across multiple conversations. Despite advances, many current AI applications struggle with context management. Users often find themselves reminding the AI of information they've already provided earlier in the conversation, creating frustrating experiences. Good systems balance remembering enough without bringing up irrelevant information. Context should be applied naturally. Instead of saying "As you mentioned earlier," the system simply shapes responses using what it knows. Privacy matters in context management. Users should control what information persists and understand how it's used. The system should know when previous information still applies versus when circumstances have changed, creating experiences where users feel understood without repetition.

Pro Tip: Create rules for how different types of information expire at different rates based on how useful they remain over time.

Exercise #6

Giving users control to interrupt and redirect

In chat-based AI conversations, users need ways to interrupt or redirect the interaction when the AI isn't providing what they need. Unlike human conversations, where interruption happens naturally, AI interfaces require specific design features to support this control.

When generating lengthy responses, AI systems should include visual indicators showing progress and allow users to stop the generation midway. This prevents users from waiting for irrelevant information to finish loading before they can redirect the conversation.

Some advanced interfaces now include:

"Stop generating" buttons that immediately halt response creation
Edit buttons for modifying already-sent requests without starting over
Inline correction features that let users fix misunderstandings in the AI's responses
Visible thinking/typing indicators that show when interruption is possible

These controls give users agency when conversations go off-track. Without them, users often abandon the entire conversation and start over, creating frustration and inefficiency.

For voice interfaces, clear audio or visual signals should indicate when the system is listening versus processing, allowing natural interruption points similar to human conversation.

Exercise #7

Designing quick action menus for AI interfaces

AI experiences benefit from offering users quick action shortcuts that streamline common tasks across different interaction modes. These action menus provide efficient ways to modify, refine, or transform AI-generated content without requiring extensive instructions.

When designing these action menus:

Keep options limited to the most frequently needed actions
Use single, clear verbs that communicate the transformation
Show examples of what each action does during onboarding or in tutorials

These quick actions reduce the need for long instructions and make common changes faster, helping users move smoothly between creating content and improving it.

Exercise #8

Making conversational interfaces work with screen readers

Creating accessible chat interfaces requires understanding how screen reader users experience conversations. Unlike visual users who scan messages, screen reader users hear content in sequence, making organization especially important.

Key accessibility considerations for screen reader users include:

Clear structure: Use proper headings to distinguish between user and AI messages
Speaker identification: Include clear labels for who is speaking in each message
Descriptive elements: Ensure all buttons and interactive elements have clear labels
Alternative text: Provide text alternatives for any images or icons
Hierarchical descriptions: Create structured descriptions for complex visuals like charts
Timing control: Avoid rapid message sequences that create overwhelming output

Testing with actual screen readers is essential rather than relying only on simulation tools, as implementation details significantly affect usability. Include users with visual impairments throughout design testing to identify issues that automated tools might miss. This approach creates conversation experiences that work well for everyone.

Exercise #9

Creating inclusive voice and visual AI interactions

Inclusive AI interfaces work well for people with different needs and abilities. Creating universal designs requires consideration across multiple dimensions of accessibility.

Key approaches for inclusive voice and visual AI interfaces:

Speech diversity: Design voice interfaces to recognize varied accents, speech impediments, and speaking rates.
Input alternatives: Provide text, gesture options, and voice input.
Timing flexibility: Allow users to customize response timing for those who need more processing time.
Visual transcription: Offer visual transcripts of AI speech with adjustable text size and reading speed.
Visual contrast: Maintain sufficient color contrast and respect system settings for reduced motion or high contrast.
Touch targets: Make interactive elements large enough for people with limited motor control.
Cultural awareness: Pay attention to language variations and cultural references that may not translate universally

These inclusive design practices benefit everyone by creating adaptable, personalized experiences that work in challenging situations like noisy environments or bright sunlight, not just for people with disabilities.