Speech Synthesis

The Speech Synthesis component provides a comprehensive interface to the Web Speech API's text-to-speech functionality, allowing your Progressive Web App to convert text into spoken words. This enables accessible, hands-free experiences and enhanced user interactions.

This component is particularly useful for:

  • Accessibility features for users with visual impairments or reading difficulties

  • Educational applications (language learning, pronunciation guides)

  • Navigation and turn-by-turn directions

  • Content reading (news articles, books, messages)

  • Voice-enabled interfaces and assistive technologies

  • Multi-language support with native voice selection

Browser Support

The Speech Synthesis API is widely supported across modern browsers:

  • Chrome/Edge: Full support with extensive voice libraries

  • Safari: Full support with high-quality voices

  • Firefox: Full support with system voices

  • Mobile browsers: Excellent support on both iOS and Android

circle-info

Voice availability varies by platform and language. Desktop browsers typically offer more voices than mobile browsers. The controller automatically handles voice detection and selection.

Usage

Basic Text-to-Speech

Speaking HTML Elements

Voice Selection with Custom Parameters

Multi-Language Content

Playback Controls

Custom Internationalization

Per-Item Custom Parameters

Immediate vs Queue Mode

Common Use Cases

1. Accessible News Reader

2. Language Learning App

3. Form Validation Feedback

4. Navigation Assistant

5. Interactive Tutorial System

API Reference

Values

The controller accepts the following configuration values:

localeValue (String, default: "en-US")

  • Default language/locale for speech synthesis

  • Examples: "en-US", "fr-FR", "es-ES", "de-DE"

rateValue (Number, default: 1)

  • Speech speed/rate

  • Range: 0.1 to 10 (typical: 0.5 to 2)

  • 1 = normal speed, < 1 = slower, > 1 = faster

pitchValue (Number, default: 1)

  • Speech pitch

  • Range: 0 to 2

  • 1 = normal pitch, < 1 = lower, > 1 = higher

volumeValue (Number, default: 1)

  • Speech volume

  • Range: 0 to 1

  • 0 = silent, 1 = maximum volume

voiceValue (String, default: undefined)

  • Specific voice name to use

  • If not set, automatically selects best voice for locale

  • Example: "Google US English", "Microsoft David Desktop"

enqueueValue (Boolean, default: true)

  • Queue mode behavior

  • true: Utterances are queued and played sequentially

  • false: New utterance cancels previous (immediate mode)

i18nValue (Object, default: {})

  • Internationalization strings for status messages

  • Keys: loading, ready, unsupported, playing, paused, canceled, finished

Example:

Targets

item

  • Marks HTML elements as speakable items

  • Can be clicked individually or queued together

  • Supports custom data attributes for per-item configuration

voiceSelect

  • A <select> element that will be automatically populated with available voices

  • Automatically formatted as: "Voice Name (locale) • local" or "Voice Name (locale)"

status

  • Element where status messages are displayed

  • Shows: "Loading voices…", "Ready", "Playing", "Paused", etc.

  • Content controlled by i18nValue or defaults

Example:

Actions

speak({ params })

  • Speaks the provided text

  • Parameters:

    • text (String, required): Text to speak

    • locale (String, optional): Override default locale

    • voice (String, optional): Override default voice

    • rate (Number, optional): Override default rate

    • pitch (Number, optional): Override default pitch

    • volume (Number, optional): Override default volume

speakItem(event)

  • Speaks the clicked item target element

  • Uses data-speech-* attributes or element's textContent

  • Supports per-item parameters via data attributes

enqueueItems({ params })

  • Queues all item targets for sequential playback

  • Parameters can override defaults for all items

  • Automatically starts playback

pause()

  • Pauses current speech

  • Can be resumed with resume()

resume()

  • Resumes paused speech

  • No effect if not paused

cancel()

  • Stops current speech and clears queue

  • Cannot be resumed

changeVoiceFromSelect()

  • Updates voice from the voiceSelect target value

  • Automatically bound to voiceSelect change event

setRate({ params })

  • Updates the speech rate

  • Parameters: rate (Number)

setPitch({ params })

  • Updates the speech pitch

  • Parameters: pitch (Number)

setVolume({ params })

  • Updates the speech volume

  • Parameters: volume (Number)

Example:

Item Data Attributes

When using item targets, you can customize speech parameters per element:

data-speech-text

  • Text to speak (overrides element's textContent)

data-speech-locale

  • Language/locale for this item

data-speech-voice

  • Voice name for this item

data-speech-rate

  • Speech rate for this item (Number)

data-speech-pitch

  • Speech pitch for this item (Number)

data-speech-volume

  • Speech volume for this item (Number)

Example:

Events

All events are prefixed with pwa:speech-synthesis: and dispatched on the controller element.

pwa:speech-synthesis:ready

  • Dispatched when the controller is initialized and ready

  • No event detail

pwa:speech-synthesis:unsupported

  • Dispatched if Speech Synthesis API is not supported

  • No event detail

pwa:speech-synthesis:voicesloaded

  • Dispatched when voices are loaded and available

  • Event detail:

    • voices (Array): List of available voices with name, lang, local properties

pwa:speech-synthesis:start

  • Dispatched when speech starts

  • Event detail:

    • text (String): The text being spoken

    • lang (String): The language used

    • sourceEl (Element|undefined): The source element if speaking an item

pwa:speech-synthesis:end

  • Dispatched when speech finishes

  • Event detail:

    • text (String): The text that was spoken

    • lang (String): The language used

    • sourceEl (Element|undefined): The source element if speaking an item

pwa:speech-synthesis:pause

  • Dispatched when speech is paused

  • Event detail:

    • text (String): The text being paused

    • sourceEl (Element|undefined): The source element

pwa:speech-synthesis:resume

  • Dispatched when speech is resumed

  • Event detail:

    • text (String): The text being resumed

    • sourceEl (Element|undefined): The source element

pwa:speech-synthesis:error

  • Dispatched when a speech error occurs

  • Event detail:

    • error (String): Error message or type

    • sourceEl (Element|undefined): The source element

pwa:speech-synthesis:boundary

  • Dispatched at word or sentence boundaries during speech

  • Event detail:

    • name (String): Boundary type ("word" or "sentence")

    • charIndex (Number): Character index in the text

    • charLength (Number): Length of the current word/sentence

    • elapsedTime (Number): Elapsed time in milliseconds

    • sourceEl (Element|undefined): The source element

pwa:speech-synthesis:queued

  • Dispatched when an utterance is added to the queue

  • Event detail:

    • size (Number): Current queue size

pwa:speech-synthesis:dequeue

  • Dispatched when an utterance is removed from the queue for playback

  • Event detail:

    • remaining (Number): Number of items remaining in queue

pwa:speech-synthesis:cancel

  • Dispatched when speech is canceled

  • No event detail

pwa:speech-synthesis:voicechange

  • Dispatched when the voice is changed

  • Event detail:

    • name (String): New voice name

pwa:speech-synthesis:ratechange

  • Dispatched when the rate is changed

  • Event detail:

    • rate (Number): New rate value

pwa:speech-synthesis:pitchchange

  • Dispatched when the pitch is changed

  • Event detail:

    • pitch (Number): New pitch value

pwa:speech-synthesis:volumechange

  • Dispatched when the volume is changed

  • Event detail:

    • volume (Number): New volume value

Example:

Properties

The controller exposes the following read-only properties:

voices

  • Returns array of available SpeechSynthesisVoice objects

  • Updated when voices are loaded

locales

  • Returns array of unique locale codes from available voices

  • Example: ["en-US", "fr-FR", "es-ES"]

isSpeaking

  • Returns true if currently speaking, false otherwise

  • Boolean property

Example:

Best Practices

1. Always Check Browser Support

2. Provide Visual Feedback

Always show the current state of speech synthesis:

3. Handle Long Content Appropriately

For long articles, break content into manageable chunks:

4. Use Appropriate Speech Rates

Match speech rate to content type:

  • 0.5-0.8: Learning content, complex information, pronunciation guides

  • 0.9-1.0: Standard reading, news articles, general content

  • 1.1-1.5: Quick scanning, familiar content, user preference

  • 1.6-2.0: Rapid reading for advanced users

5. Respect User Preferences

6. Optimize for Multi-Language Apps

7. Use Boundary Events for Highlighting

8. Provide Keyboard Controls

Make speech controls accessible via keyboard:

Troubleshooting

No Voices Available

Problem: Voice list is empty or voices don't load

Solutions:

  1. Wait for voicesloaded event before using voices

  2. Some browsers load voices asynchronously - the controller handles this automatically

  3. On some systems, text-to-speech voices may need to be installed separately

Speech Not Working on iOS

Problem: Speech synthesis doesn't work or requires user interaction on iOS Safari

Solutions:

  1. iOS requires user interaction before speech can play - ensure speech is triggered by user action (click, tap)

  2. Don't auto-play speech on page load

  3. Test on actual iOS devices, not just simulators

Voice Selection Not Working

Problem: Selected voice is not being used

Solutions:

  1. Ensure voice name matches exactly (case-sensitive)

  2. Voice must be available in the voices list

  3. Some voices are locale-specific - check voice's lang property matches utterance locale

Speech Cuts Off or Stops

Problem: Speech stops unexpectedly or gets interrupted

Solutions:

  1. Check for multiple speech synthesis controllers on the same page

  2. Ensure enqueue mode is set correctly

  3. Verify no JavaScript errors in console

  4. Some browsers limit queue size - break long content into smaller chunks

Rate/Pitch/Volume Not Applied

Problem: Speech parameters don't seem to work

Solutions:

  1. Some voices ignore certain parameters (especially pitch)

  2. Valid ranges: rate (0.1-10), pitch (0-2), volume (0-1)

  3. Not all voices support all parameters - try different voices

Language/Accent Mismatch

Problem: Wrong accent or language is used

Solutions:

  1. Set correct locale value (e.g., "en-US" vs "en-GB")

  2. Explicitly select a voice with desired accent using voiceValue

  3. Use data-speech-locale per item for multi-language content

Memory Issues with Long Content

Problem: Browser becomes slow or unresponsive with very long content

Solutions:

  1. Break content into smaller chunks (use item targets)

  2. Limit queue size - don't enqueue hundreds of items at once

  3. Cancel and clear queue when user navigates away

Accessibility Considerations

1. ARIA Attributes

2. Screen Reader Compatibility

The component works alongside screen readers. Provide options to disable speech synthesis if user prefers their own assistive technology:

3. Visual Indicators

Always provide visual feedback that complements audio:

Last updated

Was this helpful?