Skip to main content

Optimizing AI Videos and Audio: Voice Quality Tips

Optimize AI video and audio voice quality. Learn techniques for fixing pronunciation, controlling pacing, and creating natural-sounding delivery using punctuation, breaks, and phonetic spelling.

Updated this week

What You'll Learn


When to Use This Guide

Use these optimization techniques when your AI-generated video or audio card:

  • Mispronounces company names, acronyms, or technical terms

  • Has awkward pacing or sounds rushed

  • Needs more natural pauses between sentences or ideas

  • Requires emphasis on specific words or phrases

  • Sounds unnatural despite having correct text

These techniques work for both:

  • Video cards with AI avatars and voiceover

  • Audio cards with AI voices (you can choose from multiple voice options per language)

πŸ’‘ Not a pronunciation issue? If your video was rejected for content policy reasons, see: Avoid your video getting rejected


Quick Tips for Better AI Voice Quality

Follow these essential practices to improve your AI-generated content:

🚫 Don't mix languages
Example: Don't use English words in a Spanish script. The AI detects language automatically and mixing confuses the voice engine.

πŸ“ Spell words correctly
Make sure you've used correct spelling in your script. Misspellings can cause pronunciation errors or unexpected results.

πŸ’¬ Insert breaks if needed
You can add pauses into your script by inserting break tags:
​<break time="2s" /> (creates a 2-second pause)

✍️ Use punctuation marks
A script without proper commas and periods sounds too fast and hard to follow. Use periods, commas, hyphens, and question marks to help the AI sound natural.

πŸ—£οΈ Fix pronunciation with hyphens
Sometimes splitting words with hyphens helps the AI pronounce them correctly:
Example: "con-tent" instead of "content"

Most Important Tip

Improving voice quality is all about creative use of:

  • Periods and commas for pacing

  • Break tags for strategic pauses

  • Hyphens for pronunciation control

Don't be afraid to experiment with different combinations to get the sound you want!


Adding Breaks and Pauses

Our AI voices support SSML markup language (Speech Synthesis Markup Language). The most useful feature is the ability to add custom breaks wherever you need them.

How to Add Breaks

Wherever you want a pause in your text, simply insert (you can specify time in seconds or milliseconds):

<break time="2s" />

Example Usage

Original text:

Hey John! How are you doing today?

Problem: The default break after "John!" might feel too short or too long.

Solution: Add a custom break:

Hey John!<break time="50ms"/>How are you doing today?

When to use breaks:

  • Separate sentences for clarity

  • Add dramatic pauses for emphasis

  • Create breathing room between ideas

  • Slow down rushed sections

Works for both video and audio cards!


Correcting Pronunciation

Pronouncing company names, acronyms, business terms, or technical language can be challenging for AI. Getting pronunciation right is usually a matter of inserting hyphens or adjusting how you spell words.

Words

Try inserting hyphens to make the word sound how you want:

Example:

  • Content β†’ con-tent

  • Project β†’ pro-ject

  • Research β†’ re-search

Tip: Break words at syllable boundaries where pronunciation problems occur.

Acronyms

If you want an acronym pronounced like a word, spell it phonetically:

  • AI β†’ a-eye

  • AWS β†’ a-"double you"-s

  • NASA β†’ nassa

If you want each letter pronounced separately, add spaces between letters:

  • NYC β†’ N Y C

  • FBI β†’ F B I

  • HR β†’ H R

Numbers

Change how you spell numbers depending on how you want them to sound:

  • Ten eighty-nine β†’ 10 89

  • Two five eight six β†’ 2 5 8 6

  • One hundred and forty-eight β†’ 148

For years:

  • 2024 as "two thousand twenty-four" β†’ 2,024

  • 1999 as "nineteen ninety-nine" β†’ leave as 1999

For phone numbers: Add spaces to get natural pronunciation:
(206) 555-3131 β†’ 2 0 6 5 5 5 31 31


Using Punctuation for Natural Delivery

Punctuation isn't just for grammarβ€”it's a powerful tool for controlling AI voice delivery.

How Different Punctuation Affects Delivery

Commas (,)

  • Add shorter pauses than periods

  • Create natural breathing points

  • Help separate ideas within sentences

Periods (.)

  • Add longer breaks

  • Create downward inflection (statement tone)

  • Best for breaking long sentences into shorter ones

Quotation marks ("")

  • Add emphasis to that word or phrase

  • Makes the AI "pay attention" to specific content

  • Example: This is the "most important" step

Example: Same Sentence, Different Results

❌ Without punctuation:

Here's a demonstration of how a sentence without any breaks or commas at all compare to a sentence that has as you can see the video without can be difficult to follow because there are no breaks or pauses in it.

Result: Rushed, hard to follow

βœ… With strategic punctuation:

Here's a demonstration of how a sentence, without any breaks or commas at all, compare to a sentence that has. As you can see, the video without can be difficult to follow, because there are no breaks or pauses in it.

Result: Natural, easy to understand

Pro Tips for Punctuation

  • Questions: End with question marks to get upward inflection

  • Emphasis: Use quotes around key phrases: "the most critical step"

  • Lists: Use commas between items for natural pacing

  • Long sentences: Break into two with a period, even if grammatically you wouldn't

This works identically for video and audio cards!


Advanced Phonetic Spelling (Basic Introduction)

Sometimes hyphens aren't enough. For difficult words, you can use phonetic spelling to tell the AI exactly how to pronounce each syllable.

Basic Example

Word: Desert (the dry place, not dessert)
​Phonetic spelling: de-zert

Word: Content (the stuff, not being satisfied)
​Phonetic spelling: con-tent

When You Need Advanced Techniques

If you're still struggling with pronunciation after trying:

  • Hyphens for syllable breaks

  • Different punctuation combinations

  • SSML break tags

Then you may need advanced respelling techniques with detailed phonetic charts.

For comprehensive advanced techniques, including:

  • Full phonetic alphabet charts

  • Respelling system with :: notation

  • Complex vowel and consonant combinations

  • Emphasis techniques

  • Upward inflection methods


Supported Languages and Voices

AI-generated videos and audio cards support different language options. Below are the complete lists for each card type.

How It Works

  • Type your script in any supported language

  • Language is automatically detected from your text

  • Audio cards: Choose from multiple voice options per language (2-4 voices per language)

  • Video cards: Voice is matched to your selected avatar

Video Card Languages

Video cards support the following languages:

Afrikaans - Natural β€’ Arabic - Natural β€’ Austrian (AT) - Natural β€’ Austrian (CH) - Natural β€’ Bulgarian - Natural β€’ Burmese - Natural β€’ Catalan - Natural β€’ Chinese (CN) - Natural β€’ Chinese (HK) - Natural β€’ Chinese (TW) - Natural β€’ Croatian - Natural β€’ Czech - Natural β€’ Danish - Natural β€’ Dutch (BE) - Natural β€’ Dutch (NL) - Natural β€’ English (AU) - Natural β€’ English (CA) - Natural β€’ English (GB) - Natural β€’ English (NZ) - Natural β€’ English (US) - Professional/Natural β€’ Estonian - Natural β€’ Filipino - Natural β€’ Finnish - Natural β€’ French (BE) - Natural β€’ French (CA) - Natural β€’ French (CH) - Natural β€’ French (FR) - Natural β€’ Galician - Natural β€’ German - Natural β€’ Greek - Natural β€’ Gujarati - Natural β€’ Hebrew - Natural β€’ Hungarian - Natural β€’ Indonesian - Natural β€’ Irish - Natural β€’ Italian - Natural β€’ Japanese - Original β€’ Javanese - Natural β€’ Kannada - Original β€’ Khmer - Natural β€’ Korean - Natural β€’ Latvian - Natural β€’ Lithuanian - Natural β€’ Malay - Natural β€’ Maltese - Natural β€’ Marathi - Natural β€’ Norwegian - Natural β€’ Persian - Natural β€’ Polish - Natural β€’ Portuguese (BR) - Natural β€’ Portuguese (PT) - Natural β€’ Romanian - Natural β€’ Russian - Natural β€’ Slovak - Natural β€’ Slovenian - Natural β€’ Spanish (ES) - Natural β€’ Spanish (MX) - Natural β€’ Spanish (US) - Natural β€’ Swedish - Natural β€’ Thai - Natural β€’ Ukrainian - Natural β€’ Vietnamese - Default β€’ Welsh - Natural β€’ Zulu - Natural

Audio Card Languages

Audio cards support the following languages with multiple voice options per language:

Chinese (Mandarin) β€’ Chinese (Cantonese) β€’ English (US) β€’ English (UK) β€’ English (AU) β€’ French β€’ French (BE) β€’ French (CA) β€’ German β€’ Italian β€’ Japanese β€’ Korean β€’ Portuguese β€’ Portuguese (BR) β€’ Spanish β€’ Spanish (MX)

Note: Each audio card language includes 2-4 different voice options. If one voice struggles with specific pronunciations, try selecting a different voice to see if it handles your content more naturally.

Choosing Between Video and Audio

Use video cards when:

  • You want visual engagement with an avatar

  • Content benefits from facial expressions and presence

  • You need the broader language support (60+ languages)

Use audio cards when:

  • Visual elements aren't necessary for comprehension

  • You want faster generation times

  • You prefer voice-only delivery

  • You want to test pronunciation before committing to video


Tips for Specific Card Types

Video Cards

  • Avatar selection matters: Some avatars may handle certain pronunciations better than others

  • Regeneration is quick: If pronunciation isn't right, adjust your script and regenerate (5-10 minutes)

  • Subtitles are permanent: Yellow auto-subtitles are burnt into the video and can't be edited afterward, so test pronunciation before finalizing

Audio Cards

  • Try different voices: Each language has multiple voice optionsβ€”some may pronounce your specific content more naturally

  • Voice selection affects tone: Different voices have different delivery styles

  • Character limits: Same 700-character limit as video cards (~1 minute of audio)

  • Faster testing: Audio cards generate faster than videos, making them good for testing pronunciation techniques


Quick Troubleshooting Guide

Problem

Solution

Word is mispronounced

Try hyphens: "pro-ject" instead of "project"

Sounds too fast

Add commas for shorter pauses, periods for longer ones

Need specific pause

Use break tags: <break time="1s" />

Acronym sounds wrong

Add spaces for letters: "N Y C" or use phonetic: "nassa"

Number sounds weird

Spell differently: "2 5 8 6" instead of "2586"

No emphasis on key word

Use quotes: "This is the 'most important' step"

Unnatural inflection

Try different punctuation combinations

Still not working

See Advanced Pronunciation Techniques for detailed phonetic spelling


Common Questions

Q: Do these techniques work the same for video and audio cards?

A: Yes! Both use the same AI voice technology, so all techniques (hyphens, breaks, punctuation, phonetic spelling) work identically for both card types.

Q: Can I use these techniques in multiple languages?

A: Absolutely. Hyphens, break tags, and punctuation work across all 40+ supported languages. The phonetic spelling techniques may need language-specific adjustments.

Q: How do I know which voice to use for audio cards?

A: Try generating with different voices to hear which one handles your specific content best. Each voice has slightly different characteristics and may pronounce certain words more naturally.

Q: Why does the same script sound different with different avatars?

A: Each video avatar is paired with a specific voice profile that may have subtle pronunciation differences. If one avatar struggles with your content, try selecting a different one.

Q: Can I preview before finalizing?

A: For video cards, you need to wait for full generation (up to 10 minutes). For audio cards, generation is typically faster, making them good for testing pronunciation before committing to a video version.

Q: What if none of these techniques work?

A: Check out our Advanced Pronunciation Techniques guide, which includes detailed phonetic spelling charts and advanced emphasis methods. If you're still stuck, contact supportβ€”we can help!

Q: Do punctuation changes affect the subtitles/captions?

A: Yes, punctuation appears in auto-generated subtitles and closed captions, so consider readability when adding commas, periods, or quotes for voice control.

Q: Is there a limit to how many break tags I can use?

A: No specific limit, but excessive breaks make content feel choppy. Use them strategically for natural pacing.


Related Resources

Video and audio card guides:

Need help? Contact 7taps support through the Help button in your course editor or email our support team.


This article is part of the 7taps Help Center. For more guides on creating effective microlearning, visit our complete documentation.

Did this answer your question?