Audio

Last updated: May 15, 2026

Here you will find all the information about the most important audio settings for your AI calls.

In the technical setting, you can control the following features of your AI assistant:

Speaking Speed
Sensitivity

Speaking Speed

With speaking speed, you can control how fast your assistant speaks. Depending on the use case and target audience, it may be useful to adjust the speed. We generally recommend a speaking speed approximately in the middle of the available range.

The speaking speed only regulates the tempo at which the AI speaks its responses. It does not improve latency.

Reducing the speaking speed can improve articulation for stubborn pronunciation issues, though this does not affect latency.

Sensitivity

Sensitivity determines how easily the AI can be interrupted. Here you can control how sensitive the AI is to speech as well as to noises. Depending on the use case, it may be useful to adjust this setting.

If many of your calls take place in a noisy environment with strong background noise (for example, on a construction site), call quality can improve if you reduce the sensitivity so that the AI reacts less easily to noises.
Conversely, it may be useful to increase sensitivity if your callers typically speak quietly and in a quiet environment (for example, elderly people).

If your assistant interrupts callers who are providing multi-item lists or long sequences of information, combine reduced sensitivity with prompt instructions that explicitly tell the assistant to wait for natural pauses of 2–3 seconds before responding, and to avoid using confirmation words (“yes,” “okay”) while the caller is still speaking.

Technical Terms and Pronunciation Optimization

Additional techniques help improve the pronunciation of names and specialized terminology:

Phonetic Spelling in the Prompt: Specify the desired pronunciation directly (e.g., "Müller, pronounced 'Mül-ler'" or "Schmidt = Shmit").
Register Technical Terms: Add important names or terms as specialized vocabulary so that speech recognition captures them more reliably.
Spelling Fallback: Incorporate a safety check for critical names (e.g., "To be sure, could you please spell your last name?").
Numeric Data Formatting in Prompt: When communicating phone numbers, addresses, or other data that callers need to write down, format the output so numbers are spoken in groups, e.g., “The number is: plus four nine – two five seven one – nine nine seven nine eight – four nine”. This grouping yields naturally slower and clearer pronunciation.
- Convert numbers to words: For dates and times, spell out numbers as words rather than digits (e.g., “the thirteenth of March” instead of “13.03”).
- Add commas for pauses: In long number sequences (IDs, article numbers, postal codes), place commas between each digit to force clear articulation (e.g., “8, 9, 1, 2, 2”).
- Use month names: Always use month names (January, February, etc.) instead of month numbers to avoid confusion.
- Separate time components: For times, separate hours and minutes with the word for your time unit (e.g., “14 Uhr 30” or “2 o’clock 30”).
- Slow down for numbers: Instruct the assistant in your prompt to speak at 80% normal speed when reading numbers or dates, and to overemphasize consonants at the end of words.

These techniques are particularly valuable when standard audio settings alone are not enough to ensure consistent pronunciation.

Troubleshooting Voice Instability

If you notice that your AI assistant’s voice fluctuates in pitch or tempo during longer conversations (especially when reading out number sequences), the following adjustments may help:

Voice Processing Model for Stability

Deepgram offers more stable voice output during longer conversation segments and can reduce fluctuations in pitch and tempo. This is particularly relevant if your AI phone calls frequently involve longer monologues or the reading of number sequences.

Speech Speed for Stabilization

Reducing the speech speed can improve voice stability if it fluctuates during longer conversations. This is an effective solution for inconsistent speech output without compromising intelligibility.

Recommendation: Test these settings systematically if you notice instability in the voice output, especially during longer conversation segments or when reading out data.

Advanced Email Address and Data Recognition Techniques

Email Address Phonetic Formatting

Format emails as "name at domain punkt De-e" (or "punkt com", "punkt org", etc.)
This phonetic approach works better than standard spelling for complex email structures

Precise Information Processing Feature

Add example format patterns (e.g., vorname.nachname@firma.de, v-nachname@uni.de) to improve recognition accuracy
Particularly effective for email addresses and standardized data formats

Enhanced Email Confirmation Strategy

Add confirmation prompts: "I understood the email as 'name at domain punkt De-e'. Is that correct? If not, please spell it slowly."