Gemini 3.1 Flash TTS Preview | Models | Pixio

Gemini 3.1 Flash TTS Preview

Gemini 3.1 Flash TTS Preview on Pixio is Google's low-latency text-to-speech model for natural single-speaker narration. Use it when you want expressive voiceover from text with a selectable prebuilt voice and prompt-controlled delivery.

When to use it

You need fast text-to-speech for narration, ads, explainers, or character voiceover.
You want to steer tone, accent, pace, or style with natural language.
You want inline audio tags such as [whispers], [laughs], or [short pause].
You do not need custom voice cloning or multi-speaker dialogue in the first pass.

Inputs

Input	Notes
Text	The transcript to speak. Tags can be included inline.
Voice	One of Google's prebuilt Gemini TTS voices, such as Kore, Puck, or Sulafat.
Style Instructions	Optional performance direction for tone, pace, accent, or narrator style.

Pricing

Gemini 3.1 Flash TTS Preview costs 30 credits per 1,000 characters, rounded up per request.

Alternatives

Need	Use
Google single-speaker TTS	Gemini 3.1 Flash TTS Preview
Voice cloning or custom voices	ElevenLabs TTS or MiniMax Voice Clone
Multi-speaker dialogue	ElevenLabs Text to Dialogue
Music generation	Pixio Music, Lyria, Songcraft, or Stable Audio

Gemini 3.1 Flash TTS Preview

How to get the best out of Gemini 3.1 Flash TTS Preview

Gemini 3.1 Flash TTS Preview

How to get the best out of Gemini 3.1 Flash TTS Preview

Gemini 3.1 Flash TTS Preview

When to use it

Inputs

Pricing

Alternatives