Text to speech with TTSEngine

TTSEngine loads a local TTS model and returns PCM audio samples.

Load a TTS model

import EdgeInference

let engine = TTSEngine()
let modelURL = URL(fileURLWithPath: "/path/to/tts-model")

try await engine.loadLocal(directory: modelURL)

Generate speech

let audio = try await engine.speak(
    "Hello from an on-device voice model.",
    voice: "serena"
)

print(audio.sampleRate)
print(audio.duration)

AudioResult.samples contains Float PCM samples in the [-1.0, 1.0] range.

Advanced generation

let audio = try await engine.generate(
    text: "Read this sentence naturally.",
    speaker: "serena",
    instruct: nil,
    language: "auto",
    temperature: 0.9,
    topK: 50,
    maxTokens: 2048
)

Streaming speech

Use speakStream when you want progress and audio chunks.

for try await event in engine.speakStream(
    "Generate this as streaming audio.",
    voice: "serena"
) {
    switch event {
    case .progress(let tokenID):
        print("Generated token:", tokenID)
    case .audioChunk(let chunk):
        print("Chunk", chunk.chunkIndex, chunk.audioDuration)
    case .audio(let result):
        print("Final audio:", result.duration)
    }
}

Model info

print(engine.availableSpeakers)
print(engine.ttsModelType)
print(engine.sampleRate)

Play audio

Convert AudioResult.samples to an AVAudioPCMBuffer in your agent audio layer. Keep playback code outside the engine so your agent can choose AVAudioEngine, AVAudioPlayerNode, or a custom audio pipeline.

API surface

Method	What it does
`TTSEngine()`	Create a text-to-speech engine. `@MainActor`.
`loadLocal(directory:)`	Load a local TTS model.
`speak(_:voice:)`	Generate a single `AudioResult`.
`speakStream(_:voice:)`	Stream `TTSEvent` values.
`generate(text:speaker:)`	Generate with explicit parameters.
`availableSpeakers`	Speakers exposed by the loaded model.

Full signatures → EdgeInference API Reference

Try it next

Voice assistant example — Full duplex voice pipeline.
Speech to text — Complete the voice loop.

Load a TTS model​

Generate speech​

Advanced generation​

Streaming speech​

Model info​

Play audio​

API surface​

Try it next​