Skip to main content

Text to speech with TTSEngine

TTSEngine loads a local TTS model and returns PCM audio samples.

Load a TTS model

import EdgeInference

let engine = TTSEngine()
let modelURL = URL(fileURLWithPath: "/path/to/tts-model")

try await engine.loadLocal(directory: modelURL)

Generate speech

let audio = try await engine.speak(
"Hello from an on-device voice model.",
voice: "serena"
)

print(audio.sampleRate)
print(audio.duration)

AudioResult.samples contains Float PCM samples in the [-1.0, 1.0] range.

Advanced generation

let audio = try await engine.generate(
text: "Read this sentence naturally.",
speaker: "serena",
instruct: nil,
language: "auto",
temperature: 0.9,
topK: 50,
maxTokens: 2048
)

Streaming speech

Use speakStream when you want progress and audio chunks.

for try await event in engine.speakStream(
"Generate this as streaming audio.",
voice: "serena"
) {
switch event {
case .progress(let tokenID):
print("Generated token:", tokenID)
case .audioChunk(let chunk):
print("Chunk", chunk.chunkIndex, chunk.audioDuration)
case .audio(let result):
print("Final audio:", result.duration)
}
}

Model info

print(engine.availableSpeakers)
print(engine.ttsModelType)
print(engine.sampleRate)

Play audio

Convert AudioResult.samples to an AVAudioPCMBuffer in your agent audio layer. Keep playback code outside the engine so your agent can choose AVAudioEngine, AVAudioPlayerNode, or a custom audio pipeline.

API surface

MethodWhat it does
TTSEngine()Create a text-to-speech engine. @MainActor.
loadLocal(directory:)Load a local TTS model.
speak(_:voice:)Generate a single AudioResult.
speakStream(_:voice:)Stream TTSEvent values.
generate(text:speaker:)Generate with explicit parameters.
availableSpeakersSpeakers exposed by the loaded model.

Full signatures → EdgeInference API Reference

Try it next