Speech to text

Edge Kit exposes Developer Preview speech-to-text APIs for local transcription.

Use EdgeVoice for microphone recording. Use STTEngine from EdgeInference for native ASR in builds that include the speech runtime. WhisperEngine is currently a preview bridge for future whisper.cpp integration; in edge-kit@1.0.0-rc97 it is a skeleton and does not perform real transcription.

Record audio

import EdgeVoice

let recorder = AudioRecorder()
let recordingURL = try await recorder.startRecording()

// Later, for example after the user taps Stop:
let finalURL = recorder.stopRecording() ?? recordingURL

Transcribe an audio file with native STT

import EdgeInference

let engine = STTEngine()
let modelURL = URL(fileURLWithPath: "/path/to/asr-model")

try await engine.loadLocal(directory: modelURL)
let result = try await engine.transcribe(audioURL: finalURL)

print(result.text)

Stream native transcription

transcribeStream(audioURL:language:) returns STTStreamEvent values.

for try await event in engine.transcribeStream(audioURL: finalURL) {
    switch event {
    case .token(let text):
        print(text, terminator: "")
    case .result(let result):
        print("\nFinal:", result.text)
    default:
        break
    }
}

Whisper preview bridge

WhisperEngine remains in EdgeVoice as a preview bridge for apps that plan to embed a whisper.cpp xcframework. In the current preview tag, load(_:) only marks the skeleton as loaded, transcribe(audioURL:language:) returns a placeholder string, and startRealtime(language:) completes immediately.

Use STTEngine for runnable native ASR examples until your app provides a real Whisper binding.

Supported audio

Use file URLs recorded by AudioRecorder or WAV/PCM data prepared by your agent. Validate sample rate conversion on the devices you support.

API surface

Method	What it does
`STTEngine()`	Create a native speech-to-text engine.
`loadLocal(directory:)`	Load a local ASR model directory.
`transcribe(audioURL:)`	Transcribe an audio file.
`transcribe(samples:sampleRate:)`	Transcribe PCM samples.
`transcribeStream(audioURL:)`	Stream transcription events.
`AudioRecorder()`	Record from microphone.

Full signatures → EdgeInference API Reference and EdgeVoice API Reference

Try it next

Voice assistant example — ASR → LLM → TTS pipeline.
Text to speech — Complete the voice loop.

Record audio​

Transcribe an audio file with native STT​

Stream native transcription​

Whisper preview bridge​

Supported audio​

API surface​

Try it next​