Skip to main content

EdgeInference API reference

EdgeInference contains the primary model engines and shared inference types. Inference is powered by Edge Engine with DSR Attention for efficient long-context multi-turn sessions.

LLMEngine

@MainActor
public final class LLMEngine: ObservableObject

LLMEngine loads text-generation models and streams generated text.

Properties

PropertyTypeDescription
stateEngineStateCurrent engine state.
loadedConfigModelConfig?Registered model metadata when available.
downloadProgressDoubleDownload or load progress from 0 to 1.
lastPolicyInferencePolicy.Resolved?Last high-level policy summary.
lastMetricsInferenceMetrics?Metrics from the last completed generation.
memoryPolicyKVCacheMemoryPolicy?Automatic KV cache management policy.
promptCachePromptCacheManagerConversation cache manager.

Methods

MethodDescription
init()Creates an engine.
load(config:onProgress:)Preview metadata hook. The native default build does not download remote models here; use EdgeModelKit to prepare a local directory, then call loadLocal(directory:).
loadLocal(directory:onProgress:)Loads a local model directory.
loadLocal(directory:options:onProgress:)Loads a local model directory with runtime options such as memoryIntent.
generate(messages:tools:onToolCall:parameters:bypassPolicy:)Streams GenerateChunk values.
generateStream(messages:parameters:)Convenience stream wrapper around generate(messages:parameters:).
generateOnce(messages:parameters:)Returns a single accumulated string.
clearPromptCache()Clears conversation cache.
unload()Releases the loaded model.

VLMEngine

@MainActor
public final class VLMEngine: ObservableObject

VLMEngine loads vision-language models and streams generated text from image plus text input.

Properties

PropertyTypeDescription
stateEngineStateCurrent engine state.
downloadProgressDoubleDownload or load progress.
lastPolicyInferencePolicy.Resolved?Last high-level policy summary.
lastMetricsInferenceMetrics?Metrics from the last completed generation.
memoryPolicyKVCacheMemoryPolicy?Automatic KV cache management policy.
promptCachePromptCacheManagerConversation cache manager.
visionOffloadedBoolWhether the preview runtime is using a memory-saving vision path.

Methods

MethodDescription
init()Creates an engine.
loadLocal(directory:onProgress:)Loads a local VLM directory.
load(config:onProgress:)Preview metadata hook. The native default build does not download remote VLMs here; prepare a local directory and call loadLocal(directory:).
generate(messages:images:tools:onToolCall:parameters:)Streams text from URL images.
generate(messages:ciImages:tools:onToolCall:parameters:)Streams text from in-memory CIImage values.
generateStream(messages:images:parameters:)Convenience stream wrapper around URL-image generation.
unload()Releases the loaded model.

TTSEngine

@MainActor
public final class TTSEngine: ObservableObject

TTSEngine loads a text-to-speech model and returns PCM audio.

Properties

PropertyTypeDescription
stateEngineStateCurrent engine state.
downloadProgressDoubleLoad progress.
availableSpeakers[String]Speakers exposed by the loaded model.
ttsModelTypeStringModel type string.
sampleRateIntOutput sample rate.

Methods

MethodDescription
init()Creates an engine.
loadLocal(directory:onProgress:)Loads a local TTS model.
speak(_:voice:)Generates a single AudioResult.
generate(text:speaker:instruct:language:temperature:topK:maxTokens:)Generates with explicit parameters.
speakStream(_:voice:instruct:streamingInterval:)Streams TTSEvent values.
unload()Cancels active streaming and releases the model.
unloadAsync()Waits for active streaming cleanup before releasing the model.

STTEngine

@MainActor
public final class STTEngine

STTEngine is the native speech-to-text preview engine when the speech runtime is enabled.

MethodDescription
loadLocal(directory:)Loads a local ASR model.
transcribe(audioURL:language:maxTokens:temperature:)Transcribes an audio file.
transcribe(samples:sampleRate:language:)Transcribes PCM samples.
transcribeStream(audioURL:language:)Streams STTStreamEvent values.

EdgeRuntime

@MainActor
public final class EdgeRuntime

EdgeRuntime detects a local model category and returns an AnyEngine wrapper.

MethodDescription
loadLocal(directory:)Detects model type and loads the matching engine.
loadRecommendedModel()Loads a recommended LLM for the current device.
load(_:)Loads a registered LLM by model ID.

AnyEngine

public struct AnyEngine
PropertyTypeDescription
categoryModelCategoryDetected model category.
llmLLMEngine?LLM engine when category == .llm.
vlmVLMEngine?VLM engine when category == .vlm.
ttsTTSEngine?TTS engine when category == .tts.
sttSTTEngine?STT engine when category == .stt.

Supporting types

TypeDescription
EngineState.idle, .loading, .ready, .generating.
GenerateChunkStreaming text chunk with text.
ChatMessageRole and content pair with .system, .user, .assistant, and .tool helpers.
EdgeGenerateParametersGeneration parameters such as temperature, top-p, and max tokens.
ModelConfigRegistered model metadata and lookup helpers.
ModelCategory.llm, .vlm, .tts, .stt.
EdgeMemoryIntentProduct-level memory intent: .balanced, .longSession, .exactRecall, .batteryFriendly.
NativeRuntimeLoadOptionsLow-level native runtime options. Prefer setting only memoryIntent unless you are running a measured experiment.
InferenceMetricsTTFT, decode TPS, token counts, memory delta, and cache summary.
AudioResultPCM samples and sample rate.
AudioChunkResultStreaming audio chunk.
TTSEvent.progress, .audioChunk, .audio.
TranscriptionResultSpeech-to-text output and metrics.