Cross-Domain Personal Data Integration on Consumer Hardware Produces Emergent Insights
消费级硬件上的跨域个人数据整合产生涌现洞察
Large Model + Small Data
Smart strangers who don't know you
大模型 + 小数据
博学的陌生人,不认识你
Medium Model + Rich Data
Your personal AI that truly knows you
中等模型 + 大数据
真正了解你的个人 AI
Same model, same users, only the data input changes. Full panoramic integration (Config H) consistently outperforms any single-domain or dual-domain configuration. 相同模型,相同用户,只改变数据输入。全景整合(配置 H)在所有用户上持续优于任何单域或双域配置。
| Config配置 | Data Sources数据来源 | Avg Score均分 | |
|---|---|---|---|
| A | Finance only (Dailyn)仅财务 (Dailyn) | 66.3 | |
| B | Diet only (Mealens)仅饮食 (Mealens) | 65.1 | |
| C | Mood only (Ururu)仅情绪 (Ururu) | 63.2 | |
| D | Reading only (Narrus)仅阅读 (Narrus) | 55.4 | |
| E | Finance + Diet财务 + 饮食 | 76.6 | |
| F | Finance + Mood财务 + 情绪 | 76.3 | |
| G | Diet + Mood饮食 + 情绪 | 74.1 | |
| H | Full Panoramic (All 4)全景整合 (全部 4 域) | 92.6 |
IIR ≥ 1.0 indicates cross-domain integration adds value. Target: ≥ 1.5x IIR ≥ 1.0 表示跨域整合产生增量价值。目标:≥ 1.5x
The jump from 2B to 9B (+14.9) is nearly 3x the gain from 9B to 35B (+5.6). A 9B model captures most of the insight quality, validating the "medium model" thesis. 从 2B 到 9B 的提升 (+14.9) 是 9B 到 35B (+5.6) 的近 3 倍。9B 模型已捕获大部分洞察质量,验证了"中等模型"命题。
All models run locally on Apple Silicon via MLX. No cloud, no API calls, no data leaves the device. 所有模型通过 MLX 在 Apple Silicon 上本地运行。无云端、无 API 调用、数据不离开设备。
| Device设备 | Model模型 | TPS | TTFT |
|---|---|---|---|
| M2 Ultra 192G | 0.8B Q8 | 137.2 | 0.088s |
| M2 Ultra 192G | 2B Q8 | 105.1 | 0.139s |
| M2 Ultra 192G | 35B-A3B Q8 | 49.9 | 0.365s |
| M2 Ultra 192G | 9B Q8 | 41.3 | 0.471s |
| M1 Max 32G | 9B Q8 | 21.9 | 1.138s |
| M2 Pro 32G | 9B Q8 | 12.7 | 1.762s |
Vertical apps collect data naturally. The home server federates summaries — never raw data — across devices to produce panoramic insights. 垂直应用自然采集数据。家庭服务器联邦式聚合摘要 — 从不接触原始数据 — 跨设备生成全景洞察。
Raw data never leaves the originating device. Only compressed summaries traverse the local network. The federation protocol achieves 125.5x compression while preserving insight quality. 原始数据从不离开产生它的设备。只有压缩摘要在局域网内传输。联邦协议实现 125.5x 压缩比,同时保持洞察质量。
Each model-generated insight is scored by Claude Opus 4.6 (Anthropic) on four dimensions, each 0–25 points, for a total of 0–100. The judge receives the user's full data context and the generated insight, producing scores with detailed justifications. 每条模型生成的洞察由 Claude Opus 4.6(Anthropic)在四个维度上评分,每维度 0–25 分,满分 100。评审模型接收用户完整数据上下文和生成的洞察,给出带有详细理由的评分。
| Dimension维度 | Score分值 | What It Measures衡量内容 |
|---|---|---|
| Relevance相关性 | 0–25 | Does the insight address something meaningful in the user's life?洞察是否涉及用户生活中有意义的内容? |
| Specificity具体性 | 0–25 | Does it reference concrete data points, or is it generic advice?是否引用了具体数据,还是泛泛而谈? |
| Cross-Domain跨域性 | 0–25 | Does it connect patterns across multiple life domains?是否关联了多个生活领域的模式? |
| Actionability可操作性 | 0–25 | Can the user take a concrete action based on this insight?用户是否能基于此洞察采取具体行动? |
Methodology reference: Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena (Zheng et al., NeurIPS 2023) 方法论参考:Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena(Zheng et al., NeurIPS 2023)
Each user has 90 days of synthetic data across 4 apps with an injected life event to simulate temporal drift. 每位用户拥有 4 个应用、90 天的合成数据,并注入生活事件以模拟时序漂移。
Early Warning for Life Crises 生活危机的预警系统
Cross-domain data integration enables detection of cascading life crises that single-domain analysis cannot see — from financial stress spiraling into health emergencies, to burnout escalating into medical errors. 跨域数据整合使系统能够检测到单域分析无法发现的连锁生活危机 — 从经济压力演变为健康紧急事件,到过度疲劳升级为医疗事故。
Privacy is essential: users are more likely to maintain honest records when they trust that no external party will see their data. 隐私至关重要:当用户确信没有外部方能看到数据时,他们更愿意保持真实的记录。