V3 — From Technical Proof to Social Validity V3 — 从技术验证到社会有效性

Prism V3 Prism V3

Cross-domain crisis detection and socially diverse user modeling through on-device personal data integration. Simple rules + rich data = effective crisis detection. No ML required. 通过设备端跨域个人数据整合实现危机检测与多元用户建模。简单规则 + 丰富数据 = 有效危机预警。无需机器学习。

Builds on Prism V2 (IIR 1.48x, federation protocol, model-scale curve) 基于 Prism V2 构建 (IIR 1.48x, 联邦协议, 模型规模曲线)

0.77
L3 Crisis F1 L3 危机 F1
14
Users (ages 15-71) 用户 (15-71岁)
1.34x
IIR (GLM, self-judged) IIR (GLM 自评)
7x
Precision gain L1→L3 精度提升 L1→L3

Core Finding 核心发现

Cross-domain signal convergence improves crisis detection precision 7x (L1: 0.10 → L3: 0.71) using only threshold rules — no machine learning, no training data, no parameter tuning. The "intelligence" comes entirely from the data architecture: having multiple independent domains that correlate only during genuine crises. 跨域信号收敛将危机检测精度提升了 7 倍 (L1: 0.10 → L3: 0.71),仅使用阈值规则——无需机器学习、无需训练数据、无需调参。"智能"完全来自数据架构:多个独立数据域仅在真实危机时才会产生相关性。

Cross-Domain Crisis Detection 跨域危机检测
Five rule-based detectors (finance, diet, mood, reading, data absence) aggregate anomaly signals into three severity levels. Cross-domain convergence is the key precision amplifier. 五个基于规则的检测器(财务、饮食、情绪、阅读、数据缺失)将异常信号聚合为三个严重程度级别。跨域信号收敛是精度提升的关键。
L1 WatchL1 关注
0.10
Precision
0.80
Recall
0.17
F1
L2 WarningL2 预警
0.57
Precision
1.00
Recall
0.73
F1
L3 CrisisL3 危机
0.71
Precision
0.83
Recall
0.77
F1

KEY关键 Precision jumps 7x from L1 to L3 purely through cross-domain convergence. Real crises perturb multiple domains simultaneously; noise affects only one. 精度从 L1 到 L3 跃升 7 倍,完全通过跨域收敛实现。真实危机会同时扰动多个域;噪声通常只影响单个域。

Drift Class漂移类型 Precision Recall F1 n
Normal 0.111 0.667 0.191 6
Unexpected 0.286 1.000 0.444 5
Severe 0.412 0.875 0.560 3
Dual-Model Ablation Study 双模型消融实验
8 data configurations × 14 users × 2 models = 224 inferences. Both architecturally distinct models confirm IIR > 1.0 across all user groups. 8 种数据配置 × 14 用户 × 2 模型 = 224 次推理。两种不同架构的模型均在所有用户组中确认 IIR > 1.0。

Qwen3.5-35B-A3B

MoE, Q8_K_XL, ~39 GB
1.17x
Average IIR (self-judged) 平均 IIR (自评)

GLM-4.7-Flash

Dense, BF16, ~55 GB
1.34x
Average IIR (self-judged) 平均 IIR (自评)
Config Data Sources数据源 Qwen GLM Type
A Dailyn (finance财务) 72.4 67.4 Single单域
B Mealens (diet饮食) 73.4 70.1 Single单域
C Ururu (mood情绪) 71.0 70.8 Single单域
D Narrus (reading阅读) 70.5 66.1 Single单域
Single avg单域均值 71.8 68.6
E Finance × Diet财务×饮食 89.3 82.6 Dual双域
F Finance × Mood财务×情绪 89.8 84.0 Dual双域
G Diet × Mood饮食×情绪 88.9 84.3 Dual双域
H Panoramic (all 4)全景 (四域) 84.2 91.8 Full全域

FINDING发现 Both models consistently show panoramic > single-domain. The cross-domain benefit is architecture-independent — it holds for both MoE (Qwen) and dense (GLM) models. 两个模型均一致显示全景 > 单域。跨域收益是架构无关的——MoE (Qwen) 和稠密模型 (GLM) 均成立。

Drift Class漂移类型 Qwen IIR GLM IIR Crisis F1危机 F1
Normal 1.19 1.39 0.191
Unexpected 1.16 1.27 0.444
Severe 1.17 1.38 0.560
Simulated Expert Blind Evaluation 模拟专家盲评
3 LLM-simulated expert personas independently rate 112 blinded insights on 5 dimensions. The evaluation reveals a critical actionability paradox. 3 位 LLM 模拟的专家角色独立评估 112 条盲化洞察,评分涵盖 5 个维度。评估揭示了关键的可操作性悖论。

Expert A

Social Worker (15y exp.) 社会工作者 (15年经验)
Focus Accuracy + Action准确性 + 可操作性

Expert B

Psychologist (PhD) 认知心理学博士
Focus Depth + Novelty深度 + 新颖性

Expert C

Data Scientist (Senior) 资深数据科学家
Focus Accuracy + Integration准确性 + 跨域整合
Config Accuracy准确性 Depth深度 Novelty新颖性 Action.可操作性 Integration整合
Single (A-D)单域 (A-D) 4.18 3.96 3.09 4.44 1.72
Dual (E-G)双域 (E-G) 3.65 4.28 3.83 3.60 3.97
Panoramic (H)全景 (H) 4.07 4.48 3.95 2.02 4.52

The Actionability Paradox 可操作性悖论

PARADOX悖论 Panoramic analysis achieves the highest depth (4.48) and integration (4.52) but the lowest actionability (2.02). Richer data produces broader insights but less specific recommendations. Solution: two-stage output (insight → action plan). 全景分析获得了最高的深度 (4.48) 和整合 (4.52),但最低的可操作性 (2.02)。更丰富的数据产生更广泛的洞察,但建议的具体性下降。解决方案:两阶段输出(洞察 → 行动计划)。

1.28x
INR (Integrated Novelty Ratio) INR (整合新颖性比)
4.52
H Integration (1-5) H 整合得分 (1-5)
2.02
H Actionability (1-5) H 可操作性 (1-5)
3
Expert Personas 专家角色
14 Users, Ages 15–71 14 位用户,年龄 15–71 岁
10 users from V2 (adjusted event severity) + 4 new socially vulnerable users. Event distribution: 6 normal / 5 unexpected / 3 severe. 10 位 V2 用户 (调整事件强度) + 4 位新增社会脆弱群体用户。事件分布:6 普通 / 5 意外 / 3 剧烈。
lixiang 15 y/o
Middle School Student初三学生 李想
Normal
Day 40: Exam rank drops 15 places 第40天:月考排名下降15名
wangguilan 71 y/o
Retired Teacher, Living Alone独居退休教师 王桂兰
Unexpected
Day 55: Falls in bathroom, hides from family 第55天:浴室摔倒,隐瞒子女
zhangxiuying 66 y/o
Caretaker Grandmother看娃老人 张秀英
Unexpected
Day 30: Parenting conflict with daughter-in-law 第30天:与儿媳育儿观念冲突
chenmo 26 y/o
Socially Disconnected Youth断亲青年 陈默
Unexpected
Day 70: Spring Festival family reunion photos trigger 5-day low 第70天:春节团圆照冲击,5天情绪低谷
user_01 22 y/o
Factory Worker工厂工人 小刘
Unexpected
Day 40: Laid off, 2 weeks no income 第40天:工厂裁员,收入中断2周
user_02 28 y/o
Delivery Rider外卖骑手 阿强
Severe
Day 46: Traffic accident → hospital → debt 第46天:交通事故 → 住院 → 负债
user_03 31 y/o
Elementary Teacher小学教师 陈老师
Normal
Day 35: Pregnancy confirmed 第35天:怀孕确认
user_04 26 y/o
Graduate Student研究生 小林
Normal
Day 45: Thesis major revision, defense postponed 第45天:论文大修,答辩推迟
user_05 27 y/o
E-commerce Ops电商运营 小周
Normal
Day 30: Promoted to team lead 第30天:晋升为组长
user_06 29 y/o
Freelance Illustrator自由插画师 小夏
Normal
Day 60: Lands ¥50K client project 第60天:接到5万大客户
user_07 33 y/o
Hospital Resident住院医 李医生
Unexpected
Day 21: Medication error → internal warning 第21天:开错药量 → 科室警告处分
user_08 32 y/o
Tech P7大厂P7 张磊
Severe
Day 55: Laid off → divorce → depression 第55天:裁员 → 离婚 → 抑郁螺旋
user_09 45 y/o
Business Owner私企老板 王总
Severe
Day 38: ¥800K bad debt → cash flow collapse 第38天:80万坏账 → 现金流断裂
user_10 38 y/o
Fund Manager基金经理 赵总
Normal
Day 25: Fund 4% drawdown 第25天:基金单日回撤4%

Green border = new V3 users. Event distribution: 6 Normal / 5 Unexpected / 3 Severe 绿色边框 = V3 新增用户。事件分布:6 普通 / 5 意外 / 3 剧烈

Evolution from V2 从 V2 到 V3 的演进
Dimension维度 V2 V3
Users用户 10 (ages 22-45) 14 (ages 15-71)
Social diversity社会多样性 Urban professionals城市白领 Student, elderly, caretaker, disconnected 学生、独居老人、看娃老人、断亲青年
Crisis detection危机检测 Qualitative only仅定性描述 L3 F1 = 0.77
Models模型 Qwen3.5 (1 family) Qwen + GLM (2 families)
Evaluation评估方法 LLM-as-Judge (Claude) 3 methods: self-judge + expert + crisis 3 种方法:自评 + 专家 + 危机
IIR 1.48x (Opus judge) 1.17-1.34x (self-judged)
Focus重点 Technical feasibility技术可行性 Social validity社会有效性