V3 — From Technical Proof to Social Validity V3 — 从技术验证到社会有效性

Prism V3 Prism V3

Cross-domain crisis detection and socially diverse user modeling through on-device personal data integration. Simple rules + rich data = effective crisis detection. No ML required. 通过设备端跨域个人数据整合实现危机检测与多元用户建模。简单规则 + 丰富数据 = 有效危机预警。无需机器学习。

Builds on Prism V2 (IIR 1.48x, federation protocol, model-scale curve) 基于 Prism V2 构建 (IIR 1.48x, 联邦协议, 模型规模曲线)

0.77

L3 Crisis F1 L3 危机 F1

Users (ages 15-71) 用户 (15-71岁)

1.34x

IIR (GLM, self-judged) IIR (GLM 自评)

Precision gain L1→L3 精度提升 L1→L3

Core Finding 核心发现

Cross-domain signal convergence improves crisis detection precision 7x (L1: 0.10 → L3: 0.71) using only threshold rules — no machine learning, no training data, no parameter tuning. The "intelligence" comes entirely from the data architecture: having multiple independent domains that correlate only during genuine crises. 跨域信号收敛将危机检测精度提升了 7 倍 (L1: 0.10 → L3: 0.71)，仅使用阈值规则——无需机器学习、无需训练数据、无需调参。"智能"完全来自数据架构：多个独立数据域仅在真实危机时才会产生相关性。

Experiment 1 实验 1

Cross-Domain Crisis Detection 跨域危机检测

Five rule-based detectors (finance, diet, mood, reading, data absence) aggregate anomaly signals into three severity levels. Cross-domain convergence is the key precision amplifier. 五个基于规则的检测器（财务、饮食、情绪、阅读、数据缺失）将异常信号聚合为三个严重程度级别。跨域信号收敛是精度提升的关键。

L1 WatchL1 关注

0.10

Precision

0.80

Recall

0.17

L2 WarningL2 预警

0.57

Precision

1.00

Recall

0.73

L3 CrisisL3 危机

0.71

Precision

0.83

Recall

0.77

KEY关键 Precision jumps 7x from L1 to L3 purely through cross-domain convergence. Real crises perturb multiple domains simultaneously; noise affects only one. 精度从 L1 到 L3 跃升 7 倍，完全通过跨域收敛实现。真实危机会同时扰动多个域；噪声通常只影响单个域。

Drift Class漂移类型	Precision	Recall	F1	n
Normal	0.111	0.667	0.191	6
Unexpected	0.286	1.000	0.444	5
Severe	0.412	0.875	0.560	3

Experiment 2 实验 2

Dual-Model Ablation Study 双模型消融实验

8 data configurations × 14 users × 2 models = 224 inferences. Both architecturally distinct models confirm IIR > 1.0 across all user groups. 8 种数据配置 × 14 用户 × 2 模型 = 224 次推理。两种不同架构的模型均在所有用户组中确认 IIR > 1.0。

Qwen3.5-35B-A3B

MoE, Q8_K_XL, ~39 GB

1.17x

Average IIR (self-judged) 平均 IIR (自评)

GLM-4.7-Flash

Dense, BF16, ~55 GB

1.34x

Average IIR (self-judged) 平均 IIR (自评)

Config	Data Sources数据源	Qwen	GLM	Type
A	Dailyn (finance财务)	72.4	67.4	Single单域
B	Mealens (diet饮食)	73.4	70.1	Single单域
C	Ururu (mood情绪)	71.0	70.8	Single单域
D	Narrus (reading阅读)	70.5	66.1	Single单域
	Single avg单域均值	71.8	68.6
E	Finance × Diet财务×饮食	89.3	82.6	Dual双域
F	Finance × Mood财务×情绪	89.8	84.0	Dual双域
G	Diet × Mood饮食×情绪	88.9	84.3	Dual双域
H	Panoramic (all 4)全景 (四域)	84.2	91.8	Full全域

FINDING发现 Both models consistently show panoramic > single-domain. The cross-domain benefit is architecture-independent — it holds for both MoE (Qwen) and dense (GLM) models. 两个模型均一致显示全景 > 单域。跨域收益是架构无关的——MoE (Qwen) 和稠密模型 (GLM) 均成立。

Drift Class漂移类型	Qwen IIR	GLM IIR	Crisis F1危机 F1
Normal	1.19	1.39	0.191
Unexpected	1.16	1.27	0.444
Severe	1.17	1.38	0.560

Experiment 3 实验 3

Simulated Expert Blind Evaluation 模拟专家盲评

3 LLM-simulated expert personas independently rate 112 blinded insights on 5 dimensions. The evaluation reveals a critical actionability paradox. 3 位 LLM 模拟的专家角色独立评估 112 条盲化洞察，评分涵盖 5 个维度。评估揭示了关键的可操作性悖论。

Expert A

Social Worker (15y exp.) 社会工作者 (15年经验)

Focus Accuracy + Action准确性 + 可操作性

Expert B

Psychologist (PhD) 认知心理学博士

Focus Depth + Novelty深度 + 新颖性

Expert C

Data Scientist (Senior) 资深数据科学家

Focus Accuracy + Integration准确性 + 跨域整合

Config	Accuracy准确性	Depth深度	Novelty新颖性	Action.可操作性	Integration整合
Single (A-D)单域 (A-D)	4.18	3.96	3.09	4.44	1.72
Dual (E-G)双域 (E-G)	3.65	4.28	3.83	3.60	3.97
Panoramic (H)全景 (H)	4.07	4.48	3.95	2.02	4.52

The Actionability Paradox 可操作性悖论

PARADOX悖论 Panoramic analysis achieves the highest depth (4.48) and integration (4.52) but the lowest actionability (2.02). Richer data produces broader insights but less specific recommendations. Solution: two-stage output (insight → action plan). 全景分析获得了最高的深度 (4.48) 和整合 (4.52)，但最低的可操作性 (2.02)。更丰富的数据产生更广泛的洞察，但建议的具体性下降。解决方案：两阶段输出（洞察 → 行动计划）。

1.28x

INR (Integrated Novelty Ratio) INR (整合新颖性比)

4.52

H Integration (1-5) H 整合得分 (1-5)

2.02

H Actionability (1-5) H 可操作性 (1-5)

Expert Personas 专家角色

Population 用户群体

14 Users, Ages 15–71 14 位用户，年龄 15–71 岁

10 users from V2 (adjusted event severity) + 4 new socially vulnerable users. Event distribution: 6 normal / 5 unexpected / 3 severe. 10 位 V2 用户 (调整事件强度) + 4 位新增社会脆弱群体用户。事件分布：6 普通 / 5 意外 / 3 剧烈。

lixiang 15 y/o岁

Middle School Student初三学生李想

Normal

Day 40: Exam rank drops 15 places 第40天：月考排名下降15名

wangguilan 71 y/o岁

Retired Teacher, Living Alone独居退休教师王桂兰

Unexpected

Day 55: Falls in bathroom, hides from family 第55天：浴室摔倒，隐瞒子女

zhangxiuying 66 y/o岁

Caretaker Grandmother看娃老人张秀英

Unexpected

Day 30: Parenting conflict with daughter-in-law 第30天：与儿媳育儿观念冲突

chenmo 26 y/o岁

Socially Disconnected Youth断亲青年陈默

Unexpected

Day 70: Spring Festival family reunion photos trigger 5-day low 第70天：春节团圆照冲击，5天情绪低谷

user_01 22 y/o岁

Factory Worker工厂工人小刘

Unexpected

Day 40: Laid off, 2 weeks no income 第40天：工厂裁员，收入中断2周

user_02 28 y/o岁

Delivery Rider外卖骑手阿强

Severe

Day 46: Traffic accident → hospital → debt 第46天：交通事故 → 住院 → 负债

user_03 31 y/o岁

Elementary Teacher小学教师陈老师

Normal

Day 35: Pregnancy confirmed 第35天：怀孕确认

user_04 26 y/o岁

Graduate Student研究生小林

Normal

Day 45: Thesis major revision, defense postponed 第45天：论文大修，答辩推迟

user_05 27 y/o岁

E-commerce Ops电商运营小周

Normal

Day 30: Promoted to team lead 第30天：晋升为组长

user_06 29 y/o岁

Freelance Illustrator自由插画师小夏

Normal

Day 60: Lands ¥50K client project 第60天：接到5万大客户

user_07 33 y/o岁

Hospital Resident住院医李医生

Unexpected

Day 21: Medication error → internal warning 第21天：开错药量 → 科室警告处分

user_08 32 y/o岁

Tech P7大厂P7 张磊

Severe

Day 55: Laid off → divorce → depression 第55天：裁员 → 离婚 → 抑郁螺旋

user_09 45 y/o岁

Business Owner私企老板王总

Severe

Day 38: ¥800K bad debt → cash flow collapse 第38天：80万坏账 → 现金流断裂

user_10 38 y/o岁

Fund Manager基金经理赵总

Normal

Day 25: Fund 4% drawdown 第25天：基金单日回撤4%

Green border = new V3 users. Event distribution: 6 Normal / 5 Unexpected / 3 Severe 绿色边框 = V3 新增用户。事件分布：6 普通 / 5 意外 / 3 剧烈

V2 → V3 V2 → V3

Evolution from V2 从 V2 到 V3 的演进

Dimension维度	V2	V3
Users用户	10 (ages 22-45)	14 (ages 15-71)
Social diversity社会多样性	Urban professionals城市白领	Student, elderly, caretaker, disconnected 学生、独居老人、看娃老人、断亲青年
Crisis detection危机检测	Qualitative only仅定性描述	L3 F1 = 0.77
Models模型	Qwen3.5 (1 family)	Qwen + GLM (2 families)
Evaluation评估方法	LLM-as-Judge (Claude)	3 methods: self-judge + expert + crisis 3 种方法：自评 + 专家 + 危机
IIR	1.48x (Opus judge)	1.17-1.34x (self-judged)
Focus重点	Technical feasibility技术可行性	Social validity社会有效性