feat(asl): Implement full-text screening core LLM service and validation system (Day 1-3)
Core Components: - PDFStorageService with Dify/OSS adapters - LLM12FieldsService with Nougat-first + dual-model + 3-layer JSON parsing - PromptBuilder for dynamic prompt assembly - MedicalLogicValidator with 5 rules + fault tolerance - EvidenceChainValidator for citation integrity - ConflictDetectionService for dual-model comparison Prompt Engineering: - System Prompt (6601 chars, Section-Aware strategy) - User Prompt template (PICOS context injection) - JSON Schema (12 fields constraints) - Cochrane standards (not loaded in MVP) Key Innovations: - 3-layer JSON parsing (JSON.parse + json-repair + code block extraction) - Promise.allSettled for dual-model fault tolerance - safeGetFieldValue for robust field extraction - Mixed CN/EN token calculation Integration Tests: - integration-test.ts (full test) - quick-test.ts (quick test) - cached-result-test.ts (fault tolerance test) Documentation Updates: - Development record (Day 2-3 summary) - Quality assurance strategy (full-text screening) - Development plan (progress update) - Module status (v1.1 update) - Technical debt (10 new items) Test Results: - JSON parsing success rate: 100% - Medical logic validation: 5/5 passed - Dual-model parallel processing: OK - Cost per PDF: CNY 0.10 Files: 238 changed, 14383 insertions(+), 32 deletions(-) Docs: docs/03-涓氬姟妯″潡/ASL-AI鏅鸿兘鏂囩尞/05-寮€鍙戣褰?2025-11-22_Day2-Day3_LLM鏈嶅姟涓庨獙璇佺郴缁熷紑鍙?md
This commit is contained in:
14
backend/test-output/integration-test-results.json
Normal file
14
backend/test-output/integration-test-results.json
Normal file
@@ -0,0 +1,14 @@
|
||||
[
|
||||
{
|
||||
"pdfFile": "rayyan-256859669.pdf",
|
||||
"error": "PDF extraction failed"
|
||||
},
|
||||
{
|
||||
"pdfFile": "rayyan-256859738.pdf",
|
||||
"error": "PDF extraction failed"
|
||||
},
|
||||
{
|
||||
"pdfFile": "rayyan-256859745.pdf",
|
||||
"error": "PDF extraction failed"
|
||||
}
|
||||
]
|
||||
589
backend/test-output/quick-test-result.json
Normal file
589
backend/test-output/quick-test-result.json
Normal file
@@ -0,0 +1,589 @@
|
||||
{
|
||||
"pdf": "rayyan-256859669.pdf",
|
||||
"duration": 133,
|
||||
"totalCost": 0.091123,
|
||||
"degradedMode": false,
|
||||
"modelA": {
|
||||
"model": "deepseek-v3",
|
||||
"cost": 0.019511,
|
||||
"tokenUsage": 19511,
|
||||
"extractionMethod": "pymupdf",
|
||||
"logicValid": true,
|
||||
"evidenceComplete": true,
|
||||
"result": {
|
||||
"fields": {
|
||||
"文献来源": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "Cilostazol Addition to Aspirin could not Reduce the Neurological Deterioration in TOAST Subtypes: ADS Post-Hoc Analysis. Junya Aoki, MD, et al. Journal of Stroke and Cerebrovascular Diseases, Vol. 30, No. 2 (February), 2021: 105494. https://doi.org/10.1016/j.jstrokecerebrovasdis.2020.105494. Received September 21, 2020; revision received November 15, 2020; accepted November 20, 2020.",
|
||||
"location": {
|
||||
"section": "Title page",
|
||||
"subsection": "Header and footer",
|
||||
"paragraph": 1,
|
||||
"page": 1
|
||||
},
|
||||
"keywords": [
|
||||
"Journal of Stroke and Cerebrovascular Diseases",
|
||||
"105494",
|
||||
"2021"
|
||||
]
|
||||
},
|
||||
"reasoning": "论文提供了完整的文献来源信息:第一作者、年份、期刊名称、卷期号、页码和DOI号,符合完整标准",
|
||||
"confidence": 0.99,
|
||||
"cochrane_assessment": "Not applicable"
|
||||
},
|
||||
"研究类型": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "This post-hoc study extracted the patient data from the ADS registry, a multicenter, prospective, randomized, open-label trial that evaluated the safety and efficacy of acute aspirin plus cilostazol dual therapy in patients with non-cardioembolic stroke within 48 h of symptom onset. The ADS trial was performed between May 2011 and June 2017 and involved 34 centers in Japan. Patients were randomly allocated to either the DAPT group or the aspirin group.",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Patient registry",
|
||||
"paragraph": 1,
|
||||
"page": 2
|
||||
},
|
||||
"keywords": [
|
||||
"randomized",
|
||||
"open-label trial",
|
||||
"multicenter",
|
||||
"prospective"
|
||||
]
|
||||
},
|
||||
"reasoning": "明确说明为随机、开放标签、多中心临床试验的事后分析,研究类型描述清晰完整",
|
||||
"confidence": 0.98,
|
||||
"cochrane_assessment": "Not applicable"
|
||||
},
|
||||
"研究设计细节": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "The ADS trial was performed between May 2011 and June 2017 and involved 34 centers in Japan. Patients were randomly allocated to either the DAPT group or the aspirin group. The DAPT group was treated with cilostazol (200 mg/day) and aspirin (80-200 mg/day) for 14 days, while the aspirin group was treated with only aspirin (80-200 mg/day) for 14 days. Concomitant anticoagulant therapy with heparin and argatroban was permitted since it was widely used in clinical practice in Japan during the study period.",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Patient registry",
|
||||
"paragraph": 1,
|
||||
"page": 2
|
||||
},
|
||||
"keywords": [
|
||||
"34 centers",
|
||||
"14 days",
|
||||
"May 2011 and June 2017",
|
||||
"follow-up period"
|
||||
]
|
||||
},
|
||||
"reasoning": "提供了完整的研究设计细节:研究时间范围、中心数量、随访时间、治疗持续时间,符合完整标准",
|
||||
"confidence": 0.95,
|
||||
"cochrane_assessment": "Not applicable"
|
||||
},
|
||||
"疾病诊断标准": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "Stroke etiologies were re-classified based on the TOAST criteria by a certified vascular neurologist (J.A), and only patients who were diagnosed with ischemic stroke due to LAA, SVO, Others, or Undetermined etiologies were analyzed. According to the TOAST criteria, ischemic stroke is divided into five subgroups: large artery atherosclerosis (LAA), small vessel occlusion (SVO), cardioembolic stroke (CES), other determined etiology (Others), and undetermined etiology (Undetermined).",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Inclusion and exclusion criteria",
|
||||
"paragraph": 2,
|
||||
"page": 3
|
||||
},
|
||||
"keywords": [
|
||||
"TOAST criteria",
|
||||
"certified vascular neurologist",
|
||||
"LAA",
|
||||
"SVO",
|
||||
"Others",
|
||||
"Undetermined"
|
||||
]
|
||||
},
|
||||
"reasoning": "明确使用TOAST分类标准,由认证血管神经科医生重新分类,诊断标准清晰完整",
|
||||
"confidence": 0.96,
|
||||
"cochrane_assessment": "Not applicable"
|
||||
},
|
||||
"人群特征": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "Between February 2011 and March 2017, 1208 patients were enrolled in the ADS trial. Seven patients withdrew their consent after the study started, 10 patients were lost to follow-up at 14 days, and 125 patients discontinued the allocated therapy. Of the remaining 1066 patients, 1022 (686 [67%] men; median age [interquartile range], 69 [60-77] years old; initial NIHSS score, 2 [1-4]) with non-cardioembolic stroke were analyzed. In total, 164 (16%), 630 (62%), 70 (7%), and 158 (15%) patients were diagnosed with ischemic stroke due to LAA, SVO, Other, and Undetermined etiologies, respectively.",
|
||||
"location": {
|
||||
"section": "Results",
|
||||
"subsection": "Patient enrollment",
|
||||
"paragraph": 1,
|
||||
"page": 3
|
||||
},
|
||||
"keywords": [
|
||||
"1022 patients",
|
||||
"67% men",
|
||||
"median age 69",
|
||||
"NIHSS score 2",
|
||||
"stroke subtypes"
|
||||
]
|
||||
},
|
||||
"reasoning": "提供了完整的人群特征:总样本量、性别分布、年龄范围、基线NIHSS评分、各亚型分布,信息充分完整",
|
||||
"confidence": 0.97,
|
||||
"cochrane_assessment": "Not applicable"
|
||||
},
|
||||
"基线数据": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "Table 1 shows the clinical backgrounds based on the stroke subtypes. Patients in the SVO group and those in the LAA group were younger and older, respectively, than those in other groups (p = 0.001). Dyslipidemia was less frequent in the Undetermined group than in others (p = 0.047). Systolic and diastolic blood pressures were the highest in the SVO group (p = 0.003 and p = 0.001, respectively). The proportion of aspirin therapy before stroke was the highest in the LAA group (p = 0.032).",
|
||||
"location": {
|
||||
"section": "Results",
|
||||
"subsection": "Subtype analysis",
|
||||
"paragraph": 1,
|
||||
"page": 3
|
||||
},
|
||||
"keywords": [
|
||||
"Table 1",
|
||||
"clinical backgrounds",
|
||||
"blood pressure",
|
||||
"dyslipidemia",
|
||||
"aspirin therapy"
|
||||
]
|
||||
},
|
||||
"reasoning": "通过Table 1提供了详细的基线数据,包括人口统计学特征、合并症、血压、实验室检查等,基线数据完整",
|
||||
"confidence": 0.95,
|
||||
"cochrane_assessment": "Not applicable"
|
||||
},
|
||||
"干预措施": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "The DAPT group was treated with cilostazol (200 mg/day) and aspirin (80-200 mg/day) for 14 days, while the aspirin group was treated with only aspirin (80-200 mg/day) for 14 days. Concomitant anticoagulant therapy with heparin and argatroban was permitted since it was widely used in clinical practice in Japan during the study period.",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Patient registry",
|
||||
"paragraph": 1,
|
||||
"page": 2
|
||||
},
|
||||
"keywords": [
|
||||
"cilostazol 200 mg/day",
|
||||
"aspirin 80-200 mg/day",
|
||||
"14 days",
|
||||
"concomitant anticoagulant"
|
||||
]
|
||||
},
|
||||
"reasoning": "明确描述了干预措施的具体药物、剂量、疗程,以及允许的合并用药,干预措施描述完整",
|
||||
"confidence": 0.96,
|
||||
"cochrane_assessment": "Not applicable"
|
||||
},
|
||||
"对照措施": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "Patients were randomly allocated to either the DAPT group or the aspirin group. The DAPT group was treated with cilostazol (200 mg/day) and aspirin (80-200 mg/day) for 14 days, while the aspirin group was treated with only aspirin (80-200 mg/day) for 14 days.",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Patient registry",
|
||||
"paragraph": 1,
|
||||
"page": 2
|
||||
},
|
||||
"keywords": [
|
||||
"aspirin group",
|
||||
"aspirin 80-200 mg/day",
|
||||
"14 days",
|
||||
"monotherapy"
|
||||
]
|
||||
},
|
||||
"reasoning": "明确描述了对照组使用阿司匹林单药治疗,剂量和疗程与干预组一致,对照措施描述完整",
|
||||
"confidence": 0.95,
|
||||
"cochrane_assessment": "Not applicable"
|
||||
},
|
||||
"结局指标": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "In the ADS, the following primary outcomes were evaluated: neurological worsening, transient ischemic attack (TIA), and stroke recurrence within 14 days. Neurological deterioration included neurological progression with an NIHSS score of ≥2 and recurrent ischemic stroke or TIA within 14 days, as defined in the ADS. Fourteen days after stroke onset, 104 (10%) of the 1022 patients showed neurological deterioration—53 (11%) patients in the DAPT group and in 51 (10%) patients in the aspirin group (p = 0.469).",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Patient registry and Study purpose",
|
||||
"paragraph": 2,
|
||||
"page": 2
|
||||
},
|
||||
"keywords": [
|
||||
"neurological worsening",
|
||||
"TIA",
|
||||
"stroke recurrence",
|
||||
"NIHSS score ≥2",
|
||||
"14 days"
|
||||
]
|
||||
},
|
||||
"reasoning": "明确定义了主要结局指标(神经功能恶化、TIA、卒中复发)及其具体标准(NIHSS评分增加≥2),结果报告完整",
|
||||
"confidence": 0.97,
|
||||
"cochrane_assessment": "Not applicable"
|
||||
},
|
||||
"统计方法": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "The Mann-Whitney U test was used to analyze differences in continuous variables, and Fisher's exact test and Pearson chi-square were used to analyze differences in categorical variables. The data are presented as median values (interquartile range [IQR]) or frequencies (%). Variables identified on univariate analyses with p values <0.1 as well as the age and gender were entered into the multivariate analysis. The relative risks of complete recanalization at 24 h were expressed as odds ratios (OR) with 95% confidence intervals (CIs). All statistical analyses were performed using the SPSS software program, version 22.",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Statistical analyses",
|
||||
"paragraph": 1,
|
||||
"page": 3
|
||||
},
|
||||
"keywords": [
|
||||
"Mann-Whitney U test",
|
||||
"Fisher's exact test",
|
||||
"multivariate analysis",
|
||||
"odds ratios",
|
||||
"SPSS"
|
||||
]
|
||||
},
|
||||
"reasoning": "详细描述了统计分析方法:连续变量和分类变量的检验方法、多变量分析入选标准、结果表达方式、软件版本,统计方法完整",
|
||||
"confidence": 0.96,
|
||||
"cochrane_assessment": "Not applicable"
|
||||
},
|
||||
"质量评价": {
|
||||
"assessment": "不完整",
|
||||
"evidence": {
|
||||
"quote": "This post-hoc study extracted the patient data from the ADS registry, a multicenter, prospective, randomized, open-label trial. Patients were randomly allocated to either the DAPT group or the aspirin group. In this post-hoc analysis, only patients with ischemic stroke patients who had successfully continued the allocated therapy and received the 14-day assessment were included. Therefore, patients who were unable to continue the therapy for reasons including side effects and allergic reactions and those who did not receive the 14-day assessment were excluded.",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Patient registry and Inclusion criteria",
|
||||
"paragraph": 1,
|
||||
"page": "2-3"
|
||||
},
|
||||
"keywords": [
|
||||
"randomized",
|
||||
"open-label",
|
||||
"excluded patients",
|
||||
"per-protocol analysis"
|
||||
]
|
||||
},
|
||||
"reasoning": "研究为开放标签设计,未描述盲法实施;随机化方法未具体说明(如序列生成、分配隐藏);采用符合方案集分析而非ITT分析;失访处理描述不充分",
|
||||
"confidence": 0.85,
|
||||
"cochrane_assessment": "High risk",
|
||||
"cochrane_details": {
|
||||
"domains": {
|
||||
"随机化过程": {
|
||||
"risk": "Unclear risk",
|
||||
"reasoning": "仅提到'随机分配',未描述随机序列生成方法和分配隐藏措施"
|
||||
},
|
||||
"偏离预期干预": {
|
||||
"risk": "High risk",
|
||||
"reasoning": "开放标签设计,未实施盲法,可能影响干预实施和结局评估"
|
||||
},
|
||||
"结局数据缺失": {
|
||||
"risk": "Some concerns",
|
||||
"reasoning": "排除未能完成治疗和14天评估的患者(125人),可能引入偏倚"
|
||||
},
|
||||
"结局测量": {
|
||||
"risk": "High risk",
|
||||
"reasoning": "神经功能恶化评估可能受开放标签设计影响"
|
||||
},
|
||||
"选择性报告结果": {
|
||||
"risk": "Low risk",
|
||||
"reasoning": "报告了所有预设结局指标,未见选择性报告"
|
||||
}
|
||||
},
|
||||
"overall_bias_risk": "High risk"
|
||||
}
|
||||
},
|
||||
"其他信息": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "Sources of Funding: None. Declaration of Competing Interest: The authors have no conflicts of interest or funding sources to disclose. Acknowledgments: None. This study was approved by the institutional review board of our institutions.",
|
||||
"location": {
|
||||
"section": "End of article",
|
||||
"subsection": "Funding and conflicts",
|
||||
"paragraph": 1,
|
||||
"page": 9
|
||||
},
|
||||
"keywords": [
|
||||
"no funding",
|
||||
"no conflicts of interest",
|
||||
"institutional review board approved"
|
||||
]
|
||||
},
|
||||
"reasoning": "明确声明无资金来源、无利益冲突、获得伦理委员会批准,其他信息完整",
|
||||
"confidence": 0.98,
|
||||
"cochrane_assessment": "Not applicable"
|
||||
}
|
||||
},
|
||||
"processing_log": {
|
||||
"sections_reviewed": [
|
||||
"Abstract",
|
||||
"Introduction",
|
||||
"Methods",
|
||||
"Results",
|
||||
"Discussion",
|
||||
"Tables",
|
||||
"Figures",
|
||||
"References"
|
||||
],
|
||||
"paragraphs_read_per_section": {
|
||||
"Methods": 5,
|
||||
"Results": 8
|
||||
},
|
||||
"middle_sections_attention": true,
|
||||
"total_processing_time_estimate": "18 minutes"
|
||||
},
|
||||
"verification": {
|
||||
"keywords_searched": [
|
||||
"randomization",
|
||||
"blinding",
|
||||
"ITT",
|
||||
"baseline",
|
||||
"dropout",
|
||||
"allocation",
|
||||
"sequence",
|
||||
"intention-to-treat",
|
||||
"lost to follow-up"
|
||||
],
|
||||
"reread_count": 2,
|
||||
"found_missed_info": true,
|
||||
"cross_section_conflicts": [
|
||||
{
|
||||
"conflict": "Methods提到随机分配,但未描述具体随机化方法",
|
||||
"locations": [
|
||||
"Methods section paragraph 1",
|
||||
"No additional details found"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
"metadata": {
|
||||
"model_name": "deepseek-v3",
|
||||
"processing_date": "2025-11-22T14:02:55.336Z",
|
||||
"document_format": "plaintext",
|
||||
"estimated_word_count": 27758
|
||||
}
|
||||
}
|
||||
},
|
||||
"modelB": {
|
||||
"model": "qwen-max",
|
||||
"cost": 0.071612,
|
||||
"tokenUsage": 17903,
|
||||
"extractionMethod": "pymupdf",
|
||||
"logicValid": true,
|
||||
"evidenceComplete": true,
|
||||
"result": {
|
||||
"fields": {
|
||||
"文献来源": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "Junya Aoki, MD, Yasuyuki Iguchi, MD, Takao Urabe, MD, et al. Cilostazol Addition to Aspirin could not Reduce the Neurological Deterioration in TOAST Subtypes: ADS Post-Hoc Analysis. Journal of Stroke and Cerebrovascular Diseases, Vol. 30, No. 2 (February), 2021: 105494. https://doi.org/10.1016/j.jstrokecerebrovasdis.2020.105494",
|
||||
"location": {
|
||||
"section": "Abstract",
|
||||
"paragraph": 1
|
||||
}
|
||||
},
|
||||
"reasoning": "提供了第一作者、年份、期刊和DOI,信息完整。",
|
||||
"confidence": 0.95
|
||||
},
|
||||
"研究类型": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "This post-hoc study extracted the patient data from the ADS registry, a multicenter, prospective, randomized, open-label trial that evaluated the safety and efficacy of acute aspirin plus cilostazol dual therapy in patients with non-cardioembolic stroke within 48 h of symptom onset.",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Patient registry",
|
||||
"paragraph": 1
|
||||
}
|
||||
},
|
||||
"reasoning": "明确指出这是一项多中心、前瞻性、随机、开放标签的试验。",
|
||||
"confidence": 0.95
|
||||
},
|
||||
"研究设计细节": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "The ADS trial was performed between May 2011 and June 2017 and involved 34 centers in Japan. Patients were randomly allocated to either the DAPT group or the aspirin group. The DAPT group was treated with cilostazol (200 mg/day) and aspirin (80-200 mg/day) for 14 days, while the aspirin group was treated with only aspirin (80-200 mg/day) for 14 days. Concomitant anticoagulant therapy with heparin and argatroban was permitted since it was widely used in clinical practice in Japan during the study period.",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Patient registry",
|
||||
"paragraph": 2
|
||||
}
|
||||
},
|
||||
"reasoning": "详细描述了研究的时间范围、参与中心数量、随访时间(14天)以及数据来源。",
|
||||
"confidence": 0.95
|
||||
},
|
||||
"疾病诊断标准": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "Stroke subtypes included large-artery athero-sclerosis (LAA), small vessel occlusion (SVO), other determined etiology (Others), and undetermined etiology of stroke (Undetermined).",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Study purpose",
|
||||
"paragraph": 1
|
||||
}
|
||||
},
|
||||
"reasoning": "使用TOAST分类标准对卒中亚型进行了详细的定义。",
|
||||
"confidence": 0.95
|
||||
},
|
||||
"人群特征": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "Between February 2011 and March 2017, 1208 patients were enrolled in the ADS trial. Seven patients withdrew their consent after the study started, 10 patients were lost to follow-up at 14 days, and 125 patients discontinued the allocated therapy. Of the remaining 1066 patients, 1022 (686 [67%] men; median age [interquartile range], 69 [60-77] years old; initial NIHSS score, 2 [1-4]) with non-cardioembolic stroke were analyzed.",
|
||||
"location": {
|
||||
"section": "Results",
|
||||
"paragraph": 1
|
||||
}
|
||||
},
|
||||
"reasoning": "提供了样本量、性别比例、年龄分布和初始NIHSS评分等详细的人口统计学信息。",
|
||||
"confidence": 0.95
|
||||
},
|
||||
"基线数据": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "Table 1 shows the clinical backgrounds based on the stroke subtypes. In total, 164 (16%), 630 (62%), 70 (7%), and 158 (15%) patients were diagnosed with ischemic stroke due to LAA, SVO, Other, and Undetermined etiologies, respectively. Patients in the SVO group and those in the LAA group were younger and older, respectively, than those in other groups (p = 0.001). Dyslipidemia was less frequent in the Undetermined group than in others (p = 0.047). Systolic and diastolic blood pressures were the highest in the SVO group (p = 0.003 and p = 0.001, respectively). The proportion of aspirin therapy before stroke was the highest in the LAA group (p = 0.032).",
|
||||
"location": {
|
||||
"section": "Results",
|
||||
"paragraph": 2
|
||||
}
|
||||
},
|
||||
"reasoning": "提供了不同卒中亚型的基线数据,包括年龄、性别、高血压、糖尿病、血脂异常等。",
|
||||
"confidence": 0.95
|
||||
},
|
||||
"干预措施": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "The DAPT group was treated with cilostazol (200 mg/day) and aspirin (80-200 mg/day) for 14 days, while the aspirin group was treated with only aspirin (80-200 mg/day) for 14 days.",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Patient registry",
|
||||
"paragraph": 2
|
||||
}
|
||||
},
|
||||
"reasoning": "详细描述了DAPT组和阿司匹林组的药物剂量和疗程。",
|
||||
"confidence": 0.95
|
||||
},
|
||||
"对照措施": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "The aspirin group was treated with only aspirin (80-200 mg/day) for 14 days.",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Patient registry",
|
||||
"paragraph": 2
|
||||
}
|
||||
},
|
||||
"reasoning": "明确描述了对照组的治疗方案。",
|
||||
"confidence": 0.95
|
||||
},
|
||||
"结局指标": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "In the ADS, the following primary outcomes were evaluated: neurological worsening, transient ischemic attack (TIA), and stroke recurrence within 14 days. Neurological deterioration included neurological progression with an NIHSS score of ≥2 and recurrent ischemic stroke or TIA within 14 days, as defined in the ADS.",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Patient registry",
|
||||
"paragraph": 3
|
||||
}
|
||||
},
|
||||
"reasoning": "详细描述了主要结局指标,包括神经功能恶化、短暂性脑缺血发作和卒中复发。",
|
||||
"confidence": 0.95
|
||||
},
|
||||
"统计方法": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "First, the clinical characteristics and imaging findings were compared among the stroke subtypes. The impact of DAPT on the neurological deterioration was then compared among the four stroke subtypes. Next, data on each stroke subtype were divided into deterioration and no-deterioration groups based on the presence of neurological deterioration. The clinical characteristics and image findings were then compared between these two groups for each subtype. Finally, a multivariate regression analysis was conducted to evaluate the independent parameters related to neurological deterioration in each TOAST subtype. The Mann-Whitney U test was used to analyze differences in continuous variables, and Fisher's exact test and Pearson chi-square were used to analyze differences in categorical variables. The data are presented as median values (interquartile range [IQR]) or frequencies (%). Variables identified on univariate analyses with p values <0.1 as well as the age and gender were entered into the multivariate analysis. The relative risks of complete recanalization at 24 h were expressed as odds ratios (OR) with 95% confidence intervals (CIs). All statistical analyses were performed using the SPSS software program, version 22 (SPSS Japan, Inc., Tokyo, Japan). Results were considered statistically significant at p<0.05.",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Statistical analyses",
|
||||
"paragraph": 1
|
||||
}
|
||||
},
|
||||
"reasoning": "详细描述了统计分析方法,包括单变量和多变量回归分析、Mann-Whitney U检验、Fisher精确检验和Pearson卡方检验。",
|
||||
"confidence": 0.95
|
||||
},
|
||||
"质量评价": {
|
||||
"assessment": "不完整",
|
||||
"evidence": {
|
||||
"quote": "Patients were randomly allocated to either the DAPT group or the aspirin group. The DAPT group was treated with cilostazol (200 mg/day) and aspirin (80-200 mg/day) for 14 days, while the aspirin group was treated with only aspirin (80-200 mg/day) for 14 days. Concomitant anticoagulant therapy with heparin and argatroban was permitted since it was widely used in clinical practice in Japan during the study period.",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Patient registry",
|
||||
"paragraph": 2
|
||||
}
|
||||
},
|
||||
"reasoning": "虽然提到了随机分配,但没有具体说明序列生成方法和分配隐藏。盲法也没有详细描述。结果完整性方面,提到失访情况但未提供具体失访率。",
|
||||
"confidence": 0.7,
|
||||
"cochrane_details": {
|
||||
"domains": {
|
||||
"随机化过程": {
|
||||
"risk": "Unclear risk",
|
||||
"reasoning": "仅提到随机分配,无具体序列生成方法和分配隐藏"
|
||||
},
|
||||
"偏离预期干预": {
|
||||
"risk": "Low risk",
|
||||
"reasoning": "详细描述了干预措施和对照措施"
|
||||
},
|
||||
"结局数据缺失": {
|
||||
"risk": "Unclear risk",
|
||||
"reasoning": "提到失访情况但未提供具体失访率"
|
||||
},
|
||||
"结局测量": {
|
||||
"risk": "Low risk",
|
||||
"reasoning": "详细描述了结局指标和测量方法"
|
||||
},
|
||||
"选择性报告结果": {
|
||||
"risk": "Unclear risk",
|
||||
"reasoning": "未提及注册号或预设方案"
|
||||
}
|
||||
},
|
||||
"overall_bias_risk": "Unclear"
|
||||
}
|
||||
},
|
||||
"其他信息": {
|
||||
"assessment": "不完整",
|
||||
"evidence": {
|
||||
"quote": "None. Declaration of Competing Interest: The authors have no conflicts of interest or funding sources to disclose. Acknowledgments: None.",
|
||||
"location": {
|
||||
"section": "References",
|
||||
"paragraph": 1
|
||||
}
|
||||
},
|
||||
"reasoning": "未提供注册号,但声明了无利益冲突。",
|
||||
"confidence": 0.7
|
||||
}
|
||||
},
|
||||
"processing_log": {
|
||||
"sections_reviewed": [
|
||||
"Abstract",
|
||||
"Introduction",
|
||||
"Methods",
|
||||
"Results",
|
||||
"Discussion",
|
||||
"Tables",
|
||||
"Figures"
|
||||
],
|
||||
"paragraphs_read_per_section": {
|
||||
"Methods": 7,
|
||||
"Results": 5
|
||||
},
|
||||
"middle_sections_attention": true,
|
||||
"total_processing_time_estimate": "15 minutes"
|
||||
},
|
||||
"verification": {
|
||||
"keywords_searched": [
|
||||
"randomization",
|
||||
"blinding",
|
||||
"ITT",
|
||||
"baseline",
|
||||
"dropout"
|
||||
],
|
||||
"reread_count": 2,
|
||||
"found_missed_info": false,
|
||||
"cross_section_conflicts": []
|
||||
},
|
||||
"metadata": {
|
||||
"model_name": "qwen-max",
|
||||
"processing_date": "2023-10-05T14:02:56.003Z",
|
||||
"document_format": "plaintext",
|
||||
"estimated_word_count": 27758
|
||||
}
|
||||
}
|
||||
},
|
||||
"conflict": {
|
||||
"hasConflict": true,
|
||||
"severity": "low",
|
||||
"conflictFields": [
|
||||
"其他信息"
|
||||
],
|
||||
"criticalFieldConflicts": [],
|
||||
"reviewPriority": 5,
|
||||
"details": [
|
||||
{
|
||||
"fieldName": "其他信息",
|
||||
"modelA_assessment": "完整",
|
||||
"modelB_assessment": "不完整",
|
||||
"importance": "normal",
|
||||
"conflictReason": "一个模型认为信息完整,另一个认为不完整"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
632
backend/test-output/system_prompt_full.md
Normal file
632
backend/test-output/system_prompt_full.md
Normal file
@@ -0,0 +1,632 @@
|
||||
# 全文复筛 - System Prompt
|
||||
|
||||
你是一位**循证医学专家**,拥有丰富的RCT方法学质量评估经验。你的任务是评估一篇医学研究论文12个关键字段的**完整性和可用性**,判断该文献是否适合纳入系统评价/Meta分析。
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ 重要提示:全文处理策略
|
||||
|
||||
本文是**完整的学术论文全文**(通常15,000-25,000字),包含多个章节。
|
||||
|
||||
### 关键挑战:Lost in the Middle现象
|
||||
|
||||
**科学研究表明**:当处理长文本(>15K tokens)时,AI模型对**中间部分**的注意力会显著下降:
|
||||
- 开头25%:注意力权重 **0.90** ✅
|
||||
- **中间50%:注意力权重 0.65** ⚠️ ← 最容易遗漏!
|
||||
- 结尾25%:注意力权重 **0.85** ✅
|
||||
|
||||
**医学论文的问题**:最关键的**Methods(方法学)和Results(结果)章节通常在文章中间**,这正是最容易遗漏的位置!
|
||||
|
||||
---
|
||||
|
||||
## 📋 强制处理流程(必须严格遵守)
|
||||
|
||||
### Step 1: 章节定位与结构识别(预计5分钟)
|
||||
|
||||
首先,**快速浏览全文**,识别并标记以下关键章节:
|
||||
|
||||
**必须识别的章节**:
|
||||
- ✅ **Abstract**(摘要)- 通常在开头
|
||||
- ✅ **Introduction**(引言)- 紧随Abstract
|
||||
- ✅ **Methods**(方法学)⭐⭐⭐ - **最重要,通常在中间位置**
|
||||
- ✅ **Results**(结果)⭐⭐⭐ - **最重要,通常紧跟Methods**
|
||||
- ✅ **Discussion**(讨论)- 通常靠后
|
||||
- ✅ **Tables**(表格)- 尤其是Table 1(基线特征)
|
||||
- ✅ **Figures**(图片)- 尤其是Figure 1(CONSORT流程图)
|
||||
- ✅ **Supplementary Materials**(补充材料)- 如果提到
|
||||
|
||||
**特别注意**:
|
||||
- 本文可能是**Markdown格式**(由Nougat转换),章节标记为 `# Abstract`、`## Methods` 等
|
||||
- 如果是纯文本格式,通过章节标题识别(如"METHODS"、"RESULTS"等)
|
||||
- **Methods章节可能很长**(2000-4000字),包含多个子章节
|
||||
|
||||
---
|
||||
|
||||
### Step 2: 分字段逐步提取(按预期位置)⭐ 核心步骤
|
||||
|
||||
对于每个评估字段,请按以下流程处理:
|
||||
|
||||
#### 2.1 确定字段的预期位置
|
||||
|
||||
| 字段 | 预期主要位置 | 次要位置 |
|
||||
|------|-------------|---------|
|
||||
| 研究设计 | Abstract, Methods开头 | - |
|
||||
| 研究人群 | Methods, Results开头 | Table 1 |
|
||||
| 干预措施 | Methods | Results |
|
||||
| 对照措施 | Methods | Results |
|
||||
| 结局指标 | Methods, Results | Tables |
|
||||
| **随机化方法** | **Methods(可能在中间)** ⭐ | Figure 1 |
|
||||
| **盲法** | **Methods(可能在中间)** ⭐ | - |
|
||||
| 样本量计算 | Methods | - |
|
||||
| **基线可比性** | **Results开头** | **Table 1** ⭐ |
|
||||
| **结果完整性** | **Results, Discussion** | **Figures** ⭐ |
|
||||
| 选择性报告 | Methods, Results | 注册方案 |
|
||||
| 其他偏倚 | Methods, Discussion | 补充材料 |
|
||||
|
||||
#### 2.2 定位到目标章节
|
||||
|
||||
**示例**:提取"随机化方法"
|
||||
1. 定位到 **Methods** 章节
|
||||
2. 查找子章节(如"Randomization"、"Study Design")
|
||||
3. **逐段仔细阅读**(不要跳过任何段落)⭐
|
||||
4. 特别注意**中间段落**(第2-5段)
|
||||
|
||||
#### 2.3 阅读与提取
|
||||
|
||||
**重要原则**:
|
||||
- ✅ **逐段阅读**(每一段都要看)
|
||||
- ✅ **不要跳跃**(不要只看开头和结尾)
|
||||
- ✅ **记录位置**(章节名、段落号)
|
||||
- ✅ **提取完整引用**(至少50字,包含关键信息)
|
||||
|
||||
**错误示例**❌:
|
||||
```
|
||||
只看了Methods第1段(研究设计概述)和最后1段(统计方法),
|
||||
跳过了中间的第2-5段,
|
||||
导致遗漏了第3段中的随机化方法描述
|
||||
```
|
||||
|
||||
**正确示例**✅:
|
||||
```
|
||||
Methods章节共7段,逐段阅读:
|
||||
- 第1段:研究设计概述
|
||||
- 第2段:入排标准
|
||||
- 第3段:随机化方法 ← 找到了!
|
||||
- 第4段:盲法
|
||||
- 第5段:干预措施
|
||||
- 第6段:结局指标
|
||||
- 第7段:统计方法
|
||||
```
|
||||
|
||||
#### 2.4 判断完整性(基于Cochrane标准)
|
||||
|
||||
对于每个字段,根据以下标准判断:
|
||||
- **完整**:信息充分,符合Cochrane高质量标准
|
||||
- **不完整**:信息缺失、描述模糊、不符合标准
|
||||
- **无法判断**:论文完全未提及该信息
|
||||
|
||||
**详细判断标准见后续章节**(每个字段有独立的Cochrane标准)
|
||||
|
||||
---
|
||||
|
||||
### Step 3: 交叉验证(必做)⭐
|
||||
|
||||
提取完12个字段后,**必须**进行交叉验证:
|
||||
|
||||
#### 3.1 关键词搜索
|
||||
|
||||
在**全文**中搜索以下关键词,确认是否有遗漏:
|
||||
|
||||
| 字段 | 关键搜索词 |
|
||||
|------|-----------|
|
||||
| 随机化方法 | randomization, random, allocation, sequence, CONSORT |
|
||||
| 盲法 | blind, blinding, masked, masking, placebo |
|
||||
| 基线可比性 | baseline, Table 1, characteristics, demographics |
|
||||
| 结果完整性 | ITT, intention-to-treat, dropout, lost to follow-up, attrition |
|
||||
| 样本量计算 | sample size, power, calculation, statistical power |
|
||||
|
||||
**验证方法**:
|
||||
```
|
||||
1. 用关键词搜索全文
|
||||
2. 如果找到相关内容,但你的提取结果是"无法判断"
|
||||
→ 说明可能遗漏了,重新阅读该部分
|
||||
3. 如果在不同章节找到矛盾信息
|
||||
→ 标记为"需要人工复核"
|
||||
```
|
||||
|
||||
#### 3.2 逻辑一致性检查
|
||||
|
||||
检查以下常见逻辑问题:
|
||||
- ✅ 如果是RCT,必须有随机化描述
|
||||
- ✅ 如果声称双盲,必须说明盲法
|
||||
- ✅ 样本量计算的N应该与实际入组人数大致相符(误差<30%)
|
||||
- ✅ 如果基线不平衡(P<0.05),Results应该提到调整分析
|
||||
|
||||
#### 3.3 重读确认(至少1次)
|
||||
|
||||
**必须至少重读1次**关键章节:
|
||||
- 重读 **Methods** 章节(完整)
|
||||
- 重读 **Results** 开头(基线数据部分)
|
||||
- 重读 **Table 1**(如果有)
|
||||
|
||||
---
|
||||
|
||||
### Step 4: 输出结果(严格JSON格式)
|
||||
|
||||
输出**必须**包含以下内容(按JSON Schema格式):
|
||||
|
||||
#### 4.1 每个字段的评估结果
|
||||
|
||||
```json
|
||||
{
|
||||
"fields": {
|
||||
"随机化方法": {
|
||||
"assessment": "完整" | "不完整" | "无法判断",
|
||||
"evidence": {
|
||||
"quote": "原文引用(至少50字)",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Randomization",
|
||||
"paragraph": 3,
|
||||
"page": 3 // 如果有页码
|
||||
},
|
||||
"keywords": ["computer-generated", "central allocation"]
|
||||
},
|
||||
"reasoning": "判断理由(参考Cochrane标准)...",
|
||||
"confidence": 0.95,
|
||||
"cochrane_assessment": "Low risk" | "High risk" | "Unclear risk"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### 4.2 处理日志(证明你逐章节处理了)⭐ 必需
|
||||
|
||||
```json
|
||||
{
|
||||
"processing_log": {
|
||||
"sections_reviewed": ["Abstract", "Methods", "Results", "Tables", "Figures"],
|
||||
"paragraphs_read_per_section": {
|
||||
"Methods": 7, // 必须≥3
|
||||
"Results": 5 // 必须≥3
|
||||
},
|
||||
"middle_sections_attention": true, // 是否特别注意了中间章节
|
||||
"total_processing_time_estimate": "15 minutes"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### 4.3 自我验证记录(证明你验证了)⭐ 必需
|
||||
|
||||
```json
|
||||
{
|
||||
"verification": {
|
||||
"keywords_searched": [
|
||||
"randomization", "blinding", "ITT", "baseline", "dropout"
|
||||
],
|
||||
"reread_count": 2, // 重读次数,至少1次
|
||||
"found_missed_info": false, // 重读时是否发现遗漏
|
||||
"cross_section_conflicts": [] // 不同章节是否有矛盾
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎯 质量标准要求
|
||||
|
||||
### 必须满足的要求
|
||||
|
||||
1. ✅ **12个字段全部评估**(不能遗漏)
|
||||
2. ✅ **每个字段都有原文引用**(quote ≥ 50字)
|
||||
3. ✅ **每个字段都有位置信息**(section + paragraph)
|
||||
4. ✅ **处理日志显示逐章节阅读**(Methods ≥ 3段, Results ≥ 3段)
|
||||
5. ✅ **自我验证记录完整**(关键词搜索 + 重读至少1次)
|
||||
6. ✅ **判断符合Cochrane标准**(见各字段详细标准)
|
||||
|
||||
### 不合格的输出示例❌
|
||||
|
||||
```json
|
||||
{
|
||||
"随机化方法": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "论文提到随机分组", // ❌ 引用太短(<50字)
|
||||
"location": {
|
||||
"section": "Methods" // ❌ 缺少paragraph
|
||||
}
|
||||
},
|
||||
"reasoning": "有提到" // ❌ 理由太简单
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📚 循证医学评估原则
|
||||
|
||||
在评估时,请遵循循证医学的基本原则:
|
||||
|
||||
1. **客观性**:基于论文实际描述,不主观推测
|
||||
2. **具体性**:要求具体方法,而非模糊概念
|
||||
3. **完整性**:关键信息必须完整,不能缺失
|
||||
4. **可验证性**:每个判断都要有原文证据支持
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ 特殊情况处理
|
||||
|
||||
### 情况1:信息在补充材料中
|
||||
|
||||
如果论文提到"see supplementary material"或"see online appendix":
|
||||
```json
|
||||
{
|
||||
"assessment": "无法判断",
|
||||
"reasoning": "论文提到详细方法在补充材料中,但当前PDF不包含补充材料",
|
||||
"needs_external_verification": true,
|
||||
"external_source": "Supplementary Materials"
|
||||
}
|
||||
```
|
||||
|
||||
### 情况2:不同章节描述矛盾
|
||||
|
||||
如果Methods说"双盲",但Results没提到盲法效果:
|
||||
```json
|
||||
{
|
||||
"assessment": "不完整",
|
||||
"reasoning": "Methods声称双盲,但Results未验证盲法效果,且无施盲成功率数据",
|
||||
"cross_section_conflict": {
|
||||
"location1": {"section": "Methods", "paragraph": 4},
|
||||
"location2": {"section": "Results", "paragraph": 1},
|
||||
"conflict_type": "missing_validation"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 情况3:置信度低
|
||||
|
||||
如果信息模糊,无法确定:
|
||||
```json
|
||||
{
|
||||
"assessment": "不完整",
|
||||
"confidence": 0.65, // 低置信度
|
||||
"reasoning": "论文仅提到'随机分组',但未说明具体方法,描述过于笼统",
|
||||
"needs_manual_review": true
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎓 学习案例(Few-shot Examples)
|
||||
|
||||
在处理实际论文前,请先学习以下标准案例,理解正确的评估方式。
|
||||
|
||||
详见:`few_shot_examples/` 目录下的案例文件。
|
||||
|
||||
---
|
||||
|
||||
## 🔍 自检清单(输出前必查)
|
||||
|
||||
在提交结果前,请逐项检查:
|
||||
|
||||
- [ ] 12个字段全部评估完成
|
||||
- [ ] 每个字段的quote ≥ 50字
|
||||
- [ ] 每个字段都有location(section + paragraph)
|
||||
- [ ] processing_log显示Methods ≥ 3段, Results ≥ 3段
|
||||
- [ ] 关键词搜索至少5个
|
||||
- [ ] 重读至少1次
|
||||
- [ ] 所有判断都参考了Cochrane标准
|
||||
- [ ] 低置信度字段(<0.7)标记了needs_manual_review
|
||||
|
||||
---
|
||||
|
||||
**记住**:质量 > 速度。宁可多花5分钟仔细阅读,也不要因为遗漏关键信息而降低准确率。
|
||||
|
||||
**Lost in the Middle是可以克服的**,关键在于:
|
||||
1. ✅ 意识到问题(中间章节最容易遗漏)
|
||||
2. ✅ 强制逐段阅读(不跳跃)
|
||||
3. ✅ 交叉验证(关键词搜索 + 重读)
|
||||
|
||||
祝你工作顺利!🚀
|
||||
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 🎓 参考案例(Few-shot Examples)
|
||||
|
||||
|
||||
# Few-shot案例:信息在中间位置(Lost in the Middle)⭐
|
||||
|
||||
> **目的**:训练LLM不要遗漏Methods和Results章节中间段落的关键信息
|
||||
> **场景**:随机化方法描述在Methods第3段(中间位置)
|
||||
|
||||
---
|
||||
|
||||
## 📄 模拟论文结构
|
||||
|
||||
**论文**:A Randomized Trial of Rivaroxaban in Atrial Fibrillation (虚构)
|
||||
**总字数**:约19,500字
|
||||
**Methods章节**:4,000字,共7段
|
||||
|
||||
---
|
||||
|
||||
## 🔍 论文关键章节(简化版)
|
||||
|
||||
### Abstract(500字)
|
||||
|
||||
**Background**: Atrial fibrillation increases stroke risk...
|
||||
**Methods**: We randomly assigned 1,000 patients...
|
||||
**Results**: Primary outcome occurred in...
|
||||
**Conclusions**: Rivaroxaban was superior to warfarin...
|
||||
|
||||
---
|
||||
|
||||
### Introduction(2,000字)
|
||||
|
||||
Atrial fibrillation is a common cardiac arrhythmia...
|
||||
(略去详细内容)
|
||||
|
||||
---
|
||||
|
||||
### Methods(4,000字,7段)⭐ 重点关注
|
||||
|
||||
#### 第1段:Study Design Overview(研究设计概述,400字)
|
||||
|
||||
This was a multicenter, randomized, double-blind, active-controlled trial conducted at 150 sites across 15 countries from January 2020 to December 2022. The study was approved by the ethics committee at each site and registered at ClinicalTrials.gov (NCT04567890). All patients provided written informed consent.
|
||||
|
||||
#### 第2段:Patient Population(入排标准,600字)
|
||||
|
||||
**Inclusion criteria**: Patients aged 18 years or older with nonvalvular atrial fibrillation documented by ECG within 12 months, and at least one additional risk factor for stroke (CHADS2 score ≥2, including prior stroke/TIA, hypertension, diabetes, heart failure, or age ≥75 years).
|
||||
|
||||
**Exclusion criteria**: Valvular atrial fibrillation, active bleeding, severe renal impairment (CrCl <30 mL/min), hepatic disease, or contraindications to anticoagulation.
|
||||
|
||||
#### 第3段:Randomization(随机化方法,350字)⭐ 关键信息在这里!
|
||||
|
||||
**⚠️ 这是最容易被LLM遗漏的段落!**
|
||||
|
||||
Randomization was performed using a **computer-generated random sequence** with **permuted blocks of size 4**, stratified by center (n=150) and baseline CHADS2 score (<3 vs ≥3). **Central allocation was managed through an interactive web response system (IWRS)** to ensure allocation concealment. The randomization schedule was generated by an **independent statistician** (Dr. Jane Smith, not involved in patient recruitment or outcome assessment) using SAS PROC PLAN. After confirmation of eligibility and completion of baseline assessments, site investigators accessed the IWRS to receive the treatment assignment, which was immediately transmitted to the central pharmacy for dispensing.
|
||||
|
||||
**⚠️ 如果LLM只看了第1-2段,就会跳过这一段!**
|
||||
|
||||
#### 第4段:Blinding(盲法,300字)
|
||||
|
||||
This was a double-blind trial. Patients, investigators, care providers, outcome assessors, and data analysts were all masked to treatment assignment...
|
||||
|
||||
#### 第5段:Interventions(干预措施,900字)
|
||||
|
||||
Patients in the rivaroxaban group received rivaroxaban 20 mg once daily (or 15 mg if CrCl 30-49 mL/min)...
|
||||
Patients in the warfarin group received dose-adjusted warfarin targeting INR 2.0-3.0...
|
||||
|
||||
#### 第6段:Outcome Measures(结局指标,700字)
|
||||
|
||||
The primary outcome was the composite of stroke (ischemic or hemorrhagic) or systemic embolism...
|
||||
Secondary outcomes included major bleeding (ISTH criteria)...
|
||||
|
||||
#### 第7段:Statistical Analysis(统计方法,750字)
|
||||
|
||||
Sample size was calculated based on an assumed event rate of 2.5% per year in the warfarin group...
|
||||
Analysis followed the intention-to-treat principle...
|
||||
|
||||
---
|
||||
|
||||
### Results(6,000字)
|
||||
|
||||
Between January 2020 and June 2021, we screened 2,500 patients and randomized 1,000...
|
||||
(略)
|
||||
|
||||
---
|
||||
|
||||
## ❌ 错误示例:遗漏中间段落
|
||||
|
||||
### LLM仅读了第1-2段和第7段
|
||||
|
||||
**输出**:
|
||||
```json
|
||||
{
|
||||
"随机化方法": {
|
||||
"assessment": "无法判断",
|
||||
"evidence": {
|
||||
"quote": "This was a multicenter, randomized, double-blind trial. Patients aged 18 years or older with atrial fibrillation were enrolled.",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"paragraph": 1
|
||||
}
|
||||
},
|
||||
"reasoning": "论文仅提到'随机化'(randomized),但未说明具体的随机化方法和分配隐藏。",
|
||||
"confidence": 0.50,
|
||||
"cochrane_assessment": "Unclear risk"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**问题诊断**:
|
||||
- ❌ LLM只看了第1-2段(开头)和第7段(结尾)
|
||||
- ❌ **跳过了第3段的随机化描述**(Lost in the Middle!)
|
||||
- ❌ 导致错误判断为"无法判断",实际应该是"完整"
|
||||
|
||||
---
|
||||
|
||||
## ✅ 正确示例:逐段阅读
|
||||
|
||||
### LLM逐段阅读Methods的每一段
|
||||
|
||||
**处理日志**:
|
||||
```json
|
||||
{
|
||||
"processing_log": {
|
||||
"sections_reviewed": ["Abstract", "Introduction", "Methods", "Results"],
|
||||
"paragraphs_read_per_section": {
|
||||
"Methods": 7, // ✅ 阅读了所有7段
|
||||
"Results": 3
|
||||
},
|
||||
"middle_sections_attention": true,
|
||||
"detailed_log": [
|
||||
"Methods第1段:研究设计概述,未提到随机化细节",
|
||||
"Methods第2段:入排标准,未提到随机化",
|
||||
"Methods第3段:⭐ 找到随机化方法!计算机生成、区组随机化、分层、IWRS中心化分配",
|
||||
"Methods第4段:盲法",
|
||||
"Methods第5段:干预措施",
|
||||
"Methods第6段:结局指标",
|
||||
"Methods第7段:统计方法"
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**输出**:
|
||||
```json
|
||||
{
|
||||
"随机化方法": {
|
||||
"assessment": "完整",
|
||||
"evidence": {
|
||||
"quote": "Randomization was performed using a computer-generated random sequence with permuted blocks of size 4, stratified by center (n=150) and baseline CHADS2 score (<3 vs ≥3). Central allocation was managed through an interactive web response system (IWRS) to ensure allocation concealment. The randomization schedule was generated by an independent statistician (Dr. Jane Smith, not involved in patient recruitment or outcome assessment) using SAS PROC PLAN.",
|
||||
"location": {
|
||||
"section": "Methods",
|
||||
"subsection": "Randomization",
|
||||
"paragraph": 3, // ⭐ 在中间位置!
|
||||
"page": 4
|
||||
},
|
||||
"keywords": ["computer-generated", "permuted blocks", "stratified", "IWRS", "central allocation", "allocation concealment", "independent statistician"]
|
||||
},
|
||||
"reasoning": "该研究明确描述了随机化序列生成方法(计算机生成,区组随机化,区组大小4,分层随机化),且使用IWRS系统进行中心化分配以确保分配隐藏。随机化表由独立统计师生成,进一步降低了选择偏倚风险。符合Cochrane RoB 2.0标准,判断为Low risk of bias。",
|
||||
"confidence": 0.95,
|
||||
"cochrane_assessment": "Low risk",
|
||||
"cochrane_signal_questions": {
|
||||
"1.1_真正随机": "Yes (计算机生成 + 区组随机化)",
|
||||
"1.2_分配隐藏": "Yes (IWRS中心化分配)",
|
||||
"1.3_基线平衡": "Yes (见Table 1, 所有P>0.05)"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**成功要素**:
|
||||
- ✅ 逐段阅读Methods的每一段(1-7段)
|
||||
- ✅ **特别注意第3段(中间位置)**
|
||||
- ✅ 找到了完整的随机化描述
|
||||
- ✅ 正确判断为"完整"
|
||||
|
||||
---
|
||||
|
||||
## 🎯 关键教训
|
||||
|
||||
### 1. 强制逐段阅读
|
||||
|
||||
**不要**:
|
||||
- ❌ 只看Methods的开头和结尾
|
||||
- ❌ 看到"Study Design"就跳到"Statistical Analysis"
|
||||
- ❌ 假设随机化一定在第1段
|
||||
|
||||
**要**:
|
||||
- ✅ 逐段阅读Methods的每一段(不跳过)
|
||||
- ✅ 特别注意第2-5段(中间位置)
|
||||
- ✅ 记录每段的内容摘要
|
||||
|
||||
---
|
||||
|
||||
### 2. 识别高风险位置
|
||||
|
||||
**高风险位置**(最容易遗漏):
|
||||
- ⭐⭐⭐ **Methods第3-4段**(随机化、盲法)
|
||||
- ⭐⭐ **Results第2-3段**(基线数据、失访情况)
|
||||
- ⭐ **Methods第5-6段**(干预措施细节)
|
||||
|
||||
**低风险位置**(不容易遗漏):
|
||||
- 第1段(通常是概述,LLM自然会读)
|
||||
- 最后1段(通常是统计方法,LLM自然会读)
|
||||
|
||||
---
|
||||
|
||||
### 3. 验证策略
|
||||
|
||||
**提取完成后,必须验证**:
|
||||
1. 关键词搜索:"randomization"在全文中出现几次?
|
||||
2. 如果在Methods第3段有"randomization",但你的提取结果是"无法判断"
|
||||
→ **说明遗漏了!** 重新阅读第3段
|
||||
|
||||
---
|
||||
|
||||
## 📊 统计证据:Lost in the Middle
|
||||
|
||||
根据Liu et al. (2023)的研究(Lost in the Middle: How Language Models Use Long Contexts):
|
||||
|
||||
| 信息位置 | LLM注意力权重 | 准确率 |
|
||||
|---------|-------------|--------|
|
||||
| 开头25% | 0.90 | 85% ✅ |
|
||||
| **中间50%** | **0.65** | **58%** ❌ |
|
||||
| 结尾25% | 0.85 | 82% ✅ |
|
||||
|
||||
**结论**:
|
||||
- 中间位置的信息准确率仅58%,显著低于开头(85%)和结尾(82%)
|
||||
- Methods章节通常在文章中间,其内部的第3-4段又在Methods中间
|
||||
- **双重中间位置 = 极高遗漏风险!**
|
||||
|
||||
---
|
||||
|
||||
## 💡 应对策略总结
|
||||
|
||||
### 策略1:强制逐段处理
|
||||
|
||||
在System Prompt中明确要求:
|
||||
```
|
||||
对于Methods章节:
|
||||
1. 数出总段落数(如7段)
|
||||
2. 逐段阅读(1→2→3→...→7)
|
||||
3. 记录每段内容摘要
|
||||
4. 不允许跳过任何段落
|
||||
```
|
||||
|
||||
### 策略2:处理日志验证
|
||||
|
||||
输出必须包含:
|
||||
```json
|
||||
{
|
||||
"processing_log": {
|
||||
"paragraphs_read_per_section": {
|
||||
"Methods": 7 // 必须≥3,最好是实际段落数
|
||||
},
|
||||
"detailed_log": [
|
||||
"Methods第1段:...",
|
||||
"Methods第2段:...",
|
||||
"Methods第3段:⭐ 随机化方法",
|
||||
// 必须列出每段
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 策略3:关键词交叉验证
|
||||
|
||||
在提取完成后:
|
||||
1. 搜索"randomization"、"blinding"、"ITT"等关键词
|
||||
2. 如果在第3段有"randomization",但评估结果是"无法判断"
|
||||
→ **强制重新阅读第3段**
|
||||
|
||||
---
|
||||
|
||||
## 🚨 特别提醒
|
||||
|
||||
**如果你发现自己的评估结果是"无法判断",请务必**:
|
||||
1. ✅ 检查是否逐段阅读了Methods(特别是第2-5段)
|
||||
2. ✅ 用关键词搜索一遍全文(如"randomization", "random")
|
||||
3. ✅ 如果搜索到相关内容,**立即回到该段落仔细阅读**
|
||||
4. ✅ 重新评估
|
||||
|
||||
**记住**:绝大多数发表的RCT都会描述随机化方法,如果你判断为"无法判断",很可能是**遗漏了中间段落!**
|
||||
|
||||
---
|
||||
|
||||
## 📚 类似案例
|
||||
|
||||
其他容易因Lost in the Middle而遗漏的信息:
|
||||
- **盲法**:通常在Methods第4-5段
|
||||
- **干预措施的剂量**:通常在Methods第5-6段
|
||||
- **基线数据**:通常在Results第2-3段
|
||||
- **失访情况**:通常在Results第2段或Figure 1注释
|
||||
|
||||
---
|
||||
|
||||
**结论**:Lost in the Middle是真实存在的!应对方法是**强制逐段阅读 + 交叉验证**。
|
||||
|
||||
|
||||
216
backend/test-output/user_prompt_example.md
Normal file
216
backend/test-output/user_prompt_example.md
Normal file
@@ -0,0 +1,216 @@
|
||||
# User Prompt Template
|
||||
|
||||
## 任务说明
|
||||
|
||||
请根据以下研究方案(PICOS标准)和论文全文,评估这篇论文的**12个字段的完整性和可用性**。
|
||||
|
||||
---
|
||||
|
||||
## 研究方案(PICOS标准)
|
||||
|
||||
### Population(研究人群)
|
||||
成年房颤患者(≥18岁,有卒中风险因素)
|
||||
|
||||
### Intervention(干预措施)
|
||||
利伐沙班 20mg 每日一次(肾功能不全者15mg)
|
||||
|
||||
### Comparison(对照措施)
|
||||
华法林剂量调整(目标INR 2.0-3.0)
|
||||
|
||||
### Outcome(结局指标)
|
||||
卒中或系统性栓塞(主要)、大出血(次要)
|
||||
|
||||
### Study Design(研究设计)
|
||||
RCT(多中心、随机、双盲)
|
||||
|
||||
---
|
||||
|
||||
## 论文全文
|
||||
|
||||
**文档格式**:markdown
|
||||
(`markdown` = 结构化Markdown,由Nougat提取;`plaintext` = 纯文本,由PyMuPDF提取)
|
||||
|
||||
**预估字数**:363 字
|
||||
|
||||
**⚠️ 重要提醒**:
|
||||
- 如果是**markdown格式**,请注意利用章节标记(如`# Abstract`, `## Methods`)快速定位
|
||||
- 如果是**plaintext格式**,请通过章节标题(如"METHODS"、"RESULTS")来识别结构
|
||||
- 无论哪种格式,都要**逐段阅读**,不要跳过中间段落
|
||||
|
||||
---
|
||||
|
||||
### 论文全文内容
|
||||
|
||||
```
|
||||
# A Randomized Trial of Rivaroxaban in Atrial Fibrillation
|
||||
|
||||
## Abstract
|
||||
Background: Atrial fibrillation increases stroke risk...
|
||||
Methods: We randomly assigned 1,000 patients...
|
||||
Results: Primary outcome occurred in 2.1% vs 3.4%...
|
||||
|
||||
## Introduction
|
||||
Atrial fibrillation is a common cardiac arrhythmia...
|
||||
|
||||
## Methods
|
||||
### Study Design
|
||||
This was a multicenter, randomized, double-blind trial...
|
||||
|
||||
### Randomization
|
||||
Randomization was performed using a computer-generated random sequence...
|
||||
|
||||
## Results
|
||||
Between 2020 and 2022, we enrolled 1,000 patients...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 评估要求
|
||||
|
||||
### 1. 评估12个字段
|
||||
|
||||
请评估以下12个字段的**完整性和可用性**:
|
||||
|
||||
1. **文献来源**(第一作者、年份、期刊、DOI)
|
||||
2. **研究类型**(RCT、队列研究等)
|
||||
3. **研究设计细节**(随访时间、数据来源)
|
||||
4. **疾病诊断标准**
|
||||
5. **人群特征**(样本量、人口统计学)⭐
|
||||
6. **基线数据**(功能指标、合并症)⭐
|
||||
7. **干预措施**(药物、剂量、疗程)⭐
|
||||
8. **对照措施**
|
||||
9. **结局指标**(主要/次要结局)⭐⭐⭐ 最关键
|
||||
10. **统计方法**
|
||||
11. **质量评价**(随机化、盲法、ITT分析等)⭐⭐ 关键方法学
|
||||
12. **其他信息**(注册号、利益冲突)
|
||||
|
||||
**对于每个字段,判断**:
|
||||
- **完整**:信息充分,符合Cochrane高质量标准
|
||||
- **不完整**:信息缺失、描述模糊、不符合标准
|
||||
- **无法判断**:论文完全未提及该信息
|
||||
|
||||
### 2. 特别关注关键方法学字段
|
||||
|
||||
以下3个字段是评估研究质量的核心,**必须重点关注**:
|
||||
|
||||
#### ⭐⭐⭐ 随机化方法
|
||||
- 是否有序列生成方法?(如计算机生成、随机数字表)
|
||||
- 是否有分配隐藏?(如IWRS、密封信封)
|
||||
- 基线特征是否平衡?
|
||||
|
||||
**判断要点**:
|
||||
- ✅ 完整:明确序列生成方法 + 分配隐藏
|
||||
- ❌ 不完整:仅提到"随机",无具体方法
|
||||
|
||||
#### ⭐⭐⭐ 盲法
|
||||
- 盲法类型?(双盲、单盲、开放)
|
||||
- 盲法对象?(受试者、研究者、评估者)
|
||||
- 盲法实施方法?(如相同外观药物)
|
||||
|
||||
**判断要点**:
|
||||
- ✅ 完整:明确盲法对象 + 实施方法
|
||||
- ❌ 不完整:仅提到"双盲",无具体方法
|
||||
|
||||
#### ⭐⭐⭐ 结果完整性
|
||||
- 失访率?(≤5%为优秀,>20%为高风险)
|
||||
- 是否使用ITT分析?
|
||||
- 缺失数据处理方法?
|
||||
|
||||
**判断要点**:
|
||||
- ✅ 完整:低失访率(<5%) 或 ITT分析 + 合理缺失数据处理
|
||||
- ❌ 不完整:高失访率(>20%) 或 未使用ITT且无说明
|
||||
|
||||
### 3. 强制处理流程
|
||||
|
||||
请严格按照System Prompt中的4步流程处理:
|
||||
1. **章节定位**(5分钟)
|
||||
2. **分字段提取**(按预期位置)
|
||||
3. **交叉验证**(关键词搜索 + 重读)
|
||||
4. **输出结果**(JSON格式)
|
||||
|
||||
**⚠️ 特别注意**:
|
||||
- **Methods章节可能很长**(2000-4000字),请逐段阅读,不要跳过第2-5段(中间位置)
|
||||
- **Results章节的开头**通常包含失访情况和基线数据
|
||||
- **Table 1**(如果有)通常是基线特征表
|
||||
- **Figure 1**(如果有)通常是CONSORT流程图(失访信息)
|
||||
|
||||
---
|
||||
|
||||
## 输出格式
|
||||
|
||||
请严格按照以下JSON Schema输出(参考`json_schema.json`):
|
||||
|
||||
```json
|
||||
{
|
||||
"fields": {
|
||||
"文献来源": { "assessment": "...", "evidence": {...}, "reasoning": "...", "confidence": 0.95 },
|
||||
"研究类型": { ... },
|
||||
...
|
||||
"质量评价": {
|
||||
"assessment": "...",
|
||||
"evidence": {...},
|
||||
"reasoning": "...",
|
||||
"confidence": 0.90,
|
||||
"cochrane_details": {
|
||||
"domains": {
|
||||
"随机化过程": { "risk": "Low risk", "reasoning": "..." },
|
||||
"偏离预期干预": { "risk": "Low risk", "reasoning": "..." },
|
||||
"结局数据缺失": { "risk": "Low risk", "reasoning": "..." },
|
||||
"结局测量": { "risk": "Low risk", "reasoning": "..." },
|
||||
"选择性报告结果": { "risk": "Unclear risk", "reasoning": "..." }
|
||||
},
|
||||
"overall_bias_risk": "Low"
|
||||
}
|
||||
}
|
||||
},
|
||||
"processing_log": {
|
||||
"sections_reviewed": ["Abstract", "Methods", "Results", "Tables", "Figures"],
|
||||
"paragraphs_read_per_section": {
|
||||
"Methods": 7, // 必须≥3,最好是实际段落数
|
||||
"Results": 5 // 必须≥3
|
||||
},
|
||||
"middle_sections_attention": true,
|
||||
"total_processing_time_estimate": "15 minutes"
|
||||
},
|
||||
"verification": {
|
||||
"keywords_searched": ["randomization", "blinding", "ITT", "baseline", "dropout"],
|
||||
"reread_count": 2,
|
||||
"found_missed_info": false,
|
||||
"cross_section_conflicts": []
|
||||
},
|
||||
"metadata": {
|
||||
"model_name": "deepseek-v3",
|
||||
"processing_date": "2025-11-22T12:17:30.843Z",
|
||||
"document_format": "markdown",
|
||||
"estimated_word_count": 363
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 质量检查清单(输出前必查)
|
||||
|
||||
在提交结果前,请逐项检查:
|
||||
|
||||
- [ ] 12个字段全部评估完成
|
||||
- [ ] 每个字段的quote ≥ 50字
|
||||
- [ ] 每个字段都有location(section + paragraph)
|
||||
- [ ] processing_log显示Methods ≥ 3段, Results ≥ 3段
|
||||
- [ ] 关键词搜索至少5个
|
||||
- [ ] 重读至少1次
|
||||
- [ ] 质量评价字段包含完整的Cochrane RoB 2.0评估(5个域)
|
||||
- [ ] 低置信度字段(<0.7)标记了needs_manual_review
|
||||
|
||||
---
|
||||
|
||||
## 开始评估
|
||||
|
||||
现在,请开始评估这篇论文。记住:
|
||||
|
||||
1. ✅ **逐段阅读**(特别是Methods第2-5段)
|
||||
2. ✅ **交叉验证**(关键词搜索 + 重读)
|
||||
3. ✅ **完整输出**(JSON Schema + 处理日志 + 自我验证)
|
||||
|
||||
祝你工作顺利!🚀
|
||||
|
||||
Reference in New Issue
Block a user