Distilling the Essence: Efficient Reasoning Distillation via Sequence Truncation 对推理蒸馏在数据层面进行深入分析 2025-12-29 学习笔记 #LLM
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning LLM微调的时候引入学生信息辅助数据增强 2025-12-15 学习笔记 #LLM
RETAINING BY DOING: THE ROLE OF ON-POLICY DATA IN MITIGATING FORGETTING LLM场景下,mode-seeking相比mean-seeking也许可以更好保留原本的知识,并且这一行为可能来自on-policy strategy 2025-12-14 学习笔记 #LLM #Compression