Sophilex's Blog
  • Home
  • Archive
  • Category
  • Tags
  • About me
  • Friends
FROM CORRECTION TO MASTERY: REINFORCED DISTILLATION OF LARGE LANGUAGE MODEL AGENTS

FROM CORRECTION TO MASTERY: REINFORCED DISTILLATION OF LARGE LANGUAGE MODEL AGENTS

学生生成SGO时,教师在必要时给予干预,压缩理论误差上界
2025-09-28
学习笔记
#LLM #KD
Merge-of-Thought Distillation

Merge-of-Thought Distillation

KD时,对不同教师来源的信息做了一个巧妙的融合处理
2025-09-28
学习笔记
#LLM #KD
Delta Knowledge Distillation for Large Language Models

Delta Knowledge Distillation for Large Language Models

将KD的对齐目标变为模型更新的变化量,而不是固定的token概率分布
2025-09-28
学习笔记
#LLM #KD
Massive Activations in Large Language Models

Massive Activations in Large Language Models

关于LLM中的异常大激活值,以及其与attention sink的关系,将其解释为模型中统一且固定的bias,挺有意思的
2025-09-21
学习笔记
#LLM
TD3: Tucker Decomposition Based Dataset Distillation Method for Sequential Recommendation

TD3: Tucker Decomposition Based Dataset Distillation Method for Sequential Recommendation

序列推荐数据集的蒸馏,引入Tucker分解来缓解随数据集规模而增长的参数压力
2025-09-21
学习笔记
#Dataset_Condensation
Dataset Condensation for Recommendation

Dataset Condensation for Recommendation

推荐数据集蒸馏
2025-09-14
学习笔记
#KD #Dataset_Condensation
BOND: Aligning LLMs with Best-of-N distillation

BOND: Aligning LLMs with Best-of-N distillation

对Best-of-N的生成结果显式表示成一种策略,并蒸馏给模型,将N次推理成功压缩到一次
2025-08-18
学习笔记
#LLM #KD #RLHF
Evaluating Position Bias in Large Language Model Recommendations

Evaluating Position Bias in Large Language Model Recommendations

推荐任务中,item的输入顺序可能会影响模型推荐结果
2025-08-11
学习笔记
#LLM
DATASET DISTILLATION VIA KNOWLEDGE DISTILLATION: TOWARDS EFFICIENT SELF-SUPERVISED PRETRAINING OF DEEP NETWORKS

DATASET DISTILLATION VIA KNOWLEDGE DISTILLATION: TOWARDS EFFICIENT SELF-SUPERVISED PRETRAINING OF DEEP NETWORKS

利用KD在监督学习与自监督学习之间搭了一座桥,非常巧妙!
2025-08-11
学习笔记
#LLM #KD #Pruning
Distilling the Knowledge in Data Pruning

Distilling the Knowledge in Data Pruning

Dataset Pruning与KD的结合
2025-08-10
学习笔记
#LLM #KD #Compression
1234…6

搜索

Hexo Fluid
京ICP证123456号 | police-icon 京公网安备12345678号
载入天数... 载入时分秒...