C2KD: Cross-layer and Cross-head Knowledge Distillation for Small Language Model-based Recommendations LLM Rec 关于KD的优化 2025-08-04 学习笔记 #LLM #KD
DipSVD: Dual-importance Protected SVD for Efficient LLM Compression 根据每一层的重要性和可压缩程度自适应地分配矩阵压缩率 2025-07-07 学习笔记 #LLM #Matrix_Decomposition
SVD-LLM: TRUNCATION-AWARE SINGULAR VALUE DECOMPOSITION FOR LARGE LANGUAGE MODEL COMPRESSION 引入Cholesky decomposition,从理论上保证丢弃的奇异值与loss值一一对应 2025-07-07 学习笔记 #LLM #KD
LANGUAGE MODEL COMPRESSION WITH WEIGHTED LOW-RANK FACTORIZATION 引入Fisher information,在SVD分解时对参数进行加权,以及讨论LLM Compression的常见路线 2025-07-07 学习笔记 #LLM #Matrix_Decomposition
ASVD: ACTIVATION-AWARE SINGULAR VALUE DECOMPOSITION FOR COMPRESSING LARGE LANGUAGE MODELS 引入关于input的信息辅助SVD分解 2025-07-07 学习笔记 #LLM #Matrix_Decomposition
Dual-Space Knowledge Distillation for Large Language Models 尝试解决当前llm白盒蒸馏框架下只能对同词汇表模型之间进行蒸馏的局限 2025-06-23 学习笔记 #LLM #KD
Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation 提出了两个指标来直观观察lm的Error Accumulation现象 2025-06-23 学习笔记 #LLM #KD
NOT ALL LLM-GENERATED DATA ARE EQUAL: RETHINKING DATA WEIGHTING IN TEXT CLASSIFICATION 尝试通过引入sample-wise loss weight来缓解train-inference mismatch问题 2025-06-23 学习笔记 #LLM #KD