共计 17 篇文章
2025
C2KD: Cross-layer and Cross-head Knowledge Distillation for Small Language Model-based Recommendations
SVD-LLM: TRUNCATION-AWARE SINGULAR VALUE DECOMPOSITION FOR LARGE LANGUAGE MODEL COMPRESSION
Training-Inference Mismatch In LLM KD
Dual-Space Knowledge Distillation for Large Language Models
Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation
NOT ALL LLM-GENERATED DATA ARE EQUAL: RETHINKING DATA WEIGHTING IN TEXT CLASSIFICATION
Different Designs For LLM KD Loss