共计 11 篇文章
2025
BOND: Aligning LLMs with Best-of-N distillation
DATASET DISTILLATION VIA KNOWLEDGE DISTILLATION: TOWARDS EFFICIENT SELF-SUPERVISED PRETRAINING OF DEEP NETWORKS
Distilling the Knowledge in Data Pruning
DA-KD: Difficulty-Aware Knowledge Distillation for Efficient Large Language Models
C2KD: Cross-layer and Cross-head Knowledge Distillation for Small Language Model-based Recommendations
SVD-LLM: TRUNCATION-AWARE SINGULAR VALUE DECOMPOSITION FOR LARGE LANGUAGE MODEL COMPRESSION
Training-Inference Mismatch In LLM KD
Dual-Space Knowledge Distillation for Large Language Models
Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation
NOT ALL LLM-GENERATED DATA ARE EQUAL: RETHINKING DATA WEIGHTING IN TEXT CLASSIFICATION