共计 19 篇文章
2025
Different Designs For LLM KD Loss
Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory
Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective
Training-Inference Mismatch In LLM KD(II)
FROM CORRECTION TO MASTERY: REINFORCED DISTILLATION OF LARGE LANGUAGE MODEL AGENTS
Merge-of-Thought Distillation
Delta Knowledge Distillation for Large Language Models
Dataset Condensation for Recommendation
BOND: Aligning LLMs with Best-of-N distillation
DATASET DISTILLATION VIA KNOWLEDGE DISTILLATION: TOWARDS EFFICIENT SELF-SUPERVISED PRETRAINING OF DEEP NETWORKS