标签 - KD - Sophilex‘s Blog

11-17

Different Designs For LLM KD Loss

10-13

Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory

10-13

Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective

10-11

Training-Inference Mismatch In LLM KD(II)

09-28

FROM CORRECTION TO MASTERY: REINFORCED DISTILLATION OF LARGE LANGUAGE MODEL AGENTS

09-28

Merge-of-Thought Distillation

09-28

Delta Knowledge Distillation for Large Language Models

09-14

Dataset Condensation for Recommendation

08-18

BOND: Aligning LLMs with Best-of-N distillation

08-11

DATASET DISTILLATION VIA KNOWLEDGE DISTILLATION: TOWARDS EFFICIENT SELF-SUPERVISED PRETRAINING OF DEEP NETWORKS