归档 - Sophilex‘s Blog

11-17

Different Designs For LLM KD Loss

11-17

Importance-Aware Data Selection for Efficient LLM Instruction Tuning

10-13

Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory

10-13

Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective

10-11

Training-Inference Mismatch In LLM KD(II)

09-28

FROM CORRECTION TO MASTERY: REINFORCED DISTILLATION OF LARGE LANGUAGE MODEL AGENTS

09-28

Merge-of-Thought Distillation

09-28

Delta Knowledge Distillation for Large Language Models

09-21

Massive Activations in Large Language Models

09-21

TD3: Tucker Decomposition Based Dataset Distillation Method for Sequential Recommendation