标签 - KD - Sophilex‘s Blog

08-10

Distilling the Knowledge in Data Pruning

08-04

DA-KD: Difficulty-Aware Knowledge Distillation for Efficient Large Language Models

08-04

C2KD: Cross-layer and Cross-head Knowledge Distillation for Small Language Model-based Recommendations

07-07

SVD-LLM: TRUNCATION-AWARE SINGULAR VALUE DECOMPOSITION FOR LARGE LANGUAGE MODEL COMPRESSION

06-24

Training-Inference Mismatch In LLM KD

06-23

Dual-Space Knowledge Distillation for Large Language Models

06-23

Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation

06-23

NOT ALL LLM-GENERATED DATA ARE EQUAL: RETHINKING DATA WEIGHTING IN TEXT CLASSIFICATION

06-10

Different Designs For LLM KD Loss