共计 52 篇文章
2025
Training-Inference Mismatch In LLM KD
Dual-Space Knowledge Distillation for Large Language Models
Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation
NOT ALL LLM-GENERATED DATA ARE EQUAL: RETHINKING DATA WEIGHTING IN TEXT CLASSIFICATION
hexo+reveal指南
Different Designs For LLM KD Loss
服务器转发流量至本地
练琴有感
关于浮点数存储精度
bug聚集地