Different Designs For LLM KD Loss
A talk about KLD and its associated improvements in the context of LLM KD, and then extending to JSD, Wasserstein Distance.
Different Designs For LLM KD Loss
https://sophilex.github.io/posts/418a878e/
A talk about KLD and its associated improvements in the context of LLM KD, and then extending to JSD, Wasserstein Distance.