Friday, January 12, 2024

[2305.18290] Direct Preference Optimization: Your Language Model is Secretly a Reward Model

https://arxiv.org/abs/2305.18290

No comments:

Post a Comment