papersilove
Friday, January 12, 2024
[2305.18290] Direct Preference Optimization: Your Language Model is Secretly a Reward Model
https://arxiv.org/abs/2305.18290
No comments:
Post a Comment
Newer Post
Older Post
Home
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment