Login
From:
Lj Miranda
(Uncensored)
subscribe
A lexical view of contrast pairs in preference datasets
https://ljvmiranda921.github.io/notebook/2024/03/12/contrast-pairs/
links
backlinks
Tagged with:
notebook
openai
llm
rlhf
preference data
shp
berkeley-nest
Can we spot differences between preference pairs just by looking at their word embeddings? In this blog post, I want to share my findings from examining lexical distances between chosen and rejected responses in preference datasets.
Roast topics
Find topics
Find it!