Investigating the seahorse emoji doom loop using logitlens.| vgel.me
Playing around with the Representation Engineering paper, I made some interesting control vectors, and a Python package to make your own.| vgel.me