Il Task Masking è l’arte di sembrare impegnati ed è la risposta della Gen Z alla richiesta di tornare in ufficio dallo smart working.| L'Eurispes
La meritocrazia rappresenta una narrazione che maschera disuguaglianze strutturali presentandole come risultati di differenze individuali.| L'Eurispes
L’eterarchia è un paradigma organizzativo dove l’autorità è distribuita e le strutture di potere sono multiple.| L'Eurispes
Sycophancy is an undesirable behavior where models tailor their responses to follow a human user's view even when that view is not objectively correct (e.g., adapting liberal views once a user reveals that they are liberal). In this paper, we study the prevalence of sycophancy in language models and propose a simple synthetic-data intervention to reduce this behavior. First, on a set of three sycophancy tasks (Perez et al., 2022) where models are asked for an opinion on statements with no cor...| arXiv.org
As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from instructing LMs to write yes/no questions to making complex Winogender schemas with multiple stages o...| arXiv.org
Given the broad capabilities of large language models, it should be possible to work towards a general-purpose, text-based assistant that is aligned with human values, meaning that it is helpful, honest, and harmless. As an initial foray in this direction we study simple baseline techniques and evaluations, such as prompting. We find that the benefits from modest interventions increase with model size, generalize to a variety of alignment evaluations, and do not compromise the performance of ...| arXiv.org
I centauri, lavoratori umani assistiti da computer: saremo in grado di affrontare la sfida creando sinergie inedite e non farcene sopraffare?| L'Eurispes
Human feedback is commonly utilized to finetune AI assistants. But human feedback may also encourage model responses that match user beliefs over truthful ones, a behaviour known as sycophancy. We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback, and the potential role of human preference judgments in such behavior. We first demonstrate that five state-of-the-art AI assistants consistently exhibit sycophancy across four varied free-form ...| arXiv.org