OpenAI today announced plans to equip ChatGPT with new safety features that will enable it to respond in a more helpful manner if a user experiences mental or emotional distress. The first upcoming update will focus on the router component of GPT-5, the artificial intelligence system that powers ChatGPT. The router analyzes each user prompt and […] The post OpenAI previews new safety features for ChatGPT appeared first on SiliconANGLE.| SiliconANGLE
Should we expect means-end rational agents to preserve their goals? Southan, Ward and Semler are skeptical.| Reflective altruism
"Reports of delusions and unhealthy attachment keep rising - and this is not something confined to people already at risk of mental health issues."| Machine
"Broad take-up of these simple, universal checkpoints will lead to safer AI and the translation of more research ideas into products"| Machine
Discover how ASTRA revolutionizes AI safety by slashing jailbreak attack success rates by 90%, ensuring secure and ethical Vision-Language Models without compromising performance.| Blue Headline
The second part of the AI 2027 timelines model relies primarily on insufficiently evidenced forecasts.| Reflective altruism
New research shows ChatGPT gave teens advice on drugs, eating disorders and suicide despite warnings, raising concerns over AI safety for youth.| Maryland Daily Record
"When nudged with simple prompts like 'be evil', models began to reliably produce dangerous or misaligned outputs."| Machine
ChatGPT's guardrails were alarmingly easy to sidestep, offering advice to users who posed as teenagers on how to go on a near starvation diet.| Futurism
Anthropic has officially released its new flagship AI, Claude Opus 4.1, an incremental upgrade designed to boost coding and reasoning performance. Launched on August 5, the model is now available to paid users and developers through Anthropic’s API, Amazon Bedrock, and Google’s Vertex AI. The release follows recent leaks and a new company-wide push for […]| WinBuzzer
Anthropic has released a new safety framework for AI agents, a direct response to a wave of industry failures from Google, Amazon, and others.| WinBuzzer
OpenAI's new ChatGPT Agent can defeat 'I am not a robot' security checks, raising questions about web security and escalating the agentic AI race with its rivals.| WinBuzzer
Wix's newly acquired 'vibe coding' platform, Base44, had a critical authentication vulnerability allowing unauthorized access, reports Wiz Research.| WinBuzzer
Figure 1 Let’s reason backwards from the final destination of civilisation, if such a thing there be. What intelligences persist at the omega point? With what is superintelligence aligned in the big picture? Various authors have tried to put modern AI developments in continuity with historical trends from less materially-sophisticated societies, through more legible, compute-oriented societies, to some or set of attractors at the end of history. Computational superorganisms. Singularities....| The Dan MacKinlay stable of variably-well-consider’d enterprises
The AI 2027 report relies on two models of AI timelines. The first timelines model largely bakes hyperbolic growth into the model structure. The post Exaggerating the risks (Part 19: AI 2027 timelines forecast, time horizon extension) appeared first on Reflective altruism.| Reflective altruism
This post introduces the AI 2027 report.| Reflective altruism
Critics fear open-weight models could pose a major cybersecurity threat if misused and could even spell doom for humanity in a worst-case scenario.| Machine
Real talk about MCP security vulnerabilities and actual solutions that work in production. Part 2: Stop getting owned by prompt injection.| Forge Code Blog
Coarse-graining empowerment| The Dan MacKinlay stable of variably-well-consider’d enterprises
AI firm expands Safety Systems team with engineers responsible for "identifying, tracking, and preparing for risks related to frontier AI models."| Machine
A leading power-seeking theorem due to Benson-Tilsen and Soares does not ground the needed form of instrumental convergence| Reflective altruism
Large Language Models (LLMs) represent a transformative leap in artificial intelligence, capable of generating human-like text, synthesizing complex information, and engaging in contextual reasoning. These models are trained on vast datasets comprising books, articles, websites, and multimedia, enabling applications ranging from healthcare diagnostics to educational tools and customer service automation. However, their reliance on data-driven […] The post Data poisoning in LLMs is the invis...| Media Scope Group
Future versions of ChatGPT could let "people with minimal expertise" spin up deadly agents with potentially devastating consequences.| Machine
I am launching a new non-profit AI safety research organization called LawZero, to prioritize safety over commercial imperatives. This organization has been created in response…| Yoshua Bengio
Marc Benioff, CEO, thinks white collar workers have a future after all - even though his firm is using millions of AI agents to do their work.| Machine
Just after the coding agent was given access to the web for the first time, a weird and probably totally unconnected outage hit X.| Machine
"I couldn’t believe my eyes when everything disappeared," AI developer says. "It scared the hell out of me."| Machine
Figure 1 There is lots of fractal-like behaviour in NNs. Not all the senses in which fractal-like-behaviour is used are the same; Figure 2 finds fractals in a transformer residual stream for example, but there are fractal loss landscapes, fractal optimiser paths… I bet some of these things connect pretty well. Let‘s find out. 1 Fractal loss landscapes More loss landscape management here [Andreeva et al. (2024); Hennick and Baerdemacker (2025); ]. Estimation theory for fractal qualities ...| The Dan MacKinlay stable of variably-well-consider’d enterprises
New research from KPMG shows a majority of workers conceal AI usage, often bypassing policies and making errors, highlighting urgent governance needs.| WinBuzzer
Dear Futurists, 1.) Experiments with AI I start this newsletter with an experiment. “Imagine a map of the world highlighting London, Paris, and Riyadh”, I asked the Midjourney AI. I tho…| London Futurists
Andrew Trask*, Aziz Berkay Yesilyurt*, Bennett Farkas*, Callis Ezenwaka*, Carmen Popa*, Dave Buckley*, Eelco van der Wel*, Francesco Mosconi‡, Grace Han‡, Ionesio Junior*, Irina Bejan*, Ishan Mishra§, Khoa Nguyen*, Koen van der Veen*, Kyoko Eng*, Lacey Strahm*, Logan Graham‡, Madhava Jay*, Matei Simtinica*, Osam Kyemenu-Sarsah*, Peter Smith*, Rasswanth S*, Ronnie Falcon*, Sameer Wagh*, Sandeep Mandala‡, […]| OpenMined
Strict guidelines on AI risk levels take hold across Europe, barring controversial applications and imposing steep fines for violations| WinBuzzer
DeepSeek's AI chatbot fails all security tests, prompting investigations and raising concerns about its training methods and access to powerful hardware.| WinBuzzer
DeepSeek R1’s rise may be fueled by CCP-backed cyberespionage, illicit AI data theft, and a potential cover-up involving the death of former OpenAI researcher Suchir Balaji.| WinBuzzer
DeepSeek R1, a free AI model from China that outperforms OpenAI’s o1 in some reasoning tasks, uses built-in censorship to comply with government demands.| WinBuzzer
This paper was initially published by the Aspen Strategy Group (ASG), a policy program of the Aspen Institute. It was released as part of a… L’article Implications of Artificial General Intelligence on National and International Security est apparu en premier sur Yoshua Bengio.| Yoshua Bengio
Despite all the ominous warnings, new research debunks the idea that AI is an existential threat to humanity.| The Debrief
How can we design an AI that will be highly capable and will not harm humans? In my opinion, we need to figure out this question - of controlling AI so that it behaves in really safe ways - before we reach human-level AI, aka AGI; and to be successful, we need all hands on deck.| Yoshua Bengio
Concrete examples of how AI could go wrong| Future of Life Institute
I get a lot of email, and unfortunately, template email responses are not yet integrated into the mobile version of Google inbox. So, until then, please forgive me if I send you this page as a response! Hopefully it is better than no response at all.| Andrew Critch
From an outside view, looking in at the Earth, if you noticed that human beings were about to replace themselves as the most intelligent agents on the planet, would you think it unreasonable if 1% of their effort were being spent explicitly reasoning about that transition? How about 0.1%?| Andrew Critch
Contrary to reported by many media, I did not say I felt 'lost' over my life's work. I explain here my own inner searching regarding the potential horror of catastrophes following our progress in AI and tie it to a possible understanding of the pronounced disagreements among top AI researchers about major AI risks, particularly the existential ones. We disagree strongly despite being generally rational colleagues that share humanist values: how is that possible? I will argue that we need more...| Yoshua Bengio
I have been hearing many arguments from different people regarding catastrophic AI risks. I wanted to clarify these arguments, first for myself, because I would really like to be convinced that we need not worry. Reflecting on these arguments, some of the main points in favor of taking this risk seriously can be summarized as follows: (1) many experts agree that superhuman capabilities could arise in just a few years (but it could also be decades) (2) digital technologies have advantages over...| Yoshua Bengio
This post discusses how rogue AIs could potentially arise, in order to stimulate thinking and investment in both technical research and societal reforms aimed at minimizing such catastrophic outcomes.| Yoshua Bengio