Image for article type news_post
News

UK: AISI publishes paper on evaluating control measures for AI agents

April 25, 2025
Summary

The AI Security Institute (AISI) published a paper on April 11, 2025, discussing scalable control measures for AI agents as large language models (LLMs) become more autonomous. The paper defines AI control as safeguards to prevent AI from causing harm post-deployment, contrasting with alignment strategies. It introduces a five-level taxonomy of AI Control Levels (ACLs) and emphasizes tailoring control protocols to a model's capabilities. The paper also highlights the need for evolving control measures and new research to manage risks from superhuman agents.

On April 11, 2025, the AI Security Institute (AISI) published a paper on evaluating control measures for artificial intelligence (AI) agents. The paper explores how  AI control measures—can be scaled as large language models (LLMs) become more autonomous and capable of acting without human oversight.What is AI control?The paper defines AI control as a set of technical and procedural safeguards designed to prevent AI models from causing harm, even if they pursue unintended goals. Unlike alignment strategies that aim to shape model behaviour during training, the paper notes that AI control focuses on restricting actions post-deployment. Examples of control measures discussed in the paper inclu

News

Gain access to unlimited articles with 7 day access to all features, no credit card required.

or

Other options: