OpenAI pledges to publish AI security check outcomes extra typically

OpenAI is transferring to publish the outcomes of its inner AI mannequin security evaluations extra recurrently in what the outfit is pitching as an effort to extend transparency.

On Wednesday, OpenAI launched the Safety Evaluations Hub, a webpage exhibiting how the corporate’s fashions rating on varied assessments for dangerous content material technology, jailbreaks, and hallucinations. OpenAI says that it’ll use the hub to share metrics on an “ongoing foundation,” and that it intends to replace the hub with “main mannequin updates” going ahead.

Introducing the Security Evaluations Hub—a useful resource to discover security outcomes for our fashions.

Whereas system playing cards share security metrics at launch, the Hub can be up to date periodically as a part of our efforts to speak proactively about security.https://t.co/c8NgmXlC2Y

— OpenAI (@OpenAI) May 14, 2025

“Because the science of AI analysis evolves, we intention to share our progress on growing extra scalable methods to measure mannequin functionality and security,” wrote OpenAI in a blog post. “By sharing a subset of our security analysis outcomes right here, we hope this won’t solely make it simpler to know the protection efficiency of OpenAI methods over time, but in addition help group efforts⁠ to extend transparency throughout the sector.”

OpenAI says that it could add further evaluations to the hub over time.

In current months, OpenAI has raised the ire of some ethicists for reportedly speeding the protection testing of sure flagship fashions and failing to launch technical experiences for others. The corporate’s CEO, Sam Altman, additionally stands accused of deceptive OpenAI executives about mannequin security evaluations previous to his transient ouster in November 2023.

Late final month, OpenAI was pressured to roll again an replace to the default mannequin powering ChatGPT, GPT-4o, after customers started reporting that it responded in a very validating and agreeable approach. X turned flooded with screenshots of ChatGPT applauding all types of problematic, dangerous decisions and ideas.

OpenAI mentioned that it might implement a number of fixes and modifications to forestall future such incidents, together with introducing an opt-in “alpha section” for some fashions that will enable sure ChatGPT customers to check the fashions and provides suggestions earlier than launch.