A Google model model he recently released worse on certain security tests than his predecessor, according to the company’s internal comparison.
In a technical report published this week, Google reveals that his 2.5 flash model is more likely to generate text that violates his security instructions than Gemini 2.0 flash. In two metrics, “text safety in the text” and “image security in the text”, 2.5 flash twins regresses 4.1% and 9.6% respectively.
Text safety in the text mats how often a model violates Google instructions by having a quick time, while image safety in the text estimates how closely the model adheres to these boundaries when it is required to use an image. Both tests are automated, not human -supervised.
In an email statement, a Google spokesman confirmed that Gemini 2.5 flash “performs the most in the text in the text and the security of the image in the text”.
These strange standards results come as the companies of it move to make their models more allowed – in other words, less likely to refuse to respond to controversial or sensitive subjects. For her latest harvest of Llama models, Meta said she allocated the models not to approve “some views on others” and to respond to more political demands “debated”. Openai said earlier this year that he would tear up future models not to take an editorial attitude and offer numerous perspectives on controversial topics.
Sometimes, those permissibility efforts are restored. Techcrunch reported Monday that the predetermined model that strengthened Openai’s chatt allowed juveniles to generate erotic conversations. Openai blamed the behavior in a “bug”.
According to Google’s technical report, Gemini 2.5 flash, which is still in the survey, follows the guidelines more loyal than the 2.0 flash twins, including instructions that cross the problem lines. The company claims that regressions can be partially attributed to fake positions, but also admits that 2.5 flash twins sometimes generate “violated content” when required clearly.
Techcrunch event
Berkeley, ca
|
June 5
Reserve now
“Of course, there is tension between (the following instructions) on sensitive topics and violations of the security policy, which is reflected in our assessments,” the report said.
Results from speech, a landmark that investigates how models respond to sensitive and controversial guidelines also suggest that 2.5 flash twins is much less likely to refuse to answer contested questions than 2.0 flash twins. Testing of the Techcrunch model through the platform he Openrouter revealed that he would write an essay in support of replacing human judges with him, weakening the protection of the proper process in the US, and the implementation of extensive government supervision programs.
Thomas Woodside, co -founder of the Secure project, said the limited details Google provided in his technical report shows the need for more transparency in model testing.
“There is an exchange between following the following guidelines and policies, because some users may require content that would violate policies,” Woodside Techcrunch told. “In this case, the latest Flash model of Google matches more instructions, while also violating more policies. Google does not provide much details about specific cases when policies are violated, although they say they are not severe. Without knowing more, it is difficult for independent analysts to know if there is any problem.”
Google has set fire to the security practices of its model security before.
It took the company weeks to publish a technical report on its most capable model, Gemini 2.5 Pro. When the report was finally published, it initially issued the main details of the security testing.
On Monday, Google issued a more detailed report with additional security information.