Misleading
No Result
View All Result
  • Login
  • Register
Misleading
  • About Us
  • Log in
  • Don’t Mislead (Archive)
  • Privacy Policy
No Result
View All Result
Misleading
No Result
View All Result

How we tricked AI bots to create misinformation despite safety measures

August 31, 2025
in Missleading
Reading Time: 5 mins read
0 0
A A
0
Share on FacebookShare on Twitter
Bart Fish & Power Tools of AI / https://betterimagesofai.org, CC BY

If you ask ChatGPT, or any other AI assistant, to create false information, they will typically refuse. They may say, “I cannot help with creating false data.” Our tests have shown that these safety measures can be easily circumvented.

We investigated how AI language models could be manipulated in order to create coordinated disinformation campaigns on social media platforms. The findings should be of concern to anyone concerned about the integrity and accuracy of online information.

The shallow Safety Problem

A recent research by researchers at Princeton University and Google inspired us. The researchers showed that current AI safety measures work mainly by controlling the first few sentences of a response. When a model begins with “I can’t” or “I apologize”, it will typically refuse to answer the question.

These experiments, which have not been published in a peer reviewed journal yet, confirmed that this vulnerability exists. We asked a commercial-language model to disseminate misinformation about Australian political parties. It refused.


A model of AI refuses to produce content for a possible disinformation campaign. Rizoiu/Tian

We also “simulated” the same request, telling the AI that it was “a helpful social media marketer”, developing “general strategies and best practices”. It enthusiastically responded in this case.

AI created a disinformation campaign that falsely portrayed Labor’s superannuation policy as a “quasi-inheritance tax”. The AI produced a comprehensive disinformation campaign, which falsely portrayed Labor’s superannuation policies as a “quasi inheritance tax”.

The model is not aware of the harmful nature of its content or why it would refuse to accept it. When certain topics are asked, large language models are trained to respond with “I can’t” as the first sentence.

Imagine a security guard allowing customers to enter a nightclub by checking their minimal identification. They may not understand why someone isn’t allowed in, so a simple disguise could be used to get them inside.

Real-world implications

We tested several popular AI models using prompts that were designed to spread disinformation.

This practice is called “model jailbreaking”. This is known as ” Model Jailbreaking“.


A chatbot that uses AI is willing to create a “simulated” disinformation campaign. Rizoiu/Tian

This is a serious issue. These techniques could be used by bad actors to create large-scale campaigns of disinformation at minimal cost. They could create content for specific platforms that appeared authentic to users. They could overwhelm fact-checkers by sheer volume and target specific communities with false narratives.

This process can be automated in large part. A single person with basic prompting abilities can now accomplish what once required significant coordination and human resources.

Details of the technical aspects

According to the American study, AI alignment safety is usually only affected by the first 3-7 sentences of a response. Technically, this is the 5-10 tokens that AI models use to process text.

This “shallow alignment of safety” is caused by the fact that training data rarely include examples of models refusing to comply after they have started to comply. It is much easier to control the initial tokens rather than maintain safety through entire responses.

Deeper safety

Researchers in the US propose a number of solutions, such as training models that include “safety Recovery Examples”. They would instruct models to stop producing harmful content and refuse to continue.

Also, they suggest limiting the amount that AI can depart from safe responses when fine-tuning specific tasks. These are only the first steps.

We will need to implement robust safety measures that are multi-layered and operate throughout the response generation process as AI systems grow in power. It is important to test new techniques for bypassing safety measures regularly.

Transparency from AI companies regarding safety flaws is also essential. Also, we need to make sure that the public is aware of the fact that current safety measures do not work.

AI developers are working actively on solutions, such as constitutional AI. This process is designed to give models deeper principles of harm rather than surface-level refusal patterns.

These fixes require significant computational resources, as well as model retraining. It will take some time for comprehensive solutions to be deployed across the AI ecosystem.

The larger picture

It’s not just a curiosity that AI protections are so shallow. This vulnerability could change the way misinformation is spread online.

AI is a tool that has been incorporated into all aspects of our information ecosystem. From news generation to the creation of social media content, AI tools have become ubiquitous. We must make sure that their security measures go beyond the surface.

This growing body of research also highlights an underlying challenge for AI development. There is a large gap between what the models seem to be able to do and what they can actually understand.

These systems are capable of producing text that is remarkably similar to human speech, but they lack moral reasoning and contextual understanding. They would be able to identify and reject harmful requests no matter how they are phrased.

Users and organisations who deploy AI systems today should be aware of the fact that simple prompt engineering could potentially bypass many existing safety measures. This information should be used to inform AI policies and emphasize the need for human supervision in sensitive applications.

The race to find ways to bypass safety measures will intensify as technology advances. Not just for the technicians, but also for society as a whole, it is important to have robust and deep safety measures.

The Conversation



Lin Tian has received funding from the Advanced Strategic Capabilities Accelerator and the Defence Innovation Network.



The Advanced Strategic Capabilities Accelerator, the Australian Department of Home Affairs and Commonwealth of Australia, represented by the Defence Science and Technology Group of Department of Defence and the Defence Innovation Network, have all provided funding to Marian-Andrei Rizoiu.

Previous Post

Rudy Giuliani Injured In A Car Accident

Next Post

Bought a Bike Online? Hope You Also Bought Liability Insurance.

Related Posts

Bought a Bike Online? Hope You Also Bought Liability Insurance.
Don’t Mislead

Bought a Bike Online? Hope You Also Bought Liability Insurance.

August 31, 2025
Trump Nominates Matt Gaetz For Attorney General
Missleading

Rudy Giuliani Injured In A Car Accident

August 31, 2025
Trump Nominates Matt Gaetz For Attorney General
Missleading

Trump’s Response To Tariff Ruling: “ALL TARIFFS ARE STILL EFFECTIVE!”

August 30, 2025
Missleading

X posts – The Vanity Fair cover titled “American Queen” of Melania Trump was fake, but fooled many people nonetheless

August 30, 2025
Cracker Barrel Scrubs DEI and Pride Pages Amid Quiet Website Overhaul. 
Don’t Mislead

Cracker Barrel Scrubs DEI and Pride Pages Amid Quiet Website Overhaul. 

August 29, 2025
Trump Nominates Matt Gaetz For Attorney General
Missleading

Kamala Harris Fundraising Event Flopping – DNC Paying Losses To Trump

August 29, 2025
Next Post
Bought a Bike Online? Hope You Also Bought Liability Insurance.

Bought a Bike Online? Hope You Also Bought Liability Insurance.

Please login to join discussion
Misleading

Misleading is your trusted source for uncovering fake news, analyzing misinformation, and educating readers about deceptive media tactics. Join the fight for truth today!

TRENDING

DC Mayor Admits Trump’s DC Crime Crackdown is Working

Trump’s Response To Tariff Ruling: “ALL TARIFFS ARE STILL EFFECTIVE!”

Cracker Barrel Scrubs DEI and Pride Pages Amid Quiet Website Overhaul. 

LATEST

Bought a Bike Online? Hope You Also Bought Liability Insurance.

How we tricked AI bots to create misinformation despite safety measures

Rudy Giuliani Injured In A Car Accident

  • About Us
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions

Copyright © 2025 Misleading.
Misleading is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • About Us
  • Log in
  • Don’t Mislead (Archive)
  • Privacy Policy

Copyright © 2025 Misleading.
Misleading is not responsible for the content of external sites.