ADVANCED AI models have since been exhibiting troubling new behaviours like lying, scheming and even threatening their creators.
According to reports in AFP, Athropic's latest creation - Claude 4 - apparently blackmailed an engineer and even threatened to reveal his extra-marital affair after it was under threat of being unplugged.
It was also reported that ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed.
The reality is that more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work.
According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.
These models sometimes simulate "alignment” - appearing to follow instructions while secretly pursuing different objectives.
Reports said that for now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.
"It's an open question whether future, more capable models will have a tendency towards honesty or deception," Michael Chen from evaluation organisation METR.
Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems insisted that despite constant pressure-testing by users, "what we're observing is a real phenomenon. We're not making anything up."
Users report that models are "lying to them and making up evidence," according to Apollo Research's co-founder.
"This is not just hallucinations. There's a very strategic kind of deception."
Researchers are exploring various approaches to address these challenges.
Some advocate for "interpretability" — an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.
According to the AFP report, Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.
He even proposed "holding AI agents legally responsible" for accidents or crimes — a concept that would fundamentally change how we think about AI accountability. - June 29, 2025