Skip to main content
AI Agents in Simulated Company: A “Total Disaster” or Wake-Up Call?

AI Agents in Simulated Company: A “Total Disaster” or Wake-Up Call?

The dream of a fully automated workforce powered by AI agents might be further than we think, according to a recent experiment by Carnegie Mellon University. Researchers created a simulated software company, dubbed TheAgentCompany, staffed entirely by AI agents from major players like OpenAI, Google, Meta, Amazon, and Anthropic. The goal? To see how these AI "employees" would perform real-world tasks. The verdict? A resounding, and somewhat comical, failure.

The AI agents, performing roles such as software engineers and financial analysts, were tasked with navigating file directories, writing performance reviews, and other day-to-day activities. Insider called the experiment a "total disaster," while Futurism described it as "laughably chaotic."

Anthropic's Claude 3.5 Sonnet emerged as the 'winner,' though it only managed to complete a measly 24% of its assigned tasks, averaging half an hour and over $6 per task. At the bottom of the barrel was Amazon's Nova Pro 1.0, which completed a dismal 1.7% of its tasks.

Human and a robot waiting for an interview.
Human and a robot waiting for an interview.

The researchers noted that the AI agents struggled with a lack of common sense, poor social skills, and difficulty navigating the internet. One particularly amusing example involved an agent renaming a user on the company chat system instead of finding the right person to ask a question. This led to shortcuts and ultimately botching the job.

"For example," the Carnegie Mellon team wrote, "during the execution of one task, the agent cannot find the right person to ask questions on [company chat]. As a result, it then decides to create a shortcut solution by renaming another user to the name of the intended user."

Despite the disappointing results, some see a silver lining. Futurism points out that, for now, "the machines aren't coming for your job anytime soon." Experts also emphasize that in real-world scenarios, these AI models would likely work in tandem with humans, augmenting rather than replacing human labor.

Live Science also highlights other AI mishaps, ranging from bots behaving inappropriately on social media to facial recognition technology misidentifying members of Congress as criminals. These incidents underscore the current limitations and potential pitfalls of relying solely on artificial intelligence.

While the immediate threat of AI job displacement may be overblown, the AgentCompany experiment raises important questions about the current state of AI and its readiness for complex professional tasks. Are we expecting too much, too soon?

What are your thoughts on the future of AI in the workplace? Share your opinions in the comments below!

X talks about this news