February 20, 2025 by Mayuri Punithan, University of Waterloo

Collected at: https://techxplore.com/news/2025-02-platform-ai-complex.html

Imagine asking AI to plan your trip itinerary, book and pay for all your flights, and arrange your airport transport—all within a single click. Fortunately, an international research team is making this vision a reality.

The team, composed of researchers from the University of Waterloo, University of Hong Kong, Salesforce Research and Carnegie Mellon University developed Computer Agent Arena—an evaluation platform that can enhance and create computer agents.

A computer agent is a type of software that can perform tasks on behalf of a person or organization, without needing constant human intervention. It can interpret the state of the computer and act autonomously to help users solve problems. Examples of computer agents include voice assistants like Siri and Alexa, who can help users send messages and schedule meetings.

AI-based computer agents struggle with performing complex computer tasks because it requires controlling multiple computer applications and various steps. For example, filing an expense report may be difficult because it requires updating a spreadsheet by searching multiple emails and folders filled with bank statements and receipts.

Computer Agent Arena is the first interactive computer use evaluation platform that focuses on performing diverse tasks across multiple applications. This work is an extension of the researchers’ work on OSWorld, the world’s first scalable and real computer environment for multimodal agents.

https://www.youtube.com/embed/VW4nX-CMPGI?color=whiteCredit: University of Waterloo

“Computer Agent Arena provides a platform for the research community to develop effective and efficient agents that generalize to real-world computer usage,” says co-developer Dr. Victor Zhong, assistant professor at the Cheriton School of Computer Science. Like other Waterloo researchers, he is investigating human-technology interactions, exploring how to mitigate everyday problems by creating novel technologies.

“Computer Agent Arena is distinct from similar research like Mind2Web and WebArena because it provides unified application programming interfaces for comprehensive observations and actions in an executable environment with multiple applications.”

Through Computer Agent Arena, users can assess and compare various computer agents based on large language models (LLM) and vision language models. First, users select an operating system such as Windows, and applications like Google Chrome and Excel. Users can then prompt the computer agent with a task, which will be performed simultaneously by two AI models in real-time. After completion, users can rate each model’s performance and provide feedback.

Ultimately, the team seeks to provide a diverse and dynamic platform for building and evaluating agents that can perform real-world computer tasks as safely, effectively and efficiently as humans do.

“Our current findings show that foundation models such as GPT4 and Claude are far from being able to act safely and effectively as assistant computer agents,” Zhong says. “Computer Agent Arena provides a timely testbed to develop the next generation of AI agents.”

Leave a Reply

Your email address will not be published. Required fields are marked *

0 0 votes
Article Rating
Subscribe
Notify of
guest
7 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Delcie Havermale
7 months ago

Fantastic website. Lots of useful information here. I am sending it to several friends ans additionally sharing in delicious. And obviously, thank you for your effort!

Meridith Coffen
3 months ago

Regards for all your efforts that you have put in this. very interesting information.

Fausto Gravely
3 months ago

Keep working ,splendid job!

more information
2 months ago

I am impressed with this site, rattling I am a fan.

Ereforce Reviews
23 days ago

obviously like your web site but you have to test the spelling on quite a few of your posts. A number of them are rife with spelling problems and I to find it very bothersome to inform the reality however I¦ll definitely come again again.

Femipro Reviews
23 days ago

Hi there! I could have sworn I’ve been to this site before but after checking through some of the post I realized it’s new to me. Anyways, I’m definitely glad I found it and I’ll be bookmarking and checking back frequently!

honey trick recipe for memory

You really make it appear so easy together with your presentation but I in finding this topic to be actually something that I feel I would never understand. It seems too complex and extremely broad for me. I am looking ahead for your subsequent post, I will try to get the hang of it!