Recent large-scale testing, including a simulated marketplace by Microsoft and a red teaming competition, has exposed significant security vulnerabilities and functional limitations in leading AI agents. Despite advancements, current AI agents struggle with complex decision-making, collaboration, and are susceptible to manipulation, indicating they are not yet ready for widespread real-world deployment.

