San Francisco, CA – Anthropic has unveiled transformative updates to its Claude AI platform, introducing two upgraded models and a groundbreaking new capability that allows Claude to interact with computers much like a human would.
The upgraded Claude 3.5 Sonnet offers a significant leap in performance across all metrics, with a notable improvement in coding abilities. It now holds the highest score on the SWE-bench Verified coding benchmark, achieving an impressive 49.0% — outperforming all publicly available models. The Sonnet model also excels in agentic tool use, with substantial gains on the TAU-bench evaluation. Early adopters, including GitLab, Cognition, and The Browser Company, are already experiencing considerable improvements in reasoning, planning, and problem-solving capabilities.
Coming later this month, the Claude 3.5 Haiku model offers a seamless balance of cost and speed while outperforming even the previous generation’s largest model, Claude 3 Opus, on numerous intelligence benchmarks. With its strong coding abilities and improved efficiency, Haiku is perfect for user-facing products, specialized applications, and personalization involving large datasets. Haiku ensures users can achieve cutting-edge performance without sacrificing affordability.
The most groundbreaking announcement is the new "computer use" feature, now available in public beta. This innovative capability allows Claude to interact with digital interfaces just as a human does — navigating, clicking, and typing. Through the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI, developers can now instruct Claude to perform sophisticated tasks on computers. While still in its experimental phase, early results on the OSWorld benchmark are highly promising. Companies such as Asana, Canva, DoorDash, and Replit are already exploring how "computer use" could automate complex workflows and reshape productivity.
With great innovation comes new challenges. Anthropic is mindful of the risks associated with enabling AI to operate computers, such as the potential rise in spam, misinformation, and fraud. In response, the company has developed advanced classifiers designed to detect and minimize these threats. Anthropic is committed to responsible innovation and is actively seeking feedback from developers to refine and safeguard this emerging technology. Developers can participate by providing feedback through Anthropic's developer forum, submitting issues via GitHub, or joining dedicated feedback sessions organized by the company.
The upgraded Claude 3.5 Sonnet is available now, with Claude 3.5 Haiku arriving later this month. Anthropic believes these advancements, particularly the "computer use" feature, will unlock unprecedented opportunities for human-AI collaboration. The company is eager to see how developers will harness these tools to push the boundaries of innovation even further.