Anthropic aims to fix one of the biggest problems in AI right now

Andrew Tarantola

July 2, 2024 at 2:00 p.m.·2 min read

Anthropic

Hot on the heels of the announcement that its Claude 3.5 Sonnet large language model beat out other leading models, including GPT-4o and Llama-400B, AI startup Anthropic announced Monday that it plans to launch a new program to fund the development of independent, third-party benchmark tests against which to evaluate its upcoming models.

Per a blog post, the company is willing to pay third-party developers to create benchmarks that can “effectively measure advanced capabilities in AI models.”

“Our investment in these evaluations is intended to elevate the entire field of AI safety, providing valuable tools that benefit the whole ecosystem,” Anthropic wrote in a Monday blog post. “Developing high-quality, safety-relevant evaluations remains challenging, and the demand is outpacing the supply.”

The company wants submitted benchmarks to help measure the relative “safety level” of an AI based on a number of factors, including how well it resists attempts to coerce responses that might include cybersecurity; chemical, biological, radiological, and nuclear (CBRN); and misalignment, social manipulation, and other national security risks. Anthropic is also looking for benchmarks to help evaluate models’ advanced capabilities and is willing to fund the “development of tens of thousands of new evaluation questions and end-to-end tasks that would challenge even graduate students,” essentially testing a model’s ability to synthesize knowledge from a variety of sources, its ability to refuse cleverly worded malicious user requests, and its ability to respond in multiple languages.

Anthropic is looking for “sufficiently difficult,” high-volume tasks that can involve as many as “thousands” of testers across a diverse set of test formats that help the company inform its “realistic and safety-relevant” threat modeling efforts. Any interested developers are welcome to submit their proposals to the company, which plans to evaluate them on a rolling basis.

Engadget
FTC warns some PC manufacturers that they're violating right to repair rules
The FTC warned ASRock, Gigabyte and Zotech to get rid of them 'warranty void' stickers and stop threatening to void warranties if users break the seal.
Bloomberg
Indonesia’s Biggest Cyberattack Prompts Resignation, Audit
(Bloomberg) -- An official of Indonesia’s information technology ministry resigned as the government continues an audit of its data centers in the wake of the nation’s worst cyberattack.Most Read from BloombergBiden’s Fourth of July Shrouded by Pressure to Drop 2024 BidKamala Harris Is Having a Surprise Resurgence as Biden’s Campaign UnravelsHouse Democrats Consider Demanding Biden Withdraw From RaceNewsom Shocks California Politics by Scrapping Crime MeasureChina Can End Russia’s War in Ukraine
Bloomberg
Apple Seeks to Scrub ‘Dominance’ From China Antitrust App Ruling
(Bloomberg) -- Apple Inc. is seeking to get a Chinese court to alter its written ruling in a lawsuit the iPhone maker won, an unusual move that underscores the sensitivity of the US company’s position in the world’s largest smartphone arena. Most Read from BloombergBiden’s Fourth of July Shrouded by Pressure to Drop 2024 BidKamala Harris Is Having a Surprise Resurgence as Biden’s Campaign UnravelsHouse Democrats Consider Demanding Biden Withdraw From RaceNewsom Shocks California Politics by Scra
Barrons.com
Apple Poised to Get OpenAI Board Role, Report Says. Why That’s a Blow for Microsoft.
The arrangement would seemingly put Apple on an equal footing with Microsoft. which is OpenAI's main financial backer.
Reuters
Australia spy agency moves intelligence data to cloud in Amazon deal
SYDNEY (Reuters) -Australia will move its top secret intelligence data to the cloud under a A$2 billion deal with Amazon Web Services that Defence Minister Richard Marles said would boost defence force interoperability with the United States. The Director General of the Australian Signals Directorate, Rachel Noble, said the national security agency would also increase its use of artificial intelligence (AI) to analyse data under the shift, which would see top secret data centres built in Australia.
Barrons.com
Apple Needs AI—but It Needs Satellites More
Investors hope advanced hardware required to run complex artificial-intelligence-based computing will drive Apple iPhone users to upgrade their handsets, but satellite connectivity might matter more. Since Apple unveiled new AI features at its June developer conference, its stock has climbed about 15%, creating about $420 billion in market value. “Apple’s AI strategy will leverage its golden installed base around personalization and [large language models] on the phone that should change the growth trajectory of Cupertino [and] spur an AI-driven iPhone upgrade cycle starting with iPhone 16,” wrote Wedbush analyst Dan Ives after the event.
TechCrunch
noplace, a mashup of Twitter and Myspace for Gen Z, hits No. 1 on the App Store
Aiming to bring the "social" back to "social media," a new app called noplace has surged to the top of the App Store as it launched out of invite-only mode Wednesday. Designed to appeal to a younger crowd — or anyone who wants to connect with friends or around shared interests — noplace is like a modern-day Myspace with its colorful, customizable profiles that allow people to share everything from relationship status, to what they're listening to or watching, what they're reading or doing, and more. Boding well for its potential in the often-difficult consumer social market, noplace had already gone viral ahead of its public launch because of its feature that allows users to express themselves by customizing the colors of their profile.
BuzzFeed
"Oomf" Is The Newest Gen Z Affectionate Slang That's Taking Over The Internet — Here's What It Actually Means
This word doesn’t mean what you think.
Hello!
King Charles and Queen Camilla's portrait with Prince William and Prince Edward sparks questions from royal fans
The King and Queen were pictured alongside the Prince of Wales and the Duke of Edinburgh at the Thistle Service in Edinburgh during Royal Week
Miami Herald
Florida Keys sailboat was a floating house of horrors for children, cops say
Florida Keys police opened their investigation after the FBI arrested him.
Yahoo Canada Style
Montreal Canadiens player Josh Anderson and Paola Finizio get married in Italian ceremony: 'A dream day'
The couple got married in Puglia, Italy, which was where they got engaged last May.
Hello!
Amanda Holden almost bears all in glamorous shower video from lavish bathroom
BGT's Amanda Holden looked sensational when she almost bore all in a candid shower video. See video.
BuzzFeed
"Coffee Badging" Is The Newest Return-To-Office Trend That's Stirring Up Controversy — Here's What You Need To Know
If employees are expected to swipe into the office, chances are some people are doing this.
The Daily Beast
Trump Caught on Video Claiming ‘Broken-Down’ Biden Has Quit: ‘It’s Kamala’
Donald Trump delivered a brutal assessment of Joe Biden’s performance against him in last week’s presidential debate, calling the president a “broken-down pile of crap” teetering on the verge of “quitting the race” in a video provided by a source to The Daily Beast.“He just quit, you know—he’s quitting the race,” Trump says, sitting in a golf cart. “I got him out of the—and that means we have Kamala.”Later in the clip, he fawns over Chinese President Xi Jinping, calling him “a fierce man, very t
NY Daily News
Ivanka Trump breaks silence on father’s conviction in hush money case
NEW YORK — Ivanka Trump broke her weekslong silence regarding her father’s recent hush money criminal conviction during a podcast appearance released Tuesday. The former White House staffer said the experience has been agonizing. “On a human level, it’s my father and I love him very much, so it’s painful to experience, but ultimately, I wish it didn’t have to be this way,” she said on “The Lex ...
People
How Much Do Dallas Cowboy Cheerleaders Make? A Breakdown of Salaries Mentioned in “America's Sweethearts”
Netflix's docuseries 'America's Sweethearts: Dallas Cowboys Cheerleaders' has sparked conversations about the salaries of NFL cheerleaders
HuffPost
James Carville Issues Blunt Plea To Democratic Donors About Ditching Joe Biden
The longtime Democratic strategist reportedly warned donors what he’d do if “we don’t do something about this.”
Entertainment Weekly
Karine Jean-Pierre, NBC's Kelly O'Donnell pause conference to call out journalist's 'inappropriate' Joe Biden joke
"That's inappropriate," O'Donnell said during the press conference, pointing behind her to an off-camera colleague.
PA Media: UK News
‘Sex predator’ teacher facing jail
Rebecca Joynes, 30, had a baby with one of the two schoolboys she groomed.
Hello!
Prince Harry's cousin Lady Amelia Windsor is a festival babe in retro bikini and flares
Lady Amelia Windsor modelled a tiny bikini top with bold festival flares as she partied with friends at Glastonbury - see photos.

Latest Stories