Business Insider

OpenAI takes another step closer to getting AI to think like humans with new 'o1' model

By Lakshmi Varanasi,Jyoti Mann,

1 days ago

https://img.particlenews.com/image.php?url=0m0LCQ_0vUJsKkp00 — OpenAI CEO Sam Altman.
Jason Redmond/AFP/Getty Images

OpenAI unveiled o1, an AI model designed to reason more like humans.
o1 outperforms previous models in complex tasks, especialy in science, coding, and math.
Experts remain skeptical, arguing o1 is still far from achieving artificial general intelligence.

The line separating human intelligence from artificial intelligence just got more narrow.

OpenAI on Thursday revealed o1, the first in a new series of AI models that are "designed to spend more time thinking before they respond," the company said in a blog post .

The new model can work through complex tasks and, in comparison to previous models, solve more difficult problems in science, coding, and math. In essence, they think a little more like humans than existing AI chatbots.

While previous iterations of OpenAI's models have excelled on standardized tests like the SAT to the Uniform Bar Examination, the company says that o1 goes a step further. It performs "similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology."

For example, it beat GPT-4o — a multimodal model OpenAI unveiled in May — in the qualifying exam for the International Mathematics Olympiad by a long shot. GPT-4o only correctly solved 13% of the exam's problems, while o1 scored 83%, the company said.

The sharp surge in the o1's reasoning capabilities comes, in part, from a prompting technique known as "chain of thought." OpenAI said o1 "learns to recognize and correct its mistakes. It learns to break down tricky steps into simpler ones. It learns to try a different approach when the current one isn't working."

That's not to say there aren't some tradeoffs compared to earlier models. OpenAI noted that while human testers preferred o1's responses in reasoning-heavy categories like data analysis, coding, and math, GPT-4o still won out in natural language tasks like personal writing.

OpenAI's primary mission has long been to create artificial general intelligence , or AGI, a still hypothetical form of AI that mimics human capabilities. Over the summer, while o1 was still in development, the company unveiled a new five-level classification system for tracking its progress toward that goal. Company executives reportedly told employees that o1 was nearing a level two, which it identified as "reasoners" with human-level problem-solving .

Ethan Mollick , a professor at the University of Pennsylvania's Wharton School who has had access to o1 for over a month, said the model's gains are perhaps best illustrated by how it solves crossword puzzles. Crossword puzzles are typically difficult for large language models to solve because "they require iterative solving: trying and rejecting many answers that all affect each other," Mollick wrote in a post on his Substack. Most large language models "can only add a token/word at a time to their answer."

But when Mollick asked o1 to solve a crossword puzzle, it thought about it for a "full 108 seconds" before responding. He said that its thoughts were both "illuminating" and "pretty impressive" even if they weren't fully correct.

Other AI experts, however, are less convinced.

Gary Marcus, a New York University professor of cognitive science, told Business Insider that the model is "impressive engineering" but not a giant leap. "I am sure it will be hyped to the sky, as usual, but it's definitely not close to AGI," he said.

Since OpenAI unveiled GPT-4 last year, it's been releasing successive iterations in its quest to invent AGI. In April, GPT-4 Turbo was made available to paid subscribers. One update included the ability to generate responses that are "more conversational."

The company announced in July that it's testing an AI search product called SearchGPT with a limited group of users.

Read the original article on Business Insider

Expand All

Read in NewsBreak

Comments /

Add a Comment

YOU MAY ALSO LIKE

Local News

Elon Musk now travels with up to 20 bodyguards who refer to him by the code name 'Voyager,' report says

Business Insider7 hours ago

A sinkhole in South Dakota is packed with mammoth fossils that experts have been digging up for half a century. Take a look.

Business Insider1 day ago

Student-loan borrowers are getting $100 million in payments after being 'cheated' out of lower bills by a major lender, a federal consumer watchdog says

Business Insider1 day ago

A Navy SEAL unit that killed Osama bin Laden is busy preparing for a possible Chinese invasion of Taiwan: report

Business Insider1 day ago

My 56-year-old mom lives rent-free in our house. She contributes by doing chores and childcare, but communicating expectations can be hard.

Business Insider16 hours ago

A California police department spent at least $140,000 on a custom Cybertruck: report

Business Insider1 day ago

Americans are turning to new ways to cut expenses right now

Business Insider1 day ago

A secretive, experimental Chinese spacecraft returned to Earth after 8 months. It's still a mystery what it was up to.

Business Insider18 hours ago

Mortgage Interest Rates Today, September 12, 2024 | Rates Down Near 5.7% as Inflation Cools

Business Insider1 day ago

German warships ignored China's complaints and sailed through the Taiwan Strait for the first time in over 2 decades

Business Insider14 hours ago

A Neom executive reportedly complained about having a meeting on a Sunday evening after 3 workers died

Business Insider2 days ago

The US will enter a mild recession as the economy deflates after its boost from unproductive stimulus cash, former Commerce Secretary says

Business Insider13 hours ago

Starbucks' new CEO hints the chain may have become a little too convenient

Business Insider2 days ago

Every household can get four free COVID-19 tests by mail, starting late September

Northern Kentucky Tribune6 days ago

Berkshire's vice-chair just cut his stake in Buffett's company

Business Insider1 day ago

Satellite images show the Russian cargo ship that transported ballistic missiles from Iran

Business Insider2 days ago

The 3 Most Demanding Zodiac Signs to Live With

Emily Standley Allard20 days ago

Fentanyl-meth combo ravages homeless in Denver, so why aren't there better treatments?

David Heitz6 days ago

I've shopped at Costco for nearly a decade. Here are the 8 biggest mistakes I see customers make.

Business Insider8 hours ago

Melinda French Gates says she made a 'substantial' donation to the Harris-Walz campaign

Business Insider2 days ago

Jensen Huang says if 'anything were to happen' in Taiwan, Nvidia could have GPUs made somewhere else

Business Insider1 day ago

Carnival Responds to Cruise Passenger Wanting to Remove All Tips from Bill

J. Souza22 days ago

Delinquent debt is piling up. Here's why it's not a reason to worry about the US consumer, according to BofA CEO Brian Moynihan.

Business Insider2 days ago

Mark Cuban is lauding Kamala Harris for joining him in his big fight to lower prescription drug prices

Business Insider3 days ago

Potato-shaped critter visible now may predict climate change in Colorado

David Heitz21 days ago

A Russian politician says the country needs a 'special demographic operation' to boost its birth rate, which has crashed to a 25-year-low

Business Insider2 days ago

NJ Businessman Pleads Guilty to Multimillion-Dollar Jewel Trade Fraud

Morristown Minute2 days ago

The illegal maneuvers the rich use to get richer

Business Insider18 hours ago

Mark Zuckerberg could become the richest person on the planet after a $51 billion wealth surge this year

Business Insider2 days ago

New study finds even light drinking linked to cancer deaths among older adults

Northern Kentucky Tribune7 days ago

It’s essential to note our commitment to transparency:

Our Terms of Use acknowledge that our services may not always be error-free, and our Community Standards emphasize our discretion in enforcing policies. As a platform hosting over 100,000 pieces of content published daily, we cannot pre-vet content, but we strive to foster a dynamic environment for free expression and robust discourse through safety guardrails of human and AI moderation.

Comments / 0

Community Policy