Get updates delivered to you daily. Free and customizable.
Business Insider
OpenAI takes another step closer to getting AI to think like humans with new 'o1' model
By Lakshmi Varanasi,Jyoti Mann,
1 days ago
OpenAI unveiled o1, an AI model designed to reason more like humans.
o1 outperforms previous models in complex tasks, especialy in science, coding, and math.
Experts remain skeptical, arguing o1 is still far from achieving artificial general intelligence.
The line separating human intelligence from artificial intelligence just got more narrow.
OpenAI on Thursday revealed o1, the first in a new series of AI models that are "designed to spend more time thinking before they respond," the company said in a blog post .
The new model can work through complex tasks and, in comparison to previous models, solve more difficult problems in science, coding, and math. In essence, they think a little more like humans than existing AI chatbots.
While previous iterations of OpenAI's models have excelled on standardized tests like the SAT to the Uniform Bar Examination, the company says that o1 goes a step further. It performs "similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology."
For example, it beat GPT-4o — a multimodal model OpenAI unveiled in May — in the qualifying exam for the International Mathematics Olympiad by a long shot. GPT-4o only correctly solved 13% of the exam's problems, while o1 scored 83%, the company said.
The sharp surge in the o1's reasoning capabilities comes, in part, from a prompting technique known as "chain of thought." OpenAI said o1 "learns to recognize and correct its mistakes. It learns to break down tricky steps into simpler ones. It learns to try a different approach when the current one isn't working."
That's not to say there aren't some tradeoffs compared to earlier models. OpenAI noted that while human testers preferred o1's responses in reasoning-heavy categories like data analysis, coding, and math, GPT-4o still won out in natural language tasks like personal writing.
Ethan Mollick , a professor at the University of Pennsylvania's Wharton School who has had access to o1 for over a month, said the model's gains are perhaps best illustrated by how it solves crossword puzzles. Crossword puzzles are typically difficult for large language models to solve because "they require iterative solving: trying and rejecting many answers that all affect each other," Mollick wrote in a post on his Substack. Most large language models "can only add a token/word at a time to their answer."
But when Mollick asked o1 to solve a crossword puzzle, it thought about it for a "full 108 seconds" before responding. He said that its thoughts were both "illuminating" and "pretty impressive" even if they weren't fully correct.
Other AI experts, however, are less convinced.
Gary Marcus, a New York University professor of cognitive science, told Business Insider that the model is "impressive engineering" but not a giant leap. "I am sure it will be hyped to the sky, as usual, but it's definitely not close to AGI," he said.
Since OpenAI unveiled GPT-4 last year, it's been releasing successive iterations in its quest to invent AGI. In April, GPT-4 Turbo was made available to paid subscribers. One update included the ability to generate responses that are "more conversational."
Get updates delivered to you daily. Free and customizable.
It’s essential to note our commitment to transparency:
Our Terms of Use acknowledge that our services may not always be error-free, and our Community Standards emphasize our discretion in enforcing policies. As a platform hosting over 100,000 pieces of content published daily, we cannot pre-vet content, but we strive to foster a dynamic environment for free expression and robust discourse through safety guardrails of human and AI moderation.
Comments / 0