• AI Weekly
  • Posts
  • A.I Is Nowhere Near Ph.D Level Intelligence According to Google

A.I Is Nowhere Near Ph.D Level Intelligence According to Google

In partnership with

Trusted by millions. Actually enjoyed by them too.

Most business news feels like homework. Morning Brew feels like a cheat sheet. Quick hits on business, tech, and finance—sharp enough to make sense, snappy enough to make you smile.

Try the newsletter for free and see why it’s the go-to for over 4 million professionals every morning.

Check it out

Why Today's AI Isn't Really "PhD Smart" — Despite What The Hype Claims

Demis Hassabis, the brilliant mind behind Google DeepMind and a fresh Nobel Prize winner, just dropped a reality check that's making waves across Silicon Valley. His message? All those flashy claims about AI having "PhD-level intelligence" are, frankly, nonsense.36kr+2

Here's the thing that's got everyone talking: these AI systems can absolutely nail some incredibly complex tasks — they're winning gold medals at international math competitions and solving problems that would stump most graduate students. But then they turn around and completely botch simple arithmetic or counting exercises that any elementary school kid could handle.timesofindia.indiatimes+3

The Weird World of "Jagged Intelligence"

This bizarre pattern has earned a catchy name in AI circles: "jagged intelligence". Think of it like having a friend who's a chess grandmaster but can't remember where they put their car keys. These AI models show superhuman abilities in some areas while stumbling over the most basic tasks in others.businessinsider+2

Google's CEO Sundar Pichai has been calling it "Artificial Jagged Intelligence" or AJI — basically admitting that current AI is more like a brilliant but extremely inconsistent student rather than the steady, reliable intelligence we'd expect from a true PhD.linkedin+1

The Real Problems Nobody Talks About

The Consistency Crisis

The biggest red flag? These systems completely fall apart when you phrase the same question differently. A true PhD-level intelligence shouldn't be confused by whether you ask "How many apples are there?" versus "What's the total number of apples?" But current AI models get tripped up by these trivial changes all the time.techcrunch+1

Researchers at Apple recently demonstrated this perfectly. They took standard math problems and added completely irrelevant details — like mentioning that some kiwis were "smaller than average" — and watched as state-of-the-art models like GPT-o1 incorrectly subtracted the smaller kiwis from the total count. It's the kind of mistake that would make a third-grader laugh, yet it stumps our most advanced AI.techcrunch

The Learning Problem

Here's what really separates humans from these AI systems: we can learn new things on the fly and adjust our behavior in real time. If you teach a person a new concept, they can immediately start applying it to different situations. Current AI models? They're stuck with whatever they learned during training.ibm+2

This limitation, called the "continual learning" problem, is a massive roadblock. These systems can't update their knowledge or correct their mistakes through experience the way humans do. They're more like extremely sophisticated lookup tables than actual thinking machines.splunk+2

Why Bigger Doesn't Always Mean Better

The dirty secret of AI development is that the industry has been following a simple playbook: make the models bigger, feed them more data, and hope for the best. This approach, known as "scaling laws," worked great for a while, but it's starting to hit some serious walls.techcrunch+2

The Data Wall

We're running out of high-quality human-created content to train these models on. Sure, the internet has tons of text, but most of it is repetitive, low-quality, or just plain wrong. Some researchers estimate we'd need 100,000 times more quality data than currently exists to reach the reliability needed for truly advanced AI.foundationcapital

The Compute Wall

Training these massive models is getting absurdly expensive. We're talking about models that cost hundreds of millions of dollars to train, with future versions potentially hitting the $100 billion mark. At some point, throwing more money and computing power at the problem stops being a viable solution.exponentialview

The Architecture Wall

Most fundamentally, the way these models work — predicting the next word in a sequence based on statistical patterns — might not be enough to achieve real intelligence. They're incredibly good at pattern matching, but they don't actually understand concepts the way humans do.ibm+2

The Counterarguments: Why Some Still Believe the Hype

The Optimist's Case

Not everyone agrees with Hassabis's skeptical take. OpenAI's Sam Altman has been doubling down on claims that their latest models really do have PhD-level capabilities. He points to impressive performance on coding tasks and academic benchmarks as evidence.fortune+2

Altman argues that these systems can already perform tasks "equal to that of an entry-level employee" and can even "do problems that I'd expect an expert PhD in my field to do". From this perspective, the inconsistencies are just growing pains that will be solved with better training and more sophisticated architectures.fortune

The "Test-Time Compute" Revolution

Some researchers are betting big on a new approach called "test-time compute," which gives AI models more time and resources to "think" before answering. OpenAI's o1 model uses this technique, and early results are promising. The idea is that even if we can't make models smarter during training, we can make them more reliable by giving them more time to reason through problems.techcrunch

The Benchmark Believers

Supporters also point to consistently improving performance on standardized tests and benchmarks. Models are scoring higher on everything from reading comprehension to scientific reasoning. While they acknowledge the inconsistency problem, they argue it's a temporary technical hurdle rather than a fundamental limitation.epoch+1

The Reality Check: We're Not There Yet

Despite the optimistic spin, the evidence strongly supports Hassabis's skeptical view. Even the most advanced models fail spectacularly on tasks that should be trivial for PhD-level intelligence.epoch+1

The Math Problem

Consider this sobering fact: leading AI models can solve less than 2% of problems on FrontierMath, a benchmark of research-level mathematical problems. This is despite achieving over 90% accuracy on easier math benchmarks. It's like claiming someone has PhD-level math skills because they can ace algebra, while they fail calculus completely.epoch

The Reasoning Gap

The core issue is that these models don't actually reason — they recognize patterns. They can generate text that looks like reasoning, but they lack the deep understanding that true reasoning requires. When faced with novel situations that don't match their training patterns, they often produce nonsensical results.elearncollege+2

What This Means for the Future

Hassabis believes we're still 5-10 years away from achieving real AGI, and he thinks it will require "one or two missing breakthroughs" beyond just making models bigger. This timeline is actually more pessimistic than many in the industry, who have been predicting AGI within the next few years.cnbc+2

The Path Forward

Rather than just scaling up existing approaches, researchers are exploring fundamentally different architectures. Some are working on hybrid systems that combine neural networks with traditional symbolic reasoning. Others are focusing on solving the continual learning problem.ibm+2

The Enterprise Reality

Companies like Salesforce are already grappling with these limitations in real-world applications. They've found that AI's "jagged intelligence" makes it unreliable for many business-critical tasks, forcing them to develop new approaches for enterprise deployment.aimmediahouse+1

The bottom line? While today's AI is undeniably impressive, calling it "PhD-level intelligence" is misleading marketing at best. These systems are powerful tools with significant limitations, not the human-level artificial minds that the hype suggests. Understanding these limitations isn't pessimistic — it's essential for using AI effectively and safely as the technology continues to evolve.

As Hassabis puts it, true general intelligence should be consistent, broadly capable, and not break when you phrase a question differently. We're not there yet, and pretending otherwise does everyone a disservice. The real question isn't whether AI will eventually reach human-level intelligence, but how long it will take to solve the fundamental problems that current approaches haven't addressed.

Reply

or to participate.