Turing Award goes to 2 pioneers of AI

Research by Andrew Barto, left, and Richard Sutton at UMass Amherst plays a key role in today’s AI systems.

By Cade Metz, New York Times

SAN FRANCISCO — In 1977, Andrew Barto, as a researcher at the University of Massachusetts Amherst began exploring a new theory that neurons behaved like hedonists. The basic idea was that the human brain was driven by billions of nerve cells that were each trying to maximize pleasure and minimize pain.

A year later, he was joined by another young researcher, Richard Sutton. Together, they worked to explain human intelligence using this simple concept and applied it to artificial intelligence. The result was "reinforcement learning," a way for AI systems to learn from the digital equivalent of pleasure and pain.

On Wednesday, the Association for Computing Machinery, the world's largest society of computing professionals, announced that Barto and Sutton had won this year's Turing Award for their work on reinforcement learning. The Turing Award, which was introduced in 1966, is often called the Nobel Prize of computing. The two scientists will share the $1 million prize that comes with the award.

Over the past decade, reinforcement learning has played a vital role in the rise of artificial intelligence, including breakthrough technologies such as Google's AlphaGo and OpenAI's ChatGPT. The techniques that powered these systems were rooted in the work of Barto and Sutton.

“They are the undisputed pioneers of reinforcement learning,’’ said Oren Etzioni, a professor emeritus of computer science at the University of Washington and founding CEO of the Allen Institute for Artificial Intelligence. “They generated the key ideas — and they wrote the book on the subject.’’

Their book, "Reinforcement Learning: An Introduction," which was published in 1998, remains the definitive exploration of an idea that many experts say is only beginning to realize its potential.

Psychologists have long studied the ways that humans and animals learn from their experiences. In the 1940s, the pioneering British computer scientist Alan Turing suggested that machines could learn in much the same way.

But it was Barto and Sutton who began exploring the mathematics of how this might work, building on a theory that A. Harry Klopf, a computer scientist working for the government, had proposed. Barto went on to build a lab at UMass Amherst dedicated to the idea, while Sutton founded a similar kind of lab at the University of Alberta in Canada.

“It is kind of an obvious idea when you’re talking about humans and animals,’’ said Sutton, who is also a research scientist at Keen Technologies, an AI startup, and a fellow at the Alberta Machine Intelligence Institute, one of Canada’s three national AI labs. “As we revived it, it was about machines.’’

This remained an academic pursuit until the arrival of AlphaGo in 2016. Most experts believed that another 10 years would pass before anyone built an AI system that could beat the world's best players at the game of Go.

But during a match in Seoul, AlphaGo beat Lee Sedol, the best Go player of the past decade. The trick was that the system had played millions of games against itself, learning by trial and error. It learned which moves brought success (pleasure) and which brought failure (pain).

The Google team that built the system was led by David Silver, a researcher who had studied reinforcement learning under Sutton at the University of Alberta.

Many experts still question whether reinforcement learning could work outside of games. Game winnings are determined by points, which makes it easy for machines to distinguish between success and failure.

But reinforcement learning has also played an essential role in online chatbots.

Leading up to the release of ChatGPT in the fall of 2022, OpenAI hired hundreds of people to use an early version and provide precise suggestions that could hone its skills. They showed the chatbot how to respond to particular questions, rated its responses and corrected its mistakes. By analyzing those suggestions, ChatGPT learned to be a better chatbot.

Barto and Sutton say these systems hint at the ways machines will learn in the future. Eventually, they say, robots imbued with AI will learn from trial and error in the real world, as humans and animals do.

“Learning to control a body through reinforcement learning — that is a very natural thing,’’ Barto said.