SAN FRANCISCO — A match between an Israeli college debate champion and a loquacious IBM computer program demonstrated on Monday new gains in the quest for computers that can hold conversations with humans. It also led to an unlikely question for the tech industry’s deep thinkers: Can a machine talk too much?
At an IBM office in downtown San Francisco, Noa Ovadia, a college senior who won an Israeli championship in 2016, squared off with IBM’s program, called the IBM Debater.
She argued against government subsidies for space exploration. The machine argued in favor, delivering three brief speeches in a digitally created monotone and — at least in small ways — responding to Ms. Ovadia’s human opinions.
“Another point that I believe my opponent made is that there are more important things than space exploration to spend money on,” the machine said during its lengthy rebuttal. “It is very easy to say there are more important things to spend money on, and I dispute this. No one is claiming that this is the only item on our expense list.”
Under development for six years, this artificial intelligence system is part of a broader effort to build technology that can interact with people the way we interact with one another.
Last month, Google demonstrated a system, called Google Duplex, that can phone a restaurant and make dinner reservations. In China, you can phone Xiaoice, a “chatbot” built by Microsoft, and spend a few minutes shooting the breeze.
Companies like Google, Amazon and Apple have for several years offered coffee table gadgets and smartphone apps that answer simple questions or perform simple tasks. (“Hey, Siri. Set my alarm for 7 a.m. tomorrow.”)
Projects like IBM Debater and Google Duplex show that this kind of system is starting to stretch beyond simple commands. But they also demonstrate the limitations of current technology.
IBM’s system was designed to debate about 100 topics, but these interactions are tightly constrained: a four-minute opening statement followed by a rebuttal to its opponent’s argument — and then a statement summing up its own viewpoint. It was not exactly Lincoln v. Douglas.
Subsidized space exploration, the machine said during its opening statement, “inspires our children to pursue education and careers in science and technology and mathematics.”
“It is more important than good roads or improved schools or better health care,” it added.
Noam Slonim, an IBM researcher who helped oversee the project, estimated that the technology could have a “meaningful” debate on those 100 topics 40 percent of the time. IBM chose the topic for the live debate before it began. In some cases, the machine’s lengthy speeches hinted at how it was stitching together its arguments — identifying relevant sentences and clauses and then combining them into a reasonably coherent, computerized thought.
Google Duplex is also limited to narrow tasks. (It can “schedule hair salon appointments” or “get holiday hours” as well as book restaurant reservations.) And because Google has revealed the system only in brief demonstrations, it is unclear how well it really performs. Certainly, systems like Xiaoice are a long way from passing the Turing Test, the challenge laid down by the British computing pioneer Alan Turing in the 1950s that asks whether a machine can play “the imitation game” to mimic humans. No one would mistake these systems for a human — at least not after a lengthy conversation
In 2011, IBM demonstrated a system that could beat the leading players at the trivia game show “Jeopardy!” The company used this system, called Watson, as a way of promoting a wide range of products and consulting services for hospitals and other businesses.
After Watson won, Mr. Slonim, a researcher at an IBM lab in Haifa, Israel, pitched the Debater project as IBM’s next “grand challenge.”
The long-running project is in some ways an unorthodox addition to the rapidly accelerating field of artificial intelligence research. Among big tech companies and major A.I. labs, no one else is exploring technology that can carry on a debate, as two humans would, say, discuss politics. And Mr. Slonim acknowledged that IBM Debater was not a direct path to a new product or service. “Debating is not a business,” he said.
But the project reflects the recent acceleration of research related to “natural language understanding,” the effort to build machines that can understand the natural way we humans talk and respond in kind. As this research progresses, it can provide new ways for computers to digest and process information or even lead to machines that can hold a completely convincing conversation.
This sort of technology would have a wide range of uses. It could help businesses filter hot-button issues on social media. Or it could provide governments with a more effective way of censoring information.
Understanding natural language is such a complex and difficult task, systems like IBM Debater lean on a wide range of systems, each handling a different part of the problem. One system will identify information that helps fuel an argument on one side of the debate. Another will generate the text of the argument. And so on.
Typically, each system is designed and built independently, before researchers meld them together. But recent research from the likes of OpenAI, an independent artificial intelligence lab in San Francisco, and Salesforce, the San Francisco tech giant, point toward the development of systems that can tackle language problems in a broader way: Teach a system to do one task, and it can help with other tasks, too.
In recent years, researchers have significantly improved systems that recognize people and objects, identify spoken words and translate between languages.
But understanding language is far more complex. That means systems that perform fairly simple language tasks — like writing a Wikipedia article, let alone engaging in a serious debate on a random topic — may still be years away.
“It is now very obvious this change is happening,” said Jeremy Howard, an independent researcher working in this area. “But these things take time.”
Follow Cade Metz on Twitter: @CadeMetz.
Cade Metz reported from San Francisco, and Steve Lohr from New York.