IN A lab next to the river on New York’s Upper West Side a computer will soon start reading. It is part of a cadre of computers that are learning to read more like humans, helping us digest and understand society’s huge volumes of text on a large scale.
Called the Declassification Engine, it will comb through 4.5 million US State Department cables from the 1930s to the 1980s – everything the department has declassified so far. It’s more than any human could read, but the software will analyse the lot, mapping social connections and looking for new narratives about the behaviour of US diplomats and officials abroad in the 20th century, says Owen Rambow, a computer scientist at Columbia University, which runs the.
“A cable might talk about a meeting with the foreign minister of Turkey,” says Rambow. “If we could extract social networks from these cables then we can study how the networks of the US changed over time. In time of crises do these networks contract or expand?”
The Declassification Engine isn’t the only computer with the ability to read. Software that can understand the words and simple facts in text is already familiar:to answer simple questions, for example. Some software can that humans have missed.
But Rambow’s system and a range of others are going beyond this, learning to understand the relationships between the characters, how time passes in the text, and whether the characters get what they want.
“Computers can operate at a scale and speed that we can’t,” says Tom Mitchell at Carnegie Mellon University in Pittsburgh. His group has spent years training computers to digest reams of online content, trying to get a handle on humanity through its text output. Hethat can analyse the relationships in a text to figure out which characters are friends and which are enemies.
Snigdha Chaturvedi of the University of Maryland, Baltimore, says that existing natural language processing systems, like Google Now and Siri, are good at answering fact-based questions. “If you type ‘Who is president of the United States’ it says ‘Obama’,” she says. “They’re very good at fact-based questions, but not very good with opinion.”
Chaturvedi is nowthat has the potential to go beyond facts to understand people’s opinions through the things they write. For instance, future versions could give answers to the subjective question “What did Obama do to win the election?”, says Chaturvedi. The software would digest the thousands of online news reports, books and magazine stories about his campaign, finding common elements that appeared in the stories of Obama’s victory – perhaps a key person on the campaign, or an important place.
“Software can go beyond facts to understand people’s opinions through the things they write”
“You could ask similar questions about everyday life,” says Chaturvedi. This would bring up a consensus or several options depending on the situation, mined out of humanity’s collective written knowledge. Software could be pointed at medical discussion forums, for example, reading everything ever written online to see whether people feel that the drugs and treatments they get are effective.
Computers that can make these kinds of distinctions will be a powerful tool, says Mitchell, since they can read faster and more widely than a human ever could, 24 hours a day. “They could acquire experience that could far surpass what we could ever get in our lifetime,” he says. “They’d be a million times better read than you or me.”
There are still some issues to overcome, however, such as dealing with text in unusual formats. For example, names are particularly important to the Declassification Engine, and the system relies on the capital letters at the start of names to find them. But the cables come in all capitals, a relic of the Telex system that conveyed them around the planet. Rambow is confident he can get over the hurdle. “I hope that in the next month or two the humming will start,” he says.
(Image: Luis Liwanag/Xinhua News Agency/Eyevine)
This article appeared in print under the headline “Super-literate computers”
This entry passed through the Full-Text RSS service – if this is your content and you’re reading it on someone else’s site, please read the FAQ at fivefilters.org/content-only/faq.php#publishers.