Will Google’s Semantic Search in ‘Talk to Books’ Supplant Keywords?
By Henry Kronk
April 24, 2018
On Friday the 13th this month, Ray Kurzweil took the stage at TED 2018 in Vancouver to talk about his latest work at Google. Their new service, Talk to Books, allows you to, well, talk to books. And not just a few books. More like the 100,000+ volumes available in Google Books.
Here’s how it works. An interested or curious reader can type in a question, a statement, a sentence, a topic—anything really—much as they would speak it in a conversation. Talk to Books then scans all the titles in Google’s library (in roughly half a second) and produces several short segments of books that likely relate to your search.
Semantic Search in Talk to Books
That’s definitely cool, but I’m also kind of burying the lead. Talk to Books does not use a keyword-based search, as does its search engine and nearly every other online library and database. Instead, it uses what Kurzweil calls ‘semantic search.’
“Semantic search,” writes Kurzweil in a recent blog post, “is based on searching meaning, rather than on keywords or phrases. Developed with machine learning, it uses “natural language understanding” of words and phrases.”
Talk to Books is part of a larger project at Google implementing AI-powered semantic recognition. It also features word association games to help test (or train?) their algorithms along with a growing library of end-to-end models to be used by developers.
Quartz reports that Kurzweil does not intend semantic search to replace keyword search. They do not, however, provide a direct quote and we have a tendency to respond by asking, ‘are you sure?’
Keyword searches are generally unintuitive applications. Let’s take, for example, JSTOR, one of the most popular academic online databases. If I wanted to search for articles authored by Lawrence Lessig, I would need to navigate to the advanced search menu, toggle to the category of ‘author,’ and then type in ‘Lessig, Lawrence.’ If I accidentally typed ‘Lessig, Laurence,’ I would miss out on all relevant results. In some databases, you need to know particular search syntax. (On JSTOR, the above search would look like (au:”Lessig, Lawrence”).)
In the online worlds of research, marketing, journalism, legal matters, personal directories, social media, virtual communities, and countless others, users navigate primarily via search. That search function will be like JSTOR’s at best, and much more basic at worst. The fact of the matter is that in our current environment of hypertext and abundance of information, it can be very difficult to find what we’re looking for.
Talk to Books, and Kurzweil’s team’s research on semantic search, seems poised to revolutionize how we navigate the internet in general.
Machine Learning Algorithms Have Issues of Their Own
Still, deep learning algorithms do not come without their shortcomings. To paint in broad strokes, the semantic search algorithm seeks to understand and recreate (sort of) human language by learning it from other humans. And other humans have biases. As Kurzweil reiterates in a separate post, “In Talk to Books, while we can’t manually vet each sentence of 100,000 volumes, we use a popularity measure which increases the proportion of volumes that are published by professional publishing houses. There are additional measures that could be taken. For example, a toxicity classifier or sensitive topics classifier could determine when the input or the output is something that may be objectionable or party to an unwanted association. We recommend taking bias-impact mitigation steps when crafting end-user applications built with these models.”
The systems are mainly trained using white faces and male voices. This “…can lead to misperceptions of black faces or female voices, which can lead to the AI making negative judgments about [other folks].” https://t.co/GBho5WBKpv
— chris g (@hypervisible) April 22, 2018
When it comes to digital redlining, that’s really just the tip of the iceberg. As a society, we’ve already begun to experience serious growing pains with AI algorithms that have inadvertently already displayed prejudice in IRL situations.
For a TL;DR, Kurzweil’s team have not taken additional measures with their semantic search function to protect against bias. Taking a hard look at the ethics of semantic search would be a great next step. Its potential for online education and navigation of the internet in general is huge. But for now, in a social sense, it is only as good as the humans who use it.
Cover Image: Ed Schipul, Flickr.