Google’s New Algorithm Creates Original Articles From Your Content by @martinibuster
Google has published research of a new algorithm that can take yours and your competitor’s webpages and generate “coherent” articles. By creating original content, Google’s new algorithm can answer a user’s question without having to send them to another webpage.
Google’s new algorithm works by summarizing web content using an algorithm that “extracts” your content then tosses out the irrelevant parts. This is similar to the algorithms used to generate featured snippets.
These are called “extractive summaries” because they extract content from webpages. Extractive Summaries are like a reduction of the original text to the most important sentences.
Afterwards, this algorithm then uses another kind of algorithm called an Abstractive Summary. Abstractive summaries are a form of paraphrasing.
A downside of artificial paraphrasing (abstractive summaries) is that almost a third of the summaries contain fake facts.
Here is more information about extractive summaries, Faithful to the Original: Fact Aware Neural Abstractive Summarization
Google’s new research has discovered a way to join the best of both approaches. They use “extractive summaries” to extract the important facts from web documents and then apply the “abstractive” approach to paraphrase the content. This approach creates a new document based on the information found on the web, creating Google’s own version of Wikipedia.
Google’s new algorithm is described in a research paper titled, Generating Wikipedia by Summarizing Long Sequences
According to Google:
“We show that generating English Wikipedia articles can be approached as a multidocument summarization of source documents.”
This means that Google can go out and collect information about a topic from multiple webpages.
“We use extractive summarization to coarsely identify salient information…”
This means that they use reduce the webpages to the most important sentences in order to extract the meaning.
The next step is to use:
“…a neural abstractive model to generate the article.”
This means that Google will then take the extracted meanings and use a “neureal abstractive model” to summarize those facts (extracted from many websites) into natural looking sentences and paragraphs to create an article.
Google says that the resulting articles can pass human examination.
“We show that this model can generate fluent, coherent multi-sentence paragraphs… When given reference documents, we show it can extract relevant factual information as reflected in… human evaluations.”
Featured snippets are an example of Extractive Summarization. It’s a process of taking an entire webpage then tossing out the irrelevant words and phrases and keeping just the few sentences that communicate the answer to a question.
There is a related Google algorithm that summarizes webpages for Google Voice called, Sentence Compression by Deletion with LSTMs. You can read about it in plain English in my article: Google Voice Search Summary Algorithm.
This algorithm is about summarizing “multiple documents” and summarizing them. This can be applied to books. This can be applied to open source databases of information. But this can also be applied to any public webpage, including your content.
The research uses Wikipedia topics as the search query and search engine results as the source for extracted summaries that are then paraphrased to create brand new articles. This algorithm also did a side by side test by also generating a second set of articles using only the references cited by Wikipedia.
The paper describes the process this way:
“The reference documents are obtained from a search engine, with the Wikipedia topic used as query similar to our search engine references. However we also show results with documents only found in the References section of the Wikipedia articles.”
The translation to plain English is that they use Wikipedia topics as search queries and the Search Engine Results Pages (SERPs), your content, as the source material from which to generate brand new webpages that can be used to answer a question without ever showing a link to your website.
The research paper is silent on whether Google will show their own content created from your content. There is also no discussion as to whether Google will add links to the source materials, either as part of the SERPs or as a footnote link.
The research paper concludes that their experiment is sucessful. Google can generate it’s own content by summarizing your content, thereby answering a user’s question without inconveniencing them by having to click through to your site.
Here is what Google’s research paper states:
“We have shown that generating Wikipedia can be approached as a multi-document summarization problem…”
That phrase “multi-document” means any document that is freely available, including your webpages and the webpages of your competitors.
And this is what the research paper says about how successful the algorithm is:
“This model significantly outperforms traditional encoder-decoder architectures on long sequences, allowing us to condition on many reference documents and to generate coherent and informative Wikipedia articles.”
That means Google is able to use many webpages to generate “coherent” and “informative” articles. This is a rather disturbing turn of events.
There is no word yet when or if Google will begin generating it’s own content from your content. However an algorithm like this is a perfect fit for voice assistant search. Voice assistant search are searches made through a mobile phone or an Internet of Things (IoT) device in your home or in your car.
Thus, a person can ask Google Voice Assistant about a movie star and Google’s voice assistant can respond in sentences to answer your question, just as if you asked a real person.
Google has long aspired to be like the voice assistant computer on Star Trek. Back in 2014, it was reported that a previous version of voice search was codenamed after the actress who played the voice of the Star Trek computer. An algorithm like this one would fit perfectly into a voice assistant setting.
Read the research here, Generating Wikipedia by Summarizing Long Sequences. Read more about extractive and abstractive summarization in these research papers:
Images by Shutterstock, Modified by Author