06 Dec How Google Analyzes a Webpage’s Content, Explained by Google’s Martin Splitt
The content analysis method the search engine uses to analyze web pages was revealed in a recent webinar by Google’s Martin Splitt. Additionally, he introduced his new concept, Centerpiece Annotations, which are applied to the analysis of web content by Google.
Google’s method of analyzing web pages
As Martin Splitt explained, Google uses a feature called Centerpiece Annotation. In this way, Google can determine the main topic or component of the page. Using this information, Google separates the content of each web page into multiple components and gives each component a different weight based on its relevance.
“For instance, we have a feature called the Centerpiece Annotation, as well as additional annotations looking at the semantic content, and even the layout tree. In general, we can figure it out from the HTML structure already. As a result of all of the natural language processing we performed on this whole document, it appears to deal primarily with topic A, such as dog food.”
Additionally, there is something else on the page that appears to be linked to related products, but that’s not the main point of the page. This section is not the most important part of the page. There seems to be some additional information here.
In addition, there is stuff like boilerplate or, “I just noticed that these pages and lists all have the same menu. As you can see, this menu seems to be similar to the one on all the other pages of this domain, or it has been seen before. Our algorithm does not even consider the domain or even something like, ‘Oh, this is on a menu.’ Instead, we look for what reeks of boilerplate, and then we weigh that differently.”
As a result, the “centerpiece” of the page receives the most importance. However, other sections are not treated with the same level of importance.
The following was explained by Martin:
“If there is content on your page that is not relevant to the main subject of the rest of the page, it might not receive as much attention as you might think. All of this information is still used by us for site structure analysis and link discovery.
However, if a page contains 10,000 words about dog food and another 3,000, 2,000, or 1,000 about bikes, the content probably isn’t suitable for bikes.”
The information provided above gives a clearer idea about how Google analyzes the content of web pages of the website. Content relevance has always been essential, but now we know that it may vary from section to section on a single page.
In terms of content creation and SEO, each page needs to have a distinct topic that is covered in detail. Trying to rank for multiple types of queries on the same page isn’t worth mixing multiple topics on one page. The full video can be viewed here if you are interested.