Entity Indexation and the Future of Google Search - In Search [Episode 14]
February 12, 2019 |
The In Search SEO Podcast
Don't forget, you can follow the In Search SEO Podcast by subscribing on iTunes or by following the podcast on SoundCloud!
The In Search SEO Podcast Poll Question of the Week!
How much do you think SEO has changed just over the last six months?! What has changed in your estimation? What do these changes mean? (Conversely, why do you feel search has not changed much over the last six months?) Let us know so that we can feature you on the next episode of In Search!
Summary of Episode 14: The In Search SEO Podcast
This week SEO guru Cindy Krum
joins the podcast to discuss how mobile-first indexing might have more to do with entities than mobile! Go deep into some heavy SEO theory as Cindy shares how Google's focus on entities
changes everything from how Google crawls the web to how Google is creating a more universal understanding of entities!
Why Are PLAs so Stable Whereas Google Ads are Not? [2:09 - 6:43]
We recently released a study showing how consistent, or inconsistent, the display level of some of the most important SERP features
were in 2018 as compared to 2017.
We found, that while Google Ads were highly volatile with inconsistent display level, PLAs were the exact opposite, that is, they were entirely stable.
Just to put some teeth on this, Google ads had a standard deviation over 14 on mobile while PLAs had a standard deviation of 2!
So, of course, this makes sense as the ad type is more consistent because the keywords they target are more consistent and Google is always going to show me a variety of products that I can buy for the keywords whereas Google Ads relates to any sort of keyword with any sort of ad applicability.
That is true, but that is only one way to look at it. Google can decide at any moment that any number of products should get a PLA and it can decide the exact opposite. It can experiment to see what keywords work best for PLAs, what don’t, what can change and what can’t. But as the data shows it really doesn’t tend to do that. But why?
Google Ads are all sorts of advertisers trying to target all sorts of users. PLAs, in Mordy’s humble opinion, is about Google targeting one "person”... Amazon.
This makes sense as 30% of Amazon’s traffic comes from Google and is most likely due to people researching a product. Amazon is the opposite of Google here. Where Amazon can bank on a lot of people coming to its site to access their Prime account and go shopping Google can’t say the same. Google is dealing with an audience that may not be intent on coming to the SERP to find a product inside a PLA. These users, who are researching a product, may see Amazon show up in the organic results and head over there…. and there you have Amazon’s search traffic.
This is why PLAs are as stable as... Amazon’s ranking. As long as Amazon’s rankings remain stable so will PLAs and without much experimentation to the keywords they represent.
The Future of Indexation - Entity Indexation: A Conversation with Cindy Crum [7:14 - 59:45]
[This is a general summary of the interview and not a word for word transcript. You can listen to the podcast for the full interview.]
Can you start by telling us what is Mobile Moxie
and what it is that you do?
We’re a small consultancy that helps companies deal with mobile issues specializing in Mobile SEO and App SEO. We call ourselves the SEOs for SEOs as our clients have SEO teams and they come to us when they have problems.
And we also have a set of SEO Mobile tools, with one of the coolest
that we can test mobile search results in any location and language in the world. We also have a simulator where Google thinks we’re phones.
Awesome! That being settled let’s get into some serious SEO theory. So I know you have this amazing theory about mobile-first indexing and entities and I hate to do this to you, but could you give us a quick run through of what your thoughts are on mobile indexation being entity indexation are... just so that everyone’s on the same page in case some folk have not read your five-article series on mobile indexation being entity indexation
Sure thing. So first, I highly recommend reading the articles and if you can’t then each video has a video summary as well as audio versions.
So the idea is that mobile-first indexing is meant to change the way Google thinks and organizes its index. So this change is at its core designed to use artificial intelligence in every language together and aggregate information faster, especially with smaller languages where they had less machine learning input to teach the algorithm.
And we know there’s more to mobile-indexing than crawling with a mobile crawler because if it was just that it wouldn’t have taken two years and it doesn’t help them with their future goals.
The way I think of it is that Google used to use keywords and languages as the primary indexing in queries. Now my theory is that they backed it up a notch and are now using entities and keywords that describe entities. Meaning, you can have an entity that exists in every language and the keywords are now modifiers of the entity rather than the primary. That way you can use machine learning on the entity and entity relationships and when keywords and languages are now a modifier you can advance the algorithm around the world at the same rate and level no matter how small the language.
It almost seems then that Google has a qualitative understanding of an entity, or rather is touching on that sort of understanding… a more ethereal understanding of an entity, as if it can grasp the "meaning” of concepts as lofty as "parents” or as mundane as "vacation”. Based on your understanding of entity indexing has Google made true headway in being able to use AI to approach a qualitative understanding of an entity, and what does that mean for search going forward if it has?
What you have to understand is Google is translating Google Knowledge Graphs to fit a user’s preferred language settings regardless of the language the query is in or what country the user is in or any of those factors. So this shows that Knowledge Graphs are considered universal, but if you search for a query that’s more relevant in Spain and you search in Germany that doesn’t mean the ranking will be the same or the Knowledge Graph will show at all.
To test this out, we tested Google’s understanding of idioms because we wanted to know if they are translating word for word or translating in groups. So idioms translated literally can make no sense.
When the language is part of the Cloud National Language API, which is ten or eleven languages, then Google can detect an idiom and translate it before it’s translated. For example, the idiom ‘put yourself in my shoes’ is translated to ‘put yourself in my place’. So when Google knows the language natively it will translate the idiom correctly to ‘put yourself in my place’, but if they don’t know the language then they’ll translate it as ‘put yourself in my shoes’. So we played with that a lot whether it understands the language.
Similarly, we tested movies and songs. Meaning, media references that have the same name but are more culturally relevant in one place than another and tested those things.
So the Knowledge Graph items still exist they’re just not in the top ranking because at that time, geography, and the GPS location of your phone.
I have a friend who has a Ph.D. in linguistics from Columbia University and he’s fond of saying that language is the best tool we have to express what is inexpressible. Meaning, the core concepts that drive human activity are very hard (if not impossible) to define. Things like love and so forth are impossible to quantify conceptually as it’s almost precognitive. If Google is using language as the basis for its understanding how can it really understand an entity? Just to give you a poor example, I’m a big Pittsburgh Steelers fan. If I talk to my uncle about the season, the language we use is simply a formality. We’ve shared this experience of a disappointing season and someone who has not shared that experience won’t truly grasp our "language.” Since Google cannot share experiences how does it gain understanding?
The answer is media. Google is doubling down on media. But especially videos, podcasts, and images. To understand experience by text only is very difficult which is why Google is working hard on indexing and ranking media. And that content is especially good for voice only assets where
is very bad.
You can also ask this question: If you needed to do an experiential search to help someone understand, how will you do that? Google is working on that. They’re working on image mapping where you can submit an image and match it. I think Google is working on audio as well. But they also used this question as an explainer on entities and they said, "I can show you this picture of a train and you know it’s a train. I can show you the word train and you know it’s a train. I can play you this audio of a train and you know it’s a train.” But all of these are very different things.
And perhaps in the future, we will further improve
search. For example, if in the future there will be augmented reality searches how will it work? If I searched for "by the sea” what AR experience would I receive? Day, night, stormy, sunshine?
I think Glen Gabe came out recently talking about how the video carousels that Google was running on
were all over the place with commerce queries. Do you think part of the reason why the video carousel was not as focused is that Google was trying to bolster its entity indexing and it hit a wall?
Yes. I’ve written on this as well where a lot of this content is mapped in entity-indexing and they also have Google Play, and Google Play is a bunch of media that is organized by genre, topic, cast, and characters. Things that are organized vertically and laterally. So if you search for J.Lo in Google Play you’re going to get videos, audio-only songs, albums, and movies.
So do you think that there are areas where Google has a gap of content in Google Play?
The funny thing is that they launched books and audiobooks recently because they have all this text in regular search, but it’s organized by URLs, one language at a time.
When I dug into the Medic Update last August it seemed to me as if Google was profiling sites according to what I call its "core intent profile
.” Meaning, if a site presents itself as an informative/information-centric site but has a latent e-commerce profile (i.e., overuse of ads, e-commerce-like buttons, etc.) Google is aware that there is a conflicting "intent” profile. That’s inherently problematic from a ranking perspective since the site is not fulfilling what it says it is intent on doing. In the case of YMYL sites, a subtle/latent profile is quite concerning as how can a site’s informative content be trustworthy/safe if under the surface they’re really trying to push a product?
And that’s hard to index for entity understanding if you have a conflict of interests.
Yeah. That’s a big problem. So do you think an entity-first index naturally results in Google profiling a site via the lens of intent and demoting sites that have "ulterior” profiles that do not align to its core?
I do agree. Just think of the early days of Google how many sites that are still indexed that Google would have wanted to exclude (e.g., Viagra). And there are a lot of legacy sites with a lot of links that Google wants to exclude. Imagine how corrupted it will be if the Knowledge Graph was infected with very financially motivated "facts.”
And how do you think it plays out with domain authority? If Google can look at a site and see what is this entity and how it relates to other entities. What would that mean for Google looking at it from an authority perspective?
I think it will still relate to having links. Authoritative mentions and external references even without links or followed links also help. So if the NY Times writes about your brand but doesn’t add a link maybe that will count more if from it a lot of people start talking about your brand. Just understand that Google, every time it advances itself, is trying to steer away from links.
It is interesting to see how Google has shifted to a more powerful parsing of intent from more filters in Featured Snippets to the notion of search as a journey. The timing is even more interesting as it aligns with mobile-first indexing. Do you see the two as being related? Do you see it as fitting into one larger picture?
Yes, of course! I mean all these new features weren’t here before mobile-index so it’s not really a surprise. They’ve been waiting to put this in, but mobile-indexing was a necessary element. And intent is so important in voice-only and this is where Google has been putting all of its money. To get the AI to get it right on the first try because with
you can only get it right on the first try.
So, I was always curious why Google dropped the idea of going with 2 indexes, one for desktop and one for mobile. And the truth is, I never saw a satisfactory answer until I came upon your theory of entity-first indexing. Do you think Google’s dropping the notion of two indexes is explained by this shift towards entity-first indexing?
Maybe. For a long time now Google has been crawling the web as a mobile smartphone and they also had the desktop crawler which was their primary, but they said for a long time the mobile smartphone was their primary crawler so why need mobile-first indexing?
So I see entity-first indexing as the solution to the problem, but I don’t know how it came about. What was untenable of desktop and mobile having separate indexes was that it was resource intensive. Crawling and indexing duplicate versions of the web and doing AI separately is double the work. So I think it was more pragmatic and less visionary, but it might be rewritten as a visionary story.
So why don’t they just come out and say it? Why not say mobile-first indexing is much more than we’re saying it is?
We believe the reason is that the system is ripe with abuse. I can do things like say I’m related to Jennifer Lopez and I’ll be in the J. Lo Knowledge Panel as a related person.
Before we go I want to talk about fraggles
. I’ve been noticing more ‘jump links’ in the search results where Google will have a blue link title tag and underneath it will say "Jump To” and in the meta description it will show the piece that answers your question and not the meta description itself. So fraggle is a word I made up that takes fragment and handle and puts them together. So now Google can single out your content and what it wants to index. I think mobile-first indexing is entity-first indexing but entities don’t need URLs so they can just take fraggles. Now imagine the Google crawler crawling around taking just fraggles. It crawls the same, but differently. So they might save the URL with a location modifier because it jumps straight to it on the page. This also includes API indexing and database indexing where these things don’t have URLs. They have a unique locator in the file so they can get just what they want and index that to the Knowledge Graphs and leave the rest.
So this is similar to Google using AMP URLs instead of Featured Snippets.
Exactly. That is an illustration of how fraggles work. That highlighting isn’t from the AMP page. So what they do when it’s not a Google page in the fraggle is they create a handle in their brain about scrolling and scroll straight to it. And sometimes there are jump links on the page but they don’t have to be.
Do you think people should use more jump links when creating their content? Or to rephrase, is it more important to put in a jump link than a header now?
I think jump links and headers should go together. You need both in your strategy. Think about this... All the answers from a People Also Ask box are on the same page.
So does this mean, in theory, that Google doesn’t have to index your entire page, or crawl your entire page. If it finds what it wants it will just take it and leave the rest behind?
I think Google will crawl your entire page. And it’s not that Google is looking for one specific answer, but things that look like answers.
Very interesting. So you think more Featured Snippets are heading our way?
Absolutely. Everything is going hosted. So the more of your content it hosts, such as media (podcasts, audio, video), the more likely it is to show because they are "better at understanding it”.
That is fascinating and disturbing at the same time.
For people who don’t know, if you have an AMP URL in the Featured Snippet if you click on the URL it doesn't take you to the top of the page but it drops you where the content that was in the Featured Snippet came from, and it’s highlighted for you. Is that going to change the nature of the Featured Snippet game?
won’t see the full article and not treat you as an authority on the subject?
Think algorithmically, that Google has been overfitting to people like you, people who sit at the computer all day and will like to read the entire article. But most people aren’t like you. Most people don’t want to read technical stuff and would rather hear a conversation about it and may not want to read. So having a video or audio really helps. I use these for my own blogs because how hard would it be to fake a podcast, videos, and audio summaries?
So in a year from now, will
Snippets will be as important as now?
Bigger. I think it will all be Featured Snippets. I believe Google is alleviating their guilt for stealing your content by adding features like Google Assistant’s ‘Save this for later’.
But it seems like what Google really wants is to keep you in their universe clicking on their hosted inclusions and only when there’s a hole in their Knowledge Graph does someone get an organic result.
Yes, I find it brilliant! People use search because of all the options, because of all the different links. So what did Google do? They show you multiple features with multiple options so the users get their result diversity all while staying in Google’s own properties.
Optimize It or Disavow It!
If you could only do one would you create a set of content that goes deep into one entity or content that hits on a series of related entities?
I will go deep into one entity because everyone has already done the shallow stuff. I mean if everyone’s written a post on chocolate ice cream we really don’t need another one. Because then it can get indexed into the Knowledge Graph just for you. I mean it’s hard without being unique to be the best. It’s easier to be the best at being unique than being the best of the best of everybody. Just be the best at something small rather than trying to be the best at everything.
Well, thank you so much, Cindy, for coming on our show!
You're welcome. Thank you for having me.
SEO News [1:04:08 - 1:06:43]
YouTube Expands Test to Explore: YouTube has expanded a test to its Explore feature
. The test shows cards to users with each card reflecting a topic they may enjoy watching videos about. The format seems very similar to the Knowledge Panel and seems as if the Knowledge Graph has come to YouTube.
Google Event Rich Snippets on Desktop:
Google has been testing that deep and rich "events” center on desktop
. The deep events center came to mobile last July so perhaps now it will come to
UTM Tracking Drops on GMB:
Local SEOs have been reporting some odd behavior inside Search Console
. Profiles that use UTM tracking have seen impressions fall off a cliff.
GSC to Consolidate Performance Report Data to Canonical URL:
Google offered a view of its consolidated Search Console data that is coming in March. Specifically, the Performance Report test showed data for the canonical URL
and not the exact URL.
The Fun SEO Send-Off Question [1:06:43 - 1:08:20]
Where does Google shop for pants?
According to Mordy, Google doesn’t buy pants. Google just borrows pants much the same way it borrows content for the Featured Snippet...
Kim, though sick, emailed her answer to Mordy. She said, "I think Google has left all those decisions to his or her personal shopper and has complete faith in him or her as Google is very busy.”
That’s probably Kim’s way of nicely telling him that these questions are absurd!
Thank you for joining us, catch a new episode of The In Search SEO Podcast