Breaking Down Google's BERT Powered News Carousel [Case Study]
May 20, 2020 |
Back in December 2019, Google's BERT algorithm made its way to the Top Stories Carousel. Ever since then I've been dying to get my hands on the newly segmented carousels to see what makes them tick, where they hit the mark, and where they fall a bit short. It's only taken me a few months, but here's a bit of thematic analysis on what's happening with the Top Stories carousel now that BERT is part of the picture.
BERT Comes to Google's News Carousel
Before we get started digging into the nitty-gritty details, here's a bit on what's changed with Google's news carousel as driven by BERT. On December 11, 2019, Google clued us into a new reality for its news carousel
. In specific, Google announced that some queries would now produce multiple news carousels.
Until this announcement, Google would show but a single carousel within the News Box as is still the case on desktop:
However, in some cases, the mobile
News Box may now contain multiple sets of carousels, as can be seen below:
Oddly enough, the implementation of the multiple carousel News Box is very limited. Only roughly 5% of all mobile News Boxes contain more than one carousel of news results. Still, for certain query formats the carousel is quite common and in either case, represents a shift in Google's abilities (and perhaps makes tracking the News Box
a bit more complex).
To fuel this orgiastic feast of news diversity, Google is using machine learning, in specific... BERT (one of many properties to be more accurate). To this, Google has said they have "developed a new story-understanding technology to map the people, places and things involved in a news story, and then draw connections between them."
BERT's role appears to be related to understanding where a topic begins and ends. This makes a great deal of sense since contextualization is BERT's specialty. Google has not indicated how it's specifically using BERT in these instances. There are, however, a few logical possibilities. For one, BERT can perform named entity recognition (NER) tasks with the upshot being a more accurate cataloging of entity ambiguation. This would be helpful in the news context so as to know which articles are still discussing the same entities and in what context. Of course, Google could be implementing BERT for news results in an entirely different way, the aforementioned is just one aspect that jumps out at me based on Google's own statement.
The idea of all this, as should be relatively obvious and in the words of Google, is to feature "key information, such as notable queries and related opinion pieces, in the top stories carousel within Search. These different content types provide people a more well-rounded view of a news story..."
So, what does this "more well-rounded view" look like and does it really provide a more worldly perspective on today's news?
A Case by Case Look at Google's Multi-Carousel News Box
As you'll probably notice, the news results I'm about to explore are a bit outdated. That's because I did the initial research for this post, made my notes, and then instead of writing this post got caught up with who knows what. Add on an already full content calendar and the news stories here may be a bit dated. That said, have no fear, for if you forgot what these news storylines were about I will divulge their context. You'll notice there is neither rhyme nor reason to the order of the cases below. These are a collection of news carousels that reflect some of the patterns I've seen in the new multi-carousel format.
Context and excuses out of the way... let's have some fun with these news carousels, some of them are simply "incredible."
Case 1: News Story Specificity Is Hit or Miss
A day after the 2020 Academy Awards, I decided to check up on the most recent fallout from the then Trump impeachment process. It just happened that at the awards show famed actor Brad Pitt made some sort of comments on the acquittal of the US President. I'm not sure what he said as I did not watch the Oscars, but I am sure I don't care. My personal feelings towards celebrities and their general vapidity aside, the segmented news carousel that came with the keyword impeachment news
was one of the more telling cases I came across.
The multi-carousel News Box for the keyword impeachment news
led off with a carousel on Mr. Pitt's commentary on the matter under a header that read Oscar's 2020: Brad Pitt wins Best Supporting Actor
. There's a lot to chew on here. First off, the fact that Pitt's comments on the news story are the first carousel within the News Box shows how specific the search engine can go here. Pitt's speech is one very specific sliver of what was happening with the storyline of the impeachment hearings at the time.
Oddly enough, the heading employed by Google is hardly specific to the storyline at all. There's no mention of the impeachment hearing within it. I would have thought that a heading like Pitt Comments on Impeachment with Oscar Win
would have been a more appropriate fit. The highly specific segment of the story combined with an overgeneralized heading shows how "hot and cold" the multi-carousel News Box can be.
To see Google's fickle specificity in further accentuation, one needs to look no further than the second carousel within the News Box. Here, Google shows a series of articles under the heading of Trump News
. So we went from a very specific sliver of the overall story arch to a collection of anything and everything under the broadly termed heading of Trump News
Perhaps most alarming about the structure here is that more substantial news content was pushed to the second carousel in favor of the flavor of the day. In fact, if you look at the second carousel card within the Trump News
section, you'll see USA Today's article about West Virginia's Senator Joe Manchin's take on impeachment acquittal. In other words, a carousel with a real-life Senator's take on an issue of national if not global importance falls beneath a carousel of stories about what the guy who did Sinbad: Legend of the Seven Seas
thinks about US politics.
A look at the desktop and unsegmented version of the Top Stories Carousel corroborates my griping as a Fox News story on Senator Manchin's reaction is the very first card in the carousel:
If you look back at the first card in the third carousel within the mobile News Box, the very same story appears. It took Google two whole carousels to get to the lead card within the deskop version of the Top Stories SERP feature.
Specificity within the multi-carousel format is hit or miss and serious news content can play second fiddle to periphery storylines.
Case 2: Entity Oriented Segmentation at the Expense of a News First Focus
Here too, I ran the query in close proximity to the 2020 Academy Awards. The awards were of particular importance to Netflix who had two of their original movies up for Best Picture of the Year.
This is a 'classic' example of what the segmentation of the news carousel is meant to do. The carousels here take the entity, i.e., Netflix, and breaks it down by what is important vis-a-vis the entity
. In specific, Google created a carousel dedicated to covering the single most important storyline related to the entity, its performance at the Oscars (as the nomination of its films resulted in a serious boost to the entity's stocks
Of course, and as any good Netflix viewer knows, if you're going to discuss the streaming service from a news perspective, you have to hit on what content is heading to a screen near you. By offering a look at what was at the time Netflix's latest series renewal, Google gives the user a well-balanced look at the ongoings of the entity.
That said... an entity-first focus may not bode well for news coverage per se. In this particular case, covering the Oscar's makes sense from an entity identity perspective. However, in terms of news per se, I would imagine that Netflix seeing its stocks soar with the Oscar nominations is a bit more "news-centric." In other words, while the Oscars or the renewal of a season of a specific show speaks more to Netflix as an entity in that it is very closely tied to the entity's identity as an entertainment source... covering the company's stock performance is more closely tied to news content.
In other words, what is more newsworthy, the renewal of Sex Education for a third season, or the soaring of the company's stock (at the time of the query)?
The multi-carousel format seems to be more entity focused than strictly news focused. Now, I do want to point out that at times, this scheme works perfectly. For example, when I ran a query for mlb news
there was nothing bigger in baseball than a trade for the former MVP of the league and an ongoing cheating scandal surrounding the Houston Astros. When breaking the entity known as Major League Baseball down, Google got it exactly right:
While Google's entity-focus perfectly synched with the news of the day above, not picking up on a surge in stock for Netflix does show some limitations with the entity construct in the news setting.
Google can do a nice job breaking an entity into its most relevant parts by using news carousel segmentation. At the same time, the segmentation seems to be more entity-oriented than strictly news focused.
Case 3: Multiple News Carousels for Entity Connectivity
Tom Brady News
Tom Brady is arguably the best American football player ever (I begrudgingly admit that as he has burned my team on numerous occasions). Oddly enough, 2020 is the first time Brady, who has played 20 years in the NFL, was a free agent (i.e., he was free to sign a contract with a new team). In the sports world, this was huge news. In fact, months after I ran this query, Brady's free agency & subsequent signing with a new team remains one of the hottest topics in the American sports world.
Coincidently, 2020 has been the biggest year in quarterback (the person who throws the ball) free agency ever (perhaps). Why am I telling you this? Look who is listed in the first header in the News Box below, because it's not
No your eyes do not deceive you and no Google is not stupid, that is a header that reads Philip Rivers news
at the top of the News Box for a query that specifically mentions a different NFL player.
Allow me to explain (and yeah, we're going down a sports wormhole here, but purely for SEO purposes).
Philip Rivers is, like Tom Brady, an elite veteran NFL quarterback (though personally, I think he's lost his mojo). Like Brady, Mr. Rivers was also a free agent when I ran this query.
Do you get what's happening here? It's glorious.
When I typed in Tom Brady news
Google didn't see me asking for news about the supermodel marrying, perfect smile wearing, Super Bowl winning, 42 years old and still playing Tom Brady.
Like the example of Netflix above, Google went "entity" on me. Google took my query and said, "he was asking for news about an aging veteran Quarterback looking to take his elite skills to another team via free agency." Google profiled the entity and then matched it to multiple instances of that profile (i.e., Tom Brady AND Philip Rivers).
With this profiling at hand, Google then said, "Let's show a news carousel that reflects the underlying identity of the entity at hand (Tom Brady as an older free-agent NFL quarterback) by showing news that relates to the connectivity between Mr. Brady and a similar such entity (i.e., Philip Rivers). The only flub here is that the header should have read Philip Rivers & Tom Brady news
, not just Philip Rivers news
As with the case I outlined in regards to Netflix, this entity-based way of structuring the news carousel provides an expanded look at a given topic. On the other hand, it does limit the placement of top news within the Top Stories feature. Brady's connection to Rivers, his reaction to Rivers, and so forth (which is what the stories within that sub-carousel discuss) were not reflective of the most pressing news surrounding the greatest quarterback to play the game.
Google is profiling news queries from an entity perspective and creating news carousels that make connections between various entities who share the same profile.
Case 4: Abstract Entity Analysis Leaves the Named Entity out of News Results
The NFL (National Football League) is America's most popular sports league. Its season runs from September through mid-February. When I searched for NFL news
I did so just after the season had ended. This is an important fact as a startup football league, the XFL, was set to debut a few days after I ran this query. The XFL is meant to fill the "gap" that exists once the NFL's season is over. It doesn't compete directly with the NFL, nor does it employ the same players. It exists simply to capitalize on the sports vacuum that exists in the US between the end of the NFL season and that the start of major league baseball.
Why do you need to know this? Like in the previous case I showed you, have a look at the first header here:
As you may have noticed, this particular multi-carousel News Box does not start off with anything related to the entity named in the query, the NFL. Instead, the initial news offering Google presents us with is entirely related to the "filler league" I mentioned a few sentences ago, the XFL.
What's happening here is the manifestation of the concept I discussed in the previous case about Tom Brady news
. There, Google analyzed the underlying facets that comprise the entity's identity and connected the entity named in the query to a similar entity. Here... no attempt at a connection was made. Google appropriately profiled the entity I named, the NFL, as a football league and gave me news on what it considered to be the most relevant football league at the moment (that being the XFL, as the NFL's season was over at this point).
Personally, I applaud Google's entity abstraction, but it 100% does not work here. Qualitatively speaking, the league's are not anywhere near each other. NFL specific news is relevant all year long. It dominates the American sports conversation well after its season has ended. I have very little to no interest in the XFL and the carousel of XFL-oriented results was totally useless to me.
To me, this is a clear example of where an entity-focus and a news focus clash. That's not to say the construct doesn't work at all. The second carousel highlights news on the NFL's most valuable property, Tom Brady. So it's not like an entity focus can't work, I just think Google needs to perhaps scale this back a bit.
Google is showing news content within the multiple carousel News Box that is not directly related to the entity named in the query.
Case 5: Topical Accuracy within the Multiple News Carousels
Bernie Sanders News & Buttigieg and Sanders News
At the time of the queries, there was no clear frontrunner in the Democratic presidential primaries. Bernie Sanders, Joe Biden, and Pete Buttigieg were are in strong consideration for the nomination.
Let me explain the progression of events here. I started off running the query for bernie sanders news
. Right off the bat, something was amiss here. The top header read Joe Biden and Bernie Sanders news
. However, the carousel was full of cards reflecting news about Bernie Sanders and Pete Buttigieg, Michael Bloomberg, etc. Clearly, there was an accuracy problem here for some reason.
Seeing Buttigieg enter the carousel meant to pair Sanders and Biden made me wonder what would happen if I searched for Buttigieg and Sanders news
, would I find more of the same inaccuracy?
The answer was, no. Here, Google did a wonderful job. The carousel supported by the header Bernie Sanders and Pete Buttigieg news
was filled with.... Bernie Sanders and Pete Buttigieg news!
What's really peculiar is that the second carousel was again for Joe Biden and Bernie Sanders news
. However, this time around Google got it 100% right - go figure.
Google is pretty good at getting the right content for the right carousel, but at times it still has some work to do.
News to Me?
Let me put an odd question out there: Do I like the multiple carousel format? I know it's not a very technical way to go about analyzing the diversified format for the Top Stories SERP feature... but indulge me. In a lot of the cases I looked at (yeah, I looked at a lot more than just these five News Boxes), the answer was no, not at all. It was interesting information. For example, way back in February when I searched for Facebook news
I got some interesting stuff on something Elon Musk had said, but it was a little too focused for what I wanted.
You remember newspapers? When I was a kid I used to love to flip through the sports (and movies) section and just peruse through the headlines. I wanted to get a sense of what was happening, of what was most relevant to my teams. That's what newspaper headlines are for!
What I feel I get with the segmented news carousels is a lot of hyper-focused content that doesn't really meet my intent. It's not that I don't like the content. Rather, for Google to meet my demand to know the news, it's too specific and too far off the topic at times.
My general sense with the multiple carousels is that Google is missing the mark from a news perspective. The multiple carousels are offering great content and doing some really cool things, but from a news perspective, there is something misaligned at times.
There's a simple solution to all of this... the one carousel I did not discuss. Every multi-carousel News Box contains a carousel called Also in the news
. This is where you can get a purview of the biggest and latest news for the query. This carousel should be renamed (perhaps, Top News
) and should precede the entity-like profiling/segmentation that is generally reflected by the other carousels within a given News Box.
I'm OK with a fuller perhaps offbeat topical view of an entity or storyline, I just think the actual news should come first. Do you?