Under the Hood: Science Of Online Content Recommendations

Have you ever wondered how online providers of good and services decide which products, songs, books or movies they should suggest to you? It may seem like magic but behind the scenes there’s a lot of data being mined, complicated algorithms at work and many researchers who have been striving since the early days of the web to figure out ways to more accurately predict what you like.

Pandora Radio

The first thing to bear in mind is that organisations like Amazon.com, Pandora Radio and Netflix can only make guesses about what you might like based on the data available to them, typically from your use of their services and any other data they have shared by partners such as ad networks.

The recommendation system used by Amazon.com is called affinity based Item-to-Item Collaborative Filtering. Within a moment of my loading the Amazon.com website recently, a message appeared on screen stating that Terry Pratchett’s newest book was about to be released as a paperback. Amazon.com knows that I’m a Terry Pratchett fan from my previous purchase history of his books in paper and Kindle ebook format, so their recommendation engine assumes I’ll want to purchase his new books.

Suggested items like books, music and TV shows will be most accurate if you’re the only person using the account in question and have given the organisation whose service you’re using interest “signals” in the form of many products browsed and purchased, music tracks given thumbs ups and TV or movies shows liked.

As a general rule suggestion accuracy will become increasingly worse the more you share a login to a site with other people who have different interests to you such as your partner and children. Another factor which can lead to wildly inaccurate suggestions is if you purchase varied birthday gifts for friends and family through an online retailer because this sends confusing signals to the site about what products you’re interested in.

A particular challenge for organisations that sell or offer digital content by subscription is that they often have millions of items available but their customers are increasingly browsing and searching for new content using a small screen on a mobile device which can’t display many items at once on screen.

Personally the most effective suggestion system I’ve experienced is the Music Genome Project (MGP) used by the freemium online music service Pandora Radio. Pandora has over 200 million registered users collectively in the USA, Australia and New Zealand, who listened to 1.49+ billion hours of music in March 2013. Since the company started at the turn of the century in January 2000, users have personalised the music they listen to by making over 25 billion thumb ratings. These are used in conjunction with each song’s MGP metadata to make suggestions that are unique for each individual user.

I had coffee with Jane Huxley, Managing Director, Australia and New Zealand for Pandora at a cafe next to the new Guardian Australia HQ, in order to learn more about how Pandora’s suggestion system works.

Huxley explained that Pandora is based on the belief that “each individual has a unique relationship with music – no one else has tastes exactly like yours. So delivering a great radio experience to each and every listener requires an incredibly broad and deep understanding of music”.

The Music Genome Project has been built over 10 years by a trained team, a typical member being a professional musician who has completed several years of tertiary education in music theory, composition or performance. Each song added to the Pandora catalogue is analysed with up to 450 distinct musical characteristics by a member of this team, not a machine or other automated process.

Pandora is great at surfacing surprising unknown unknowns, songs that delight you once you hear them which you would probably never have discovered, if the MGP hadn’t suggested them to you because they share underlying characteristics with other songs that you like a lot.

I’m not a musician who can analyse a song and identify nuances so on many occasions I’d never have guessed that there were common attributes between several songs that I realised I liked only after they were recommended to me by Pandora.

Chief Content Officer at the international device and eBook vendor Kobo, Michael Tamblyn told me that in his view “recommendations aren’t just nice to have. They are critical to help a reader to find books that will interest them”.

In Tamblyn’s opinion suggestions can and should be so much better, insightful and surprising than “People Who Bought This Also Bought That”. In his vision of an ideal future the Kobo store would shape itself to the reader, constantly learning about their tastes, adjusting and surfacing new titles based on what they’ve liked.

Tamblyn says that “great recommendations, whether they be in person or on a device, are always the result of sifting huge amounts of information about which books have sold, when and to whom. A great human bookseller does this intuitively – thinking about previous purchases, trends, what’s selling, what friends are reading, what you like and don’t like. We do the same thing using data, analytics and algorithms, social activity, ratings and other data to suggest your next great book”.

As an aside he made an interesting point that “unlike a human bookseller who may know everything about politics but turns his nose up at romance (or vice versa!)”, suggestion systems can help connect you to the next book that you’ll like, without making disparaging judgements about your preferences. This along with the privacy factor that when you’re in public, ebook readers don’t have a front cover title and image broadcasting what you’re reading, is one reason why romance genre novels are so popular with eBook purchasers.

What about the future? cXense is a multinational organisation which has created a context aware software as a service (SAAS) recommendation system that is integrated into the platforms of their customers. cXense goes beyond similar product type recommendations to deeply examine nuanced factors like what time of day a website is being visited, what device is being used to access it and the prior searches and hints as to their intention a user has made which are related to the page they are viewing now.

According to Mark Pritchard, Senior Vice President Engineering at CXsense, if a news website reader was examining an article about David Cameron, cXense could recommend other articles related to entities mentioned in the article using the context of whether the reader had a deep interest in British politics, the EU, a particular industry affected by government policy etc.

When asked about the future of recommendation systems Pritchard thought that sentiment based analysis such as a mood based music player would be pretty easy to do if music service users opted in to data mining of the words and phrases they you used on social media platforms like Twitter and Facebook. For an example if a user had tweeted “going out for a run” in the last 15 minutes, the suggested music could be faster in tempo.

This is the extended version of an article written by me for The Guardian Media Network and published there on 31st May 2013.

Leave a Reply

Your email address will not be published. Required fields are marked *