Online sales have crossed $4.2 trillion globally in 2020. A quarter of the world population is now an online shopper. Shopping websites like Amazon, Walmart, and eBay have got better with each day. How did they improve over time? It’s all with the help of DATA. When data is being collected for every click and search on the website, there is a higher scope of leveraging this data to the best advantage. This is what leads to higher use of data science and data analysis in the eCommerce business today.
As an end-user, the usage of data is clearly visible on these websites in various ways like :
- The product recommendations which are made to the user either on the website or through email – is one of the very important applications of data usage in a smart manner.
- The product prices change at different times of the day, depending on the user’s profile – how does this work? This is the process of prize optimization.
- After the user buys a product, there is a feedback and review process that a user is encouraged to complete. These reviews which the users provide help in doing a customer sentiment analysis of the product.
- Personalized discount coupons are provided to users depending on past purchases.
All of these activities make use of data and some mathematics in the background to make these features available to the users. A few of the above use cases will be discussed below in detail.
When a user searches for a particular product on a shopping website, say Amazon, they can see a horizontal belt with 5-10 recommended products below the searched product. These sets of products are made available to the user with the help of product recommendation algorithms. Product recommendations have improved over time and are recommending near-perfect things to customers. They work based on recommendation engines that make use of machine learning algorithms like collaborative filtering or content-based filtering. Recommendation engines enable the customer to buy the product. This increases their sales and also dictates trends. Thinking of Amazon? Based on your past purchases and purchase history, a recommendation is made to you along with some bonus discounts that prompt you to buy it.
35% of Amazon’s income is generated by using its recommendation engine strategy. It uses recommendation engines as a marketing tool both via emails as well as web site pages. It makes use of the browsing history of a user to recommend relevant products to the user. The main goal of Amazon is to increase the average order value i.e up-sell to customers and also cross-sell depending on the items they have in their shopping carts and also on the products they are searching for.
Three important types of recommendation engines are:
Collaborative filtering requires the user’s activities and preferences on the shopping website i.e. historical data. It then predicts what the user will like to buy next based on its similarity with other users. It stays by the concept of people who agreed in the past will also agree in the future i.e. people will buy similar products to their past purchases. It does not require any information or understanding about the item but focuses on past behaviors solely. For example, if a user P likes items, A1, A2, and A3 and user Q likes A2, A3 and A4, if they have similar interests then, P should like item A4 and Q should like item A1.
There are two different approaches to collaborative filtering:
It makes use of the user rating data to recommend products to users based on similarity. There is no model parameter learning involved in memory-based techniques like gradient descent. It is generally divided into two sections – User-Item Filtering and Item-Item Filtering. The User-Item filtering takes a particular user, finds users similar to this user based on the similarity of ratings, and subsequently recommends items that the similar users liked. For example, you will often see this title where they mention Users similar to you also liked…. The Item-Item filtering takes a particular item, finds users who liked this item and finds other items liked by the user or similar users. This closeness is generally determined by using the Cosine Similarity or Pearson’s Correlation coefficients. These are distance metrics that help you find the distance between two points.
The cosine similarity metric can be used to calculate the difference between two points – where every user is a row and items are columns in the matrix. For finding a similar user, the cosine angle between the two user vectors is calculated.
Advantages – It is an easy to use approach.
Disadvantages – Performance measures degrade as the matrix grows sparse.
Model-Based Collaborative Filtering methods make use of machine learning methods for predicting user ratings for items. There are various types of machine learning models used for this purpose like Matrix Factorization, Singular Value Decomposition, KNN, Neural Networks, etc.
In recommender systems, we typically work with very sparse matrices as the item universe is very large while a single user typically interacts with a very small subset of the item universe. Take YouTube for example — a user typically watches hundreds if not thousands of videos, compared to the millions of videos YouTube has in its corpus, resulting in sparsity of >99%.
This means that when we represent the users (as rows) and items (as columns) in a matrix, the result is an extremely sparse matrix consisting of many zero values (see below).
To overcome the problems of scalability and sparsity, Matrix Factorization is one such mechanism that is used for reducing the high-level sparse matrix to low dimensional matrices with latent features. For example, if a person has rated three movies Inception, E.T., and Gravity high, then the Matrix Factorization indicates that the user is interested in Sci-Fi movies. Sci-fi movies are one characteristic that is common to all the movies above. So this can help in recommending similar users about the hidden level(i.e. Sci-Fi movies in our example). These hidden level features are referred to as Embeddings. These embeddings are learned for every user by the system.
To understand how matrix factorization works, the first thing to understand is Singular Value Decomposition(SVD). It is the factorisation of a matrix into a product of three different matrices. Any real matrix A can be decomposed into 3 matrices U, Σ, and V. Continuing using movie examples, U is an n × k user-latent feature matrix, V is a k × m movie-latent feature matrix. Σ is a k × k diagonal matrix containing the singular values of the original matrix, simply representing how important a specific feature is to predict user preference.
This is an instance of Principal Component Analysis. Non – negative matrix factorization is a way of breaking down the matrices into non – negative ratings as in the movie’s example. In a KNN based approach, the distance metric changes and we use unsupervised learning instead of Cosine Similarity or Pearson’s correlation. The number of similar users can be limited by using the “k” factor. The other approach is about using neural network models for finding similar items for users. When we apply non – negative matrix factorization, we require the matrices obtained to be orthogonal. This is not the case with deep learning models. Deep learning models learn the values of embeddings matrices on their own by modeling. Some algorithms like Variational Autoencoders have been used to implement recommender systems. There are fewer research papers available for collaborative filtering using deep learning systems. These systems have had a massive transformation over the years, read here to know further details.
These methods focus on the item properties in comparison to the user. They are based on the description of the item and a profile of user preferences. They are modeled in the pattern of a user classification problem, the classifier will learn about the user’s likes or dislikes based on the item properties. Items are described with the help of keywords and the user profile indicates the user’s liking for a particular product. The recommendations are made based on the items that the user has liked in the past. So with this understanding, the assumption is that similar items will be liked by the user.
Two major components used by these systems to build a user profile are a model for recommending items to users based on the item properties and historical data about the user’s past purchases and browsing history, interaction with the recommender system. Basically, these methods use an item profile (i.e., a set of discrete attributes and features) characterizing the item within the system. To abstract the features of the items in the system, an item presentation algorithm is applied. A widely used algorithm is the tf–IDF(Term Frequency-Inverse Document Frequency) representation (also called vector space representation). The system creates a content-based profile of users based on a weighted vector of item features. The weights denote the importance of each feature to the user and can be computed from individually rated content vectors using a variety of techniques. Simple approaches use the average values of the rated item vector while other sophisticated methods use machine learning techniques such as Bayesian Classifiers, cluster analysis, decision trees, and artificial neural networks to estimate the probability that the user is going to like the item.
The major issue with this system is whether we can use the recommendation learned over one content source by users over the other content sources. For example, if the system has learned about the preferences of a user over the various media channels like movies, songs, etc for recommending news articles to the user. Content-based recommendation systems also make use of opinions in the form of feedback. This feedback collected from the users is given as input to the sentiment analyzer unit and which results in a proper sentiment for the comment or review provided.
Different experiments have shown that the use of both collaborative filtering and content-based filtering systems together has given much better recommendations in comparison to standalone implementations. They can either be implemented individually and then their outputs are unified together or by combining one into the other and vice-versa. A comparison between a sole collaborative filtering approach and a hybrid system shows that the hybrid system performs better. These hybrid recommendation systems also help alleviate the cold start problem i.e. when a new user arrives on the problem what should be recommended to that user.
Netflix is a very good example of a hybrid recommendation system. It makes use of the user’s watch history and the searching trends of similar users which is the collaborative filtering approach and also recommends movies that are similar to the movies which have been rated high previously by the user. Every time you press play and spend some time watching a TV show or a movie, Netflix is collecting data that informs the algorithm and refreshes it. The more you watch content on Netflix, the better is the recommendation system algorithm.
Often when you are looking to buy a mobile phone, a charger will be shown as a product below. This makes use of the Market Basket Analysis technique. It works in the following manner – if a customer buys one group of items, the customer is more or less likely to buy another set of related items. Here, the items bought by the customer are known as the itemset and the conditional probability that the customer will buy a charger after buying a mobile phone is the confidence. Market Basket Analysis will predict the chances that a customer will buy and also for which item.
It is a statistical technique that works on items that are frequently bought together. They do not involve higher-level mathematics, statistics, or machine learning. This analysis mainly involves the usage of association rules. A basic example of this is that Bread and Butter are always bought together, hence shopkeepers place these products close to each other. Amazon uses this technique to cross-sell products to users. This is not a recommendation algorithm as it does not incorporate the user’s preferences. It works only with the information related to the items. Retailers highly benefit from this analysis as they can improvise and increase their sales by various mechanisms. Some of them include cross-selling, changing the layout of the store, customized emails, catalog designs. To know more about market basket analysis research, click here.
Different methods used for product recommendations have been discussed above with the help of examples. These recommendations are generated for customers for a higher sale of products. For knowing more about how your website is performing, have a look at the Website Analytics offered by Smartlook. Combining web analytic data along with recommendations will make your customer experience much better.