After having presented non-personalized recommenders and corresponding algorithms in part 2 of our blog series on recommendation systems, we will now focus on personalized recommender systems and more advanced machine learning methods. Moreover, the important topic of evaluation is considered in greater detail.
Are recommendation systems a completely new area for you? Then you are invited to start with part 1 of our blog series which explains for whom a recommendation system is relevant, which basic variants are available and how complex their implementation is.
Algorithms for personalized recommendations
Users do not always leave behind enough personalized information along their customer journey. For instance, new customers can be acquired or existing customers might browse an e-commerce website without being logged in. Non-personalized recommendation systems, such as those based on proposals for products frequently purchased together, still offer recommendation opportunities for companies in this case. However, the more individually these are tailored to the customer, the better. Therefore, in the following, personalized approaches are presented which learn the preferences of customers. To understand these methods, it is helpful to consider the recommendation problem at hand as a sparsely populated matrix. The rows represent the users, the columns represent the items. Whenever a user performs an action for an item, a respective entry is recorded in the matrix. Otherwise, values remain absent and do not need to be explicitly stored. This is illustrated best in the following diagram:
The rows represent different users who have expressed their opinions of various products as a like or a dislike. However, many of the users' interests remain unknown in practice (symbolized by a question mark in each case). Recommendation systems can be used to predict the answers to precisely these questions, i.e. to determine yet unknown preferences for items. This purpose is achieved by the following algorithms, among others.
- Collaborative filtering
User-user and item-item collaborative filtering are traditional methods of inferring user preferences. In the case of user-user collaborative filtering, a vector is formed for each user, in which each component corresponds to an item. When a user rates an item, the corresponding rating is entered; no entries are written for items which were not rated. If ratings are absent or insufficient in number, implicit feedback such as purchases or other actions can be used, and 1 can be entered as a value for an action. Because a user usually interacts with only a small number of items on a page, most of the values are missing (implicit feedback being represented by 0). A measure such as the cosine similarity can then be used to determine similarities between the vectors of users. If this is done pairwise for all users, a user-user similarity matrix is obtained. With the help of this matrix, users similar to a given user can be found, and their preferred items can be used as recommendations for the given user if the user is not yet aware of them.
In the diagram below, for example, we see that the preferences of the boy (number 1) seem to be very similar to those of the girl (number 3). This would suggest that the girl might also like the second camera (number 5 of the products).
This is a nearest-neighbor method, the neighbors here being similar users. Item-item collaborative filtering is comparable but is based on similarities between items. In practice, however, it is usually much less intensive in terms of computation and memory, because most applications involve more users than items. In addition, analogous techniques have been developed for other use cases, such as at the session level. More precisely, the term “collaborative filtering” describes the general concept that relevant information can be filtered based on several interacting users. This is the case also for the following, newer methods.
- Factorization models based on machine learning
Newer methods of machine learning and artificial intelligence are natural successors and aim to improve the traditional similarity-based collaborative filtering. In the so-called “Netflix Prize”, a competition in which 1-million dollars were awarded, matrix factorization, in particular, proved to be a very successful method. The basic concept of this method is to learn "latent preferences" of users and "latent attributes" of items. In the case of music, for example, this could be the preference for a particular genre; the corresponding attribute of a song describes how strongly it is associated with this genre. Preferences or properties are not based on assumptions, however, but are automatically learned by the algorithm based on the data. For a specific number of latent preferences, there are correspondingly sized low dimensional vectors of users and items whose match can be calculated by their dot product. A high match between a user and a new item indicates that the item will please the user and is therefore suitable as a recommendation. For matrix factorization, there are very efficient algorithms based on stochastic gradient descent, enabling efficient learning. Matrix factorization is furthermore characterized by its robustness to small amounts of available data, due to its representation of the initial data with lower dimensions. An extension of matrix factorization are factorization machines which offer the benefit of advanced feature engineering capabilities. Additional variables such as context information, user and product master data, as well as time-dependent information, can be easily integrated into the model through an appropriate coding.
- Artificial neural networks
Recently, some methods have been developed for recommendations based on artificial neural networks. This includes the use of embeddings which were originally developed for text data, and generate a latent representation, similarly to the factorization models described above. These can be integrated into other neural networks and are also suitable for mapping certain behavioral characteristics of customers, but are often more complex to train. There are also other methods based on deep learning, such as RNN approaches with gated recurrent units. A common use case of such methods are session-based recommendations. These aim to model sequence attributes such as chronologically sequential views within a session on a website, in order to predict the next clicks. Such approaches often differ from classic RNN applications in the type of objective function. Ranking-based loss functions are commonly used in this case (often with pairwise losses like Bayesian Personalized Ranking), because the goal is to generate a list of recommendations with the best possible sorting. The form of the batch-wise input training data also differs.
There are also other types such as hybrid recommender systems which are a combination of several approaches, as well as knowledge-based recommenders which, for example, rely on coded knowledge and provide the user with a step-by-step, rule-based assistance in making decisions.
Evaluation
A live evaluation and, if necessary, a user survey are most effective for assessing a recommender's success. However, it is very helpful to evaluate a recommendation system offline before deployment. The best measure for such evaluation depends on the use case. Often, the accuracy of the resulting recommendation list is of interest. Precision and recall are among the most important measures here. Precision specifies the proportion of recommended items relevant to the user (for example the fraction of products purchased after recommendation), averaged over all users. Recall indicates how many of the items relevant to the user have been selected. Both measures can be determined for different (top) thresholds. A relatively high precision value can be achieved by recommending the most popular items overall. However, this could ignore the existence of many more relevant items for specific users which depend on their special interests. A high recall value attempts to describe this.
Another important metric is the AUC, representing the area under either the curve of the true positive rate against the false positive rate or the precision against the recall. This can be considered as a combination of the concepts of both precision and recall in one metric.
For use cases in which recommendations are not presented side by side, but as a list sorted in descending order, ranking measures such as the mean reciprocal rank and normalized discounted cumulative gain are also available. The idea here is that the topmost output items need to have a higher weight because they are perceived more readily by users.
Customization
In addition to the aforementioned metrics, it is often useful to define an evaluation function tailored to the problem and, if possible, also a target function to optimize. This way, the requirements of a given use case can be taken into consideration more thoroughly, e.g., so that profit and revenue targets can be directly taken into account by the resulting recommendations.
Recommended Reading
You would like to know more details about recommender systems? As an introduction to the basics and concepts of recommenders, the book “Recommender Systems: An Introduction” by Jannach, Zanker, Felfernig and Friedrich is a good reference. More detailed descriptions of algorithmic approaches and the newest developments in the field can be found in scientific publications. For example, those articles published in the yearly RecSys conference, which specifically addresses recommendation methods and applications: https://recsys.acm.org/.
Conclusion
This blog post served as an overview of the different methodologies used in recommender systems. It is advisable to first implement a simple approach in a short project. This can already deliver good results in practice, and also enables a better understanding. Depending on the outcome, advanced machine learning methods can be used subsequently.