Recommender Systems¶
Data Scenarios¶

- Rich user data
 - Rich item data
 - Explicit rating data
 

- Rich item data
 - Explicit rating data
 

- Explicit rating data
 

Problem Dimensions¶
Problem Settings
- Users
- with/without features
- context information
 - attributes
 - metadata
 - ...
 
 
 - with/without features
 - Items
- with/without features
- context information
 - attributes
 - metadata
 - ...
 
 
 - with/without features
 - Interaction
- Explicit feedback (ratings)
 - Implicit feedback (positive only)
 
 
Problem Statement
Given a user u, what new items i (not previously rated, bought, seen, ... by u) should be recommended to u, so that u is likely to interact (buy, view, rate, ...) with i
Challenges¶
Cold Start¶
- User cold start
- what to recommend to a new user, for whom there is no (feature, feedback) data?
 
 - Item cold start
- to whom to recommend a new item that no one has bought before?
 
 
Serendipity¶
From Oxford English Dictionary:
The faculty of making happy and unexpected discoveries by accident. Also, the fact or an instance of such a discovery
Example Usage:
”Columbus and Cabot..(by the greatest serendipity of history) discovered America instead of reaching the Indies.”
It is hard for recommender systems to recommend something unexpected
Problem: Boring Recommendations¶

Problem: Information Filter Bubbles¶
Wikipedia:
A filter bubble [...] is a state of intellectual isolation that allegedly can result from personalized searches when a website algorithm selectively guesses what information a user would like to see based on information about the user, such as location, past click-behavior and search history. As a result, users become separated from information that disagrees with their viewpoints, effectively isolating them in their own cultural or ideological bubbles. . . .

From https://en.wikipedia.org/wiki/Filter_bubble
Netflix Prize¶

https://www.netflixprize.com/rules.html
- A big driver for research in recommender technology
 
Content-based Recommendation¶
We assume that we have some content
- Here on items
 
Item Content, Explicit Feedback¶

Scenario:
- Item features available
 - No user features
 - Explicit feedback
 
Note: picture only shows one user; there still are many users, but we treat them one at at time!
User Classifier¶
Items rated by user u described by feature vectors:

Recommendation: predict ratings of new items based on the item feature vector.
- Standard machine learning regression (numeric label) or classification (categorical label) task
 - Can build standard prediction model (Naive Bayes, Decision tree, ...) for each user
 
Naive Bayes Classifier¶
Notation¶
- Assume qualitative rating labels: rating of user user u for item i: r_{u,i} \in \{+,-\}
 - f_i denotes the feature vector of item i, and \bold f a particular value for this feature vector
 
In example from previous table:

Bayesian Classification¶
Bayes Rule:

Same for P_u(r_{u,i}= - \mid f_i=\bold f)
Both conditional probabilities have the factor 1/P_u(f_i=\bold f) in common
- For classification can ignore this factor and can write:
 

(“=” in equation (4.6) in Ch.4 of Rec.Sys. Handbook should also be “\approx”)
Key Question
What is:

Naive Bayes assumption:

where M is the number of components in f_i
Example

- Large number of term feature factors (= size of vocabulary) may dominate this product
 - mitigated by: for most terms t: P_u(t=0 \mid r_{u,i}=+) and P_u(t=0 \mid r_{u,i}=-) will be very similar, and therefore have little impact on P_u(t=\bold f \mid r_{u,i}=+)/P_u(t= \bold f \mid r_{u,i}=-)
 - May need to make some adjustments to handle "hybrid" item feature data as in this example
 
Still to determine
For a single term t what is:

Bernouli model
- Use term occurrence features: k \in \{0,1\}
 - P_u(\text{t}=1 \mid r_{u,i} = +): probability that term t occurs in the text (review) for an item i that u has rated positively
- = relative document frequency of term t in the "corpus" of items rated positively by u (cf. slide 3.9)
 
 
Multinomial model
- Use term frequency features (bag of words): k= F[i,t] \in \N
 - P_u(t=k \mid r_{u,i} = +) = P_u(t \mid r_{u,i} = +)^k, where
- P_u(t \mid r_{u,i} = +) is the relative frequency of the term t in items rated positively by u.
- = relative collection frequency of term t in the "corpus" of items rated positively by u (cf. slide 3.9)
 
 
 - P_u(t \mid r_{u,i} = +) is the relative frequency of the term t in items rated positively by u.
 
User Classifier Pros and Cons¶
Pros
- Makes use of item features
 - Can handle item cold start (assuming features of new items known)
 
Cons
- Requires explicit feedback
 - Does not handle user cold start
- ... or even users with relatively small data set
 
 
Item Content, Implicit Feedback¶

Scenario
- Item features available
 - No user features
 - Implicit feedback
 
A partial analogy to Information Retrieval (IR):

User Profile¶
Idea
Represent user by a vector in the same space as the item feature vectors by summarizing the feature vectors of items for which there is implicit feedback.
Example

Then: rank candidate items according to similarity with user profile. Similarity can be defined as (weighted) sum of component-specific similarity measures:

Illustration

All items

Items with implicit rating by user u

User profile of u (= item prototype)

Ranking of new (gray) items for recommendation
Evaluation¶

How good is the recommender we have designed?
Explicit Feedback¶
- Split data into training and test set:
 

- Design/learn/configure recommender using the training data
 - Compare true and predicted ratings on test cases:
 

Quantify performance using e.g. \text{accuracy}:{\text{#correct predictions} \over \text{#all predictions}} (or root mean squared error RMSE for numeric predictions).
Implicit Feedback¶
- Split data into training and test set:
 

- Build/learn/configure recommender using the training data
 - Determine positions of test items in ranked list of all items (or: test items plus random selection of some other items):
 

- Quantify performance
 
Mean reciprocal rank:

where K: number of test items; rank(i): the rank of the ith test item.

- All metrics make the implicit assumption that non-test items are not relevant for the user
 - To go beyond the limitations of this implicit assumption: need user studies
 
Collaborative Filtering¶
Pure interaction data


- Very sparse matrices
 
Some key techniques
- Neighborhood methods (this session)
 - Random walk based (next session)
 - Matrix factorization (next session)
 
Neighborhood Methods¶
Scenario
- No item features
 - No user features
 - Explicit feedback
 
Example

Here: explicit numeric feedback; could also be explicit categorical (+, −) feedback
User Based¶
To predict Eric's rating for Titanic:
- Find users similar to Eric who have rated Titanic
 - Predict by taking average of similar users’ ratings
 
Item Based¶
To predict Eric's rating for Titanic:
- Find items that Eric has rated which are similar to Titanic
 - Predict by taking average of Eric’s ratings for similar items
 
User vs Item Based¶
User- and item-based completely analogous: just transpose the matrix.
Differences due to rating distribution in matrix:
User cold start: for a user u with very few ratings (suppose exactly one rating):
- User-based
- Many equally (and highly similar) other users.
 - Prediction for r_{u,i} for item i close to global average of ratings for i
 
 - Item-based
- Only one candidate similar item.
 - Prediction for r_{u,i} for item i equal to the only previous rating of u
 
 
Similarly for item cold start (everything transposed ...)
Normalization¶
Suppose

are user vectors.
Two different users may use somewhat different semantics for their ratings: user 1 gives 5 stars whenever he likes a movie, user 2 gives 5 stars only once in a lifetime.
Mean Centering¶
Let \overline r_u denote the mean value of all ratings of user u.
Define the user mean centered ratings as
Example:
- \overline r_u = 3, \quad \overline r_v = 2.25
 

(note the 0s!)
Similarity¶
Problem: measure similarity between two partial integer vectors:

- these can be either a user (column) or item (row) vectors.
 - in a sparse vector, the “blank” entries would be equal to 0.
 
Only consider the components that are present in both vectors:

- (1, 5, 4, 3, 2) \quad (1, 4, 4, 1, 1)
 
Then what...
- Dot product?
 - Cosine similarity?
 
User-User Similarity¶
- Center full user vectors
 

- Calculate similarity w_{u,v} as the cosine of the sub-vectors of commonly rated items:
 
(this is equivalent to Equation (2.19) in Rec. Sys. Handbook, Ch. 2)
Significance¶
Problem with w_{u,v}: may obtain large similarity values from very small sets of common ratings:
- \cos((1),(1)) = 1
 - \cos((1, −1, 0, 2, 2, 0, −1, −2), (1, −1, 0, 1, 2, 0, −1, −2)) < 1
 
We can apply a penalty for 'short vectors':
- e.g. \gamma =25
 
Putting Things Together¶
User-based prediction of r_{u,i}:
- Let \mathcal N_i be the set of users that have rated i
 - Predict
 

- Or use w'_{u,v}
 - Instead of summing over all users in \mathcal N_i, may only sum over the k users that are most similar to u denoted \mathcal N_i(u)