Showing posts with label Similarity. Show all posts
Showing posts with label Similarity. Show all posts

Thursday, December 13, 2007

Friends Problems, Take 2

This weekend we will release some fixes that will solve this problem for many of you. Unfortunately it will not solve it for all of you. Here's the deal: When you go the the Friends page, the website loads up those sliders with the movies your friends have rated. It pulls in your notes. It calculates your sim% to each friend. And it does a few other things. To calculate the sim% it is doing an analysis and comparison of your ratings to each of your friends. We give the page a time period to get all this done. If it doesn't complete its tasks in this period, it "times out" and you get some kind of error message.

The problem appears to be that there are cases where a member has a lot of friends. Not one, not 3. Not 15. But lots. Say, 30. (Only a tiny % of members have this many friends, btw). But lots of friends isn't enough alone to be an issue. We test this with lots of friends. But if a significant number of those friends ALSO are heavy raters -- say with many thousands of movie ratings (say, 5K or 7K!) -- it is taking too long to pull in their movie rating data and do the sim% calculation, and a few other things, such that it runs out of time. And thus the error message.

It's not that Friends is down. Friends is NOT down. >99% of you are not having a problem. But ironically, the few that have this problem are people we like a lot - you have lots of friends, you write notes, you rate lots of movies; you usually write reviews too, and make lists--you're all round good Netflix Samaritans. So we are taking this seriously. (For the record, this problem only affects a few hundred of you.)

The problem starts and stops for some people because there are other factors - for instance, the time of day you try it (because of changes in server loads), or which server you get when you log in; it might matter which pages you view in which order. We're still looking at these components.

The quick fix is to widen the window of the time period, which will hopefully allow more time for these events to transpire before triggering a timeout. this will go into effect this weekend, and will improve things but not eliminate the problem for some. It is a somewhat inelegant solution and there are clearly smarter ways to handle this. We could change the way we calculate and store sim%, for instance, or change the way we deal with notes. There are many possible ways to address this, but all are long-term, and none can happen immediately. We are looking at a range of solutions, and will keep you posted on our progress.

For those affected, sorry for the trouble.

Wednesday, August 1, 2007

Members Similar To You

The question has been raised: "How are these people chosen?" and "Why do some of my Friends have higher sim% than these people?"

The calculation to determine similarity is extremely complex -- and takes into consideration all the movies you've seen and rated, and how other people rate those movies to find weight, and then looks at the various kinds of overlaps with another person. To take you and compare you to every other subscriber would be CPU prohibitive. To compare you to every reviewer is slightly easier, but still takes many minutes. We use this complex algorithm for you and your Friends since you only have a (relatively) few Friends. But with the larger Community we had to simplify the calculation a little. It's simply not quite as refined as the variation we use on your Friends. The result is that, for the moment, a "very similar" stranger is in going to be in the 70-80% range (i've not seen a 90+ yet) and that's about 10 points below what I get with my most similar Friends. We will continue improving the speed and accuracy of the similarity calculations.

The block of four selected here is not necessarily the four "most similar" people to you. We use a number of quick assumptions to narrow down the field of reviewers we will compare you to, then run the faster similarity calculation on them, and finally randomly select four from the group at the top. This way, it will change a bit from time to time. If we find that people like this block, and enjoy exploring people who are similar, we will expand this kind of feature (maybe get a whole page of similar people, or be able to search for folks, or other interesting things); but first I'd like to see if a lot of folks are intrigued. (And as we know from blog comments here -- a lot of you simply don't care about strangers, so it is most certainly not assured to be a valuable feature). But we're listening. Whatcha think?

Thursday, June 28, 2007

Similar Reviewers

I really shouldn't say this, and I may regret it, but YES YES, we are building some ways for you to (a) see people who are the most similar to you, and (b) ways for you to save people once you've found them, no matter how you've found them. I won't go into the details here, or the precise timeline, but its a big priority for our team and I expect will be available in the near term (however you chose to interpret that).

As I've mentioned here before, expect that the community features are going to be moving and growing continually for some time. This means cool new features and pages, and also the unfortunate side-effect of short bursts of instability and downtime (not for the Netflix site in general, but potentially for some of these new features). We'll keep the bad parts to a minimum, but, you know, I'd like to be candid here. So keep those cards and letters coming, we ARE listening (we've actually always been listening too, but it's a lot more obvious these days, no?) Thanks for the attention. Now go watch some movies.

PS: We're LOVING your avatars.

Friday, June 8, 2007

Connecting to People Who Are "Similar"

That's a great idea, folks. You're browsing around and you come across someone who is really really similiar... you know, something amazing like 90% or more. You read their reviews and you clearly have similar tastes and even share some pretty obscure favorite movies. You don't want to lose this person. They don't really have to be your Friend, I mean, you don't want to bug them. They may or may not be interested in you. You just want to hear when they find a new movie that's great, or read their latest review... you want to "subscribe" to a feed of their ratings and reviews.

Let's say there is a big button by their avatar image, and if you clicked it, you'd be able to keep an eye on them (but not in a creepy way). What's the button say on it?

Are you Subscribing to this person? If you saw that would you understand what that meant? What about Bookmarking them? That's often understood to mean 'holding' onto this page, although that misses the passive nature of this. You could be Adding them to your Favorites list. Like being a Friend, there could be another class -- a Favorite. Is Adding a Favorite better than Subscribing? And then there is a simple Save this Reviewer.

Can any of you propose a label for this button that is immediately understandable, clearly describes what this activity is, and doesn't require a paragraph explanation.

A Guide to Similarity %

We're seeing more and more avatars on movie pages -- and that's fun: the reviews have somehow changed for us, from something like "content" into something more human, more interesting. You put a photo on your reviews and they become as much about you as they do the movie. And people now seem a lot more interested in clicking on those photos to see what else the reviewer likes and has reviewed.

But for me, the key is that Similarity %. But there is no scale, it's just a relative value. Is 50% similar good? What does it mean? So here are some comments on Sim%.

How is it calculated? Netflix uses algorithms comparable to those employed in the "Cinematch" engine which recommends movies -- but turns it around. Now computers take all the movies you've connected with -- rented is the most weighted, but also rated or even just put in your queue -- to get a signal about your taste. Then we compare those movies to the same set from each reviewer and generate a number. But its not an absolute value. Sometimes there is little direct overlap of titles, but there is overlap in "similar" titles, or more importantly, an overlap of genres. You might not have seen (or rated) the same set of movies I did, but we are interested in the same kind of movies, and this would make us similar. We do this very quickly to get a general sense of similarity.

What is a "good" match? Like I said, it's all relative -- if you and I are 60% similar, I may not know precisely what that means, but it suggests we're more similar than someone that i'm 55% similar to. The wisdom around here is that if you are 70% similar to someone, that's pretty darn similar. 80% is dead on. My very best friends -- with whom i would see ANYTHING they liked most of the time -- i'm in the high 80s with. And I'm not 90% similar to anyone I know. (Although I sometimes find reviewers who share that much taste with me). Below 50% and i tend to check carefully if i agree with their Favorite movies...

With your Friends list, we add a few more passes through the algorithm, to get an even subtler taste similarity, where we push up the emphasis on how you and I rate movies, and how common that kind of rating for a movie is (if you and I love a movie that the whole world loves, that doesn't really make us all that similar, but if you and I love a movie that everyone hates, well then, that's worth noting. So we do.)

One note: With Friends, the Sim% is asymmetric--that is, I can be more similar to you than you are to me. This is because if you have seen 10 movies and I have seen 100, including all 10 of yours, due to some intricacies in the formula, it shows a (small) difference between us--you with 10 movies will be MORE similar to me than I am to you (since I've seen so many you haven't, because there is such disproportion between our viewing histories). The presumption is that if you've only seen 10 and I've seen 100, i may have a far wider interest range than you. If you watch (or rate) 90 more, and there is still good overlap in interest, that eliminates the difference pretty much, but there is a lot of uncertainty with your smaller dataset. (We actually don't like this asymmetry very much, and are exploring that part of the equation even as we speak.) I know I was disappointed to learn that my very best (most similar) friend--who was 89% similar to me--didn't hold me in a comparable position, and I was only 80% similar to him. That was a bit of a let down. (I'm rating more movies and the difference is shrinking.)

Like the recommendation engine at Netflix, we continually improve these mathematical formulas (see the Netflix Prize). The only (somewhat cryptic) thing i'd add is that we're only scratching the surface for how many cool things we can do once we have calculated this Sim%, and you will be seeing more use of the tool throughout the year. Here's my question of the week: besides being able to find and save other people who are very similar to you, and sorting reviews based on (among other things) how similar the reviewer is to you, what ways can you imagine applying the Sim%?

Do you find it useful? Interesting?