Thursday, December 13, 2007

Friends Problems, Take 2

This weekend we will release some fixes that will solve this problem for many of you. Unfortunately it will not solve it for all of you. Here's the deal: When you go the the Friends page, the website loads up those sliders with the movies your friends have rated. It pulls in your notes. It calculates your sim% to each friend. And it does a few other things. To calculate the sim% it is doing an analysis and comparison of your ratings to each of your friends. We give the page a time period to get all this done. If it doesn't complete its tasks in this period, it "times out" and you get some kind of error message.

The problem appears to be that there are cases where a member has a lot of friends. Not one, not 3. Not 15. But lots. Say, 30. (Only a tiny % of members have this many friends, btw). But lots of friends isn't enough alone to be an issue. We test this with lots of friends. But if a significant number of those friends ALSO are heavy raters -- say with many thousands of movie ratings (say, 5K or 7K!) -- it is taking too long to pull in their movie rating data and do the sim% calculation, and a few other things, such that it runs out of time. And thus the error message.

It's not that Friends is down. Friends is NOT down. >99% of you are not having a problem. But ironically, the few that have this problem are people we like a lot - you have lots of friends, you write notes, you rate lots of movies; you usually write reviews too, and make lists--you're all round good Netflix Samaritans. So we are taking this seriously. (For the record, this problem only affects a few hundred of you.)

The problem starts and stops for some people because there are other factors - for instance, the time of day you try it (because of changes in server loads), or which server you get when you log in; it might matter which pages you view in which order. We're still looking at these components.

The quick fix is to widen the window of the time period, which will hopefully allow more time for these events to transpire before triggering a timeout. this will go into effect this weekend, and will improve things but not eliminate the problem for some. It is a somewhat inelegant solution and there are clearly smarter ways to handle this. We could change the way we calculate and store sim%, for instance, or change the way we deal with notes. There are many possible ways to address this, but all are long-term, and none can happen immediately. We are looking at a range of solutions, and will keep you posted on our progress.

For those affected, sorry for the trouble.