CSE 228: Lecture 12
Video On Demand
Scalability
Let's extend the idea of a digital video store proposed in the last lecture. If we want to make this a scalable service that can extend to provide video on demand to a very large number of users, how do we proceed?
There was a study done about movie rentals that shows that 90% of all rentals are for the top 10% of movies. This is very good news for our system if we can somehow take advantage of it. However, despite the fact that many users want the same movie, most of them will want to watch it at different times during the day, a problem for us to consider.
Architecture
Let's say that we have a main storage server that stores all movies in one place, and a set of users that will want to watch some of those movies at various times during the day.

If we have enough users, it will be impossible for this one server to support all of them. How can we take advantage of the locality implied when 90% of the users want the top 10% of movies? In computer architecture, we use caching to take advantage of locality when accessing the hard disk and memory system, let's try something similar here.
In order to construct the cache, we will need to have a storage provider. This storage provider will sit in between the main server and the users and will attempt to cache the movies in a manner that will provide the video on demand services to the users below it efficiently.

How do we decide on an optimal method of caching our videos? We need to assign costs to the transmission and storage of video in order to determine the method of caching that has the smallest overall cost.
An Example
Assume that the cost to transmit a movie from S1 to S2 is $1.20 and the cost to transmit from S2 to S3 costs $0.40. Let's also assume that the cost to store one video at server S1 is $0.10 per hour, S2 is $0.20 per hour and to store a video at server S3 costs $0.50 per hour. Graphically, this looks like this:

Now we can determine our optimal caching scheme by looking at this as an optimization problem. Let's consider that as an example, User1 wants to watch a video at 2:00, User2 wants to watch the same video at 3:00, one hour later, and User3 wants to watch this video at 12:00, 9 hours after User2.
The first time we transmit, we will need to send the movie from the Main Server all the way to User1. This incurs a cost of $1.60 ($1.20 + $0.40).
Now, if we consider the second user that wants to view this same movie one hour later, we can consider the cost of storing the video at each of the different storage servers for one hour. If we send the video all the way from S1 to User2, this will incur a cost of $1.70 ($0.10 + $1.20 + $0.40). If we store the movie at S2 for an hour and transmit it to User2 we will incur a cost of $0.60 ($0.20 to store and $0.40 to transmit). Finally, if we store the movie at S3, we will incur a cost of $0.50. Since this is the method with the least cost, we should use it.
Now, let's consider the scenario with all three users. Because User3 wants the video 9 hours later, we will again have to reconsider our choices. Retransmitting from S1 all the way to User3 will incur a cost of $2.50 (9*$0.10 + $1.20 + $0.40). Storing at S2 for 9 hours and retransmitting will cost $2.20 (9*$0.20 + $0.40). Storing at S3 for 9 hours will cost $4.50. It looks like the cheapest option for us will be to store the movie at S2.
But wait, what about User2? If we are storing the video at S3 for User2, then the movie will not be available at S2 to store for 9 hours. This means that we will have to adjust the cost for storing that movie at S3 to include either storing at S2 for an additional hour or retransmitting from S3 to S2 so it can be stored there. Since the cost to store at S2 for an additional hour is cheaper than transmitting from S3 to S2, we will have to store the video at S2 for an additional hour. Now, we should reconsider our plan for User2. Since the video will need to be stored at S2 anyway, is it still cost effective for us to store the video at S3 for an hour? If we do this, our total cost will be $4.50 ($1.60 + $0.50 to store at S3 for User2 + $0.20 to also store at S2 for User3 for the additional hour + $2.20 to store at S2 for User3 for the remaining 9 hours and retransmit). If we just store the movie at S2 for both users, our total cost is reduced to $4.40 ($1.60 + $0.60 to store at S2 and transmit to User2 + $2.20 to store at S2 for User3 and retransmit).
It is clear that the optimal solution can be very complex to determine because you need to find the lowest total cost rather than optimizing each user independently. However, Professors Rangan and Papadimitriou have developed and patented a dynamic programming method to solve this problem efficiently, making this approach feasible.
Please read the paper entitled "Architectures for Personalized Multimedia" as a supplement to this material.