Synchronization is a very important problem in multimedia streaming.Multimedia objects, in general, may be composed of multiple media streams such as audio and video, whose retrieval must proceed so as to not only maintain continuity of playback of each of the constituent media streams, but also preserve the temporal relationships among them. The design of mechanisms and protocols for providing synchronous access to multimedia services over integrated networks constitutes the subject matter of this paper. Thus it is important to guarantee that the playback of the stream at different users is going to be the same or synchronous.
In general, the different media streams constituting a multimedia object may be captured or played back at different sites (such as telephones, videophones, cameras, digital HDTVs, audio speakers, etc., generically referred to as mediaphones in the rest of this paper) on the network. For example, video may be played back at HDTV display sites and audio at CD-quality speakers, each of which may be connected directly to the network via digitizers (see Figure 1). When network delays are deterministic or constant (as in analog transmission over cable TV networks), and recording/playback rates of all the users’ media capture and display sites are perfectly matched, synchronous playback is easily ensured; all that is required of a multimedia server is that it (1) instruct the mediaphones to commence playback after preset delays following the reception of the first media unit, and then (2) transmit media units to those mediaphones at their playback rate. However, in future integrated networks, factors such as congestion and queueing at network nodes are expected to introduce non-deterministic delays. Media objects may be recorded at one set of mediaphones (such as cameras belonging to video publishing and distribution houses), then stored at the multimedia server, and later played back at a different set of mediaphones (such as digital HDTVs belonging to residential consumers); and there may not be any commonality in the time of existence of connections to media recording and playback sites, rendering synchronization between clocks virtually impossible among those sites. In all such environments, additional mechanisms are essential for enforcing synchronization between media.
For example as shown in figure 1 there are different media units which may be requesting for the same stream and synchronization should guarantee that all the units are receiving the same information at any given time

Providing a multimedia on-demand service over an integrated metropolitan area network is a multimedia server, that stores multimedia objects on a large array of high capacity disks. Subscribers to the service can retrieve media units (such as video frames and audio samples) belonging to multimedia objects in real-time from the multimedia server over the network, and play the media units back at their mediaphones (see Figure 2). The integrated network that interconnects the server and subscribers’ mediaphones is assumed to impose delays bounded between delta_min and delta_max for each media or feedback unit transmitted. Whereas delta_min is close to the smallest propagation delay of the network, delta_max must not exceed a few hundred milliseconds if the network is to support real-time, interactive multimedia applications.Bounds on network delays can be guaranteed via resource reservation at the time of start of playback of a multimedia object, and in particular, by employing admission control, real-time scheduling and buffer reservation schemes at network nodes, at the multimedia server and at the mediaphones.
The mediaphones are simple display sites that are capable of receiving and playing back media units as well as transmitting feedback units (which are replicas of media units, except that they are devoid of data). Since the mediaphones are assumed not to possess globally synchronized clocks, there may be variations in their playback rates, with the maximum fractional drift in the playback period, theta of a media unit at any mediaphone being bounded by +/- p.
Although retrieval of media streams for playback at different mediaphones commences simultaneously, the maximum extent by which the slowest mediaphone (e.g., video playback: call it the slave device) may lag behind the fastest mediaphone (e.g., audio playback: call it the master device) at the instant the master mediaphone has played back media unit mu_m is given by figure 3.
Proof: Let mu_m be the media unit which is being played back at the master
(fastest) mediaphone simultaneously with the
playback of mu_s at the slave-slowest mediaphone.
The difference mu_m - mu_s is calculated as follows:
1. the time for the start of playback of mu_m: delay is delta_min, plus
mu_m times theta*(1-p) since it is the fastest device (hence shortest period).
2. the time for the start of playback of mu_s: delay is delta_max, plus
mu_s times theta*(1+p) since it is the slowest device (hence longest period).
Equate the two and solve for mu_m-mu_s: both in terms of mu_m and mu_s:
mu_m-mu_s = ((delta_max - delta_min)+2*theta*p*mu_m)/(theta*(1+p))
mu_m-mu_s = ((delta_max - delta_min)+2*theta*p*mu_s)/(theta*(1-p))