[FFmpeg-devel] Microsoft Smooth Streaming

Marcus Nascimento marcus.cps at gmail.com
Wed Oct 26 15:35:16 CEST 2011

On Wed, Oct 26, 2011 at 6:50 AM, Nicolas George <
nicolas.george at normalesup.org> wrote:

> Le quartidi 4 brumaire, an CCXX, Marcus Nascimento a écrit :
> > Please, check the answers bellow.
> That was more than perfect. Thanks.
> > First of all, Microsoft Smooth Streaming basic idea is to encode the same
> > video in multiple bitrates. The client can decide which bitrate to use.
> At
> > any time it is possible to switch to another bitrate based on bandwidth
> > availability and other measurements.
> > Each encoding bitrate will originate an independent ISMV file (IIS Smooth
> > Media Video I supose).
> > The encoding keeps focus in the idea of fragmented structure that ISOFF
> (ISO
> > File Format - the MP4 file format) offers. Keyframes are generated
> regularly
> > and equally spaced in all ISMV files (2s).
> > This is more restrictive than regular encoding procedures that allow some
> > flexibility on keyframe intervals (I believe it, since I'm not an
> specialist
> > on that).
> > Important to say that all fragments always start with a keyframe.
> > Each ISOFF fragment is perfectly aligned between different bitrates (in
> > terms of time, of course. Data size may vary drastically). That alignment
> > allows the client to request different bitrates for one fragment and
> switch
> > to another bitrate in the next fragment.
> >
> > The ISMV file format is called PIFF and is based on the ISOFF with a few
> > additions. There are 3 uuid box types that are dedicated to DRM purposes
> (I
> > wont touch them here). Thus the meaning of PIFF: Protected Interoperable
> > File Format. The PIFF brand (ftyp box value) is "piff".
> > More on PIFF format here: http://go.microsoft.com/?linkid=9682897
> >
> > The server side (in the MS implementation) is just an extension to the
> > called IIS Media Services.
> > That is just a web service that accepts HTTP requests with a custom
> > formatted URL.
> > The base URL is something like http://domain.com/video.ism (note that is
> not
> > ISMV), which is never requested.
> >
> > By the time the client wants to play a video, it will request a Manifest
> > file. The URL is <baseUrl>/Manifest.
> For now, it sounds quite straightforward.
> > The Manifest is just a XML file that provides some information regarding
> > different streams and other information.
> > Here is a basic example (modified parts of the original found here:
> >
> http://playready.directtaps.net/smoothstreaming/SSWSS720H264/SuperSpeedway_720.ism/Manifest
> > ):
> Do you know how much of the features of XML the manifest is allowed to use?
> Writing a parser for well-balanced-tags-with-quoted-attributes is an easy
> task, while supporting namespaces, external entities, processing
> instructions, etc., is not.
I have to check this out.
For simplicity, I'll stick with a simple XML parser without namespaces,
external entities and other stuff.
It may be emproved in the future to be more correct about that.
Something like: Let's make it work first.

> > We can see it says the version of the smooth stream media and the
> duration
> > (this is measured in 1 / 10,000,000 seconds).
> > Next we see the video section which says each quality level has 4 chunks
> > (fragments), with 2 quality levels available. It also says the video
> > dimensions and the URL format.
> > Next it gives information about each bitrate with codec information and
> > codec private data (I believe it is used to configure the codec is a
> opaque
> > way).
> > Next it lists each fragment size. The first fragment would be referenced
> as
> > 0 (zero), and the others as a sum of previous fragments size. I'm not
> sure
> > exactly what those values mean.
> > Next we have the same structure for audio description.
> Ok.
> > After getting the Manifest file, the client must decide which quality
> level
> > is best suited for the device and its resources.
> > It is not clear to me on what parameters it bases it's decisions. I heard
> > about size of the screen and its resolution, computing power, download
> > bandwidth, etc.
> I do not think you need to concern yourself with the heuristics for that:
> that is for the application to decide, not the library implementing the
> protocol. The library only needs to provide the information necessary to
> make the decision.
My concern here is how the application would know how long it took to get a
fragment, to give an example.
That would require a lot of interactions between ffmpeg and the application
during playback.
As everything else related to ffmpeg, I need to study a little first but
I'll keep that in mind.

> Other may disagree, but I believe that if you manage to implement anything
> at all (for example reading the first, or the best stream of each type, or
> maybe reading all streams while honoring the discard flag), that would be a
> very good starting point.
Perfect. Reading a single stream would be a huge progress. I'll aim to that.

> > As soon as the quality level is chosen, I suppose the decoder has to be
> > configured in a suitable way, using the CodecPrivateData information
> > provided.
> > The client then will start requesting fragments following the URL pattern
> > given in the Manifest.
> > To request the first fragment for the first quality level, it would
> follow
> > the <baseUrl>/QualityLevel(0)/Fragments(video=0).
> > To request the forth fragment for the second quality level, it would
> follow
> > the <baseUrl>/QualityLevel(1)/Fragments(video=60060000).
> > It is still possible to request just the audio following the same idea.
> For
> > instance: <baseUrl>/QualityLevels(0)/Fragments(audio=20201360).
> >
> > Each fragment received is arranged in PIFF wire format. In other words:
> > Contains exactly one moof box and exactly one mdat box and nothing
> > more (check MP4 specs for more info).
> > Of course there are internal boxes to those if applicable. It may contain
> > custom uuid boxes designed to allow DRM protection. Lets not consider
> them
> > here.
> > I'm not sure which information I can get from the moof boxes, but I
> assume
> > it would be relevant for the demuxer only, since the codec would only
> work
> > on the mdat contained opaque data. Correct me if I'm wrong, please.
> >
> > The client would apply some heuristics while requesting fragments and
> > sometime it may decide to switch to another quality level. I suppose it
> > would have to reconfigure the decoder and repeat it over and over until
> the
> > end of that.
> >
> > I'm not sure how a decoder works, but I believe there is a way to
> configure
> > that in order to receive future "injected" data.
> >
> > If you get all the way here, I really thank you!
> > I wonder how to fit all this into the ffmpeg structure.
> I will elaborate slightly on top of what Michael wrote.
> The "standard" scheme for ffmpeg has three completely separate layers:
>        protocol -> demuxer -> codecs
> The protocol takes a string (an URL of some kind) and outputs a stream of
> bytes. The most basic protocol is the file protocol, which takes a file
> name
> and just reads that file. Protocols can be nested (for example mmsh
> internally uses http which internally uses TCP), but that is an
> implementation detail that is not seen in the API (yet; there are plans to
> do something for complex multistreams protocols).
> The demuxer reads a stream of bytes and then first populates a global data
> structure, including one or several streams. Then it outputs a series of
> packets. Packets are a sequence of bytes attached to a few simple
> informations: size, timestamp, stream of attachment.
> The codecs decode the packets. There is normally one codec per stream,
> except if that stream is ignored. The codec initialize itself with the data
> in the stream data structure, then accepts packets and possibly outputs
> video frames, audio PCM data or anything else (subtitles).
> AFAIK, in ffmpeg, the separation between demuxers and codecs has no real
> exception. Which means that you should be able to ignore completely the
> problem of codecs.
> On the other hand protocols and demuxer sometimes need to work hand in
> hand.
> In your particular case, the problem may be as simple as getting your
> protocol handler to resynthetize proper ISOM headers and concatenate the
> data to obtain a valid non-seekable ISOM stream.
> At a later time, the ISOM demuxer could be adapted to be able to use the
> seek-by-timestamp (read_seek) method that protocols can provide.
> But that is just random thoughts, and I do not know enough of the ISOM
> particulars to know if that is workable.
That helps a lot. Now I have a good idea on how things work. I'll dig into
the code.

> > I'm not that familiar with RTP but from what I've ready in the past few
> > minutes it sounds similar.
> From what you described, RTP and SDP files are too simple to be of any use
> by comparison.
> > Yes. I've seen something about it. It looks suitable for the case.
> > It may be my starting point for studying.
> I believe that you can use the HTTP protocol handler directly as a backend,
> like mmsh does.
I'll check that.
Thank you very much.

> Good luck.
> Regards,
> --
>  Nicolas George
> Version: GnuPG v1.4.11 (GNU/Linux)
> iEYEARECAAYFAk6nyeUACgkQsGPZlzblTJMG6ACeLxbpvgLJr/Nk3qPP9/i84j8U
> D7kAoMpWtuiPAwVEqO3reaTmKfb0ETbh
> =oDvU
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Marcus Nascimento

More information about the ffmpeg-devel mailing list