[FFmpeg-devel] Microsoft Smooth Streaming

Wed Oct 26 01:25:22 CEST 2011

Please, check the answers bellow.

Thank you very much in advance.

On Tue, Oct 25, 2011 at 3:54 PM, Nicolas George <
nicolas.george at normalesup.org> wrote:

> Le quartidi 4 brumaire, an CCXX, Marcus Nascimento a écrit :
> > I'd like to extend FFMpeg to support Microsoft Smooth Streaming
> (streaming
> > playback), the same way it has been done by all the available Silverlight
> > players.
>
> Contributions are always welcome on principle.
>
> > By now I do not intend to dump data to a file to be played locally or
> > anything like that. And probably will never intend to do that. I just
> want
> > to play it.
>
> If it can play it, then it can also dump it to a file. I hope you were not
> counting otherwise.
>
>
Definitely not. I was only worried about legal issues. Don't want to cause
trouble to FFMpeg or something like that.

> > I did some research in this mail list and find out some posts that talked
> > about that before.
> > However I couldn't find in depth information or anything beyond the point
> > I'm stuck.
> >
> > I've done a lot of research on MS Smooth Streaming theory of operation,
> > studied some ISOFF (and PIFF) and some more.
> > It is pretty clear to me how MS Smooth Streaming works. Now it is time to
> > focus on how to do that in the FFMpeg way.
> >
> > First things first, I'd like to know how a streaming should be processed
> in
> > order to be played by FFMpeg.
>
> I believe you would receive more relevant replies faster if you took a few
> minutes to describe an overview of how the protocol works.
>
>
Right away. I'll give as many details as necessary here. Prepare yourself
for some reading!

First of all, Microsoft Smooth Streaming basic idea is to encode the same
video in multiple bitrates. The client can decide which bitrate to use. At
any time it is possible to switch to another bitrate based on bandwidth
availability and other measurements.
Each encoding bitrate will originate an independent ISMV file (IIS Smooth
Media Video I supose).
The encoding keeps focus in the idea of fragmented structure that ISOFF (ISO
File Format - the MP4 file format) offers. Keyframes are generated regularly
and equally spaced in all ISMV files (2s).
This is more restrictive than regular encoding procedures that allow some
flexibility on keyframe intervals (I believe it, since I'm not an specialist
on that).
Important to say that all fragments always start with a keyframe.
Each ISOFF fragment is perfectly aligned between different bitrates (in
terms of time, of course. Data size may vary drastically). That alignment
allows the client to request different bitrates for one fragment and switch
to another bitrate in the next fragment.

The ISMV file format is called PIFF and is based on the ISOFF with a few
additions. There are 3 uuid box types that are dedicated to DRM purposes (I
wont touch them here). Thus the meaning of PIFF: Protected Interoperable
File Format. The PIFF brand (ftyp box value) is "piff".
More on PIFF format here: http://go.microsoft.com/?linkid=9682897

The server side (in the MS implementation) is just an extension to the IIS
called IIS Media Services.
That is just a web service that accepts HTTP requests with a custom
formatted URL.
The base URL is something like http://domain.com/video.ism (note that is not
ISMV), which is never requested.

By the time the client wants to play a video, it will request a Manifest
file. The URL is <baseUrl>/Manifest.
The Manifest is just a XML file that provides some information regarding
different streams and other information.
Here is a basic example (modified parts of the original found here:
http://playready.directtaps.net/smoothstreaming/SSWSS720H264/SuperSpeedway_720.ism/Manifest
):

<SmoothStreamingMedia MajorVersion="2" MinorVersion="1"
Duration="1209510000">
<StreamIndex Type="video" Name="video" Chunks="4" QualityLevels="2"
MaxWidth="1280" MaxHeight="720" DisplayWidth="1280" DisplayHeight="720"
Url="QualityLevels({bitrate})/Fragments(video={start time})">
<QualityLevel Index="0" Bitrate="2962000" FourCC="H264" MaxWidth="1280"
MaxHeight="720"
CodecPrivateData="000000016764001FAC2CA5014016EFFC100010014808080A000007D200017700C100005A648000B4C9FE31C6080002D3240005A64FF18E1DA12251600000000168E9093525"/>
<QualityLevel Index="1" Bitrate="2056000" FourCC="H264" MaxWidth="992"
MaxHeight="560"
CodecPrivateData="000000016764001FAC2CA503E047BFF040003FC52020202800001F480005DC03030003EBE8000FAFAFE31C6060007D7D0001F5F5FC6387684894580000000168E9093525"/>
<c d="20020000"/>
<c d="20020000"/>
<c d="20020000"/>
<c d="6670001"/>
</StreamIndex>
<StreamIndex Type="audio" Index="0" Name="audio" Chunks="4"
QualityLevels="1" Url="QualityLevels({bitrate})/Fragments(audio={start
time})">
<QualityLevel FourCC="AACL" Bitrate="128000" SamplingRate="44100"
Channels="2" BitsPerSample="16" PacketSize="4" AudioTag="255"
CodecPrivateData="1210"/>
<c d="20201360"/>
<c d="19969161"/>
<c d="19969161"/>
<c d="8126985"/>
</StreamIndex>
</SmoothStreamingMedia>

We can see it says the version of the smooth stream media and the duration
(this is measured in 1 / 10,000,000 seconds).
Next we see the video section which says each quality level has 4 chunks
(fragments), with 2 quality levels available. It also says the video
dimensions and the URL format.
Next it gives information about each bitrate with codec information and
codec private data (I believe it is used to configure the codec is a opaque
way).
Next it lists each fragment size. The first fragment would be referenced as
0 (zero), and the others as a sum of previous fragments size. I'm not sure
exactly what those values mean.
Next we have the same structure for audio description.

After getting the Manifest file, the client must decide which quality level
is best suited for the device and its resources.
It is not clear to me on what parameters it bases it's decisions. I heard
about size of the screen and its resolution, computing power, download
bandwidth, etc.
As soon as the quality level is chosen, I suppose the decoder has to be
configured in a suitable way, using the CodecPrivateData information
provided.
The client then will start requesting fragments following the URL pattern
given in the Manifest.
To request the first fragment for the first quality level, it would follow
the <baseUrl>/QualityLevel(0)/Fragments(video=0).
To request the forth fragment for the second quality level, it would follow
the <baseUrl>/QualityLevel(1)/Fragments(video=60060000).
It is still possible to request just the audio following the same idea. For
instance: <baseUrl>/QualityLevels(0)/Fragments(audio=20201360).

Each fragment received is arranged in PIFF wire format. In other words:
Contains exactly one moof box and exactly one mdat box and nothing
more (check MP4 specs for more info).
Of course there are internal boxes to those if applicable. It may contain
custom uuid boxes designed to allow DRM protection. Lets not consider them
here.
I'm not sure which information I can get from the moof boxes, but I assume
it would be relevant for the demuxer only, since the codec would only work
on the mdat contained opaque data. Correct me if I'm wrong, please.

The client would apply some heuristics while requesting fragments and
sometime it may decide to switch to another quality level. I suppose it
would have to reconfigure the decoder and repeat it over and over until the
end of that.

I'm not sure how a decoder works, but I believe there is a way to configure
that in order to receive future "injected" data.

If you get all the way here, I really thank you!
I wonder how to fit all this into the ffmpeg structure.
If anyone can point me some direction, I'd be very thankful.
There is still a few comments bellow...

For the rest, I am just shooting in the dark, as I know nothing of the
> protocol.
>
> > I see two possible scenarios:
> >
> > 1 - An external code make all HTTP requests to obtain the manifest XML
> file,
> > use that to configure the decoder. Then makes further HTTP requests to
> > obtain fragments that will be parsed by the demuxer (probably a custom
> one
> > based on the ISOM already available).
>
> This looks like the manifest XML file has a role similar to the SDP file
> with RTP streams. You could look at how that works to see if that suits
> you.
>
>
I'm not that familiar with RTP but from what I've ready in the past few
minutes it sounds similar.

> > 2 - A very simple external code just request FFMpeg to play a smooth
> > streaming media. FFMpeg will detect this is a HTTP based media and will
> > request the manifest file for that (I believe I'd have to create a custom
> > HTTP based solution for that). By the time the manifest is available,
> ffmpeg
> > would configure the decoder. Then makes further HTTP requests same way as
> in
> > 1.
>
> There is already HTTP client code, as surely you know.
>
>
Yes. I've seen something about it. It looks suitable for the case.
It may be my starting point for studying. But I still feel like in need for
some big picture on how ffmpeg works in general.

> Regards,
>
> --
>  Nicolas George
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
>
> iEYEARECAAYFAk6m9+YACgkQsGPZlzblTJNjRQCgu8TU6PEtihcBr82qpYDFn6jW
> 4lcAn1ZL0rC80YWbD+BOC/xpFM3t94Gg
> =KlVB
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>

-- 
Marcus Nascimento