[FFmpeg-devel] Microsoft Smooth Streaming

Wed Oct 26 15:38:00 CEST 2011

Good point. I've read a little about RTMP some time ago.
This approach seems to be quiet suitable for my needs.
I'll study that.

Thanks very much.

On Wed, Oct 26, 2011 at 9:20 AM, Ratin <ratin3 at gmail.com> wrote:

> On Tue, Oct 25, 2011 at 4:25 PM, Marcus Nascimento <marcus.cps at gmail.com>
> wrote:
> > Please, check the answers bellow.
> >
> > Thank you very much in advance.
> >
> >
> > On Tue, Oct 25, 2011 at 3:54 PM, Nicolas George <
> > nicolas.george at normalesup.org> wrote:
> >
> >> Le quartidi 4 brumaire, an CCXX, Marcus Nascimento a écrit :
> >> > I'd like to extend FFMpeg to support Microsoft Smooth Streaming
> >> (streaming
> >> > playback), the same way it has been done by all the available
> Silverlight
> >> > players.
> >>
> >> Contributions are always welcome on principle.
> >>
> >> > By now I do not intend to dump data to a file to be played locally or
> >> > anything like that. And probably will never intend to do that. I just
> >> want
> >> > to play it.
> >>
> >> If it can play it, then it can also dump it to a file. I hope you were
> not
> >> counting otherwise.
> >>
> >>
> > Definitely not. I was only worried about legal issues. Don't want to
> cause
> > trouble to FFMpeg or something like that.
> >
> >
> >> > I did some research in this mail list and find out some posts that
> talked
> >> > about that before.
> >> > However I couldn't find in depth information or anything beyond the
> point
> >> > I'm stuck.
> >> >
> >> > I've done a lot of research on MS Smooth Streaming theory of
> operation,
> >> > studied some ISOFF (and PIFF) and some more.
> >> > It is pretty clear to me how MS Smooth Streaming works. Now it is time
> to
> >> > focus on how to do that in the FFMpeg way.
> >> >
> >> > First things first, I'd like to know how a streaming should be
> processed
> >> in
> >> > order to be played by FFMpeg.
> >>
> >> I believe you would receive more relevant replies faster if you took a
> few
> >> minutes to describe an overview of how the protocol works.
> >>
> >>
> > Right away. I'll give as many details as necessary here. Prepare yourself
> > for some reading!
> >
> > First of all, Microsoft Smooth Streaming basic idea is to encode the same
> > video in multiple bitrates. The client can decide which bitrate to use.
> At
> > any time it is possible to switch to another bitrate based on bandwidth
> > availability and other measurements.
> > Each encoding bitrate will originate an independent ISMV file (IIS Smooth
> > Media Video I supose).
> > The encoding keeps focus in the idea of fragmented structure that ISOFF
> (ISO
> > File Format - the MP4 file format) offers. Keyframes are generated
> regularly
> > and equally spaced in all ISMV files (2s).
> > This is more restrictive than regular encoding procedures that allow some
> > flexibility on keyframe intervals (I believe it, since I'm not an
> specialist
> > on that).
> > Important to say that all fragments always start with a keyframe.
> > Each ISOFF fragment is perfectly aligned between different bitrates (in
> > terms of time, of course. Data size may vary drastically). That alignment
> > allows the client to request different bitrates for one fragment and
> switch
> > to another bitrate in the next fragment.
> >
> > The ISMV file format is called PIFF and is based on the ISOFF with a few
> > additions. There are 3 uuid box types that are dedicated to DRM purposes
> (I
> > wont touch them here). Thus the meaning of PIFF: Protected Interoperable
> > File Format. The PIFF brand (ftyp box value) is "piff".
> > More on PIFF format here: http://go.microsoft.com/?linkid=9682897
> >
> > The server side (in the MS implementation) is just an extension to the
> IIS
> > called IIS Media Services.
> > That is just a web service that accepts HTTP requests with a custom
> > formatted URL.
> > The base URL is something like http://domain.com/video.ism (note that is
> not
> > ISMV), which is never requested.
> >
> > By the time the client wants to play a video, it will request a Manifest
> > file. The URL is <baseUrl>/Manifest.
> > The Manifest is just a XML file that provides some information regarding
> > different streams and other information.
> > Here is a basic example (modified parts of the original found here:
> >
> http://playready.directtaps.net/smoothstreaming/SSWSS720H264/SuperSpeedway_720.ism/Manifest
> > ):
> >
> > <SmoothStreamingMedia MajorVersion="2" MinorVersion="1"
> > Duration="1209510000">
> > <StreamIndex Type="video" Name="video" Chunks="4" QualityLevels="2"
> > MaxWidth="1280" MaxHeight="720" DisplayWidth="1280" DisplayHeight="720"
> > Url="QualityLevels({bitrate})/Fragments(video={start time})">
> > <QualityLevel Index="0" Bitrate="2962000" FourCC="H264" MaxWidth="1280"
> > MaxHeight="720"
> >
> CodecPrivateData="000000016764001FAC2CA5014016EFFC100010014808080A000007D200017700C100005A648000B4C9FE31C6080002D3240005A64FF18E1DA12251600000000168E9093525"/>
> > <QualityLevel Index="1" Bitrate="2056000" FourCC="H264" MaxWidth="992"
> > MaxHeight="560"
> >
> CodecPrivateData="000000016764001FAC2CA503E047BFF040003FC52020202800001F480005DC03030003EBE8000FAFAFE31C6060007D7D0001F5F5FC6387684894580000000168E9093525"/>
> > <c d="20020000"/>
> > <c d="20020000"/>
> > <c d="20020000"/>
> > <c d="6670001"/>
> > </StreamIndex>
> > <StreamIndex Type="audio" Index="0" Name="audio" Chunks="4"
> > QualityLevels="1" Url="QualityLevels({bitrate})/Fragments(audio={start
> > time})">
> > <QualityLevel FourCC="AACL" Bitrate="128000" SamplingRate="44100"
> > Channels="2" BitsPerSample="16" PacketSize="4" AudioTag="255"
> > CodecPrivateData="1210"/>
> > <c d="20201360"/>
> > <c d="19969161"/>
> > <c d="19969161"/>
> > <c d="8126985"/>
> > </StreamIndex>
> > </SmoothStreamingMedia>
> >
> > We can see it says the version of the smooth stream media and the
> duration
> > (this is measured in 1 / 10,000,000 seconds).
> > Next we see the video section which says each quality level has 4 chunks
> > (fragments), with 2 quality levels available. It also says the video
> > dimensions and the URL format.
> > Next it gives information about each bitrate with codec information and
> > codec private data (I believe it is used to configure the codec is a
> opaque
> > way).
> > Next it lists each fragment size. The first fragment would be referenced
> as
> > 0 (zero), and the others as a sum of previous fragments size. I'm not
> sure
> > exactly what those values mean.
> > Next we have the same structure for audio description.
> >
> > After getting the Manifest file, the client must decide which quality
> level
> > is best suited for the device and its resources.
> > It is not clear to me on what parameters it bases it's decisions. I heard
> > about size of the screen and its resolution, computing power, download
> > bandwidth, etc.
> > As soon as the quality level is chosen, I suppose the decoder has to be
> > configured in a suitable way, using the CodecPrivateData information
> > provided.
> > The client then will start requesting fragments following the URL pattern
> > given in the Manifest.
> > To request the first fragment for the first quality level, it would
> follow
> > the <baseUrl>/QualityLevel(0)/Fragments(video=0).
> > To request the forth fragment for the second quality level, it would
> follow
> > the <baseUrl>/QualityLevel(1)/Fragments(video=60060000).
> > It is still possible to request just the audio following the same idea.
> For
> > instance: <baseUrl>/QualityLevels(0)/Fragments(audio=20201360).
> >
> > Each fragment received is arranged in PIFF wire format. In other words:
> > Contains exactly one moof box and exactly one mdat box and nothing
> > more (check MP4 specs for more info).
> > Of course there are internal boxes to those if applicable. It may contain
> > custom uuid boxes designed to allow DRM protection. Lets not consider
> them
> > here.
> > I'm not sure which information I can get from the moof boxes, but I
> assume
> > it would be relevant for the demuxer only, since the codec would only
> work
> > on the mdat contained opaque data. Correct me if I'm wrong, please.
> >
> > The client would apply some heuristics while requesting fragments and
> > sometime it may decide to switch to another quality level. I suppose it
> > would have to reconfigure the decoder and repeat it over and over until
> the
> > end of that.
> >
> > I'm not sure how a decoder works, but I believe there is a way to
> configure
> > that in order to receive future "injected" data.
> >
> > If you get all the way here, I really thank you!
> > I wonder how to fit all this into the ffmpeg structure.
> > If anyone can point me some direction, I'd be very thankful.
> > There is still a few comments bellow...
> >
> >
> > For the rest, I am just shooting in the dark, as I know nothing of the
> >> protocol.
> >>
> >> > I see two possible scenarios:
> >> >
> >> > 1 - An external code make all HTTP requests to obtain the manifest XML
> >> file,
> >> > use that to configure the decoder. Then makes further HTTP requests to
> >> > obtain fragments that will be parsed by the demuxer (probably a custom
> >> one
> >> > based on the ISOM already available).
> >>
> >> This looks like the manifest XML file has a role similar to the SDP file
> >> with RTP streams. You could look at how that works to see if that suits
> >> you.
> >>
> >>
> > I'm not that familiar with RTP but from what I've ready in the past few
> > minutes it sounds similar.
> >
> >
> >> > 2 - A very simple external code just request FFMpeg to play a smooth
> >> > streaming media. FFMpeg will detect this is a HTTP based media and
> will
> >> > request the manifest file for that (I believe I'd have to create a
> custom
> >> > HTTP based solution for that). By the time the manifest is available,
> >> ffmpeg
> >> > would configure the decoder. Then makes further HTTP requests same way
> as
> >> in
> >> > 1.
> >>
> >> There is already HTTP client code, as surely you know.
> >>
> >>
> > Yes. I've seen something about it. It looks suitable for the case.
> > It may be my starting point for studying. But I still feel like in need
> for
> > some big picture on how ffmpeg works in general.
> >
> >
> >> Regards,
> >>
>
> I think a close match for this would be RTMP support in FFmpeg, the
> complexity of negotiations with the server is handled by an external
> library like librtmp (FFmpeg compiled with librtmp support enabled),
> which feeds the data of a particular chunk size for the decoder to
> decode. RTMP protocol supports negotiating a bandwidth as part of its
> handshake between the server and the client (based on network load),
> the servers usually serves three different quality levels, but they
> are not as scalable (that would be H.264 SVC AFAIK) like you
> described. Negotiations are handled within librtmp, FFmpeg's interface
> to this external library is done in libavformat/flvdec.c.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

-- 
Marcus Nascimento