[FFmpeg-devel] Integrating the mod engine into FFmpeg - what is the best design approach?

Fri Jul 16 23:14:06 CEST 2010

On 07/16/2010 08:50 PM, Michael Niedermayer wrote:
> On Fri, Jul 16, 2010 at 04:15:26PM +0200, Vitor Sessak wrote:
>> On 07/16/2010 03:46 PM, Sebastian Vater wrote:
>>> Hello dears!
>>>
>>> I had a discussion with Vitor and Stefano about the best way to
>>> integrate the mod engine into FFmpeg.
>>>
>>> Vitor's idea doing this was (quoting him from the mail):
>>> Note that in the way we are suggesting, the MOD decoder decodes the bulk
>>> of the file to a format-independent Big Sound Struct (BSS). With our
>>> approach:
>>>
>>> 1- The MOD demuxer will do three things:
>>>       a) Probe if a file is a .mod
>>>       b) Extract metadata
>>>       c) Pass the rest of the file in an AVPacket.
>>> 2- The MOD decoder does just one thing: decode a AVPacket to a BSS. It
>>> does not know anything about the player (it doesn't even know _if_ it
>>> will be played or converted to other format or fed to a visualization
>>> code).
>>> 3- Libavsequencer does just one thing: transforming a BSS in PCM audio.
>>> It knows nothing about file formats (it don't care or know if the BSS
>>> was made from a MOD file or recorded from a MIDI keyboard).
>>>
>>> That's why we insist in starting with the implementation of MOD ->   XM
>>> conversion: it is much simpler than MOD ->   PCM conversion, it doesn't
>>> need an implementation of libavsequencer.
>>>
>>>                            mod file - metadata                      BSS +
>>>                                                               sequencer
>>> SAMPLES
>>> MOD file -->   MOD demuxer -------------------->   MOD decoder
>>> ------------------>   application
>>>
>>> Vitor summarized the advantages of his approach as follows:
>>> 1- Good coding practice enforcing code modularity
>>> 2- Allows for conversion from a format with more features to one with
>>> less doing no mixing or sampling
>>> 3- Makes each file format very modular (just reading the bitstream and
>>> filling up BSS)
>>> 4- Better integration with the way FFmpeg works ATM
>>> 5- No libavsequencer needed to do conversion or visualization or edition
>>> 6- No need for new API calls for applications that want to access the
>>> BSS (just plain old avcodec_decode_frame())
>>> 7- At last, giving a simple goal to the SoC: getting all the code
>>> besides lavs/ committed.
>>>
>>> Since I mostly agree to his approach now, I decided to take that solely
>>> as a starting point of discussion.
>>
>> That is a pretty good description of what I initially suggested. I'd like
>> just to add that in this approach, the decoder would output the BSS as some
>> new SAMPLE_FMT_SEQUENCER.
>>
>>> I see just one disadvantage here, simply extraction of metadata and
>>> removing it, passing the rest to AVPacket requires parsing of all module
>>> files twice and also manipulating them (correct the offsets, etc.). This
>>> would require duplicate code in the demuxer and decoder, which I would
>>> like to avoid, if possible.
>>
>> And your first idea did not have this problem. Since it is not a bad idea
>> either, I'd like to explain it to see what the rest of the community think.
>> In Sebastian's original approach, the demuxer would decode the file to a
>> BSS an output it in an AVPacket. It would them define a CODEC_ID_SEQUENCER,
>> and the decoder would be just a wrapper to libavsequencer to make the BSS
>> ->  PCM conversion.
>>
>> The advantage of this approach is that the concept of demuxing/decoder does
>> not make much sense for these formats, so this avoid the artificial
>> distriction. Moreover, it makes a nice distinction of transcoding from one
>> MOD format to other (with -acodec copy) to decoding it to PCM. The
>> disadvantages is that API-wise it's less clear for external applications to
>> get the BSS data (reading the AVPacket payload). Besides, all the
>> bit-reading API is part of lavc.
>
> dont forget that we would like to be able to store mod in avi/mp4/nut
> together with a video stream. If AVPacket.data is some kind of struct
> this is not cleanly possible anymore

Is that really a sane idea?

And crazy ideas apart, do you see any other technical merits of the 
different approaches?

-Vitor