[FFmpeg-devel] Integrating the mod engine into FFmpeg - what is the best design approach?

Fri Jul 16 23:53:24 CEST 2010

On Fri, Jul 16, 2010 at 11:14:06PM +0200, Vitor Sessak wrote:
> On 07/16/2010 08:50 PM, Michael Niedermayer wrote:
>> On Fri, Jul 16, 2010 at 04:15:26PM +0200, Vitor Sessak wrote:
>>> On 07/16/2010 03:46 PM, Sebastian Vater wrote:
>>>> Hello dears!
>>>>
>>>> I had a discussion with Vitor and Stefano about the best way to
>>>> integrate the mod engine into FFmpeg.
>>>>
>>>> Vitor's idea doing this was (quoting him from the mail):
>>>> Note that in the way we are suggesting, the MOD decoder decodes the bulk
>>>> of the file to a format-independent Big Sound Struct (BSS). With our
>>>> approach:
>>>>
>>>> 1- The MOD demuxer will do three things:
>>>>       a) Probe if a file is a .mod
>>>>       b) Extract metadata
>>>>       c) Pass the rest of the file in an AVPacket.
>>>> 2- The MOD decoder does just one thing: decode a AVPacket to a BSS. It
>>>> does not know anything about the player (it doesn't even know _if_ it
>>>> will be played or converted to other format or fed to a visualization
>>>> code).
>>>> 3- Libavsequencer does just one thing: transforming a BSS in PCM audio.
>>>> It knows nothing about file formats (it don't care or know if the BSS
>>>> was made from a MOD file or recorded from a MIDI keyboard).
>>>>
>>>> That's why we insist in starting with the implementation of MOD ->   XM
>>>> conversion: it is much simpler than MOD ->   PCM conversion, it doesn't
>>>> need an implementation of libavsequencer.
>>>>
>>>>                            mod file - metadata                      BSS 
>>>> +
>>>>                                                               sequencer
>>>> SAMPLES
>>>> MOD file -->   MOD demuxer -------------------->   MOD decoder
>>>> ------------------>   application
>>>>
>>>> Vitor summarized the advantages of his approach as follows:
>>>> 1- Good coding practice enforcing code modularity
>>>> 2- Allows for conversion from a format with more features to one with
>>>> less doing no mixing or sampling
>>>> 3- Makes each file format very modular (just reading the bitstream and
>>>> filling up BSS)
>>>> 4- Better integration with the way FFmpeg works ATM
>>>> 5- No libavsequencer needed to do conversion or visualization or edition
>>>> 6- No need for new API calls for applications that want to access the
>>>> BSS (just plain old avcodec_decode_frame())
>>>> 7- At last, giving a simple goal to the SoC: getting all the code
>>>> besides lavs/ committed.
>>>>
>>>> Since I mostly agree to his approach now, I decided to take that solely
>>>> as a starting point of discussion.
>>>
>>> That is a pretty good description of what I initially suggested. I'd like
>>> just to add that in this approach, the decoder would output the BSS as 
>>> some
>>> new SAMPLE_FMT_SEQUENCER.
>>>
>>>> I see just one disadvantage here, simply extraction of metadata and
>>>> removing it, passing the rest to AVPacket requires parsing of all module
>>>> files twice and also manipulating them (correct the offsets, etc.). This
>>>> would require duplicate code in the demuxer and decoder, which I would
>>>> like to avoid, if possible.
>>>
>>> And your first idea did not have this problem. Since it is not a bad idea
>>> either, I'd like to explain it to see what the rest of the community 
>>> think.
>>> In Sebastian's original approach, the demuxer would decode the file to a
>>> BSS an output it in an AVPacket. It would them define a 
>>> CODEC_ID_SEQUENCER,
>>> and the decoder would be just a wrapper to libavsequencer to make the BSS
>>> ->  PCM conversion.
>>>
>>> The advantage of this approach is that the concept of demuxing/decoder 
>>> does
>>> not make much sense for these formats, so this avoid the artificial
>>> distriction. Moreover, it makes a nice distinction of transcoding from 
>>> one
>>> MOD format to other (with -acodec copy) to decoding it to PCM. The
>>> disadvantages is that API-wise it's less clear for external applications 
>>> to
>>> get the BSS data (reading the AVPacket payload). Besides, all the
>>> bit-reading API is part of lavc.
>>
>> dont forget that we would like to be able to store mod in avi/mp4/nut
>> together with a video stream. If AVPacket.data is some kind of struct
>> this is not cleanly possible anymore
>
> Is that really a sane idea?

saner than putting vorbis in ogg

mod/s3m/... are sound formats. technically the way they compress would
fall in the class of matching pursuit coders. For video at least these
can achive good PSNR per bitrate but they are non trivial to encode,
the same applies to audio, just think of turing a .wav into a .mod.
but that said, a module like format combined with mdct based amplitude/
pitch coding and residual mdct coding should perform quite good in terms
of rate distortion performance, that is if one manages to solve the
encoding problem.

and why should we not allow storing that in containers? we allow mdct based
codecs, we allow *CELP, we allow adpcm. modules may be unsuitable currently
as a generic sound compression but is that an argument to treat them as
a special case?

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The misfortune of the wise is better than the prosperity of the fool.
-- Epicurus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100716/10ddd2e3/attachment.pgp>