[FFmpeg-devel] [PATCH 0/7] RFC: complete rework of s337m support

Thu Dec 5 16:29:36 EET 2024

>De : Kieran Kunhya <kieran618 at googlemail.com>
>Envoyé : mercredi 4 décembre 2024 23:06
>
>> - a s337m decoder: it includes a resampler: output and input sample_rate
>> are the same, sync is always correct. It would be possible to implement
>> a full pcm fallback, but currently only a silence/pcm fallback is
>> provided. A 'passthrough' option is also provided and would make it
>> possible to mux again into wav, mxf or whatever. I guess one could
>> imagine a bitstream filter to fix the s337m phase to a clean, fixed
>> offset value (as expected by the current s337m demuxer for example).
>
>I don't understand how you can resample non-PCM data.
>
>Also the rest of the changes seem to avoid the actual issue which is
>you want the Dolby E decoder in FFmpeg to output a 1601/1602 cadence
>after resampling back to 48000.
>I also have seen a lot of Dolby E streams in the wild that where the
>Dolby E packet crosses a video frame. There is an ambiguity in PTS in
>this case, do you go forwards or go backwards (FYI in Upipe we go
>backwards)
>
>Kieran

I resample the decoded data. The s337m decoder requires the dolby_e decoder
(same way the current ftr decoder requires the aac decoder, so should not be an issue).
The resampler is used to cope with all kinds of sync situations.
Input and output frames are not bound directly, it is even possible to have a
decoded input audio frame cross the video frame at the output...
The byte position of the syncword determines its expected pts and this fits into swr_next_pts().
There is one exception: for the first frame to be decoded, the guard band is ignored because
we expect the decoded audio to be synced with the video (with 'video' sync assumed at byte 0).
So: for compliant streams, there should be one input video frame + one input s337m frame that
match/sync exactly* to one output video frame + one output pcm frame.
[*not fully exactly due to the change of the sampling rate].
For non compliant streams, there is indeed no way to kwow if a dolby_e frame belongs
to the current or to the next video frame.
Since no assumption is made on the s337m rate etc., the logic to determine if the guard band is to be
ignored (conformant stream) or not (100% taken into account) is very basic:
    if (s1->aes_start_position >= dectx->frame_size) ... [s337m.c/line 240]
This could be tuned differently, but anyway it will be somewhat a mix of arbitrary and empiric matters.
The typical use case (from my experience) is a "negative phase": something like pts at 0.035 for a 25fps stream.
In this case, one certainly would prefer to interpret it as a "-0.005"
(ie. insert 0.040s of silence, then a decoded audio frame @0s as the 0.005 is ignored)
rather than a "+0.035", but this would require to compute the 0.040 = to be aware of the frame rate...
This is possible, even certainly better from a user point of view, but I feel that this additionnal code,
used solely for better expected frame-sync on broken streams will not look very nice at the end, the
maintenance burden does not seem to worth it.
I choose the simpler code option which is +0.035: this is not perfect, but the sync error will still be very
low. The counterpart is that, as described above, there is no 1:1 relationship between an input audio
frame and an output audio frame, which is indeed unsual for DolbyE users.

Thank you for your comments, this is exactly the kind of questions I expected for this RFC.
BTW: I noticed that mxfdec raises some edit unit sync loss, I have to check about it, but I don't feel
there would be any major impediment to have everything cleaned up. Again I would first like
a have an overall feeling about the patch serie. So thanks again!

Nicolas