[FFmpeg-devel] [PATCH] libavcodec: Do not return encoding errors when -sub_charenc_mode is do_nothing

Eelco Lempsink eelco at lempsink.nl
Thu Sep 5 16:28:45 CEST 2013


On 30 aug. 2013, at 11:06, Nicolas George <nicolas.george at normalesup.org> wrote:
> Le tridi 13 fructidor, an CCXXI, Eelco Lempsink a écrit :
>> Hmm, sorry, that’s not the solution I’m looking for.  I want something so
>> that I can just pass in an SRT file and FFmpeg will figure out the
>> encoding.  
> 
>> It’s very important to realize that character encoding detection is not
>> something that can be done in an exact matter.
> 
> I believe you are contradicting yourself: "I want X... X is not possible”.

Ah, but there is where it gets interesting.  The fact that there is no perfect solution means there is an opportunity for design.  That is where user feedback and naming of options and choosing the right defaults matter.  And that is the kind of discussion I wish we could have.  I’ll start.

I think FFmpeg should try to do character encoding detection when no character encoding is specified using a library that uses a heuristic based on statistics.  As with analyzing video streams, FFmpeg can analyze a bit of the subtitles and report the most probable encoding.  When trying to run FFmpeg on a subtitle stream whose encoding could not be detected (with enough confidence) and without specifying an encoding by hand, it will exit with an error.

> I have no objection to someone implementing optional support for smarter
> heuristics, possibly using external libraries.

> 
> The default behaviour need to be simple enough to be predictable and
> controllable by the users, though.

Fully agreed.  

Talking about simplification, would it be useful to simplify the current situation first before introducing new stuff?  I reviewed the code and I fail to see the need for the ‘sub_charenc_mode’ option:

- do_nothing is never used, unless you also specify a character encoding.  In that case two options would be in conflict.  It seems do_nothing only makes sense if set by the demuxer or decoder.

- ‘auto’ only sets the option to pre_decoder if there is a charset specified.

I would therefore argue that the -sub_charenc_mode option should be removed (also preventing confusion over what the options mean exactly).  I’m unsure what the exact policies are for demuxers/decoders touching the options, but the do_nothing mode could be replaced by either setting the charenc to what the demuxer/decoders outputs or by a more internal mechanism.

I’d be happy to make a patch for this.

Regards,

Eelco Lempsink
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 204 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130905/c0e9f3b7/attachment.asc>


More information about the ffmpeg-devel mailing list