[FFmpeg-devel] [PATCH] libavcodec: Do not return encoding errors when -sub_charenc_mode is do_nothing

Nicolas George nicolas.george at normalesup.org
Thu Aug 29 22:16:45 CEST 2013


Le duodi 12 fructidor, an CCXXI, Eelco Lempsink a écrit :
> Thanks for your explanation.  Now I understand the underlying idea, I
> would prefer that FFmpeg would exit with an error state, though, since now
> it’s unclear that data is missing when using FFmpeg in a larger workflow
> where warnings might get lost in the noise.

I agree that ffmpeg (the command-line tool) should be stricter with this
kind of error. You can use -xerror to tell it to be.

> I’m also curious to hear how you plan to handle the encoding detection
> (e.g. for an SRT file) or if you think that’s the responsibility of the
> user.

My plan is mostly to imitate Vim's behaviour: let the user specify a list of
encodings, try them each until one works, and recognize obvious signs such
as byte order marks.

> Hmm, you might be correct.  We’re using FFmpeg for two things: extracting
> embedded text-based subtitles as SRT and for normalizing SRTs.
> 
> For the normalizing (basically using the FFmpeg SRT parser to filter
> problems in the SRT) it would be possible to do the encoding detection on
> the input rather than the output.  That way we can ensure UTF8 goes in and
> comes out, so that should be no problem.

Yes, I would advise that.

> As far as the extracting goes, I suppose the encoding information is
> either embedded in the format or defined in the format’s specification.

I do not know a format that does not specify the encoding. Multimedia
formats capable of holding text subtitles are rather recent, they were
designed at a time when people understand Unicode is the only sane way to
go.

> I’m not entirely sure that all formats and tools can be trusted though.

It is probably a dangerous assumption indeed, but I believe you should not
try to spend time on how to handle the situation until it actually occurs
for you, just be sure you can detect it.

That makes me realize: disabling the check would allow ffmpeg to produce
just that kind of invalid files: S_TEXT in Matroska is specified as UTF-8,
while ffmpeg would just copy the encoding of the input file. It is IMHO a
very good reason not to disable it.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130829/b76f1dab/attachment.asc>


More information about the ffmpeg-devel mailing list