[FFmpeg-devel] [PATCH] avcodec/htmlsubtitles: Be a bit more picky on syntax
Michael Niedermayer
michael at niedermayer.cc
Mon Jul 3 13:44:57 EEST 2017
On Mon, Jul 03, 2017 at 11:44:26AM +0200, Paul B Mahol wrote:
> On 7/2/17, Michael Niedermayer <michael at niedermayer.cc> wrote:
> > On Sun, Jul 02, 2017 at 01:14:00PM +0200, wm4 wrote:
> >> On Sun, 2 Jul 2017 00:09:42 +0200
> >> Michael Niedermayer <michael at niedermayer.cc> wrote:
> >>
> >> > This reduces the number of strstr() calls per byte
> >> > This diasalows empty tags like '< >' as well as '<' in tags like
> >> > '<ab<cd<<ef>'
> >> >
> >> > Fixes timeout
> >> > Fixes: 1817/clusterfuzz-testcase-minimized-5104230530547712
> >> >
> >> > Found-by: continuous fuzzing process
> >> > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> >> > Signed-off-by: Michael Niedermayer <michael at niedermayer.cc>
> >> > ---
> >> > libavcodec/htmlsubtitles.c | 4 ++--
> >> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c
> >> > index be5c9316ca..67abc94085 100644
> >> > --- a/libavcodec/htmlsubtitles.c
> >> > +++ b/libavcodec/htmlsubtitles.c
> >> > @@ -110,13 +110,13 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint
> >> > *dst, const char *in)
> >> > case '<':
> >> > tag_close = in[1] == '/';
> >> > len = 0;
> >> > - if (sscanf(in+tag_close+1, "%127[^>]>%n", buffer, &len) >=
> >> > 1 && len > 0) {
> >> > + if (sscanf(in+tag_close+1, "%127[^<>]>%n", buffer, &len) >=
> >> > 1 && len > 0) {
> >> > const char *tagname = buffer;
> >> > while (*tagname == ' ')
> >> > tagname++;
> >> > if ((param = strchr(tagname, ' ')))
> >> > *param++ = 0;
> >> > - if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack)) ||
> >> > + if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack) &&
> >> > *tagname) ||
> >> > ( tag_close && sptr > 0 &&
> >> > !strcmp(stack[sptr-1].tag, tagname))) {
> >> > int i, j, unknown = 0;
> >> > in += len + tag_close;
> >>
> >> Invalid syntax is not unusual in SRT files. Are you sure this doesn't
> >> make the output worse in files that do not use the syntax correctly?
> >
> > I do not know if this makes the output worse for files with invalid syntax
> > I also do not know if this makes the output better for files with invalid
> > syntax
> >
> > i dont have a large library of (real world invalid syntax) srt files
> > whith which to find cases where it makes a difference.
> >
> > You seem to know alot about srt, maybe you can pick some srt files
> > and add fate tests, so we have better coverage of odd and nasty cases
> >
> > about this patch specifically, what do you suggest ?
> > should i try to fix this while maintaining existing behavior for
> > invalid syntax exactly? (i dont know how that code would look)
>
> WHat's so wrong with code that you want it changed in bad way?
IIRC
If you have tags like <<<<<<<<<<<<<<<<<<<<<<a> you end up checking
<<<<<<<<<<<<<<<<<<<<<<a>
<<<<<<<<<<<<<<<<<<<<<a>
<<<<<<<<<<<<<<<<<<<<a>
...
then form the closing tag
</<<<<<<<<<<<<<<<<<<<<<a>
</<<<<<<<<<<<<<<<<<<<<a>
</<<<<<<<<<<<<<<<<<<<a>
...
and then search the whole input for each closing tag
On top of this there are special cases and limitations like
fixed size buffers and a fixed size stack, so it will not always do
all the steps above.
But in practice this is slow. The same issue should happen on real
files if they contain syntax issues and conatin a lot of unescaped '<'
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Never trust a computer, one day, it may think you are the virus. -- Compn
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20170703/58a64352/attachment.sig>
More information about the ffmpeg-devel
mailing list