[FFmpeg-devel] [PATCH] JPEG2000: SSE optimisation of DWT decoding

maxime taisant maximetaisant at hotmail.fr
Thu Aug 10 23:03:43 EEST 2017



> From: Clément Bœsch <u at pkh.me>
> 
> On Tue, Aug 08, 2017 at 09:09:44AM +0000, maxime taisant wrote:
> > From: Maxime Taisant <maximetaisant at hotmail.fr>
> >
> > Hi,
> >
> > Here is some SSE optimisations for the dwt function used to decode
> JPEG2000.
> > I tested this code by using the time command while reading a
> JPEG2000 encoded video with ffmpeg and, on average, I observed a
> 4.05% general improvement, and a 12.67% improvement on the dwt
> decoding part alone.
> > In the nasm code, you can notice that the SR1DFLOAT macro appear
> twice. One version is called in the nasm code by the HORSD macro
> and the other is called in the C code of the dwt function, I couldn't
> figure out a way to make only one macro.
> > I also couldn't figure out a good way to optimize the VER_SD part, so
> that is why I left it unchanged, with just a SSE-optimized version of
> the SR_1D_FLOAT function.
> >
> > Regards.
> >
> > ---
> >  libavcodec/jpeg2000dwt.c          |  21 +-
> >  libavcodec/jpeg2000dwt.h          |   6 +
> >  libavcodec/x86/jpeg2000dsp.asm    | 794
> ++++++++++++++++++++++++++++++++++++++
> >  libavcodec/x86/jpeg2000dsp_init.c |  55 +++
> >  4 files changed, 863 insertions(+), 13 deletions(-)
> >
> > diff --git a/libavcodec/jpeg2000dwt.c b/libavcodec/jpeg2000dwt.c
> index
> > 55dd5e89b5..69c935980d 100644
> > --- a/libavcodec/jpeg2000dwt.c
> > +++ b/libavcodec/jpeg2000dwt.c
> > @@ -558,16 +558,19 @@ int ff_jpeg2000_dwt_init(DWTContext *s,
> int border[2][2],
> >          }
> >      switch (type) {
> >      case FF_DWT97:
> > +        dwt_decode = dwt_decode97_float;
> >          s->f_linebuf = av_malloc_array((maxlen + 12), sizeof(*s-
> >f_linebuf));
> >          if (!s->f_linebuf)
> >              return AVERROR(ENOMEM);
> >          break;
> >       case FF_DWT97_INT:
> > +        dwt_decode = dwt_decode97_int;
> >          s->i_linebuf = av_malloc_array((maxlen + 12), sizeof(*s-
> >i_linebuf));
> >          if (!s->i_linebuf)
> >              return AVERROR(ENOMEM);
> >          break;
> >      case FF_DWT53:
> > +        dwt_decode = dwt_decode53;
> >          s->i_linebuf = av_malloc_array((maxlen +  6), sizeof(*s-
> >i_linebuf));
> >          if (!s->i_linebuf)
> >              return AVERROR(ENOMEM);
> 
> Using globals is not acceptable, you need to fix that.
> 

Yeah, I can't even remember why I did that... I will fix it.
Thank you.



More information about the ffmpeg-devel mailing list