[FFmpeg-devel] [PATCH 3/3] avcodec/scpr: optimize shift loop.
Michael Niedermayer
michael at niedermayer.cc
Sat Sep 9 02:49:59 EEST 2017
On Fri, Sep 08, 2017 at 06:43:06PM -0300, James Almer wrote:
> On 9/8/2017 6:29 PM, Michael Niedermayer wrote:
> > Speeds code up from 50sec to 15sec
> >
> > Fixes Timeout
> > Fixes: 3242/clusterfuzz-testcase-5811951672229888
> >
> > Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> > Signed-off-by: Michael Niedermayer <michael at niedermayer.cc>
> > ---
> > libavcodec/scpr.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/libavcodec/scpr.c b/libavcodec/scpr.c
> > index 37fbe7a106..2ef63a7bf8 100644
> > --- a/libavcodec/scpr.c
> > +++ b/libavcodec/scpr.c
> > @@ -827,7 +827,16 @@ static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame,
> > return ret;
> >
> > for (y = 0; y < avctx->height; y++) {
> > - for (x = 0; x < avctx->width * 4; x++) {
> > + if (!(((uintptr_t)dst) & 7)) {
> > + uint64_t *dst64 = (uint64_t *)dst;
> > + int w = avctx->width>>1;
> > + for (x = 0; x < w; x++) {
> > + dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
>
> Shouldn't this be used only if HAVE_FAST_64BIT is true, and a version
> shifting four bytes at a time used otherwise? That's how we do almost
> everywhere else.
The compiler would break the 64bit into two 32bit automatically.
I can write an explicit version if that is wanted ?
it seemed overkill for this here though
>
> The chances for anyone bothering writing simd for this decoder are
> almost none, so adding C optimized loops is ok in this case.
>
> > + }
> > + x *= 8;
> > + } else
> > + x = 0;
>
> How does this fix the timeout if the new code is only run if the pointer
> is eight byte aligned? (or four once you add that).
The timeout is IIRC currently defined as 30sec or so on the platform
the fuzzer runs on, data is aligned in that case.
You could imagine a platform where data is not aligned, yes
on that patform the patch would not improve the time decoding takes.
Simiarly you could imagine a CPU that supports only 8bit operations,
or just a slower CPU.
This patch adds some optimizations that makes decoding faster which
gets us over the threshold for this particular sample and a normal
modern desktop platform.
Its quite possible another scpr file will still hit the threshold
and certainly another threshold or other hw could trigger it
still with this sample
I would very much prefer a more universal fix ...
But i didnt see one and making that loop 3 times as fast is great on
its own.
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Asymptotically faster algorithms should always be preferred if you have
asymptotical amounts of data
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20170909/2b79750b/attachment.sig>
More information about the ffmpeg-devel
mailing list