[FFmpeg-devel] lavc/aarch64: add simple idct neon functions

Matthieu Bouron matthieu.bouron at gmail.com
Wed Mar 15 14:14:42 EET 2017


On Mon, Mar 06, 2017 at 03:48:57PM +0100, Matthieu Bouron wrote:
> On Thu, Feb 23, 2017 at 04:59:16PM +0100, Matthieu Bouron wrote:
> > Hello,
> > 
> > The following patchset add the ff_simple_idct function neon functions for the
> > aarch64 platform. It's ported from armv7 simple_idct_neon with some improvements:
> >  * the source idct blocks are now loaded once and kept in v24-v31
> >  * the source idct blocks are no longer overriden in idct_col4_top
> >  * the destination is now written in one pass at the end of
> >    ff_simple_idct{,_put,_add}_neon
> > 
> > It is bitexact with the armv7 neon implementation.
> > 
> > Here are some results (reported by {START,STOP}_TIMER) on an Odroid-C2 (Cortex
> > A53):
> > 
> > Functions             IDCT: simple       IDCT: simpleneon
> > ff_simple_idct_put     9795 units        3170 units
> > ff_simple_idct_add    10227 units        3302 units
> > 
> 
> Ping.

I'd like to push the patch tomorrow if there is no objection.

If that helps, here is the output of mjpegdec with simple and simpleneon
idct methods.

Original: http://0x5c.me/idct/original.jpg
Simple: http://0x5c.me/idct/simplec.png
Simpleneon: http://0x5c.me/idct/simpleneon.png

The diff between simple and simpleneon shows off some off by 1
differences: http://0x5c.me/idct/diff.png (simpleneon aarch64 is bitexact
with its armv7 counterpart though).

Matthieu


More information about the ffmpeg-devel mailing list