[FFmpeg-devel] [PATCH 7/9] sbcenc: add MMX optimizations
Aurelien Jacobs
aurel at gnuage.org
Sat Dec 23 23:21:31 EET 2017
On Sat, Dec 23, 2017 at 09:52:11PM +0100, Aurelien Jacobs wrote:
> On Sat, Dec 23, 2017 at 05:47:04PM -0300, James Almer wrote:
> > On 12/23/2017 5:44 PM, Aurelien Jacobs wrote:
> > > On Sat, Dec 23, 2017 at 03:35:28PM -0300, James Almer wrote:
> > >> On 12/23/2017 3:01 PM, Aurelien Jacobs wrote:
> > >>> This was originally based on libsbc, and was fully integrated into ffmpeg.
> > >>>
> > >>> Rough speed test:
> > >>> C version: speed= 592x
> > >>> MMX version: speed= 785x
> > >>> ---
> > >>> libavcodec/sbcdsp.c | 3 +
> > >>> libavcodec/sbcdsp.h | 2 +
> > >>> libavcodec/x86/Makefile | 2 +
> > >>> libavcodec/x86/sbcdsp.asm | 284 +++++++++++++++++++++++++++++++++++++++++++
> > >>> libavcodec/x86/sbcdsp_init.c | 51 ++++++++
> > >>> 5 files changed, 342 insertions(+)
> > >>> create mode 100644 libavcodec/x86/sbcdsp.asm
> > >>> create mode 100644 libavcodec/x86/sbcdsp_init.c
> > >>
> > >> [...]
> > >>
> > >>> +;*******************************************************************
> > >>> +;void ff_sbc_calc_scalefactors(int32_t sb_sample_f[16][2][8],
> > >>> +; uint32_t scale_factor[2][8],
> > >>> +; int blocks, int channels, int subbands)
> > >>> +;*******************************************************************
> > >>> +INIT_MMX mmx
> > >>> +cglobal sbc_calc_scalefactors, 5, 7, 3, sb_sample_f, scale_factor, blocks, channels, subbands, ptr, blk
> > >>> + ; subbands = 4 * subbands * channels
> > >>> + shl subbandsd, 2
> > >>> + cmp channelsd, 2
> > >>> + jl .loop_1
> > >>> + shl subbandsd, 1
> > >>> +
> > >>> +.loop_1:
> > >>> + sub subbandsq, 8
> > >>> + lea ptrq, [sb_sample_fq + subbandsq]
> > >>> +
> > >>> + ; blk = (blocks - 1) * 64;
> > >>> + lea blkq, [blocksq - 1]
> > >>> + shl blkd, 6
> > >>> +
> > >>> + movq m0, [scale_mask]
> > >>
> > >> I insist, this can be easily loaded outside the loop. You have enough
> > >> spare regs to store a copy.
> > >
> > > Oh, I forgot to reply to this. There isn't any register left available
> > > on x86_32, hence why I kept those load inside the loop.
> >
> > You're not using a gprs to store the mask nor need to. You're using mmx
> > regs and have 5 left.
>
> Oh, indeed ! Not sure why it didn't even cross my mind...
> I will have a look at this.
Here it is with the scale_mask load out of the loop.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0007-sbcenc-add-MMX-optimizations.patch
Type: text/x-diff
Size: 13692 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20171223/231d05c5/attachment.patch>
More information about the ffmpeg-devel
mailing list