[Ffmpeg-devel] Re: [PATCH] SIMD accelected SNOW decoding

Guillaume POIRIER poirierg
Mon Nov 28 00:12:52 CET 2005


Hi,

On 11/27/05, Guillaume POIRIER <poirierg at gmail.com> wrote:
> Hi,
>
> On 11/27/05, Guillaume POIRIER <poirierg at gmail.com> wrote:
> > Hi there,
> >
> > I long time ago (6month), yartrebo wrote some 2 routines to speed-up
> > SNOW decoding (30-40% faster). It never got committed because neither
> > of the 2 were working on AMD64.
> >
> > 6 month later, I suspect more talented people can look at it.
> >
> > Find in attachment the work-in-progress patch yartrebo sent me before
> > going in summer break (never to return again it seems).
> >
> > See below for the gdb backtrace of one of the routine (both trigger a
> > segfault). Unfortunately, that doesn't give the very line number the
> > fails on the ASM (maybe because the program never actually reaches the
> > asm be fails to call it?).
>
> Hum, a closer look at the asm shows a series of IA32 style registers,
> rather than the use REG_xx which are used throughout the rest of
> ffmpeg code. No wonder it could not work! :)
>
> I'll fix that and see what happens.

Well, it was a bit more complicated (to me) than it looked like.
Apparently, the clobber list had to be fixed to.

Please find in attachment a "fixed" version with hardcoded AMD-64 regs
(instead of using the #define REG_a "rax" type of define, that do not
seem to work well presently with this code).

I still get a segfault though:

0x00000000006a0d11 in ff_spatial_idwt_buffered_slice (cs=0x0,
slice_buf=0x2aaaad3f3720, width=720, height=576, stride_line=1,
type=0,
    decomposition_count=-1384826608, y=0) at snow.c:1736
1736            vertical_compose97i_asm
(gdb) bt
#0  0x00000000006a0d11 in ff_spatial_idwt_buffered_slice (cs=0x0,
slice_buf=0x2aaaad3f3720, width=720, height=576, stride_line=1,
type=0,
    decomposition_count=-1384826608, y=0) at snow.c:1736
#1  0x00000000006af150 in decode_frame (avctx=0xadb7c0, data=0xadb6a0,
data_size=0x7fffff9b51cc, buf=0x50 <Address 0x50 out of bounds>,
    buf_size=-1384832368) at snow.c:4208
#2  0x000000000056a8d3 in avcodec_decode_video (avctx=0xadb7c0,
picture=0xadb6a0, got_picture_ptr=0x7fffff9b51cc,
    buf=0xab8910
"\u0627\uffffJ\210\v*\232Kq\uffff\216\uffffr38\220\226\uffff\uffff\uffff\uffff@\uffffk1\uffff\uffff\u043e'\uffff\212\uffff\uffff\uffff\2307\uffff\v/\uffff\uffff\uffffr\uffff\221\uffff\221\016\uffffoP5D\uffff\uffff\u026cK\uffff\uffff\"\uffff<\uffff\004\uffff\uffffP\030\220\221\023\uffff\035.h-\004\uffffu\204\uffff\uffff\tV\223\203~/\025\uffffg?9\uffff",
buf_size=106372) at utils.c:905
#3  0x0000000000452955 in decode (sh=0xab41d0, data=0xab8910,
len=106372, flags=0) at vd_ffmpeg.c:818
#4  0x000000000044f2ac in decode_video (sh_video=0xab41d0,
    start=0xab8910
"\u0627\uffffJ\210\v*\232Kq\uffff\216\uffffr38\220\226\uffff\uffff\uffff\uffff@\uffffk1\uffff\uffff\u043e'\uffff\212\uffff\uffff\uffff\2307\uffff\v/\uffff\uffff\uffffr\uffff\221\uffff\221\016\uffffoP5D\uffff\uffff\u026cK\uffff\uffff\"\uffff<\uffff\004\uffff\uffffP\030\220\221\023\uffff\035.h-\004\uffffu\204\uffff\uffff\tV\223\203~/\025\uffffg?9\uffff",
in_size=106372, drop_frame=0) at dec_video.c:316
#5  0x000000000040fd9e in main (argc=11223504, argv=0xffffffff) at
mplayer.c:2659
(gdb) disass $pc-32,$pc+32
Dump of assembler code for function ff_spatial_idwt_buffered_slice:


The corresponding code snippey is here:

0x00000000006a0cf2 <ff_spatial_idwt_buffered_slice+1474>:       movdqa
%xmm1,(%rax,%rdx,4)
0x00000000006a0cf7 <ff_spatial_idwt_buffered_slice+1479>:       movdqa
%xmm3,0x10(%rax,%rdx,4)
0x00000000006a0cfd <ff_spatial_idwt_buffered_slice+1485>:       movdqa
%xmm5,0x20(%rax,%rdx,4)
0x00000000006a0d03 <ff_spatial_idwt_buffered_slice+1491>:       movdqa
%xmm7,0x30(%rax,%rdx,4)
0x00000000006a0d09 <ff_spatial_idwt_buffered_slice+1497>:       mov   
0xa0(%rsp),%rdx
0x00000000006a0d11 <ff_spatial_idwt_buffered_slice+1505>:       paddd 
(%rdx,%rdx,4),%xmm1
0x00000000006a0d16 <ff_spatial_idwt_buffered_slice+1510>:       paddd 
0x10(%rdx,%rdx,4),%xmm3
0x00000000006a0d1c <ff_spatial_idwt_buffered_slice+1516>:       paddd 
0x20(%rdx,%rdx,4),%xmm5
0x00000000006a0d22 <ff_spatial_idwt_buffered_slice+1522>:       paddd 
0x30(%rdx,%rdx,4),%xmm7
0x00000000006a0d28 <ff_spatial_idwt_buffered_slice+1528>:       movdqa
(%rbx,%rdx,4),%xmm0
0x00000000006a0d2d <ff_spatial_idwt_buffered_slice+1533>:       movdqa
0x10(%rbx,%rdx,4),%xmm2

Thoughts?


Guillaume
--
MPlayer's doc is offline. Find some fresh one here:
http://tuxrip.free.fr//MPlayer-DOCS-HTML/en/
http://tuxrip.free.fr//MPlayer-DOCS-HTML/fr/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: snow_mmx_sse2.h
Type: text/x-chdr
Size: 45574 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20051128/b34a39aa/attachment.h>



More information about the ffmpeg-devel mailing list