[FFmpeg-devel] [PATCH] vp9/x86: 16x16 iadst_idct, idct_iadst and iadst_iadst (ssse3+avx).

Clément Bœsch u at pkh.me
Thu Jan 16 13:17:36 CET 2014


On Wed, Jan 15, 2014 at 09:04:41PM -0500, Ronald S. Bultje wrote:
> Sample timings on ped1080p.webm (of the ssse3 functions):
> iadst_idct:  4672 -> 1175 cycles
> idct_iadst:  4736 -> 1263 cycles
> iadst_iadst: 4924 -> 1438 cycles
> Total decoding time changed from 6.565s to 6.413s.
> ---
>  libavcodec/x86/vp9dsp_init.c |  34 ++++--
>  libavcodec/x86/vp9itxfm.asm  | 272 ++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 293 insertions(+), 13 deletions(-)
> 
[...]
> +%macro VP9_IADST16_1D 2 ; src, pass
> +%assign %%str 16*%2
> +    mova                m0, [%1+ 0*32]  ; in0
> +    mova                m1, [%1+15*32]  ; in15
> +    mova                m8, [%1+ 7*32]  ; in7
> +    mova                m9, [%1+ 8*32]  ; in8
> +
> +    VP9_UNPACK_MULSUB_2D_4X  1,  0,  2,  3, 16364,   804    ; m1/2=t1[d], m0/3=t0[d]
> +    VP9_UNPACK_MULSUB_2D_4X  8,  9, 11, 10, 11003, 12140    ; m8/11=t9[d], m9/10=t8[d]
> +    VP9_RND_SH_SUMSUB_BA     9,  0, 10,  3,  4, [pd_8192]   ; m9=t0[w], m0=t8[w]
> +    VP9_RND_SH_SUMSUB_BA     8,  1, 11,  2,  4, [pd_8192]   ; m8=t1[w], m1=t9[w]
> +
> +    mova               m11, [%1+ 2*32]  ; in2
> +    mova               m10, [%1+13*32]  ; in13
> +    mova                m3, [%1+ 5*32]  ; in5
> +    mova                m2, [%1+10*32]  ; in10
> +

> +    VP9_UNPACK_MULSUB_2D_4X 10, 11,  6,  7, 15893,  3981    ; m3/6=t3[d], m2/7=t2[d]
> +    VP9_UNPACK_MULSUB_2D_4X  3,  2,  4,  5,  8423, 14053    ; m10/4=t11[d], m11/5=t10[d]

The comments look entangled here.

[...]

Rest LGTM

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140116/3727bf38/attachment.asc>


More information about the ffmpeg-devel mailing list