[FFmpeg-devel] [Patch 2/5]Fix h264 on POWER LE: libavcodec/ppc/h264dsp.c

Thu Nov 27 12:28:15 CET 2014

On Thu, Nov 27, 2014 at 02:35:24PM +0800, rongyan wrote:
> Hi,
> We present 5 patches to fix h264 bugs for POWER8 little endian, which are sent in 5 seperate emails.
> This is the second, to fix the functions  
> h264_idct8_add_altivec();
>  
> h264_idct_dc_add_internal();
>  
> h264_loop_filter_luma_altivec();
>  
> write16x4() VEC_1D_DCT();
>  
> weight_h264_W_altivec();
>  
> biweight_h264_W_altivec();
>  
> VEC_LOAD_U8_ADD_S16_STORE_U8();
>  
> ALTIVEC_STORE_SUM_CLIP();
>  
> And add marcos  GET_2PERM(), dstv_load(),vdst_load(), dest_unligned_store().
> 
> The fate test result after merge these 5 patches can be found on website by searching "ibmcrl", also attached in the below to facilitate the review. The passed test cases change from  2017/2243 to 2209/2245.
>  
> Thanks.
>  Rong Yan
>   
>   ------------------
>   The world has enough for everyone's need, but not enough for everyone's greed.


>  h264dsp.c |  374 +++++++++++++++++++++++++++++++++-----------------------------
>  1 file changed, 205 insertions(+), 169 deletions(-)
> dcaccec4338f960704148c933e1ec454dd4dc6a2  0002-libavcodec-ppc-h264dsp.c-fix-h264_idct8_add_altivec.patch
> From 130b20e650a2d83a4c66cd23c10fe943742339f8 Mon Sep 17 00:00:00 2001
> From: Rong Yan <rongyan236 at gmail.com>
> Date: Thu, 27 Nov 2014 05:49:53 +0000
> Subject: [PATCH 2/5] libavcodec/ppc/h264dsp.c : fix h264_idct8_add_altivec()
>  h264_idct_dc_add_internal() h264_loop_filter_luma_altivec() write16x4()
>  VEC_1D_DCT() weight_h264_W_altivec() biweight_h264_W_altivec()
>  VEC_LOAD_U8_ADD_S16_STORE_U8() ALTIVEC_STORE_SUM_CLIP() add marcos 
>  GET_2PERM() dstv_load() vdst_load() dest_unligned_store() for POWER LE
> 
> ---
>  libavcodec/ppc/h264dsp.c | 374 ++++++++++++++++++++++++++---------------------
>  1 file changed, 205 insertions(+), 169 deletions(-)
> 
> diff --git a/libavcodec/ppc/h264dsp.c b/libavcodec/ppc/h264dsp.c
> index 7fc7e0b..cfce32d 100644
> --- a/libavcodec/ppc/h264dsp.c
> +++ b/libavcodec/ppc/h264dsp.c
> @@ -34,7 +34,7 @@
>   * IDCT transform:
>   ****************************************************************************/
>  
> -#define VEC_1D_DCT(vb0,vb1,vb2,vb3,va0,va1,va2,va3)               \
> +#define VEC_1D_DCT(vb0,vb1,vb2,vb3,va0,va1,va2,va3) {\
>      /* 1st stage */                                               \
>      vz0 = vec_add(vb0,vb2);       /* temp[0] = Y[0] + Y[2] */     \
>      vz1 = vec_sub(vb0,vb2);       /* temp[1] = Y[0] - Y[2] */     \
> @@ -46,7 +46,8 @@
>      va0 = vec_add(vz0,vz3);       /* x[0] = temp[0] + temp[3] */  \
>      va1 = vec_add(vz1,vz2);       /* x[1] = temp[1] + temp[2] */  \
>      va2 = vec_sub(vz1,vz2);       /* x[2] = temp[1] - temp[2] */  \
> -    va3 = vec_sub(vz0,vz3)        /* x[3] = temp[0] - temp[3] */
> +    va3 = vec_sub(vz0,vz3);        /* x[3] = temp[0] - temp[3] */\
> +}
>  
>  #define VEC_TRANSPOSE_4(a0,a1,a2,a3,b0,b1,b2,b3) \
>      b0 = vec_mergeh( a0, a0 ); \
> @@ -62,14 +63,23 @@
>      b2 = vec_mergeh( a1, a3 ); \
>      b3 = vec_mergel( a1, a3 )
>  
> -#define VEC_LOAD_U8_ADD_S16_STORE_U8(va)                      \
> -    vdst_orig = vec_ld(0, dst);                               \
> -    vdst = vec_perm(vdst_orig, zero_u8v, vdst_mask);          \
> -    vdst_ss = (vec_s16) vec_mergeh(zero_u8v, vdst);         \
> -    va = vec_add(va, vdst_ss);                                \
> -    va_u8 = vec_packsu(va, zero_s16v);                        \
> -    va_u32 = vec_splat((vec_u32)va_u8, 0);                  \
> -    vec_ste(va_u32, element, (uint32_t*)dst);
> +#if HAVE_BIGENDIAN
> +#define vdst_load(d)\
> +    vdst_orig = vec_ld(0, dst); \
> +    vdst = vec_perm(vdst_orig, zero_u8v, vdst_mask)
> +#else
> +#define vdst_load(d)\
> +    vdst = vec_vsx_ld(0, dst)
> +#endif
> +
> +#define VEC_LOAD_U8_ADD_S16_STORE_U8(va) {\
> +    vdst_load();\
> +    vdst_ss = (vec_s16) VEC_MERGEH(zero_u8v, vdst);\
> +    va = vec_add(va, vdst_ss);\
> +    va_u8 = vec_packsu(va, zero_s16v);\
> +    va_u32 = vec_splat((vec_u32)va_u8, 0);\
> +    vec_ste(va_u32, element, (uint32_t*)dst);\
> +}

please dont mix whitespace changes with functional changes
this makes the patch and commit unreadable
it also can cause problems for other developers as rebasing their
work becomes harder if the code changed alot
please leave the whitespaces in place

git show HEAD^^^ -w --stat
 libavcodec/ppc/h264dsp.c |  106 +++++++++++++++++++++++++++++++---------------
 1 file changed, 71 insertions(+), 35 deletions(-)


git show HEAD^^^  --stat
 libavcodec/ppc/h264dsp.c |  374 +++++++++++++++++++++++++---------------------
 1 file changed, 205 insertions(+), 169 deletions(-)

 [...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Dictatorship naturally arises out of democracy, and the most aggravated
form of tyranny and slavery out of the most extreme liberty. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20141127/6cd38a6c/attachment.asc>