[FFmpeg-devel] [PATCH 3/3] avfilter/vf_convolution: add X86 SIMD for filter_column()
Song, Ruiling
ruiling.song at intel.com
Wed Dec 4 02:59:08 EET 2019
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> chen
> Sent: Tuesday, December 3, 2019 4:59 PM
> To: FFmpeg development discussions and patches <ffmpeg-
> devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH 3/3] avfilter/vf_convolution: add X86
> SIMD for filter_column()
>
> comments inline in code
>
>
> At 2019-12-03 15:52:07, xujunzz at sjtu.edu.cn wrote:
> >From: Xu Jun <xujunzz at sjtu.edu.cn>
[...]
> >+
> >+ cvtdq2ps m4, m4
> >+ mulps m4, m0 ; sum *= rdiv
> >+ addps m4, m1 ; sum += bias
>
> >+ addps m4, m5 ; sum += 0.5
> I don't know how about precision mismatch if we pre-compute (bias+0.5)
I think it is hard to prove it is safe to do pre-compute.
>
>
> >+ cvttps2dq m4, m4
> >+ packssdw m4, m4
> >+ packuswb m4, m4
> >+ movss [dstq + dst_offq], m4
> >+ add c_offq, mmsize/4
> >+ add dst_offq, mmsize/4
> >+
> >+ add off16q, mmsize/4
> >+ cmp off16q, widthq
> >+ jl .loop16
> >+
> >+ add widthq, rq
> >+ cmp off16q, widthq
> >+ jge .paraend
> >+
>
> >+ .loopr:
> no idea about this loop, if we can read beyond, we can reuse above SIMD
> code
Reuse above SIMD code may write to the memory that does not belong to this slice-thread.
IMO, the code to handle remainder columns is still necessary.
Ruiling
>
>
> >+ xor sumd, sumd
> >+ xor iq, iq
> >+ .loopr_i:
> >+ mov ciq, [ptrq + iq * gprsize]
> >+ movzx rd, byte [ciq + c_offq]
> >+ imul rd, [matrixq + 4*iq]
> >+ add sumd, rd
> >+
> >+ add iq, 1
> >+ cmp iq, radq
> >+ jl .loopr_i
> >+
> >+ pxor m4, m4
> >+ cvtsi2ss m4, sumd
> >+ mulss m4, m0 ; sum *= rdiv
> >+ addss m4, m1 ; sum += bias
> >+ addss m4, m5 ; sum += 0.5
> >+ cvttps2dq m4, m4
> >+ packssdw m4, m4
> >+ packuswb m4, m4
> >+ movd sumd, m4
> >+ mov [dstq + dst_offq], sumb
> >+ add c_offq, 1
> >+ add dst_offq, 1
> >+ add off16q, 1
> >+ cmp off16q, widthq
> >+ jl .loopr
> >+
> >+ .paraend:
> >+ sub c_offq, widthq
> >+ sub dst_offq, widthq
> >+ add c_offq, strideq
> >+ add dst_offq, dstrideq
> >+
> >+ sub heightq, 1
> >+ cmp heightq, 0
> >+ jg .loopy
> >+
> >+.end:
> >+ RET
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
More information about the ffmpeg-devel
mailing list