[FFmpeg-devel] [PATCH 2/2] lavc/aarch64: h264, add idct for 10bit
Martin Storsjö
martin at martin.st
Fri Sep 3 14:35:59 EEST 2021
On Fri, 20 Aug 2021, Mikhail Nitenko wrote:
> Benchmarks: A53 A72
> h264_idct4_add_10bpp_c: 187.7 115.2
> h264_idct4_add_10bpp_neon: 72.5 45.0
> h264_idct4_add_dc_10bpp_c: 96.0 61.2
> h264_idct4_add_dc_10bpp_neon: 36.0 19.5
> h264_idct8_add4_10bpp_c: 2115.5 1424.2
> h264_idct8_add4_10bpp_neon: 734.0 459.5
> h264_idct8_add_10bpp_c: 1017.5 709.0
> h264_idct8_add_10bpp_neon: 345.5 216.5
> h264_idct8_add_dc_10bpp_c: 316.0 235.5
> h264_idct8_add_dc_10bpp_neon: 69.7 44.0
> h264_idct_add16_10bpp_c: 2540.2 1498.5
> h264_idct_add16_10bpp_neon: 1080.5 616.0
> h264_idct_add16intra_10bpp_c: 784.7 439.5
> h264_idct_add16intra_10bpp_neon: 641.0 462.2
>
> Signed-off-by: Mikhail Nitenko <mnitenko at gmail.com>
> ---
>
> there is a function that is not covered by tests, but I tested it with
> sample videos, not sure what to do with it
It would be really good to add a checkasm test for it, because assembly
without checkasm tests can have lots of hidden bugs (although it seems
fairly straightforward here) that only get uncovered by later compiler
updates. Not saying that it is the case here, but without a checkasm test
we don't know.
Overall the patch seems fine, the code is fairly 1:1 copy of the existing
but with wider SIMD elements, and I presume that the range of values don't
allow keeping anything in more narrow form.
> +function ff_h264_idct_add8_neon_10, export=1 // NO TESTS but test video
> looks fine (did not look fine before the fixes so it is definitely
> working somehow)
I'm not quite sure what you mean here - did it look wrong before
implementing this function - then we'd have a bug in the C code? Or did it
look wrong with a broken version of this assembly function, and look right
after getting it right?
// Martin
More information about the ffmpeg-devel
mailing list