[FFmpeg-devel] [PATCH 2/2] swscale/aarch64: Add rgb24 to yuv implementation

Mon Jun 3 11:07:29 EEST 2024

On Mon, 3 Jun 2024, Zhao Zhili wrote:

> diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S
> new file mode 100644
> index 0000000000..0a46475723
> --- /dev/null
> +++ b/libswscale/aarch64/input.S
> @@ -0,0 +1,229 @@
> +/*
> + * Copyright (c) 2024 Zhao Zhili <quinkblack at foxmail.com>
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include "libavutil/aarch64/asm.S"
> +
> +.macro rgb24_to_yuv_load_rgb, src
> +        ld3.16b         { v16, v17, v18 }, [\src]
> +        ushll.8h        v19, v16, #0         // v19: r
> +        ushll.8h        v20, v17, #0         // v20: g
> +        ushll.8h        v21, v18, #0         // v21: b
> +        ushll2.8h       v22, v16, #0         // v22: r
> +        ushll2.8h       v23, v17, #0         // v23: g
> +        ushll2.8h       v24, v18, #0         // v24: b

Don't use this nonstandard, Apple specific aarch64 syntax. This was used 
by Apple tools at the start, when the proper standardized aarch64 syntax 
wasn't quite settled yet, and it is still accepted. (And apparently this 
is still the preferred form to disassemble things in, for apple 
platforms.)

With this syntax, the assembly is rejected by GNU binutils and MSVC.

> +function ff_rgb24ToY_neon, export=1
> +        cmp             w4, #0                  // check width > 0
> +        b.le            4f
> +
> +        ldp             w10, w11, [x5], #8       // w10: ry, w11: gy
> +        dup             v0.8H, w10
> +        dup             v1.8H, w11
> +        ldr             w12, [x5]               // w12: by
> +        dup             v2.8H, w12

Don't use uppercase .8H for field layout configurations, we prefer to 
stick to all lowercase here - see 
184103b3105f02f1189fa0047af4269e027dfbd6. The same goes for a number of 
places in this patch.

> +        add             w9, w9, #1              // i++
> +        add             x3, x3, #6              // src += 6
> +3:
> +        cmp		w9, w5
> +        b.lt		2b
> +4:

Incorrect indentation for the cmp/b.lt instructions here.

I have set up a bunch of github actions for testing aarch64 assembly - see 
https://github.com/mstorsjo/ffmpeg/commits/gha-aarch64. If you have a 
github account, grab a copy of this branch into your repo, add your own 
commits on top, and push to your fork (and if necessary, activate running 
the actions), then you should get a wide testing of your patches.

See https://github.com/mstorsjo/FFmpeg/actions/runs/9346228714 for one 
example run of these actions with your patches.

// Martin