[FFmpeg-devel] 回复: [PATCH v3] avcodec/h264_mb: Fix tmp buffer overlap in mc_part_weighted

Tue Dec 24 09:53:40 EET 2024

Sure,  changed in PATCH v4.

________________________________
发件人: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> 代表 Michael Niedermayer <michael at niedermayer.cc>
发送时间: 2024年12月23日 5:41
收件人: FFmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
主题: Re: [FFmpeg-devel] [PATCH v3] avcodec/h264_mb: Fix tmp buffer overlap in mc_part_weighted

On Fri, Dec 20, 2024 at 01:26:37PM +0800, Bin Peng wrote:
> When decoding a bitstream with weighted-bipred enabled,
> the results on ARM and x86 platforms may differ.
>
> The reason for the inconsistency is that the value of
> STRIDE_ALIGN differs between platforms. And STRIDE_ALIGN
> is set to the buffer stride of temporary buffers for U
> and V components in mc_part_weighted.
>
> If the buffer stride is 32 or 64 (as on x86 platforms),
> the U and V pixels can be interleaved row by row without
> overlapping, resulting in correct output.
> However, on ARM platforms where the stride is 16,
> the V component will overwrite part of the U component's pixels,
> leading to incorrect predicted pixels.
>
> Fixes: ticket 11357
>
> Signed-off-by: Bin Peng <pengbin at visionular.com>
> ---
>  libavcodec/h264_mb.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libavcodec/h264_mb.c b/libavcodec/h264_mb.c
> index 4e94136313..b480cd312b 100644
> --- a/libavcodec/h264_mb.c
> +++ b/libavcodec/h264_mb.c
> @@ -407,8 +407,8 @@ static av_always_inline void mc_part_weighted(const H264Context *h, H264SliceCon
>          /* don't optimize for luma-only case, since B-frames usually
>           * use implicit weights => chroma too. */
>          uint8_t *tmp_cb = sl->bipred_scratchpad;
> -        uint8_t *tmp_cr = sl->bipred_scratchpad + (16 << pixel_shift);
> -        uint8_t *tmp_y  = sl->bipred_scratchpad + 16 * sl->mb_uvlinesize;
> +        uint8_t *tmp_cr = sl->bipred_scratchpad + (16 * sl->mb_uvlinesize);
> +        uint8_t *tmp_y  = sl->bipred_scratchpad + (32 * sl->mb_uvlinesize);

larger seperation wil decrease cache utilization and worsen speed.
cant we make sure stride is at least 32 without forcing alignment by 32 ?

thx

[...]
--
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Republics decline into democracies and democracies degenerate into
despotisms. -- Aristotle