[FFmpeg-devel] [PATCH] VP8 MMX optimizations (MC and IDCT dc_add)
Fri Jun 25 19:02:35 CEST 2010
"Ronald S. Bultje" <rsbultje at gmail.com> writes:
> On Wed, Jun 23, 2010 at 3:38 AM, Jason Garrett-Glaser
> <darkshikari at gmail.com> wrote:
>> There are some... issues... with the current code that prevent commit.
>> ?I will be bringing these up with Ronald soon ;)
> The main issue was VLA because I didn't distinguish
> srcstride/deststride where relevant, leading to huge stack
> allocations. Attached patch fixes that in a manner similar to
The scratch buffers look sane now.
> Still only MMX/EXT here, if this is OK I'll apply it and work on
> adapting and integrating Jason's SSE2/SSSE3 work, too (or he can do
> that himself).
> The other issue Jason brought up is the fact that splitmv (4x4
> subblocks of 4x4 pixels each in a 16x16 macroblock) is handled as
> actually 16 4x4 blocks, whereas usually several of them have the same
> MV (4x8, 8x4, 8x8, etc.). I intend to address this, but in a separate
> patch because the issue is unrelated to the VLA one, touches different
> code and doesn't affect the optimizations themselves (it'll just make
> heavier use of the SSE2/SSSE3 functions once done correctly).
> Please comment, I'd love to apply these patches to get my tree a
> little more aligned with upstream.
If the x86 gurus are happy with the asm, I think this should be
committed. There's no harm in doing it incrementally.
mans at mansr.com
More information about the ffmpeg-devel