[FFmpeg-devel] [PATCH] VP8: avoid conditional and division for chroma MV

Wed Jun 23 23:58:41 CEST 2010

On 06/23/2010 08:51 PM, Ronald S. Bultje wrote:
> Hi,
>
> 2010/6/23 M?ns Rullg?rd<mans at mansr.com>:
>> Stefan Gehrer<stefan.gehrer at gmx.de>  writes:
>>> On 06/23/2010 07:45 PM, Ronald S. Bultje wrote:
>>>> On Wed, Jun 23, 2010 at 1:40 PM, Stefan Gehrer<stefan.gehrer at gmx.de>    wrote:
>>>>> On 06/23/2010 06:51 PM, Ronald S. Bultje wrote:
>>>>>> On Wed, Jun 23, 2010 at 12:44 PM, Stefan Gehrer<stefan.gehrer at gmx.de>
>>>>>>    wrote:
>>>>>>> Are there any recommended samples to test against?
>>>>>>
>>>>>> http://code.google.com/p/webm/downloads/detail?name=vp8-test-vectors-r1.zip&can=2&q=
>>>>>
>>>>> Okay, I tested some clips from the test vector and
>>>>> the frame CRCs stay the same with the patch.
>>>>
>>>> You can test all of them using a small script (use untested to get ref.md5s):
>>>>
>>>> rm -f test.md5s
>>>> for files in 000 001 002 003 004 005 006 007 008 009 \
>>>>                010 011 012 013 014 015 016 017; do \
>>>>       ./ffmpeg -i ~/Desktop/vp8-test-vectors-r1/vp80-00-comprehensive-${files}.ivf \
>>>>               -v 0 -y -an -vcodec rawvideo -f md5 - 2>&1 | grep MD5>>    test.md5s
>>>> done
>>>> diff -u test.md5s ref.md5s&&    echo "Results identical"
>>>>
>>>>> But when I compile on my machine with gcc 4.4.3 amd -O3
>>>>> it seems clever enough to avoid conditional and division
>>>>> anyway.
>>>>> So now I believe in the correctness of the patch, I am
>>>>> just not so sure about the usefulness.
>>>>
>>>> Well, does it make it faster or lead to better assembly?
>>>
>>> A little bit of both:
>>>
>>> int uvmv_test_old (int mv)
>>> {
>>>      return (mv + (mv<  0 ? -2 : 2)) / 4;
>>>        90:       8b 54 24 04             mov    0x4(%esp),%edx
>>>        94:       89 d0                   mov    %edx,%eax
>>>        96:       c1 f8 1f                sar    $0x1f,%eax
>>>        99:       83 e0 fc                and    $0xfffffffc,%eax
>>>        9c:       8d 54 10 02             lea    0x2(%eax,%edx,1),%edx
>>>        a0:       89 d0                   mov    %edx,%eax
>>>        a2:       c1 f8 1f                sar    $0x1f,%eax
>>>        a5:       c1 e8 1e                shr    $0x1e,%eax
>>>        a8:       01 d0                   add    %edx,%eax
>>>        aa:       c1 f8 02                sar    $0x2,%eax
>>>        ad:       c3                      ret
>>> }
>>>
>>> int uvmv_test_new (int mv)
>>> {
>>>      return (mv + 2 + (mv>>  (INT_BIT-1)))>>  2;
>>>        b0:       8b 44 24 04             mov    0x4(%esp),%eax
>>>        b4:       89 c2                   mov    %eax,%edx
>>>        b6:       c1 fa 1f                sar    $0x1f,%edx
>>>        b9:       8d 44 10 02             lea    0x2(%eax,%edx,1),%eax
>>>        bd:       c1 f8 02                sar    $0x2,%eax
>>>        c0:       c3                      ret
>>> }
>>>
>>> Putting START/STOP_TIMER just around those two lines
>>> in inter_predict() funtion and decoding
>>> vp80-00-comprehensive-002.ivf
>>>
>>> without patch
>>>
>>> 680 dezicycles in uvmv, 2048 runs, 0 skips
>>>
>>> with patch
>>>
>>> 649 dezicycles in uvmv, 2048 runs, 0 skips
>>
>> Smaller _and_ faster: the choice is obvious.
>
> Indeed, patch OK. You have SVN access right?

Yes, patch applied.

It took me a while to convince svn to ask me for the password,
it always wanted to consult the GNOME keyring for it instead.
A command line tool ignores user input and prefers to ask
some desktop component? Whoever is responsible for that should
be slapped.
I am sure that wouldn't happen with git :)

Stefan