[FFmpeg-devel] [PATCH] use a 64-bit read in filter_mb_dir

Alexander Strange astrange
Sun Jan 24 12:00:24 CET 2010


On Jan 24, 2010, at 5:22 AM, Michael Niedermayer wrote:

> On Sun, Jan 24, 2010 at 01:05:26AM -0500, Alexander Strange wrote:
>> As in subject.
>> 
> 
>> h264_loopfilter.c |    2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 966858b5c64362599cbd9396fcade8da7532d3e1  0001-Use-64-bit-read-for-a-check-in-filter_mb_dir.patch
>> From 5a652e9d8c5b81eecbbb0a8d0492a9644801818c Mon Sep 17 00:00:00 2001
>> From: Alexander Strange <astrange at ithinksw.com>
>> Date: Sat, 23 Jan 2010 19:41:53 -0500
>> Subject: [PATCH 1/2] Use 64-bit read for a check in filter_mb_dir().
>> 
>> ---
>> libavcodec/h264_loopfilter.c |    2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>> 
>> diff --git a/libavcodec/h264_loopfilter.c b/libavcodec/h264_loopfilter.c
>> index 84a3464..3b80d4a 100644
>> --- a/libavcodec/h264_loopfilter.c
>> +++ b/libavcodec/h264_loopfilter.c
>> @@ -572,7 +572,7 @@ static av_always_inline void filter_mb_dir(H264Context *h, int mb_x, int mb_y, u
>>                 }
>>             }
>> 
>> -            if(bS[0]+bS[1]+bS[2]+bS[3] == 0)
>> +            if(*(uint64_t*)bS == 0)
> 
> this can cause a partial memory stall, also ive already tried that without
> seeing a speedgain. Is it faster for you?

Yes, but only by a little (not that I expected much).
Around the "Calculate bS" loop on x86-64 core2:
before:
7304 dezicycles in calculate bS, 4090 runs, 6 skips
7289 dezicycles in calculate bS, 8186 runs, 6 skips
7429 dezicycles in calculate bS, 16376 runs, 8 skips
7370 dezicycles in calculate bS, 32756 runs, 12 skips  
6892 dezicycles in calculate bS, 65514 runs, 22 skips  
6961 dezicycles in calculate bS, 131038 runs, 34 skips   

after:
7239 dezicycles in calculate bS, 4090 runs, 6 skips
7210 dezicycles in calculate bS, 8182 runs, 10 skips
7359 dezicycles in calculate bS, 16371 runs, 13 skip 
7300 dezicycles in calculate bS, 32751 runs, 17 skips    
6828 dezicycles in calculate bS, 65509 runs, 27 skips  
6901 dezicycles in calculate bS, 131035 runs, 37 skips 




More information about the ffmpeg-devel mailing list