[FFmpeg-devel] [PATCH] FFV1 rectangular slice multithreading

Thu Oct 14 23:02:19 CEST 2010

On Thu, Oct 14, 2010 at 1:58 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Thu, Oct 14, 2010 at 01:16:06PM -0700, Jason Garrett-Glaser wrote:
>> On Thu, Oct 14, 2010 at 8:09 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> > On Thu, Oct 14, 2010 at 06:33:08AM -0700, Jason Garrett-Glaser wrote:
>> >> On Thu, Oct 14, 2010 at 5:59 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> >> > Hi
>> >> >
>> >> > Following patchset makes ffv1.2 much faster on multiprocessor systems
>> >> > (requires version to be set to 2, which needs you to edit the source if you
>> >> > ?want to try as 1.2 bitstream is not finalized yet)
>> >> >
>> >> > Compression wise 4 slices with foreman and large gops (300 frames) perform
>> >> > slightly better (0.05% IIRC) than 1 slice.
>> >> > With small gops (50 frames) compression is worse with the rangecoder and the
>> >> > large context model by 0.8% otherwise better too.
>> >> > (its quite obvious why its worse in that case and ill be working on that ...)
>> >> >
>> >> > Comments welcome, bikesheds not, and ill apply this soon
>> >>
>> >> >+ ? ?if(f->num_h_slices > 256U || f->num_v_slices > 256U){
>> >>
>> >> The max slices is 256, but this allows for up to 65,536, which doesn't
>> >> seem right.
>> >
>> > oops, fixed locally with
>> > + ? ?if(f->num_h_slices > 256U || f->num_v_slices > 256U || f->num_h_slices*f->num_v_slices > MAX_SLICES){
>> > - ? ?if(f->num_h_slices > 256U || f->num_v_slices > 256U){
>>
>> Isn't the former check unnecessary? ?If either num_h_slices or
>> num_v_slices exceeds 256, the latter check will also be triggered,
>> unless one of the slice counts is 0 (which is equally invalid).
>
> integer overflow ... now i could have casted to uint64_t but i wanted to throw
> this out and store x/y/w/h per slice, not that iam planing to do anything
> with that but the overhead is small and its more flexible for the bitstream
>
>
>>
>> Also while you're playing with FFV1, I mucked with the contexts a
>> while back. ?I was never able to get a very large improvement, but
>> here's some of the ideas I tried:
>>
>> 1. ?Base some (or all) of the contexts on previous residual values
>> instead of neighboring pixels (it was almost equivalent in
>> compression, but I'm curious how much a combination of that and the
>> current approach could help). ?The bonus of this method is you can
>> combine it with FFV2's pixel ordering to allow decode/encode SIMD of
>> the median prediction.
>
>
>>
>> 2. ?Base some of the contexts on the actual pixel values, e.g. an [8]
>> based on a quantized luma range of the average neighbor range.
>
> that should be easy to try.
>
>
>>
>> 3. ?"Blended" contexts -- in addition to reading the relevant context,
>> read all the "neighboring" contexts too, and do a weighted average of
>> some sort. ?This is equivalent to blending on context update, IIRC.
>
> in the large context model we have 5 quantized inputs, that makes if we use
> naive bilinear style interpolation in higher dimension
> interpolation in a 5d hypercube with 32 points, iam not sure how fast that
> would be.
> also if we just consider a 1d context from 10..0 and the next from 1..11
> and have 99% of actual values be 0 then the simple blending i was thinking of
> really could behave poorly if 1..11 was quite different from 10..0
> but i dont know what you had in mind exactly ...

Hmm, then blended reading probably needs to be combined with blended
updates of some sort... or some way of ignoring contexts which haven't
been used yet.

For that matter, do we have any statistics on how often the contexts
are used?  If a very large portion of contexts are never used, a
hash-based context table might actually be sane and let us get away
with even more contexts.

Dark Shikari