[FFmpeg-devel] [PATCH] cpu: Limit the number of auto threads in 32 bit builds

Martin Storsjö martin at martin.st
Mon Sep 5 14:50:21 EEST 2022


On Mon, 5 Sep 2022, Andreas Rheinhardt wrote:

> Martin Storsjö:
>> Limit the returned value from av_cpu_count to sensible amounts
>> in 32 bit builds.
>> 
>> This chosen limit, 64, is somewhat arbitrary - a 32 bit process
>> is capable of creating much more than 64 threads. But in many
>> cases, multiple parts of the encoding pipeline (decoder, filters,
>> encoders) all create a pool of threads, auto sized according to the
>> number of cores.
>> 
>> In one failing test, the process had managed to create 506 threads
>> before a pthread_create call failed.
>> 
>> In the current set of fate tests, the filter-lavd-scalenorm test
>> seems to be the limiting factor; in a 32 bit build (arm linux,
>> running on an aarch64 kernel), it starts failing with an auto
>> thread count somewhere around 85. Therefore, pick the maximum
>> with some margin below this.
>> 
>> This fixes running fate without any manually set number of threads
>> in 32 bit builds on machines with huge numbers of cores.
>> ---
>>  libavutil/cpu.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>> 
>> diff --git a/libavutil/cpu.c b/libavutil/cpu.c
>> index 0035e927a5..094bd71d3d 100644
>> --- a/libavutil/cpu.c
>> +++ b/libavutil/cpu.c
>> @@ -233,6 +233,12 @@ int av_cpu_count(void)
>>      nb_cpus = sysinfo.dwNumberOfProcessors;
>>  #endif
>> 
>> +#if SIZE_MAX <= UINT32_MAX
>> +    // Avoid running out of memory/address space in 32 bit builds, by
>> +    // limiting the number of auto threads.
>> +    nb_cpus = FFMIN(nb_cpus, 64);
>> +#endif
>> +
>>      if (!atomic_exchange_explicit(&printed, 1, memory_order_relaxed))
>>          av_log(NULL, AV_LOG_DEBUG, "detected %d logical cores\n", nb_cpus);
>> 
>
> I don't think we should be lying in libavutil/cpu.c.

> We should instead limit the number of threads in the functions that 
> actually create threads based upon the return value of av_cpu_count(); 
> e.g. both frame_thread_encoder.c (limit 64) as well as pthread_frame.c 
> and pthread_slice.c (limit 16) already limit these numbers. 
> lavu/slicethread.c doesn't seem to do so.

Right, that's probably the more sensible thing to do.

For something like lavu/slicethread.c, it probably makes sense with a 
low-ish limit, like 16 or 32 - splitting up an individual frame in more 
slices than that probably doesn't make sense as automatic default, right?

// Martin


More information about the ffmpeg-devel mailing list