[FFmpeg-devel] [PATCH] What is missing for working AVFMT_FLAG_NONBLOCK?

Tue Mar 3 19:06:25 CET 2009

Hi.

Le tridi 13 vent?se, an CCXVII, Michael Niedermayer a ?crit?:
> are you saying that sched_yield() will blow up and cause actual undefined
> behaviour and that we should use usleep(0) or what else?
> I really feel that sched_yield() is correct and usleep() is not.
> thats IMHO from reading POSIX and the man pages

I remember a message on lkml where someone really good explained that
sched_yield is very rarely useful, and that most of the times people use it
wrongly.

What sched_yield does is the following:

It several processes are competing to hog one CPU and have the same
priority, each in turn get some CPU time, and when that time is exhausted,
they are preempted and the CPU goes to the next one.

If process #1 calls sched_yield, it has the same effect as if it were
preempted after exhausting its time slice: the CPU goes to the next process,
and comes back to process #1 when all process have exhausted their time
slice.

If process #1 is busy-waiting to something, and there are other
CPU-intensive processes with the same priority, then sched_yield has the
desired effect: process #1 will run just as long as necessary to see if
something new happened and let other processes run most of the time.

But the "correct" behaviour of sched_yield stops here.

If there are no other CPU-intensive processes, or if all other such
processes have a lower priority, then sched_yield is just a no-operation,
and the process hogs the CPU.

The man page of Linux implementation gives some hints of the real purpose of
sched_yield:

       Strategic calls to sched_yield()  can  improve  performance  by  giving
       other  threads  or  processes  a chance to run when (heavily) contended
       resources (e.g., mutexes) have been  released  by  the  caller.

As for the solutions to the current problem:

usleep(0) is useless, Single Unix says "If the value of useconds is 0, then
the call has no effect."

usleep(1000) will mostly work, but will have two opposite problems:

- 1000 may be too big. For example, the ALSA capture device sets the period
  time of the ALSA PCM device to the lowest possible value, to allow
  low-latency programs to work. But with some devices, that may be really
  low: with my Intel HDA, the sample rate can be 192kHz and the period is 32
  samples, which means reads every 1/6000 of a second.

- 1000 may be to low: in practice, most of the device return much less than
  1000 packets per second, which means a lot of useless wakes. On embedded
  devices, that means power consumption, for example.

One possible solution could be to keep one device (preferably the fastest)
in blocking mode, and poll the other devices in non-blocking mode when it
returns data. This is an ugly hack, but it would probably work quite well.

Another, much cleaner solution, would be to manage to get real Unix file
descriptors for the devices and poll() them.

But I am not sure that all devices can actually be tested that way. For
example, mmaped V4L1, as far as I remember, uses a blocking ioctl, which can
not be poll()ed. In fact, I am not sure it can be set to non-blocking-mode
either.

Furthermore, this is really less portable: I would be very surprised if VFW
capture could be poll()ed, for example.

The only real solution to the problem of simultaneous capture from several
devices is using threads.

This is not a very complex solution either: each device in its own thread,
in blocking mode; whenever a packet is read, it is added in an asynchronous
message queue and the main program is blocked waiting on that message queue.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090303/0e67cf53/attachment.pgp>