[FFmpeg-devel] [PATCH 4/7] checkasm: use pointers for start/stop functions
Lynne
dev at lynne.ee
Mon Jul 17 20:48:40 EEST 2023
Jul 17, 2023, 07:18 by remi at remlab.net:
> Le sunnuntaina 16. heinäkuuta 2023, 23.32.21 EEST Lynne a écrit :
>
>> Introducing additional overhead in the form of a dereference is a point
>> where instability can creep in. Can you guarantee that a context will
>> always remain in L1D cache,
>>
>
> L1D is not involved here. In version 2, the pointers are cached locally.
>
>> as opposed to just reading the raw CPU timing
>> directly where that's supported.
>>
>
> Of course not. Raw CPU timing is subject to noise from interrupts (and
> whatever those interrupts trigger). And that's not just theoretical. I've
> experienced it and it sucks. Raw CPU timing is much noisier than Linux perf.
>
> And because it has also been proven vastly insecure, it's been disabled on Arm
> for a long time, and is being disabled on RISC-V too now.
>
>> > But I still argue that that is, either way, completely negligible compared
>> > to the *existing* overhead. Each loop is making 4 system calls, and each
>> > of those system call requires a direct call (to PLT) and an indirect
>> > branch (from GOT). If you have a problem with the two additional function
>> > calls, then you can't be using Linux perf in the first place.
>>
>> You don't want to ever use linux perf in the first place, it's second class.
>>
>
> No it isn't. The interface is more involved than just reading a CSR; and sure
> I'd prefer the simple interface that RDCYCLE is all other things being equal.
> But other things are not equal. Linux perf is in fact *more* accurate by
> virtue of not *wrongly* counting other things. And it does not threaten the
> security of the entire system, so it will work inside a rented VM or an
> unprivileged process.
>
Threaten? This is a development tool first and foremost.
If anyone doesn't want to use rdcycle, they can use linux perf, it still works,
with or without the patch.
>> I don't think it's worth changing the direct inlining we had before. You're
>> not interested in whether or not the same exact code is ran between
>> platforms,
>>
>
> Err, I am definitely interested in doing exactly that. I don't want to have to
> reconfigure and recompile the entire FFmpeg just to switch between Linux perf
> and raw cycle counter. A contrario, I *do* want to compare performance between
> vendors once the hardware is available.
>
That's a weak reason to compromise the accuracy of a development tool.
>> just that the code that's measuring timing is as efficient and
>> low overhead as possible.
>>
>
> Of course not. Low overhead is irrelevant here. The measurement overhead is
> know and is subtracted. What we need is stable/reproducible overhead, and
> accurate measurements.
>
Which is what TSC or the equivalent gets you. It's noisy, but that's because
it's better and higher accuracy than having to roundtrip through the kernel.
> And that's assuming the stuff works at all. You can argue that we should use
> Arm PMU and RISC-V RDCYCLE, and that Linux perf sucks, all you want. PMU
> access will just throw a SIGILL and end the checkasm process with zero
> measurements. The rest of the industry wants to use system calls for informed
> reasons. I don't think you, or even the whole FFmpeg project, can win that
> argument against OS and CPU vendors.
>
Either way, I don't agree with this patch, not accepting it.
More information about the ffmpeg-devel
mailing list