SerialQueue profiling

Well, I was curious about where IO latency comes from (STM32H7 USB).
I’ve spent some time trying to find a tool which would allow to profile mutex locks within serialqueue reactor. (doesn’t work, uprobes is not avaliable in the RPI kernel).

Well, anyway, I made something guffy to find if there is latency here. The idea is:

int pthread_mutex_lock_wrp (pthread_mutex_t *__mutex, int line) {
    struct serialqueue *sq = container_of(__mutex, struct serialqueue, lock);
    double start = get_monotonic();
    pthread_mutex_lock(__mutex);
    double end = get_monotonic();
    if (end - start > 0.000010) {
        fprintf(stderr, "line: %i, hold by: %i, mutex_wait: %f\n", line, sq->owner, end - start);
    }
    sq->owner = line;
    return 0;
}

(I assume that occasional printing to journald is “free” (50us)).

I was trying to find out why io (like debug_ping), the send-receive time can be large >500us. It can be kernel scheduling, IO, that stuff, but who knows.
At least this is not a mutex lock here.

If I try to run some “printing”, there are occasional jumps. Yes, but generally they are within tens of microseconds.

Probably, the command_event(), can be made to release the lock more often,
But generally, it seems that the current locking scheme is working great.

Sep 18 00:03:26 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000022
Sep 18 00:04:32 v-core-3 python[17822]: line: 232, hold by: 857, mutex_wait: 0.000011
Sep 18 00:07:25 v-core-3 python[17822]: line: 232, hold by: 857, mutex_wait: 0.000024
Sep 18 00:09:27 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000011
Sep 18 00:09:33 v-core-3 python[17822]: line: 810, hold by: 857, mutex_wait: 0.000042
Sep 18 00:13:53 v-core-3 python[17822]: line: 232, hold by: 232, mutex_wait: 0.000013
Sep 18 00:13:55 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000012
Sep 18 00:13:55 v-core-3 python[17822]: line: 810, hold by: 857, mutex_wait: 0.000381
Sep 18 00:13:56 v-core-3 python[17822]: line: 810, hold by: 857, mutex_wait: 0.000033
Sep 18 00:13:58 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:13:58 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000010
Sep 18 00:13:58 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000010
Sep 18 00:13:59 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:13:59 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000010
Sep 18 00:13:59 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000015
Sep 18 00:13:59 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000010
Sep 18 00:14:01 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000010
Sep 18 00:14:01 v-core-3 python[17822]: line: 783, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:01 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000019
Sep 18 00:14:01 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000028
Sep 18 00:14:01 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000176
Sep 18 00:14:01 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:01 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:01 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:02 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:02 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:02 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:02 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:03 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000027
Sep 18 00:14:03 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000476
Sep 18 00:14:03 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:03 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:04 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:04 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000029
Sep 18 00:14:04 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000096
Sep 18 00:14:04 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000016
Sep 18 00:14:04 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000010
Sep 18 00:14:04 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:05 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:05 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:05 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:05 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:05 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:06 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000049
Sep 18 00:14:06 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000015
Sep 18 00:14:06 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:06 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:07 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:07 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000017
Sep 18 00:14:07 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:07 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:08 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:08 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:08 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:08 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:08 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:08 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:09 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:09 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:09 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:09 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:09 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:10 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:10 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:10 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:10 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:10 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:11 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:11 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:11 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:11 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:11 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:12 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:12 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:12 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:12 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:12 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000015
Sep 18 00:14:13 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:13 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000026
Sep 18 00:14:13 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000142
Sep 18 00:14:13 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:13 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:13 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:14 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000021
Sep 18 00:14:14 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000015
Sep 18 00:14:14 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:14 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:14 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:14 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:15 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:15 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:15 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:15 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:15 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:16 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:16 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:16 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:16 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:16 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:17 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:17 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000027
Sep 18 00:14:17 v-core-3 python[17822]: line: 857, hold by: 810, mutex_wait: 0.000083
Sep 18 00:14:17 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000015
Sep 18 00:14:17 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:17 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:17 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:18 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:18 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000027
Sep 18 00:14:18 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000098
Sep 18 00:14:18 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000016
Sep 18 00:14:18 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:18 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:18 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:19 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000015
Sep 18 00:14:19 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000029
Sep 18 00:14:19 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000144
Sep 18 00:14:19 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:19 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:19 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:20 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:20 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:20 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:20 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:20 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:21 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000028
Sep 18 00:14:21 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000445
Sep 18 00:14:21 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:21 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:21 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:22 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:22 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000021
Sep 18 00:14:22 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:22 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:23 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:23 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000030
Sep 18 00:14:23 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000139
Sep 18 00:14:23 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:23 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:23 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:24 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:24 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000016
Sep 18 00:14:24 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:24 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:24 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:25 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:25 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000062
Sep 18 00:14:25 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000143
Sep 18 00:14:25 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:25 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:25 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:26 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:26 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000017
Sep 18 00:14:26 v-core-3 python[17822]: line: 857, hold by: 810, mutex_wait: 0.000108
Sep 18 00:14:26 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:26 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:26 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:27 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000010
Sep 18 00:14:27 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000030
Sep 18 00:14:27 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000097
Sep 18 00:14:27 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:27 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:27 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:29 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000010
Sep 18 00:14:29 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:29 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:29 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:29 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:30 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000010
Sep 18 00:14:30 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000027
Sep 18 00:14:30 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000103
Sep 18 00:14:30 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:30 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:30 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:31 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000010
Sep 18 00:14:31 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000029
Sep 18 00:14:31 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000163
Sep 18 00:14:31 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:31 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:31 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:32 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:32 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:32 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:32 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:33 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000015
Sep 18 00:14:33 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:33 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:33 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:34 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:34 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:34 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000010
Sep 18 00:14:35 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:35 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:35 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000010
Sep 18 00:14:36 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:36 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:36 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:37 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:37 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:37 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:37 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:37 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:38 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000028
Sep 18 00:14:38 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000451
Sep 18 00:14:38 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:38 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000011
Sep 18 00:14:38 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:39 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:39 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000030
Sep 18 00:14:39 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000135
Sep 18 00:14:39 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:39 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:39 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:40 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000010
Sep 18 00:14:40 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000016
Sep 18 00:14:40 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000027
Sep 18 00:14:40 v-core-3 python[17822]: line: 857, hold by: 595, mutex_wait: 0.000102
Sep 18 00:14:40 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:40 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:40 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013
Sep 18 00:14:41 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000014
Sep 18 00:14:41 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000016
Sep 18 00:14:41 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000015
Sep 18 00:14:41 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:41 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000012
Sep 18 00:14:41 v-core-3 python[17822]: line: 810, hold by: 595, mutex_wait: 0.000013

From the strace standpoint, IO is also generally good:

$ strace -T -p 17906
write(12, "\6\37\5\256\36~", 6)         = 6 <0.000023>
write(12, "\v\20)\0\2\340\0\6\233\333~", 11) = 11 <0.000023>
write(12, "\6\21\0054\16~", 6)          = 6 <0.000025>
write(12, "\v\22)\0\2\340\0\6\254`~", 11) = 11 <0.000019>
write(12, "\6\23\5\7\276~", 6)          = 6 <0.000018>
write(12, "\v\24)\0\2\340\0\6\364\255~", 11) = 11 <0.000018>
write(12, "\6\25\5Sn~", 6)              = 6 <0.000020>
...
read(12, "\7\27t\0\3124~\5\27\352>~", 4096) = 12 <0.000008>
read(18, ".", 4096)                     = 1 <0.000006>
read(12, "\7\30t\0\200\363~\5\30\22\311~", 4096) = 12 <0.000008>
read(18, ".", 4096)                     = 1 <0.000006>
read(12, "\7\31t\0\332/~\5\31\3@~", 4096) = 12 <0.000013>
read(18, ".", 4096)                     = 1 <0.000010>
read(12, "\7\32t\0005K~\5\0321\333~", 4096) = 12 <0.000012>
read(18, ".", 4096)                     = 1 <0.000010>
read(12, "\7\33t\0o\227~\5\33 R~", 4096) = 12 <0.000011>
read(18, ".", 4096)                     = 1 <0.000010>
read(12, "\7\34t\0\343\222~\5\34T\355~", 4096) = 12 <0.000043>
read(18, ".", 4096)                     = 1 <0.000009>
read(12, "\7\35t\0\271N~\5\35Ed~", 4096) = 12 <0.000031>
...
ppoll([{fd=12, events=POLLIN|POLLHUP}, {fd=18, events=POLLIN|POLLHUP}], 2, {tv_sec=0, tv_nsec=0}, NULL, 0) = 0 (Timeout) <0.000011>
ppoll([{fd=12, events=POLLIN|POLLHUP}, {fd=18, events=POLLIN|POLLHUP}], 2, {tv_sec=0, tv_nsec=25000000}, NULL, 0) = 1 ([{fd=12, revents=POLLIN}], left {tv_sec=0, tv_nsec=24444388}) <0.000571>
ppoll([{fd=12, events=POLLIN|POLLHUP}, {fd=18, events=POLLIN|POLLHUP}], 2, {tv_sec=0, tv_nsec=0}, NULL, 0) = 0 (Timeout) <0.000037>
ppoll([{fd=12, events=POLLIN|POLLHUP}, {fd=18, events=POLLIN|POLLHUP}], 2, {tv_sec=1, tv_nsec=0}, NULL, 0) = 1 ([{fd=18, revents=POLLIN}], left {tv_sec=0, tv_nsec=999998370}) <0.000035>
ppoll([{fd=12, events=POLLIN|POLLHUP}, {fd=18, events=POLLIN|POLLHUP}], 2, {tv_sec=0, tv_nsec=0}, NULL, 0) = 0 (Timeout) <0.000011>
ppoll([{fd=12, events=POLLIN|POLLHUP}, {fd=18, events=POLLIN|POLLHUP}], 2, {tv_sec=0, tv_nsec=25000000}, NULL, 0) = 1 ([{fd=12, revents=POLLIN}], left {tv_sec=0, tv_nsec=24584721}) <0.000425>

ppoll, seems to be fast if there is data, if not, it simply blocks.
It is strange to see ~500us here.

Well, if I go deeper (usbmon — The Linux Kernel documentation):

# S - submission, C - callback
# Bi - bulk in, Bo - bulk out
# URB                useconds        
ffffff80c14d46c0 3486284658 C Bi:1:008:1 0 12 = 07127400 f3897e05 12bd937e
ffffff80c14d46c0 3486284662 S Bi:1:008:1 -115 128 <
ffffff80c0900c00 3486284791 S Bo:1:008:2 -115 7 = 07120b00 80857e
ffffff80c0900c00 3486284836 C Bo:1:008:2 0 7 >
ffffff80c14d4480 3486284881 C Bi:1:008:1 0 12 = 07137400 a9557e05 13ac1a7e
ffffff80c14d4480 3486284885 S Bi:1:008:1 -115 128 <
ffffff80c0900c00 3486285013 S Bo:1:008:2 -115 7 = 07130b00 da597e
ffffff80c0900c00 3486285614 C Bo:1:008:2 0 7 >
ffffff80c14d4900 3486285659 C Bi:1:008:1 0 12 = 07147400 25507e05 14d8a57e
ffffff80c14d4900 3486285662 S Bi:1:008:1 -115 128 <
ffffff80c0900c00 3486285776 S Bo:1:008:2 -115 7 = 07140b00 565c7e
ffffff80c0900c00 3486285817 C Bo:1:008:2 0 7 >
ffffff80c14d4a80 3486285863 C Bi:1:008:1 0 12 = 07157400 7f8c7e05 15c92c7e
ffffff80c14d4a80 3486285865 S Bi:1:008:1 -115 128 <
ffffff80c0900c00 3486285996 S Bo:1:008:2 -115 7 = 07150b00 0c807e
ffffff80c0900c00 3486286615 C Bo:1:008:2 0 7 >
ffffff80c14d4b40 3486286660 C Bi:1:008:1 0 12 = 07167400 90e87e05 16fbb77e
ffffff80c14d4b40 3486286663 S Bi:1:008:1 -115 128 <

We are interested in the microseconds mostly, and it seems that between C and S:
(8 + 45 + 4 + 101 + 3 + 41 + 2 + 19 + 3) / 9 ~= 25us
With min ~3us, and max ~100us.

I would expect slightly better timings.
But still, it is less than 500us.

(Wow, I hit the message size limit, 32k characters).

Continuation:

“Where is my money time lebowski kernel?”
Docs: Linux timing and scheduling granularity | Frits Hoogland Weblog

$ cd /sys/kernel/debug/tracing
$ echo __arm64_sys_ppoll | sudo tee set_ftrace_filter
# serialq thread pid
$ echo 17906 | sudo tee  set_ftrace_pid
$ echo function_graph | sudo tee current_tracer
$ echo 1 | sudo tee tracing_on
$ sudo cat ./trace | less
 0) ! 750.870 us  |  __arm64_sys_ppoll();
 0)   0.814 us    |  __arm64_sys_ppoll();
 0) + 60.130 us   |  __arm64_sys_ppoll();
 0)   0.760 us    |  __arm64_sys_ppoll();
 0) + 85.426 us   |  __arm64_sys_ppoll();
 0)   0.777 us    |  __arm64_sys_ppoll();
 0) + 54.389 us   |  __arm64_sys_ppoll();
 0)   0.740 us    |  __arm64_sys_ppoll();
 0) ! 767.296 us  |  __arm64_sys_ppoll();
 0)   0.778 us    |  __arm64_sys_ppoll();
# Hmmm
$ echo __arm64_sys_ppoll | sudo tee set_graph_function
$ echo | sudo tee set_ftrace_filter
$ echo | sudo tee trace
$ sudo cat ./trace | less
 3)               |  __arm64_sys_ppoll() {
 3)   0.259 us    |    get_timespec64();
 3)               |    ktime_get_ts64() {
 3)   0.259 us    |      arch_counter_read();
 3)   0.759 us    |    }
 3)   0.259 us    |    timespec64_add_safe();
 3)   0.259 us    |    set_user_sigmask();
 3)               |    do_sys_poll() {
 3)               |      select_estimate_accuracy() {
 3)               |        ktime_get_ts64() {
 3)   0.259 us    |          arch_counter_read();
 3)   0.741 us    |        }
 3)   0.259 us    |        set_normalized_timespec64();
 3)   2.000 us    |      }
 3)               |      fdget() {
 3)   0.241 us    |        __rcu_read_lock();
 3)   0.259 us    |        __rcu_read_unlock();
 3)   1.389 us    |      }
 3)               |      tty_poll() {
 3)               |        tty_ldisc_ref_wait() {
 3)   0.426 us    |          ldsem_down_read();
 3)   0.908 us    |        }
 3)               |        n_tty_poll() {
 3)               |          __pollwait() {
 3)               |            add_wait_queue() {
 3)   0.278 us    |              _raw_spin_lock_irqsave();
 3)   0.240 us    |              _raw_spin_unlock_irqrestore();
 3)   1.204 us    |            }
 3)   1.685 us    |          }
 3)               |          __pollwait() {
 3)               |            add_wait_queue() {
 3)   0.259 us    |              _raw_spin_lock_irqsave();
 3)   0.241 us    |              _raw_spin_unlock_irqrestore();
 3)   1.204 us    |            }
 3)   1.666 us    |          }
 3)               |          tty_buffer_flush_work() {
 3)               |            flush_work() {
 3)               |              __flush_work() {
 3)   0.259 us    |                __rcu_read_lock();
 3)   0.259 us    |                _raw_spin_lock_irq();
 3)   0.241 us    |                _raw_spin_unlock_irq();
 3)   0.241 us    |                __rcu_read_unlock();
 3)   2.167 us    |              }
 3)   2.648 us    |            }
 3)   3.130 us    |          }
 3)   0.241 us    |          tty_hung_up_p();
 3)   0.241 us    |          mutex_is_locked();
 3)               |          tty_chars_in_buffer() {
 3)               |            acm_tty_chars_in_buffer [cdc_acm]() {
 3)   0.259 us    |              _raw_spin_lock_irqsave();
 3)   0.260 us    |              _raw_spin_unlock_irqrestore();
 3)   1.222 us    |            }
 3)   1.704 us    |          }
 3) + 10.426 us   |        }
 3)               |        tty_ldisc_deref() {
 3)   0.259 us    |          ldsem_up_read();
 3)   0.815 us    |        }
 3) + 13.240 us   |      }
 3)   0.352 us    |      fput();
 3)               |      fdget() {
 3)   0.259 us    |        __rcu_read_lock();
 3)   0.241 us    |        __rcu_read_unlock();
 3)   1.444 us    |      }
 3)               |      pipe_poll() {
 3)               |        __pollwait() {
 3)               |          add_wait_queue() {
 3)   0.241 us    |            _raw_spin_lock_irqsave();
 3)   0.259 us    |            _raw_spin_unlock_irqrestore();
 3)   1.778 us    |          }
 3)   2.259 us    |        }
 3)   2.760 us    |      }

 3)   0.352 us    |      fput();
 3)               |      schedule_hrtimeout_range() {
 3)               |        schedule_hrtimeout_range_clock() {
 3)               |          hrtimer_init_sleeper() {
 3)   0.389 us    |            __hrtimer_init();
 3)   0.870 us    |          }
 3)               |          hrtimer_start_range_ns() {
 3)   0.259 us    |            _raw_spin_lock_irqsave();
 3)               |            get_nohz_timer_target() {
 3)   0.259 us    |              idle_cpu();
 3)   0.722 us    |            }
 3)   0.315 us    |            enqueue_hrtimer();
 3)   0.241 us    |            _raw_spin_unlock_irqrestore();
 3)   3.167 us    |          }
 3)               |          schedule() {
 3)               |            rcu_note_context_switch() {
 3)   0.240 us    |              rcu_qs();
 3)   0.722 us    |            }
 3)   0.241 us    |            _raw_spin_lock();
 3)   0.278 us    |            update_rq_clock();
 3)               |            dequeue_task_fair() {
 3)               |              dequeue_entities() {
 3)               |                dequeue_entity() {
 3)               |                  update_curr() {
 3)   0.260 us    |                    update_curr_se();
 3)   0.241 us    |                    update_min_vruntime();
 3)   0.260 us    |                    cpuacct_charge();
 3)               |                    __cgroup_account_cputime() {
 3)   0.278 us    |                      cgroup_base_stat_cputime_account_begin();
 3)   0.259 us    |                      cgroup_rstat_updated();
 3)   1.241 us    |                    }
 3)               |                    dl_server_update() {
 3)               |                      update_curr_dl_se() {
 3)   0.259 us    |                        dl_scaled_delta_exec();
 3)   0.741 us    |                      }
 3)   1.222 us    |                    }
 3)   4.667 us    |                  }
 3)   0.260 us    |                  vruntime_eligible();
 3)   0.278 us    |                  __update_load_avg_se();
 3)   0.278 us    |                  __update_load_avg_cfs_rq();
 3)               |                  update_entity_lag() {
 3)   0.259 us    |                    avg_vruntime();
 3)   0.723 us    |                  }
 3)   0.259 us    |                  update_cfs_group();
 3)   0.259 us    |                  update_min_vruntime();
 3)   8.648 us    |                }
 3)               |                dequeue_entity() {
 3)               |                  update_curr() {

 3)   0.259 us    |                    update_curr_se();
 3)   0.259 us    |                    __calc_delta.constprop.0();
 3)   0.259 us    |                    update_min_vruntime();
 3)   1.685 us    |                  }
 3)   0.259 us    |                  vruntime_eligible();
 3)   0.259 us    |                  __update_load_avg_se();
 3)   0.240 us    |                  __update_load_avg_cfs_rq();
 3)               |                  update_entity_lag() {
 3)   0.334 us    |                    avg_vruntime();
 3)   0.352 us    |                    __calc_delta.constprop.0();
 3)   1.370 us    |                  }
 3)   0.259 us    |                  update_cfs_group();
 3)   0.259 us    |                  update_min_vruntime();
 3)   6.759 us    |                }
 3)               |                dequeue_entity() {
 3)               |                  update_curr() {
 3)   0.240 us    |                    update_curr_se();
 3)   0.241 us    |                    __calc_delta.constprop.0();
 3)   0.259 us    |                    update_min_vruntime();
 3)   1.797 us    |                  }
 3)   0.259 us    |                  vruntime_eligible();
 3)   0.260 us    |                  __update_load_avg_se();
 3)   0.259 us    |                  __update_load_avg_cfs_rq();
 3)               |                  update_entity_lag() {
 3)   0.260 us    |                    avg_vruntime();
 3)   0.334 us    |                    __calc_delta.constprop.0();
 3)   1.407 us    |                  }
 3)   0.259 us    |                  update_cfs_group();
 3)   0.260 us    |                  update_min_vruntime();
 3)   6.926 us    |                }
 3)               |                dl_server_stop() {
 3)   0.241 us    |                  __dequeue_dl_entity();
 3)               |                  task_non_contending() {
 3)   0.259 us    |                    hrtimer_active();
 3)               |                    hrtimer_start_range_ns() {
 3)   0.259 us    |                      _raw_spin_lock_irqsave();
 3)               |                      ktime_get() {
 3)   0.260 us    |                        arch_counter_read();
 3)   0.759 us    |                      }
 3)               |                      get_nohz_timer_target() {
 3)   0.241 us    |                        idle_cpu();
 3)   0.814 us    |                      }
 3)   0.333 us    |                      enqueue_hrtimer();
 3)   0.241 us    |                      _raw_spin_unlock_irqrestore();
 3)   4.018 us    |                    }
 3)   5.148 us    |                  }
 3)               |                  hrtimer_try_to_cancel() {
 3)   0.259 us    |                    hrtimer_active();
 3)               |                    hrtimer_try_to_cancel.part.0() {
 3)   0.259 us    |                      _raw_spin_lock_irqsave();
 3)   0.278 us    |                      __remove_hrtimer();
 3)   0.240 us    |                      _raw_spin_unlock_irqrestore();
 3)   1.834 us    |                    }
 3)   2.982 us    |                  }
 3)   9.463 us    |                }
 3) + 33.000 us   |              }

 3)   0.259 us    |              hrtick_update();
 3) + 33.981 us   |            }
 3)               |            pick_next_task_fair() {
 3)   0.352 us    |              pick_task_fair();
 3)               |              sched_balance_newidle() {
 3)   0.241 us    |                __rcu_read_lock();
 3)   0.241 us    |                __msecs_to_jiffies();
 3)   0.240 us    |                __rcu_read_unlock();
 3)   1.889 us    |              }
 3)   2.944 us    |            }
 3)   0.259 us    |            pick_task_idle();
 3)               |            put_prev_task_fair() {
 3)   0.260 us    |              put_prev_entity();
 3)   0.259 us    |              put_prev_entity();
 3)   0.241 us    |              put_prev_entity();
 3)   1.685 us    |            }
 3)   0.259 us    |            set_next_task_idle();
 3)               |            psi_task_switch() {
 3)   0.241 us    |              psi_flags_change();
 3)               |              psi_group_change() {
 3)   0.259 us    |                record_times();
 3)   0.797 us    |              }
 3)               |              psi_group_change() {
 3)   0.259 us    |                record_times();
 3)   0.778 us    |              }
 3)               |              psi_group_change() {
 3)   0.259 us    |                record_times();
 3)   0.778 us    |              }
 3)   3.796 us    |            }
 3)   0.352 us    |            _raw_spin_lock_irqsave();
 3)   0.240 us    |            _raw_spin_unlock_irqrestore();
 3)               |            finish_task_switch.isra.0() {
 3)   0.259 us    |              _raw_spin_unlock();
 3)   0.778 us    |            }
 3) ! 726.240 us  |          }
 3)               |          hrtimer_try_to_cancel.part.0() {
 3)   0.352 us    |            _raw_spin_lock_irqsave();
 3)   0.352 us    |            __remove_hrtimer();
 3)   0.389 us    |            _raw_spin_unlock_irqrestore();
 3)   2.000 us    |          }
 3) ! 733.481 us  |        }
 3) ! 734.074 us  |      }
 3)               |      fdget() {
 3)   0.259 us    |        __rcu_read_lock();
 3)   0.259 us    |        __rcu_read_unlock();
 3)   1.407 us    |      }
 3)               |      tty_poll() {
 3)               |        tty_ldisc_ref_wait() {
 3)   0.296 us    |          ldsem_down_read();
 3)   0.981 us    |        }

 3)               |        n_tty_poll() {
 3)   0.241 us    |          tty_hung_up_p();
 3)   0.333 us    |          mutex_is_locked();
 3)               |          tty_chars_in_buffer() {
 3)               |            acm_tty_chars_in_buffer [cdc_acm]() {
 3)   0.296 us    |              _raw_spin_lock_irqsave();
 3)   0.334 us    |              _raw_spin_unlock_irqrestore();
 3)   1.407 us    |            }
 3)   1.981 us    |          }
 3)               |          tty_write_room() {
 3)               |            acm_tty_write_room [cdc_acm]() {
 3)   0.260 us    |              _raw_spin_lock_irqsave();
 3)   0.260 us    |              _raw_spin_unlock_irqrestore();
 3)   1.222 us    |            }
 3)   1.704 us    |          }
 3)   5.833 us    |        }
 3)               |        tty_ldisc_deref() {
 3)   0.259 us    |          ldsem_up_read();
 3)   0.722 us    |        }
 3)   8.685 us    |      }
 3)   0.278 us    |      fput();
 3)               |      fdget() {
 3)   0.259 us    |        __rcu_read_lock();
 3)   0.241 us    |        __rcu_read_unlock();
 3)   1.296 us    |      }
 3)   0.259 us    |      pipe_poll();
 3)   0.260 us    |      fput();
 3)               |      poll_freewait() {
 3)               |        remove_wait_queue() {
 3)   0.297 us    |          _raw_spin_lock_irqsave();
 3)   0.259 us    |          _raw_spin_unlock_irqrestore();
 3)   1.537 us    |        }
 3)   0.352 us    |        fput();
 3)               |        remove_wait_queue() {
 3)   0.296 us    |          _raw_spin_lock_irqsave();
 3)   0.241 us    |          _raw_spin_unlock_irqrestore();
 3)   1.500 us    |        }
 3)   0.259 us    |        fput();
 3)               |        remove_wait_queue() {
 3)   0.259 us    |          _raw_spin_lock_irqsave();
 3)   0.352 us    |          _raw_spin_unlock_irqrestore();
 3)   1.426 us    |        }
 3)   0.260 us    |        fput();
 3)   7.148 us    |      }
 3) ! 779.593 us  |    }
 3)               |    poll_select_finish() {
 3)               |      ktime_get_ts64() {
 3)   0.277 us    |        arch_counter_read();
 3)   0.889 us    |      }
 3)   0.259 us    |      set_normalized_timespec64();
 3)   0.259 us    |      put_timespec64();
 3)   2.611 us    |    }
 3) ! 785.482 us  |  }

For me, who is not very familiar with kernel tracing,
It looks like it really wastes time on block + switching task + rescheduling.

(I rebooted the machine)

Well, rt and priority tricks do not help to reduce RTT or make it react faster on average.

Hope someone finds this curious.

Thanks.


I’ve reused/updated the branch to print data from serialqueue sent/recv and reactor separately: Klipper: communication bus tests
Well, to spam the pings and get the average latency.