bpftraceを使用してコマンドラインからカーネルをデバッグする

これは、(「小さなもののためのBPF」シリーズの別の記事で012)と近代的なツールでなぞるLinux上での実用的な一連の記事の最初の。







それから、プログラムと言語について学びますbpftrace



。BPFについてまったく何も知らなくても、実用的な観点からBPFの世界に飛び込む最も簡単な方法です。このユーティリティをbpftrace



使用すると、コマンドラインから直接単純な言語を使用してハンドラプログラムを作成し、それらを膨大な数のカーネルお​​よびユーザースペースイベントに接続できます。以下のKPDVを見てください...おめでとうございます、あなたはすでにシステムコールを追跡する方法を知っていますbpftrace









シリーズの以前の記事とは異なり、これは比較的短く、その主要部分はチュートリアルの形式で書かれているため、5分間読んだ後、コマンドから直接Linuxカーネルのイベントに関するハンドラーを作成して統計を収集できますライン。記事の最後では、代替案ply



とBCCについて説明しています。「5分間読んだ後」というフレーズの分は、Mercurialを意味します。ユニークなスキルは、5分間読んだ後に現れるとは限りません。









コンテンツ





bpftraceのインストール



. sudo apt install bpftrace



( ). bpftrace



, .







. bpftrace



, , , , BTF. , bpftrace



.







master- bpftrace docker image bpftrace



. , :







$ docker pull quay.io/iovisor/bpftrace:latest
$ cd /tmp
$ docker run -v $(pwd):/o quay.io/iovisor/bpftrace:latest /bin/bash -c "cp /usr/bin/bpftrace /o"
$ sudo ./bpftrace -V
bpftrace v0.11.4
      
      





bpftrace glibc



, docker image glibc



.







, . , execve(2)



, :







$ sudo ./bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%s -> %s\n", comm, str(uptr(args->filename))); }'
Attaching 1 probe...
bash -> /bin/echo
bash -> /usr/bin/ls
gnome-shell -> /bin/sh
sh -> /home/aspsk/bin/geeqie
sh -> /usr/local/sbin/geeqie
...
      
      





, , , :







$ sudo mv /tmp/bpftrace /usr/local/bin
      
      





, , , BTF, bpftrace



, docker . bpftrace



. .







. bpftrace



/ BTF, kernel headers. , bpftrace



.







BTF, , BTF kernel headers, , , . BTF , .







?



. , , , . , . : « ?!» , ,







$ sudo apt install bpftrace
The following NEW packages will be installed:
  bpftrace
Preparing to unpack .../bpftrace_0.11.0-1_amd64.deb ...
Unpacking bpftrace (0.11.0-1) ...
Setting up bpftrace (0.11.0-1) ...
Processing triggers for man-db (2.9.3-2) ...
$
      
      





, bpftrace



. ? , bpftrace



. - :









, , - , .







Bpftrace: hello world



bpftrace



awk , BEGIN



END



, bpftrace



, . :







# bpftrace -e 'BEGIN { printf("Hello world!\n"); }'
Attaching 1 probe...
Hello world!
^C
      
      





"Hello world!"



. , Ctrl-C



, bpftrace



— . bpftrace



exit



. , END



:







# bpftrace -e '
 BEGIN { printf("Hello world!\n"); exit(); }
 END { printf("So long\n"); }
 '
Attaching 2 probes...
Hello world!
So long
      
      





Kprobes —



— , , , , kprobes (Kernel Probe — ) , , . , , kprobes



, , , , Linux BPF.







, schedule



:







$ sudo bpftrace -e '
    k:schedule { @[comm] = count(); }
    i:s:2 { exit();}
    END { print(@, 10); clear(@); }
'
Attaching 3 probes...
@[Timer]: 147
@[kworker/u65:0]: 147
@[kworker/7:1]: 153
@[kworker/13:1]: 158
@[IPC I/O Child]: 170
@[IPC I/O Parent]: 172
@[kworker/12:1]: 185
@[Web Content]: 229
@[Xorg]: 269
@[SCTP timer]: 1566
      
      





@



.







-? :







$ sudo bpftrace -l 'k:*' | wc -l
61106
      
      





. , , kprobe, , , kprobes.







kretprobes



kprobe



kretprobe



. kprobe



, kretporobe



. retval



.







, vfs_write



( ):







$ sudo bpftrace -e 'kr:vfs_write { @ = hist(retval); } i:s:2 { exit(); }'
Attaching 2 probes...

@:
[1]                  606 |@@@@@@@@@@@@@@@@@@@@@@@@@                           |
[2, 4)                 0 |                                                    |
[4, 8)                 0 |                                                    |
[8, 16)             1223 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16, 32)               0 |                                                    |
[32, 64)              25 |@                                                   |
      
      





uprobes uretprobes



, ( ), . kprobes



. bpftrace



uprobes



uretprobes



— .







, ( UID ):







$ sudo bpftrace -e 'uretprobe:/bin/bash:readline { printf("readline: [%d]: \"%s\"\n", uid, str(uptr(retval))); }'
Attaching 1 probe...
readline: [1000]: "echo "hello habr""
readline: [0]: "echo "hello from root""
^C
      
      





, 2



CONFIG_DEBUG_INFO_BTF=y



, kprobes, , bpf trampolines. BTF , . , / .







Tracepoints —



- , , , . tracepoints . :







$ sudo bpftrace -l 't:*'
      
      





, kprobes



:







$ sudo bpftrace -l 't:*' | wc -l
1782
      
      





tracepoints , API: , tracepoint, , . , bpftrace



, tracepoint, :







$ sudo bpftrace -lv tracepoint:thermal:thermal_temperature
tracepoint:thermal:thermal_temperature
    __data_loc char[] thermal_zone;
    int id;
    int temp_prev;
    int temp;
      
      





kprobe



, BTF, , , . BTF kprobes



kfuncs



.







usdt —



. ( ):







#include <sys/sdt.h>
#include <sys/time.h>
#include <unistd.h>

int main(int argc, char **argv)
{
    struct timeval tv;

    for (;;) {
        gettimeofday(&tv, NULL);
        DTRACE_PROBE1(test, probe, tv.tv_sec);
        sleep(1);
    }
}
      
      





test:probe



, tv.tv_sec



— . , systemtap-sdt-dev



( ). , :







$ cc /tmp/test.c -o /tmp/test
$ sudo bpftrace -l 'usdt:/tmp/test'
usdt:/tmp/test:test:probe
      
      





/tmp/test



,







$ sudo bpftrace -e 'usdt:/tmp/test:test:probe { printf("  %u\n", arg0); }'
Attaching 1 probe...
  1612903991
  1612903992
  1612903993
  1612903994
  1612903995
...
      
      





arg0



tv.tv_sec



, breakpoint.







Perf



bpftrace



, Perf. , bpftrace



:







  • software



    : -
  • hardware



    : PMCs
  • interval



    :
  • profile



    :


`software`



perf, :







# bpftrace -l 's:*'
software:alignment-faults:
software:bpf-output:
software:context-switches:
software:cpu-clock:
software:cpu-migrations:
software:dummy:
software:emulation-faults:
software:major-faults:
software:minor-faults:
software:page-faults:
software:task-clock:
      
      





, , ,







# bpftrace -e 'software:cpu-migrations:10 { @[comm] = count(); }'
Attaching 2 probes...
^C@[kworker/u65:1]: 1
@[bpftrace]: 1
@[SCTP timer]: 2
@[Xorg]: 2
      
      





CPU . perf_event_open(2)



, , cpu-migrations



, man- PERF_COUNT_SW_CPU_MIGRATIONS



.







`hardware`



hardware counters, bpftrace



BPF. , :







bpftrace -l 'h*'
hardware:backend-stalls:
hardware:branch-instructions:
hardware:branch-misses:
hardware:bus-cycles:
hardware:cache-misses:
hardware:cache-references:
hardware:cpu-cycles:
hardware:frontend-stalls:
hardware:instructions:
hardware:ref-cycles:
      
      





( , . PERF_COUNT_HW_BRANCH_MISSES



):







bpftrace -e 'hardware:branch-misses:100000 { @[tid] = count(); }'
Attaching 3 probes...
@[1055]: 4
@[3078]: 4
@[1947]: 5
@[1066]: 6
@[2551]: 6
@[0]: 29
      
      





`interval` `profile`



interval



profile



. CPU, — CPU.







, . , :







$ sudo bpftrace -e '
    kr:vfs_write { @ = hist(retval); }
    interval:s:2 { print(@); clear(@); }
'
      
      





vfs_write



, interval



, , @



.







profile



:







# bpftrace -e '
  profile:hz:99 { @[kstack] = count(); }
  i:s:10 { exit(); }
  END { print(@,1); clear(@); }
'
Attaching 3 probes...
@[
    cpuidle_enter_state+202
    cpuidle_enter+46
    call_cpuidle+35
    do_idle+487
    cpu_startup_entry+32
    start_secondary+345
    secondary_startup_64+182
]: 14455
      
      





profile



99 , exit()



, END



@



— ( , , , ).







Bpftrace: tutorial





, bpftrace



:







# bpftrace
      
      





, . , bpftrace



. , bpftrace



. , — .







-l



. . ( - , , .) :







# bpftrace -l '*kill_all*'
kprobe:rfkill_alloc
kprobe:kill_all
kprobe:btrfs_kill_all_delayed_nodes
      
      





tracepoints



:







# bpftrace -l 't:*kill*'
tracepoint:cfg80211:rdev_rfkill_poll
tracepoint:syscalls:sys_enter_kill
tracepoint:syscalls:sys_exit_kill
tracepoint:syscalls:sys_enter_tgkill
tracepoint:syscalls:sys_exit_tgkill
tracepoint:syscalls:sys_enter_tkill
tracepoint:syscalls:sys_exit_tkill
      
      





tracepoint:syscalls



, , bpftrace



. X



:







tracepoint:syscalls:sys_enter_X
tracepoint:syscalls:sys_exit_X
      
      





- , , execve(2)



. , , - tracepoint



-v



, :







# bpftrace -lv 't:s*:sys_*_execve'
tracepoint:syscalls:sys_enter_execve
    int __syscall_nr;
    const char * filename;
    const char *const * argv;
    const char *const * envp;
tracepoint:syscalls:sys_exit_execve
    int __syscall_nr;
    long ret;
      
      





( *



, syscalls



sys_enter_execve



sys_exit_execve



). , -lv



args



:







# bpftrace -e '
    t:s*:sys_enter_execve { printf("ENTER: %s\n", str(uptr(args->filename))); }
    t:s*:sys_exit_execve { printf("EXIT: %s: %d\n", comm, args->ret); }
'
Attaching 2 probes...
ENTER: /bin/ls
EXIT: ls: 0
ENTER: /bin/lss
EXIT: bash: -2
      
      





.







args->filename



. , str



, : , uptr



. bpftrace



, . , , bpftrace -lv



, , , execve ( __user



).







comm



, . args->ret



. , " " , . , - , : /bin/ls



exec



0 ( ls



/bin/ls



), /bin/lss



exec



-2



bash



.







. , . , . uptr



, .







`bpftrace`



bpftrace



awk (. 6 The AWK Programming Language



) .







<probe> <filter> { <action> }
      
      





,







# bpftrace -e 'BEGIN { printf("Hello\n") } END { printf("World\n") }'
      
      





<probe>



BEGIN



END



, <action>



printf



.

<filter>



,

,







# bpftrace -e 'p:s:1 /cpu == 0/ { printf("  CPU%d\n", cpu); }'
Attaching 1 probe...
  CPU0
  CPU0
^C
      
      





, CPU 0.







. , /cpu == 0/



?







<filter>



. , write



. — sys_enter_write



sys_exit_write



:







# cat /tmp/write-times.bt
t:syscalls:sys_enter_write {
  @[tid] = nsecs
}

t:syscalls:sys_exit_write /@[tid]/ {
  @write[comm] = sum((nsecs - @[tid]) / 1000);
  delete(@[tid]);
}

END {
  print(@write, 10);
  clear(@write);
  clear(@);
}
      
      





, . :







# bpftrace /tmp/write-times.bt
      
      





, sys_enter_write



, write



@



, tid



.







, sys_exit_write



, /@[tid]/



, . , , - write



. ( ) @write



@[tid]



.







, ^C



, END



, @write



@



, bpftrace



.







. , /@[tid]/



?







:



bpftrace



C , , :?



, ++



, --



. if {} else {}



. while



unroll



( , BPF ). , .







: . $



, :







# bpftrace -e 'BEGIN { $x = 1; printf("%d\n", ++$x); exit(); }'
# bpftrace -e 'BEGIN { if (1) { $x = 1; } printf("%d\n", ++$x); exit(); }'
      
      





— :







# bpftrace -e 'BEGIN { $x = 1; exit(); } END { printf("%d\n", $x); }'
      
      





, , @



. "" @



, write



(@[tid]



). ( bpftrace



— . , , : . , , .)







, : ? bpftrace



. kprobes



, tracepoints



— , tracepoint, Perf, , — . tracepoints



, kprobes



, Perf.







kprobes



arg0



, arg1



, .. , . :







#include <linux/skbuff.h>
#include <linux/ip.h>

k:icmp_echo {
    $skb = (struct sk_buff *) arg0;
    $iphdr = (struct iphdr *) ($skb->head + $skb->network_header);
    @pingstats[ntop($iphdr->saddr), ntop($iphdr->daddr)]++;
}
      
      





, . kprobe



icmp_echo



, ICMPv4 echo request. , arg0



, — sk_buff



, . IP @pingstats



. , , IP ! , kprobe, user space, .







. , kernel headers. sk_buff



iphdr



, . ( BTF, — , .) icmp_echo



sk_buff



$skb



. $skb



, , , , iphdr



. ntop



bpftrace



, IP .







. . uptr



kptr



. : vfs_write



, — struct file



, <linux/fs.h>



. vfs_write



. (Hint: struct file



kretprobe



?)







. bpftrace



, , , sudo



.









ping



— , CPU. count



. :







# bpftrace -e 'p:hz:5000 { @x++; @y = count(); } i:s:10 { exit(); }'
Attaching 2 probes...
@x: 760528
@y: 799002
      
      





10 5000 16 @x



@y



. ++



. count()



, CPU- : CPU , .







bpftrace



, - . — . .







# bpftrace -e 'kr:vfs_write { @ = hist(retval); } i:s:10 { exit() }'
Attaching 2 probes...

@:
[1]             14966407 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2, 4)                 0 |                                                    |
[4, 8)                 0 |                                                    |
[8, 16)             1670 |                                                    |
[16, 32)               0 |                                                    |
[32, 64)             123 |                                                    |
[64, 128)              0 |                                                    |
[128, 256)             0 |                                                    |
[256, 512)             0 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               0 |                                                    |
[2K, 4K)               0 |                                                    |
[4K, 8K)               0 |                                                    |
[8K, 16K)        7531982 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |
      
      





vfs_write



, , - ! ( 20 ):







# bpftrace -e '
    kr:vfs_write /retval == 1/ { @[pid, comm] = hist(retval); }
    i:s:10 { exit() }
    END { print(@, 1); clear(@); }'
Attaching 3 probes...
@[133835, dd]:
[1]             14254511 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
      
      





, , retval



. , — . , END



, . ? :







# tr '\000' ' ' < /proc/133835/cmdline
dd if=/dev/zero of=/dev/null bs=1
      
      





. bpftrace



, .







: flame graphs



profile



. . , profile



:







profile:hz:rate
profile:s:rate
profile:ms:rate
profile:us:rate
      
      





HZ, — , , .







, , , :







bpftrace -e 'profile:hz:99 { print(kstack); exit() }'
Attaching 1 probe...

        cpuidle_enter_state+202
        cpuidle_enter+46
        call_cpuidle+35
        do_idle+487
        cpu_startup_entry+32
        rest_init+174
        arch_call_rest_init+14
        start_kernel+1724
        x86_64_start_reservations+36
        x86_64_start_kernel+139
        secondary_startup_64+182
      
      





-.







, , . flame graphs, .







, bpftrace



:







# bpftrace -e 'profile:hz:333 { @[kstack] = count(); } i:s:10 { exit() }' > /tmp/raw
# wc -l /tmp/raw
3374 /tmp/raw
      
      





333 ( kstack



@



, kstack



— ).







FlameGraph :







$ git clone https://github.com/brendangregg/FlameGraph.git
$ cd FlameGraph
$ ./stackcollapse-bpftrace.pl /tmp/raw > /tmp/ready
$ ./flamegraph.pl /tmp/ready > /tmp/kstack.svg
      
      





bpftrace



, ( , gist SVG):













, CPU dd



, /dev/zero



/dev/null



. , .







. bpftrace



. FlameGraph



.









, bpftrace



. , .







BCC:



BCC , BPF — 100 . — , , Netflix, Facebook . . , .







libbcc



libbpf



, -C , . , . .







BCC - . , bpfcc-tools



. . ,







# funccount-bpfcc 'vfs_*' -d 5
Tracing 67 functions for "b'vfs_*'"... Hit Ctrl-C to end.

FUNC                                    COUNT
b'vfs_statfs'                               1
b'vfs_unlink'                               1
b'vfs_lock_file'                            2
b'vfs_fallocate'                           31
b'vfs_statx_fd'                            32
b'vfs_getattr'                             80
b'vfs_getattr_nosec'                       88
b'vfs_open'                               108
b'vfs_statx'                              174
b'vfs_writev'                            2789
b'vfs_write'                             6554
b'vfs_read'                              7363
Detaching...
      
      





vfs_



. -p



, PID . , mplayer



, :







# funccount-bpfcc 'vfs_*' -d 5 -p 29380
Tracing 67 functions for "b'vfs_*'"... Hit Ctrl-C to end.

FUNC                                    COUNT
b'vfs_write'                              208
b'vfs_read'                               629
Detaching...
      
      





BCC



BCC. mutex_lock



mutex_unlock



. , .







#! /usr/bin/python3

from bcc import BPF
from ctypes import c_int
from time import sleep, strftime
from sys import argv

b = BPF(text="""
BPF_PERCPU_ARRAY(mutex_stats, u64, 2);

static inline void inc(int key)
{
    u64 *x = mutex_stats.lookup(&key);
    if (x)
        *x += 1;
}

void do_lock(struct pt_regs *ctx) { inc(0); }
void do_unlock(struct pt_regs *ctx) { inc(1); }
""")

b.attach_kprobe(event="mutex_lock", fn_name="do_lock")
b.attach_kprobe(event="mutex_unlock", fn_name="do_unlock")

print("%-8s%10s%10s" % ("TIME", "LOCKED", "UNLOCKED"))

while 2 * 2 == 4:

    try:
        sleep(1)
    except KeyboardInterrupt:
        exit()

    print("%-8s%10d%10d" % (
          strftime("%H:%M:%S"),
          b["mutex_stats"].sum(0).value,
          b["mutex_stats"].sum(1).value))

    b["mutex_stats"].clear()
      
      





. , — BPF



:







from bcc import BPF
      
      





BPF, . BPF



-C.







BPF_PERCPU_ARRAY(mutex_stats, u64, 2);

static inline void inc(int key)
{
    u64 *x = mutex_stats.lookup(&key);
    if (x)
        *x += 1;
}

void do_lock(struct pt_regs *ctx)   { inc(0); }
void do_unlock(struct pt_regs *ctx) { inc(1); }
      
      





C, , BPF



C.

, mutex_stats



u64



, . , PERCPU , , CPU . inc



, mutex_stats



. .

, do_lock



do_unlock



.







BPF



libllvm



, , . kprobes



:







b.attach_kprobe(event="mutex_lock", fn_name="do_lock")
b.attach_kprobe(event="mutex_unlock", fn_name="do_unlock")
      
      





:







print("%-8s%10s%10s" % ("TIME", "LOCKED", "UNLOCKED"))

while 2 * 2 == 4:

    try:
        sleep(1)
    except KeyboardInterrupt:
        exit()

    print("%-8s%10d%10d" % (
          strftime("%H:%M:%S"),
          b["mutex_stats"].sum(0).value,
          b["mutex_stats"].sum(1).value))

    b["mutex_stats"].clear()
      
      





mutex_stats



. sum



mutex_stats



, CPU:







sum(index) {
    result = 0
      CPU {
        result += cpu->mutex_stats[index]
    }
    return result
}
      
      





. :







$ sudo ./bcc-tool-example
TIME        LOCKED  UNLOCKED
18:06:33     11382     12993
18:06:34     11200     12783
18:06:35     18597     22553
18:06:36     20776     25516
18:06:37     59453     68201
18:06:38     49282     58064
18:06:39     25541     27428
18:06:40     22094     25280
18:06:41      5539      7250
18:06:42      5662      7351
18:06:43      5205      6905
18:06:44      6733      8438
      
      





- 18:06:35 youtube Firefox, youtube 18:06:40 . , , youtube .







, C, libbpf



CO-RE. libbpf



, libllvm



, , . , BCC libbpf



+ CO-RE BCC, . libbpf-tools. BCC to libbpf conversion guide ( ).







Ply: bpftrace



ply



, Tobias Waldekranz 2015 , bpftrace



. awk- BPF, ,







ply 'tracepoint:tcp/tcp_receive_reset {
    printf("saddr:%v port:%v->%v\n", data->saddr, data->sport, data->dport);
}'
      
      





ply



: libc



(). , BPF . , , ply



ply script language -> BPF



.







, ply



, , — ply



, , . ply



, , , , C libbpf



— , ., , Building BPF applications with libbpf-bootstrap.









:









bpftrace



, BCC Linux:







  • The bpftrace One-Liner Tutorial. bpftrace



    , . , , .







  • bpftrace Reference Guide. , bpftrace



    , . , bpftrace



    .







  • BCC Tutorial. bpftrace



    ( libbpf



    ), BCC, BCC Reference Guide, , .







  • Brendan Gregg, «BPF Performance Tools». BPF Linux — BCC. BPF .







  • Brendan Gregg, «Systems Performance: Enterprise and the Cloud, 2nd Edition (2020)». «Systems Performance». : BPF, Solaris, . «BPF Performance Tools» «?», «?», ( BPF ).










All Articles