これは、(「小さなもののためのBPF」シリーズの別の記事で0、1、2)と近代的なツールでなぞるLinux上での実用的な一連の記事の最初の。
それから、プログラムと言語について学びますbpftrace
。BPFについてまったく何も知らなくても、実用的な観点からBPFの世界に飛び込む最も簡単な方法です。このユーティリティをbpftrace
使用すると、コマンドラインから直接単純な言語を使用してハンドラプログラムを作成し、それらを膨大な数のカーネルおよびユーザースペースイベントに接続できます。以下のKPDVを見てください...おめでとうございます、あなたはすでにシステムコールを追跡する方法を知っていますbpftrace
!
シリーズの以前の記事とは異なり、これは比較的短く、その主要部分はチュートリアルの形式で書かれているため、5分間読んだ後、コマンドから直接Linuxカーネルのイベントに関するハンドラーを作成して統計を収集できますライン。記事の最後では、代替案ply
とBCCについて説明しています。「5分間読んだ後」というフレーズの分は、Mercurialを意味します。ユニークなスキルは、5分間読んだ後に現れるとは限りません。
コンテンツ
bpftraceのインストール
. sudo apt install bpftrace
( ). bpftrace
, .
. bpftrace
, , , , BTF. , bpftrace
.
master- bpftrace docker image bpftrace
. , :
$ docker pull quay.io/iovisor/bpftrace:latest
$ cd /tmp
$ docker run -v $(pwd):/o quay.io/iovisor/bpftrace:latest /bin/bash -c "cp /usr/bin/bpftrace /o"
$ sudo ./bpftrace -V
bpftrace v0.11.4
bpftrace glibc
, docker image glibc
.
, . , execve(2)
, :
$ sudo ./bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%s -> %s\n", comm, str(uptr(args->filename))); }'
Attaching 1 probe...
bash -> /bin/echo
bash -> /usr/bin/ls
gnome-shell -> /bin/sh
sh -> /home/aspsk/bin/geeqie
sh -> /usr/local/sbin/geeqie
...
, , , :
$ sudo mv /tmp/bpftrace /usr/local/bin
, , , BTF, bpftrace
, docker . bpftrace
. .
. bpftrace
/ BTF, kernel headers. , bpftrace
.
BTF, , BTF kernel headers, , , . BTF , .
?
. , , , . , . : « ?!» , ,
$ sudo apt install bpftrace The following NEW packages will be installed: bpftrace Preparing to unpack .../bpftrace_0.11.0-1_amd64.deb ... Unpacking bpftrace (0.11.0-1) ... Setting up bpftrace (0.11.0-1) ... Processing triggers for man-db (2.9.3-2) ... $
, bpftrace
. ? , bpftrace
. - :
- :
BEGIN
,END
- , kprobes:
kprobe
,kretprobe
,uprobe
,uretprobe
- , BPF trampolines:
kfunc
,kretfunc
- , tracepoints:
tracepoint
- :
usdt
- , perf:
software
,hardware
,profile
,interval
,watchpoint
Bpftrace: hello world
bpftrace
awk , BEGIN
END
, bpftrace
, . :
# bpftrace -e 'BEGIN { printf("Hello world!\n"); }' Attaching 1 probe... Hello world! ^C
"Hello world!"
. , Ctrl-C
, bpftrace
— . bpftrace
exit
. , END
:
# bpftrace -e ' BEGIN { printf("Hello world!\n"); exit(); } END { printf("So long\n"); } ' Attaching 2 probes... Hello world! So long
Kprobes —
— , , , , kprobes (Kernel Probe — ) , , . , , kprobes
, , , , Linux BPF.
, schedule
:
$ sudo bpftrace -e '
k:schedule { @[comm] = count(); }
i:s:2 { exit();}
END { print(@, 10); clear(@); }
'
Attaching 3 probes...
@[Timer]: 147
@[kworker/u65:0]: 147
@[kworker/7:1]: 153
@[kworker/13:1]: 158
@[IPC I/O Child]: 170
@[IPC I/O Parent]: 172
@[kworker/12:1]: 185
@[Web Content]: 229
@[Xorg]: 269
@[SCTP timer]: 1566
@
.
-? :
$ sudo bpftrace -l 'k:*' | wc -l 61106
. , , kprobe, , , kprobes.
kretprobes
kprobe
kretprobe
. kprobe
, kretporobe
. retval
.
, vfs_write
( ):
$ sudo bpftrace -e 'kr:vfs_write { @ = hist(retval); } i:s:2 { exit(); }' Attaching 2 probes... @: [1] 606 |@@@@@@@@@@@@@@@@@@@@@@@@@ | [2, 4) 0 | | [4, 8) 0 | | [8, 16) 1223 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [16, 32) 0 | | [32, 64) 25 |@ |
uprobes uretprobes
, ( ), . kprobes
. bpftrace
uprobes
uretprobes
— .
, ( UID ):
$ sudo bpftrace -e 'uretprobe:/bin/bash:readline { printf("readline: [%d]: \"%s\"\n", uid, str(uptr(retval))); }' Attaching 1 probe... readline: [1000]: "echo "hello habr"" readline: [0]: "echo "hello from root"" ^C
, 2
CONFIG_DEBUG_INFO_BTF=y
, kprobes, , bpf trampolines. BTF , . , / .
Tracepoints —
- , , , . tracepoints . :
$ sudo bpftrace -l 't:*'
, kprobes
:
$ sudo bpftrace -l 't:*' | wc -l 1782
tracepoints , API: , tracepoint, , . , bpftrace
, tracepoint, :
$ sudo bpftrace -lv tracepoint:thermal:thermal_temperature tracepoint:thermal:thermal_temperature __data_loc char[] thermal_zone; int id; int temp_prev; int temp;
kprobe
, BTF, , , . BTF kprobes
kfuncs
.
usdt —
#include <sys/sdt.h> #include <sys/time.h> #include <unistd.h> int main(int argc, char **argv) { struct timeval tv; for (;;) { gettimeofday(&tv, NULL); DTRACE_PROBE1(test, probe, tv.tv_sec); sleep(1); } }
test:probe
, tv.tv_sec
— . , systemtap-sdt-dev
( ). , :
$ cc /tmp/test.c -o /tmp/test $ sudo bpftrace -l 'usdt:/tmp/test' usdt:/tmp/test:test:probe
/tmp/test
,
$ sudo bpftrace -e 'usdt:/tmp/test:test:probe { printf(" %u\n", arg0); }' Attaching 1 probe... 1612903991 1612903992 1612903993 1612903994 1612903995 ...
arg0
— tv.tv_sec
, breakpoint.
Perf
bpftrace
, Perf. , bpftrace
:
software
: -hardware
: PMCsinterval
:profile
:
`software`
perf, :
# bpftrace -l 's:*' software:alignment-faults: software:bpf-output: software:context-switches: software:cpu-clock: software:cpu-migrations: software:dummy: software:emulation-faults: software:major-faults: software:minor-faults: software:page-faults: software:task-clock:
, , ,
# bpftrace -e 'software:cpu-migrations:10 { @[comm] = count(); }' Attaching 2 probes... ^C@[kworker/u65:1]: 1 @[bpftrace]: 1 @[SCTP timer]: 2 @[Xorg]: 2
CPU . perf_event_open(2)
, , cpu-migrations
, man- PERF_COUNT_SW_CPU_MIGRATIONS
.
`hardware`
hardware counters, bpftrace
BPF. , :
bpftrace -l 'h*' hardware:backend-stalls: hardware:branch-instructions: hardware:branch-misses: hardware:bus-cycles: hardware:cache-misses: hardware:cache-references: hardware:cpu-cycles: hardware:frontend-stalls: hardware:instructions: hardware:ref-cycles:
( , . PERF_COUNT_HW_BRANCH_MISSES
):
bpftrace -e 'hardware:branch-misses:100000 { @[tid] = count(); }' Attaching 3 probes... @[1055]: 4 @[3078]: 4 @[1947]: 5 @[1066]: 6 @[2551]: 6 @[0]: 29
`interval` `profile`
interval
profile
. CPU, — CPU.
, . , :
$ sudo bpftrace -e ' kr:vfs_write { @ = hist(retval); } interval:s:2 { print(@); clear(@); } '
vfs_write
, interval
, , @
.
profile
:
# bpftrace -e ' profile:hz:99 { @[kstack] = count(); } i:s:10 { exit(); } END { print(@,1); clear(@); } ' Attaching 3 probes... @[ cpuidle_enter_state+202 cpuidle_enter+46 call_cpuidle+35 do_idle+487 cpu_startup_entry+32 start_secondary+345 secondary_startup_64+182 ]: 14455
profile
99 , exit()
, END
@
— ( , , , ).
Bpftrace: tutorial
, bpftrace
:
# bpftrace
, . , bpftrace
. , bpftrace
. , — .
-l
. . ( - , , .) :
# bpftrace -l '*kill_all*' kprobe:rfkill_alloc kprobe:kill_all kprobe:btrfs_kill_all_delayed_nodes
tracepoints
:
# bpftrace -l 't:*kill*' tracepoint:cfg80211:rdev_rfkill_poll tracepoint:syscalls:sys_enter_kill tracepoint:syscalls:sys_exit_kill tracepoint:syscalls:sys_enter_tgkill tracepoint:syscalls:sys_exit_tgkill tracepoint:syscalls:sys_enter_tkill tracepoint:syscalls:sys_exit_tkill
tracepoint:syscalls
, , bpftrace
. X
:
tracepoint:syscalls:sys_enter_X tracepoint:syscalls:sys_exit_X
- , , execve(2)
. , , - tracepoint
-v
, :
# bpftrace -lv 't:s*:sys_*_execve' tracepoint:syscalls:sys_enter_execve int __syscall_nr; const char * filename; const char *const * argv; const char *const * envp; tracepoint:syscalls:sys_exit_execve int __syscall_nr; long ret;
( *
, syscalls
sys_enter_execve
sys_exit_execve
). , -lv
args
:
# bpftrace -e ' t:s*:sys_enter_execve { printf("ENTER: %s\n", str(uptr(args->filename))); } t:s*:sys_exit_execve { printf("EXIT: %s: %d\n", comm, args->ret); } ' Attaching 2 probes... ENTER: /bin/ls EXIT: ls: 0 ENTER: /bin/lss EXIT: bash: -2
.
args->filename
. , str
, : , uptr
. bpftrace
, . , , bpftrace -lv
, , , execve ( __user
).
comm
, . args->ret
. , " " , . , - , : /bin/ls
exec
0 ( ls
/bin/ls
), /bin/lss
exec
-2
bash
.
. , . , . uptr
, .
`bpftrace`
bpftrace
awk (. 6 The AWK Programming Language
) .
<probe> <filter> { <action> }
,
# bpftrace -e 'BEGIN { printf("Hello\n") } END { printf("World\n") }'
<probe>
— BEGIN
END
, <action>
— printf
.
<filter>
,
,
# bpftrace -e 'p:s:1 /cpu == 0/ { printf(" CPU%d\n", cpu); }' Attaching 1 probe... CPU0 CPU0 ^C
, CPU 0.
. , /cpu == 0/
?
<filter>
. , write
. — sys_enter_write
sys_exit_write
:
# cat /tmp/write-times.bt t:syscalls:sys_enter_write { @[tid] = nsecs } t:syscalls:sys_exit_write /@[tid]/ { @write[comm] = sum((nsecs - @[tid]) / 1000); delete(@[tid]); } END { print(@write, 10); clear(@write); clear(@); }
, . :
# bpftrace /tmp/write-times.bt
, sys_enter_write
, write
@
, tid
.
, sys_exit_write
, /@[tid]/
, . , , - write
. ( ) @write
@[tid]
.
, ^C
, END
, @write
@
, bpftrace
.
. , /@[tid]/
?
:
bpftrace
C , , :?
, ++
, --
. if {} else {}
. while
unroll
( , BPF ). , .
: . $
, :
# bpftrace -e 'BEGIN { $x = 1; printf("%d\n", ++$x); exit(); }' # bpftrace -e 'BEGIN { if (1) { $x = 1; } printf("%d\n", ++$x); exit(); }'
— :
# bpftrace -e 'BEGIN { $x = 1; exit(); } END { printf("%d\n", $x); }'
, , @
. "" @
, write
(@[tid]
). ( bpftrace
— . , , : . , , .)
, : ? bpftrace
. kprobes
, tracepoints
— , tracepoint, Perf, , — . tracepoints
, kprobes
, Perf.
kprobes
arg0
, arg1
, .. , . :
#include <linux/skbuff.h> #include <linux/ip.h> k:icmp_echo { $skb = (struct sk_buff *) arg0; $iphdr = (struct iphdr *) ($skb->head + $skb->network_header); @pingstats[ntop($iphdr->saddr), ntop($iphdr->daddr)]++; }
, . kprobe
icmp_echo
, ICMPv4 echo request. , arg0
, — sk_buff
, . IP @pingstats
. , , IP ! , kprobe, user space, .
. , kernel headers. sk_buff
iphdr
, . ( BTF, — , .) icmp_echo
sk_buff
$skb
. $skb
, , , , iphdr
. ntop
bpftrace
, IP .
. . uptr
kptr
. : vfs_write
, — struct file
, <linux/fs.h>
. vfs_write
. (Hint: struct file
kretprobe
?)
. bpftrace
, , , sudo
.
ping
— , CPU. count
. :
# bpftrace -e 'p:hz:5000 { @x++; @y = count(); } i:s:10 { exit(); }' Attaching 2 probes... @x: 760528 @y: 799002
10 5000 16 @x
@y
. ++
. count()
, CPU- : CPU , .
bpftrace
, - . — . .
# bpftrace -e 'kr:vfs_write { @ = hist(retval); } i:s:10 { exit() }' Attaching 2 probes... @: [1] 14966407 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [2, 4) 0 | | [4, 8) 0 | | [8, 16) 1670 | | [16, 32) 0 | | [32, 64) 123 | | [64, 128) 0 | | [128, 256) 0 | | [256, 512) 0 | | [512, 1K) 0 | | [1K, 2K) 0 | | [2K, 4K) 0 | | [4K, 8K) 0 | | [8K, 16K) 7531982 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
vfs_write
, , - ! ( 20 ):
# bpftrace -e ' kr:vfs_write /retval == 1/ { @[pid, comm] = hist(retval); } i:s:10 { exit() } END { print(@, 1); clear(@); }' Attaching 3 probes... @[133835, dd]: [1] 14254511 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
, , retval
. , — . , END
, . ? :
# tr '\000' ' ' < /proc/133835/cmdline dd if=/dev/zero of=/dev/null bs=1
. bpftrace
, .
: flame graphs
profile
. . , profile
:
profile:hz:rate profile:s:rate profile:ms:rate profile:us:rate
HZ, — , , .
, , , :
bpftrace -e 'profile:hz:99 { print(kstack); exit() }' Attaching 1 probe... cpuidle_enter_state+202 cpuidle_enter+46 call_cpuidle+35 do_idle+487 cpu_startup_entry+32 rest_init+174 arch_call_rest_init+14 start_kernel+1724 x86_64_start_reservations+36 x86_64_start_kernel+139 secondary_startup_64+182
-.
, , . flame graphs, .
, bpftrace
:
# bpftrace -e 'profile:hz:333 { @[kstack] = count(); } i:s:10 { exit() }' > /tmp/raw # wc -l /tmp/raw 3374 /tmp/raw
333 ( kstack
@
, kstack
— ).
FlameGraph :
$ git clone https://github.com/brendangregg/FlameGraph.git $ cd FlameGraph $ ./stackcollapse-bpftrace.pl /tmp/raw > /tmp/ready $ ./flamegraph.pl /tmp/ready > /tmp/kstack.svg
bpftrace
, ( , gist SVG):
, CPU dd
, /dev/zero
/dev/null
. , .
. bpftrace
. FlameGraph
.
, bpftrace
. , .
BCC:
BCC , BPF — 100 . — , , Netflix, Facebook . . , .
libbcc
libbpf
, -C , . , . .
BCC - . , bpfcc-tools
. . ,
# funccount-bpfcc 'vfs_*' -d 5 Tracing 67 functions for "b'vfs_*'"... Hit Ctrl-C to end. FUNC COUNT b'vfs_statfs' 1 b'vfs_unlink' 1 b'vfs_lock_file' 2 b'vfs_fallocate' 31 b'vfs_statx_fd' 32 b'vfs_getattr' 80 b'vfs_getattr_nosec' 88 b'vfs_open' 108 b'vfs_statx' 174 b'vfs_writev' 2789 b'vfs_write' 6554 b'vfs_read' 7363 Detaching...
vfs_
. -p
, PID . , mplayer
, :
# funccount-bpfcc 'vfs_*' -d 5 -p 29380 Tracing 67 functions for "b'vfs_*'"... Hit Ctrl-C to end. FUNC COUNT b'vfs_write' 208 b'vfs_read' 629 Detaching...
BCC
BCC. mutex_lock
mutex_unlock
. , .
#! /usr/bin/python3
from bcc import BPF
from ctypes import c_int
from time import sleep, strftime
from sys import argv
b = BPF(text="""
BPF_PERCPU_ARRAY(mutex_stats, u64, 2);
static inline void inc(int key)
{
u64 *x = mutex_stats.lookup(&key);
if (x)
*x += 1;
}
void do_lock(struct pt_regs *ctx) { inc(0); }
void do_unlock(struct pt_regs *ctx) { inc(1); }
""")
b.attach_kprobe(event="mutex_lock", fn_name="do_lock")
b.attach_kprobe(event="mutex_unlock", fn_name="do_unlock")
print("%-8s%10s%10s" % ("TIME", "LOCKED", "UNLOCKED"))
while 2 * 2 == 4:
try:
sleep(1)
except KeyboardInterrupt:
exit()
print("%-8s%10d%10d" % (
strftime("%H:%M:%S"),
b["mutex_stats"].sum(0).value,
b["mutex_stats"].sum(1).value))
b["mutex_stats"].clear()
. , — BPF
:
from bcc import BPF
BPF, . BPF
-C.
BPF_PERCPU_ARRAY(mutex_stats, u64, 2); static inline void inc(int key) { u64 *x = mutex_stats.lookup(&key); if (x) *x += 1; } void do_lock(struct pt_regs *ctx) { inc(0); } void do_unlock(struct pt_regs *ctx) { inc(1); }
C, , BPF
C.
, mutex_stats
u64
, . , PERCPU , , CPU . inc
, mutex_stats
. .
, do_lock
do_unlock
.
— BPF
libllvm
, , . kprobes
:
b.attach_kprobe(event="mutex_lock", fn_name="do_lock")
b.attach_kprobe(event="mutex_unlock", fn_name="do_unlock")
:
print("%-8s%10s%10s" % ("TIME", "LOCKED", "UNLOCKED"))
while 2 * 2 == 4:
try:
sleep(1)
except KeyboardInterrupt:
exit()
print("%-8s%10d%10d" % (
strftime("%H:%M:%S"),
b["mutex_stats"].sum(0).value,
b["mutex_stats"].sum(1).value))
b["mutex_stats"].clear()
mutex_stats
. sum
mutex_stats
, CPU:
sum(index) { result = 0 CPU { result += cpu->mutex_stats[index] } return result }
. :
$ sudo ./bcc-tool-example TIME LOCKED UNLOCKED 18:06:33 11382 12993 18:06:34 11200 12783 18:06:35 18597 22553 18:06:36 20776 25516 18:06:37 59453 68201 18:06:38 49282 58064 18:06:39 25541 27428 18:06:40 22094 25280 18:06:41 5539 7250 18:06:42 5662 7351 18:06:43 5205 6905 18:06:44 6733 8438
- 18:06:35 youtube Firefox, youtube 18:06:40 . , , youtube .
, C, libbpf
CO-RE. libbpf
, libllvm
, , . , BCC libbpf
+ CO-RE BCC, . libbpf-tools. BCC to libbpf conversion guide ( ).
Ply: bpftrace
ply
, Tobias Waldekranz 2015 , bpftrace
. awk- BPF, ,
ply 'tracepoint:tcp/tcp_receive_reset { printf("saddr:%v port:%v->%v\n", data->saddr, data->sport, data->dport); }'
ply
: libc
(). , BPF . , , ply
ply script language -> BPF
.
, ply
, , — ply
, , . ply
, , , , C libbpf
— , ., , Building BPF applications with libbpf-bootstrap.
:
bpftrace
, BCC Linux:
The bpftrace One-Liner Tutorial.
bpftrace
, . , , .
bpftrace Reference Guide. ,
bpftrace
, . ,bpftrace
.
BCC Tutorial.
bpftrace
(libbpf
), BCC, BCC Reference Guide, , .
Brendan Gregg, «BPF Performance Tools». BPF Linux — BCC. BPF .
Brendan Gregg, «Systems Performance: Enterprise and the Cloud, 2nd Edition (2020)». «Systems Performance». : BPF, Solaris, . «BPF Performance Tools» «?», «?», ( BPF ).