Compare commits

...

52 Commits

Author SHA1 Message Date
Andrii Nakryiko
5acba1722d sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   561c80369df0733ba0574882a1635287b20f9de2
Checkpoint bpf-next commit: 21aeabb68258ce17b91af113a768760b3a491d93
Baseline bpf commit:        561c80369df0733ba0574882a1635287b20f9de2
Checkpoint bpf commit:      27861fc720be2c39b861d8bdfb68287f54de6855

Cryolitia PukNgae (1):
  libbpf: Add documentation to version and error API functions

Mykyta Yatsenko (1):
  libbpf: Export bpf_object__prepare symbol

Yureka Lilian (1):
  libbpf: Fix reuse of DEVMAP

 src/libbpf.c | 10 ++++++++++
 src/libbpf.h | 27 ++++++++++++++++++++++++++-
 2 files changed, 36 insertions(+), 1 deletion(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-08-21 13:39:29 -07:00
Andrii Nakryiko
72f3e4fd8e sync: update .mailmap
Update .mailmap based on libbpf's list of contributors and on the latest
.mailmap version in the upstream repository.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-08-21 13:39:29 -07:00
Cryolitia PukNgae
83fe1f4494 libbpf: Add documentation to version and error API functions
Add documentation for the following API functions:

- libbpf_major_version()
- libbpf_minor_version()
- libbpf_version_string()
- libbpf_strerror()

Signed-off-by: Cryolitia PukNgae <cryolitia@uniontech.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250820-libbpf-doc-1-v1-1-13841f25a134@uniontech.com
2025-08-21 13:39:29 -07:00
Mykyta Yatsenko
9f8984fba5 libbpf: Export bpf_object__prepare symbol
Add missing LIBBPF_API macro for bpf_object__prepare function to enable
its export. libbpf.map had bpf_object__prepare already listed.

Fixes: 1315c28ed809 ("libbpf: Split bpf object load into prepare/load")
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250819215119.37795-1-mykyta.yatsenko5@gmail.com
2025-08-21 13:39:29 -07:00
Yureka Lilian
04a23358c7 libbpf: Fix reuse of DEVMAP
Previously, re-using pinned DEVMAP maps would always fail, because
get_map_info on a DEVMAP always returns flags with BPF_F_RDONLY_PROG set,
but BPF_F_RDONLY_PROG being set on a map during creation is invalid.

Thus, ignore the BPF_F_RDONLY_PROG flag in the flags returned from
get_map_info when checking for compatibility with an existing DEVMAP.

The same problem is handled in a third-party ebpf library:
- https://github.com/cilium/ebpf/issues/925
- https://github.com/cilium/ebpf/pull/930

Fixes: 0cdbb4b09a06 ("devmap: Allow map lookups from eBPF")
Signed-off-by: Yureka Lilian <yuka@yuka.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250814180113.1245565-3-yuka@yuka.dev
2025-08-21 13:39:29 -07:00
Ilya Leoshkevich
fc687b8ee9 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   42be23e8f2dcb100cb9944b2b54b6bf41aff943d
Checkpoint bpf-next commit: 561c80369df0733ba0574882a1635287b20f9de2
Baseline bpf commit:        0238c45fbbf8228f52aa4642f0cdc21c570d1dfe
Checkpoint bpf commit:      561c80369df0733ba0574882a1635287b20f9de2

Achill Gilgenast (1):
  libbpf: Avoid possible use of uninitialized mod_len

Ilya Leoshkevich (1):
  libbpf: Add the ability to suppress perf event enablement

Jason Xing (1):
  net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt

Samiullah Khawaja (2):
  Add support to set NAPI threaded for individual NAPI
  net: define an enum for the napi threaded state

 include/uapi/linux/if_xdp.h |  1 +
 include/uapi/linux/netdev.h |  6 ++++++
 src/libbpf.c                | 15 +++++++++------
 src/libbpf.h                |  4 +++-
 4 files changed, 19 insertions(+), 7 deletions(-)

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2025-08-12 12:51:17 -07:00
Ilya Leoshkevich
a3e0234f49 sync: update .mailmap
Update .mailmap based on libbpf's list of contributors and on the latest
.mailmap version in the upstream repository.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2025-08-12 12:51:17 -07:00
Ilya Leoshkevich
8ce18b6b73 libbpf: Add the ability to suppress perf event enablement
Automatically enabling a perf event after attaching a BPF prog to it is
not always desirable.

Add a new "dont_enable" field to struct bpf_perf_event_opts. While
introducing "enable" instead would be nicer in that it would avoid
a double negation in the implementation, it would make
DECLARE_LIBBPF_OPTS() less efficient.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Suggested-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Co-developed-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250806162417.19666-2-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-12 12:51:17 -07:00
Achill Gilgenast
df60ff2a29 libbpf: Avoid possible use of uninitialized mod_len
Though mod_len is only read when mod_name != NULL and both are initialized
together, gcc15 produces a warning with -Werror=maybe-uninitialized:

libbpf.c: In function 'find_kernel_btf_id.constprop':
libbpf.c:10100:33: error: 'mod_len' may be used uninitialized [-Werror=maybe-uninitialized]
10100 |                 if (mod_name && strncmp(mod->name, mod_name, mod_len) != 0)
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
libbpf.c:10070:21: note: 'mod_len' was declared here
10070 |         int ret, i, mod_len;
      |                     ^~~~~~~

Silence the false positive.

Signed-off-by: Achill Gilgenast <fossdd@pwned.life>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250729094611.2065713-1-fossdd@pwned.life
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-12 12:51:17 -07:00
Samiullah Khawaja
a547f98fbb net: define an enum for the napi threaded state
Instead of using '0' and '1' for napi threaded state use an enum with
'disabled' and 'enabled' states.

Tested:
 ./tools/testing/selftests/net/nl_netdev.py
 TAP version 13
 1..7
 ok 1 nl_netdev.empty_check
 ok 2 nl_netdev.lo_check
 ok 3 nl_netdev.page_pool_check
 ok 4 nl_netdev.napi_list_check
 ok 5 nl_netdev.dev_set_threaded
 ok 6 nl_netdev.napi_set_threaded
 ok 7 nl_netdev.nsim_rxq_reset_down
 # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Link: https://patch.msgid.link/20250723013031.2911384-4-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-12 12:51:17 -07:00
Samiullah Khawaja
4c853bf66f Add support to set NAPI threaded for individual NAPI
A net device has a threaded sysctl that can be used to enable threaded
NAPI polling on all of the NAPI contexts under that device. Allow
enabling threaded NAPI polling at individual NAPI level using netlink.

Extend the netlink operation `napi-set` and allow setting the threaded
attribute of a NAPI. This will enable the threaded polling on a NAPI
context.

Add a test in `nl_netdev.py` that verifies various cases of threaded
NAPI being set at NAPI and at device level.

Tested
 ./tools/testing/selftests/net/nl_netdev.py
 TAP version 13
 1..7
 ok 1 nl_netdev.empty_check
 ok 2 nl_netdev.lo_check
 ok 3 nl_netdev.page_pool_check
 ok 4 nl_netdev.napi_list_check
 ok 5 nl_netdev.dev_set_threaded
 ok 6 nl_netdev.napi_set_threaded
 ok 7 nl_netdev.nsim_rxq_reset_down
 # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250710211203.3979655-1-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-12 12:51:17 -07:00
Jason Xing
77af22f93d net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt
This patch provides a setsockopt method to let applications leverage to
adjust how many descs to be handled at most in one send syscall. It
mitigates the situation where the default value (32) that is too small
leads to higher frequency of triggering send syscall.

Considering the prosperity/complexity the applications have, there is no
absolutely ideal suggestion fitting all cases. So keep 32 as its default
value like before.

The patch does the following things:
- Add XDP_MAX_TX_SKB_BUDGET socket option.
- Set max_tx_budget to 32 by default in the initialization phase as a
  per-socket granular control.
- Set the range of max_tx_budget as [32, xs->tx->nentries].

The idea behind this comes out of real workloads in production. We use a
user-level stack with xsk support to accelerate sending packets and
minimize triggering syscalls. When the packets are aggregated, it's not
hard to hit the upper bound (namely, 32). The moment user-space stack
fetches the -EAGAIN error number passed from sendto(), it will loop to try
again until all the expected descs from tx ring are sent out to the driver.
Enlarging the XDP_MAX_TX_SKB_BUDGET value contributes to less frequency of
sendto() and higher throughput/PPS.

Here is what I did in production, along with some numbers as follows:
For one application I saw lately, I suggested using 128 as max_tx_budget
because I saw two limitations without changing any default configuration:
1) XDP_MAX_TX_SKB_BUDGET, 2) socket sndbuf which is 212992 decided by
net.core.wmem_default. As to XDP_MAX_TX_SKB_BUDGET, the scenario behind
this was I counted how many descs are transmitted to the driver at one
time of sendto() based on [1] patch and then I calculated the
possibility of hitting the upper bound. Finally I chose 128 as a
suitable value because 1) it covers most of the cases, 2) a higher
number would not bring evident results. After twisting the parameters,
a stable improvement of around 4% for both PPS and throughput and less
resources consumption were found to be observed by strace -c -p xxx:
1) %time was decreased by 7.8%
2) error counter was decreased from 18367 to 572

[1]: https://lore.kernel.org/all/20250619093641.70700-1-kerneljasonxing@gmail.com/

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20250704160138.48677-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-12 12:51:17 -07:00
Eduard Zingerman
58dd1f58b5 docs: describe how to reproduce errors reported by oss-fuzz
Add a description for current oss-fuzz setup and write down the
commands needed to reproduce fuzzer reported errors:
- "Official way" in case exact oss-fuzz environment is necessary.
- "Simple way" for local tinkering.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2025-07-18 17:23:16 -07:00
Eduard Zingerman
dac1ec64a3 scripts: allow skipping elfutils rebuild in build-fuzzers.sh
This simplifies local reproduction of fuzzer reported errors.
E.g. the following sequence of commands would execute much faster on a
second run:

  $ SKIP_LIBELF_REBUILD=1 scripts/build-fuzzers.sh
  $ out/bpf-object-fuzzer <path-to-test-case>

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2025-07-18 17:23:08 -07:00
Andrii Nakryiko
9823ef295d libbpf: dump Makefile version to 1.7.0
With new development cycle comes updated Makefile.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-18 17:20:48 -07:00
Andrii Nakryiko
cb15da45c2 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   e860a98c8aebd8de82c0ee901acf5a759acd4570
Checkpoint bpf-next commit: 42be23e8f2dcb100cb9944b2b54b6bf41aff943d
Baseline bpf commit:        bf4807c89d8f92c47404b1e4eeeefb42259d1b50
Checkpoint bpf commit:      0238c45fbbf8228f52aa4642f0cdc21c570d1dfe

Andrii Nakryiko (2):
  libbpf: start v1.7 dev cycle
  libbpf: Fix handling of BPF arena relocations

Eduard Zingerman (1):
  libbpf: Verify that arena map exists when adding arena relocations

Matteo Croce (1):
  libbpf: Fix warning in calloc() usage

Tao Chen (1):
  bpf: Add struct bpf_token_info

 include/uapi/linux/bpf.h |  8 ++++++++
 src/libbpf.c             | 27 +++++++++++++++++++--------
 src/libbpf.map           |  3 +++
 src/libbpf_version.h     |  2 +-
 4 files changed, 31 insertions(+), 9 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-18 17:20:48 -07:00
Eduard Zingerman
8e59f80a93 libbpf: Verify that arena map exists when adding arena relocations
Fuzzer reported a memory access error in bpf_program__record_reloc()
that happens when:
- ".addr_space.1" section exists
- there is a relocation referencing this section
- there are no arena maps defined in BTF.

Sanity checks for maps existence are already present in
bpf_program__record_reloc(), hence this commit adds another one.

[1] https://github.com/libbpf/libbpf/actions/runs/16375110681/job/46272998064

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250718222059.281526-1-eddyz87@gmail.com
2025-07-18 17:20:48 -07:00
Andrii Nakryiko
fb66fb4948 libbpf: Fix handling of BPF arena relocations
Initial __arena global variable support implementation in libbpf
contains a bug: it remembers struct bpf_map pointer for arena, which is
used later on to process relocations. Recording this pointer is
problematic because map pointers are not stable during ELF relocation
collection phase, as an array of struct bpf_map's can be reallocated,
invalidating all the pointers. Libbpf is dealing with similar issues by
using a stable internal map index, though for BPF arena map specifically
this approach wasn't used due to an oversight.

The resulting behavior is non-deterministic issue which depends on exact
layout of ELF object file, number of actual maps, etc. We didn't hit
this until very recently, when this bug started triggering crash in BPF
CI when validating one of sched-ext BPF programs.

The fix is rather straightforward: we just follow an established pattern
of remembering map index (just like obj->kconfig_map_idx, for example)
instead of `struct bpf_map *`, and resolving index to a pointer at the
point where map information is necessary.

While at it also add debug-level message for arena-related relocation
resolution information, which we already have for all other kinds of
maps.

Fixes: 2e7ba4f8fd1f ("libbpf: Recognize __arena global variables.")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250718001009.610955-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-18 17:20:48 -07:00
Matteo Croce
a9dbcc32fd libbpf: Fix warning in calloc() usage
When compiling libbpf with some compilers, this warning is triggered:

libbpf.c: In function ‘bpf_object__gen_loader’:
libbpf.c:9209:28: error: ‘calloc’ sizes specified with ‘sizeof’ in the earlier argument and not in the later argument [-Werror=calloc-transposed-args]
 9209 |         gen = calloc(sizeof(*gen), 1);
      |                            ^
libbpf.c:9209:28: note: earlier argument should specify number of elements, later size of each element

Fix this by inverting the calloc() arguments.

Signed-off-by: Matteo Croce <teknoraver@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250717200337.49168-1-technoboy85@gmail.com
2025-07-18 17:20:48 -07:00
Tao Chen
a3dadc5a42 bpf: Add struct bpf_token_info
The 'commit 35f96de04127 ("bpf: Introduce BPF token object")' added
BPF token as a new kind of BPF kernel object. And BPF_OBJ_GET_INFO_BY_FD
already used to get BPF object info, so we can also get token info with
this cmd.
One usage scenario, when program runs failed with token, because of
the permission failure, we can report what BPF token is allowing with
this API for debugging.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Link: https://lore.kernel.org/r/20250716134654.1162635-1-chen.dylane@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-18 17:20:48 -07:00
Andrii Nakryiko
2b39ea081f libbpf: start v1.7 dev cycle
With libbpf 1.6.0 released, adjust libbpf.map and libbpf_version.h to
start v1.7 development cycles.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250716175936.2343013-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-18 17:20:48 -07:00
Andrii Nakryiko
da08818f4f sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   07ee18a0bc946b6b407942c88faed089e20f47d1
Checkpoint bpf-next commit: e860a98c8aebd8de82c0ee901acf5a759acd4570
Baseline bpf commit:        e34a79b96ab9d49ed8b605fee11099cf3efbb428
Checkpoint bpf commit:      bf4807c89d8f92c47404b1e4eeeefb42259d1b50

Eduard Zingerman (1):
  libbpf: __arg_untrusted in bpf_helpers.h

Kumar Kartikeya Dwivedi (3):
  bpf: Introduce BPF standard streams
  libbpf: Add bpf_stream_printk() macro
  libbpf: Introduce bpf_prog_stream_read() API

 include/uapi/linux/bpf.h | 24 ++++++++++++++++++++++++
 src/bpf.c                | 20 ++++++++++++++++++++
 src/bpf.h                | 21 +++++++++++++++++++++
 src/bpf_helpers.h        | 17 +++++++++++++++++
 src/libbpf.map           |  1 +
 5 files changed, 83 insertions(+)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-16 08:57:15 -07:00
Eduard Zingerman
5f4a34c606 libbpf: __arg_untrusted in bpf_helpers.h
Make btf_decl_tag("arg:untrusted") available for libbpf users via
macro. Makes the following usage possible:

  void foo(struct bar *p __arg_untrusted) { ... }
  void bar(struct foo *p __arg_trusted) {
    ...
    foo(p->buz->bar); // buz derefrence looses __trusted
    ...
  }

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250704230354.1323244-6-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 08:57:15 -07:00
Kumar Kartikeya Dwivedi
c2850a3840 libbpf: Introduce bpf_prog_stream_read() API
Introduce a libbpf API so that users can read data from a given BPF
stream for a BPF prog fd. For now, only the low-level syscall wrapper
is provided, we can add a bpf_program__* accessor as a follow up if
needed.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 08:57:15 -07:00
Kumar Kartikeya Dwivedi
9bb5c46da4 libbpf: Add bpf_stream_printk() macro
Add a convenience macro to print data to the BPF streams. BPF_STDOUT and
BPF_STDERR stream IDs in the vmlinux.h can be passed to the macro to
print to the respective streams.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-10-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 08:57:15 -07:00
Kumar Kartikeya Dwivedi
ae131a0b7c bpf: Introduce BPF standard streams
Add support for a stream API to the kernel and expose related kfuncs to
BPF programs. Two streams are exposed, BPF_STDOUT and BPF_STDERR. These
can be used for printing messages that can be consumed from user space,
thus it's similar in spirit to existing trace_pipe interface.

The kernel will use the BPF_STDERR stream to notify the program of any
errors encountered at runtime. BPF programs themselves may use both
streams for writing debug messages. BPF library-like code may use
BPF_STDERR to print warnings or errors on misuse at runtime.

The implementation of a stream is as follows. Everytime a message is
emitted from the kernel (directly, or through a BPF program), a record
is allocated by bump allocating from per-cpu region backed by a page
obtained using alloc_pages_nolock(). This ensures that we can allocate
memory from any context. The eventual plan is to discard this scheme in
favor of Alexei's kmalloc_nolock() [0].

This record is then locklessly inserted into a list (llist_add()) so
that the printing side doesn't require holding any locks, and works in
any context. Each stream has a maximum capacity of 4MB of text, and each
printed message is accounted against this limit.

Messages from a program are emitted using the bpf_stream_vprintk kfunc,
which takes a stream_id argument in addition to working otherwise
similar to bpf_trace_vprintk.

The bprintf buffer helpers are extracted out to be reused for printing
the string into them before copying it into the stream, so that we can
(with the defined max limit) format a string and know its true length
before performing allocations of the stream element.

For consuming elements from a stream, we expose a bpf(2) syscall command
named BPF_PROG_STREAM_READ_BY_FD, which allows reading data from the
stream of a given prog_fd into a user space buffer. The main logic is
implemented in bpf_stream_read(). The log messages are queued in
bpf_stream::log by the bpf_stream_vprintk kfunc, and then pulled and
ordered correctly in the stream backlog.

For this purpose, we hold a lock around bpf_stream_backlog_peek(), as
llist_del_first() (if we maintained a second lockless list for the
backlog) wouldn't be safe from multiple threads anyway. Then, if we
fail to find something in the backlog log, we splice out everything from
the lockless log, and place it in the backlog log, and then return the
head of the backlog. Once the full length of the element is consumed, we
will pop it and free it.

The lockless list bpf_stream::log is a LIFO stack. Elements obtained
using a llist_del_all() operation are in LIFO order, thus would break
the chronological ordering if printed directly. Hence, this batch of
messages is first reversed. Then, it is stashed into a separate list in
the stream, i.e. the backlog_log. The head of this list is the actual
message that should always be returned to the caller. All of this is
done in bpf_stream_backlog_fill().

From the kernel side, the writing into the stream will be a bit more
involved than the typical printk. First, the kernel typically may print
a collection of messages into the stream, and parallel writers into the
stream may suffer from interleaving of messages. To ensure each group of
messages is visible atomically, we can lift the advantage of using a
lockless list for pushing in messages.

To enable this, we add a bpf_stream_stage() macro, and require kernel
users to use bpf_stream_printk statements for the passed expression to
write into the stream. Underneath the macro, we have a message staging
API, where a bpf_stream_stage object on the stack accumulates the
messages being printed into a local llist_head, and then a commit
operation splices the whole batch into the stream's lockless log list.

This is especially pertinent for rqspinlock deadlock messages printed to
program streams. After this change, we see each deadlock invocation as a
non-interleaving contiguous message without any confusion on the
reader's part, improving their user experience in debugging the fault.

While programs cannot benefit from this staged stream writing API, they
could just as well hold an rqspinlock around their print statements to
serialize messages, hence this is kept kernel-internal for now.

Overall, this infrastructure provides NMI-safe any context printing of
messages to two dedicated streams.

Later patches will add support for printing splats in case of BPF arena
page faults, rqspinlock deadlocks, and cond_break timeouts, and
integration of this facility into bpftool for dumping messages to user
space.

  [0]: https://lore.kernel.org/bpf/20250501032718.65476-1-alexei.starovoitov@gmail.com

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 08:57:15 -07:00
Andrii Nakryiko
7a6e6b484d sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   4a4b84ba9e453295c746d81cb245c0c5d80050f0
Checkpoint bpf-next commit: 07ee18a0bc946b6b407942c88faed089e20f47d1
Baseline bpf commit:        a766cfbbeb3a
Checkpoint bpf commit:      e34a79b96ab9d49ed8b605fee11099cf3efbb428

Adin Scannell (1):
  libbpf: Fix possible use-after-free for externs

Yuan Chen (1):
  libbpf: Fix null pointer dereference in btf_dump__free on allocation
    failure

 src/btf_dump.c |  3 +++
 src/libbpf.c   | 10 +++++++---
 2 files changed, 10 insertions(+), 3 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-07 17:21:39 -07:00
Andrii Nakryiko
dc031df06a sync: update .mailmap
Update .mailmap based on libbpf's list of contributors and on the latest
.mailmap version in the upstream repository.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-01 15:38:22 -07:00
Adin Scannell
b58f5a3e77 libbpf: Fix possible use-after-free for externs
The `name` field in `obj->externs` points into the BTF data at initial
open time. However, some functions may invalidate this after opening and
before loading (e.g. `bpf_map__set_value_size`), which results in
pointers into freed memory and undefined behavior.

The simplest solution is to simply `strdup` these strings, similar to
the `essent_name`, and free them at the same time.

In order to test this path, the `global_map_resize` BPF selftest is
modified slightly to ensure the presence of an extern, which causes this
test to fail prior to the fix. Given there isn't an obvious API or error
to test against, I opted to add this to the existing test as an aspect
of the resizing feature rather than duplicate the test.

Fixes: 9d0a23313b1a ("libbpf: Add capability for resizing datasec maps")
Signed-off-by: Adin Scannell <amscanne@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250625050215.2777374-1-amscanne@meta.com
2025-07-01 15:38:22 -07:00
Yuan Chen
de1d0a25a8 libbpf: Fix null pointer dereference in btf_dump__free on allocation failure
When btf_dump__new() fails to allocate memory for the internal hashmap
(btf_dump->type_names), it returns an error code. However, the cleanup
function btf_dump__free() does not check if btf_dump->type_names is NULL
before attempting to free it. This leads to a null pointer dereference
when btf_dump__free() is called on a btf_dump object.

Fixes: 351131b51c7a ("libbpf: add btf_dump API for BTF-to-C conversion")
Signed-off-by: Yuan Chen <chenyuan@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250618011933.11423-1-chenyuan_fl@163.com
2025-07-01 15:38:22 -07:00
Andrii Nakryiko
95a9035e8b sync: adjust sync-kernel.sh script to handle UAPI header guards better
Adjust the sync script to handle UAPI header guards with singular and
double underscore between UAPI and LINUX. Kernel seems to have a mix of
both approaches.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-06-16 11:30:04 -07:00
Andrii Nakryiko
0e2ac81b00 sync: normalize more of Linux UAPI headers
Normalize UAPI headers that had single underscore between UAPI and LINUX.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-06-16 11:30:04 -07:00
Amery Hung
fdb04dd485 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   9325d53fe9adff354b6a93fda5f38c165947da0f
Checkpoint bpf-next commit: 4a4b84ba9e453295c746d81cb245c0c5d80050f0
Baseline bpf commit:        b4432656b36e5cc1d50a1f2dc15357543add530e
Checkpoint bpf commit:      d60d09eadb7cb17690c847f1623436cd4b58c19c

Alan Maguire (1):
  libbpf/btf: Fix string handling to support multi-split BTF

Amery Hung (1):
  libbpf: Support creating and destroying qdisc

Andrii Nakryiko (1):
  libbpf: Handle unsupported mmap-based /sys/kernel/btf/vmlinux
    correctly

Blake Jones (1):
  libbpf: Add support for printing BTF character arrays as strings

Ian Rogers (1):
  perf/uapi: Fix PERF_RECORD_SAMPLE comments in
    <uapi/linux/perf_event.h>

Ingo Molnar (1):
  perf/uapi: Clean up <uapi/linux/perf_event.h> a bit

Jiawei Zhao (1):
  libbpf: Correct some typos and syntax issues in usdt doc

Lorenz Bauer (1):
  libbpf: Use mmap to parse vmlinux BTF from sysfs

Paul Chaignon (2):
  bpf: Clarify handling of mark and tstamp by redirect_peer
  bpf: Fix L4 csum update on IPv6 in CHECKSUM_COMPLETE

Saket Kumar Bhaskar (1):
  selftests/bpf: Fix bpf selftest build warning

Stanislav Fomichev (1):
  net: devmem: TCP tx netlink api

Tao Chen (2):
  bpf: Add cookie to raw_tp bpf_link_info
  bpf: Add cookie to tracing bpf_link_info

Tobias Klauser (1):
  bpf: adjust path to trace_output sample eBPF program

Yonghong Song (2):
  bpf: Implement mprog API on top of existing cgroup progs
  libbpf: Support link-based cgroup attach with options

 include/uapi/linux/bpf.h        |  18 +-
 include/uapi/linux/if_xdp.h     |   6 +-
 include/uapi/linux/netdev.h     |   1 +
 include/uapi/linux/perf_event.h | 657 ++++++++++++++++----------------
 src/bpf.c                       |  44 +++
 src/bpf.h                       |   5 +
 src/btf.c                       |  91 ++++-
 src/btf.h                       |   3 +-
 src/btf_dump.c                  |  55 ++-
 src/libbpf.c                    |  28 ++
 src/libbpf.h                    |  20 +-
 src/libbpf.map                  |   1 +
 src/netlink.c                   |  20 +-
 src/usdt.c                      |  10 +-
 14 files changed, 602 insertions(+), 357 deletions(-)

Signed-off-by: Amery Hung <ameryhung@gmail.com>
2025-06-16 08:52:44 -07:00
Amery Hung
f6284bb875 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
2025-06-16 08:52:44 -07:00
Andrii Nakryiko
7356be641d libbpf: Handle unsupported mmap-based /sys/kernel/btf/vmlinux correctly
libbpf_err_ptr() helpers are meant to return NULL and set errno, if
there is an error. But btf_parse_raw_mmap() is meant to be used
internally and is expected to return ERR_PTR() values. Because of this
mismatch, when libbpf tries to mmap /sys/kernel/btf/vmlinux, we don't
detect the error correctly with IS_ERR() check, and never fallback to
old non-mmap-based way of loading vmlinux BTF.

Fix this by using proper ERR_PTR() returns internally.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes: 3c0421c93ce4 ("libbpf: Use mmap to parse vmlinux BTF from sysfs")
Cc: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250606202134.2738910-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-16 08:52:44 -07:00
Paul Chaignon
d3d933ac9b bpf: Fix L4 csum update on IPv6 in CHECKSUM_COMPLETE
In Cilium, we use bpf_csum_diff + bpf_l4_csum_replace to, among other
things, update the L4 checksum after reverse SNATing IPv6 packets. That
use case is however not currently supported and leads to invalid
skb->csum values in some cases. This patch adds support for IPv6 address
changes in bpf_l4_csum_update via a new flag.

When calling bpf_l4_csum_replace in Cilium, it ends up calling
inet_proto_csum_replace_by_diff:

    1:  void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
    2:                                       __wsum diff, bool pseudohdr)
    3:  {
    4:      if (skb->ip_summed != CHECKSUM_PARTIAL) {
    5:          csum_replace_by_diff(sum, diff);
    6:          if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
    7:              skb->csum = ~csum_sub(diff, skb->csum);
    8:      } else if (pseudohdr) {
    9:          *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum)));
    10:     }
    11: }

The bug happens when we're in the CHECKSUM_COMPLETE state. We've just
updated one of the IPv6 addresses. The helper now updates the L4 header
checksum on line 5. Next, it updates skb->csum on line 7. It shouldn't.

For an IPv6 packet, the updates of the IPv6 address and of the L4
checksum will cancel each other. The checksums are set such that
computing a checksum over the packet including its checksum will result
in a sum of 0. So the same is true here when we update the L4 checksum
on line 5. We'll update it as to cancel the previous IPv6 address
update. Hence skb->csum should remain untouched in this case.

The same bug doesn't affect IPv4 packets because, in that case, three
fields are updated: the IPv4 address, the IP checksum, and the L4
checksum. The change to the IPv4 address and one of the checksums still
cancel each other in skb->csum, but we're left with one checksum update
and should therefore update skb->csum accordingly. That's exactly what
inet_proto_csum_replace_by_diff does.

This special case for IPv6 L4 checksums is also described atop
inet_proto_csum_replace16, the function we should be using in this case.

This patch introduces a new bpf_l4_csum_replace flag, BPF_F_IPV6,
to indicate that we're updating the L4 checksum of an IPv6 packet. When
the flag is set, inet_proto_csum_replace_by_diff will skip the
skb->csum update.

Fixes: 7d672345ed295 ("bpf: add generic bpf_csum_diff helper")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/96a6bc3a443e6f0b21ff7b7834000e17fb549e05.1748509484.git.paul.chaignon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16 08:52:44 -07:00
Tobias Klauser
161752932b bpf: adjust path to trace_output sample eBPF program
The sample file was renamed from trace_output_kern.c to
trace_output.bpf.c in commit d4fffba4d04b ("samples/bpf: Change _kern
suffix to .bpf with syscall tracing program"). Adjust the path in the
documentation comment for bpf_perf_event_output.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Link: https://lore.kernel.org/r/20250610140756.16332-1-tklauser@distanz.ch
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-16 08:52:44 -07:00
Tao Chen
ae67e5966a bpf: Add cookie to tracing bpf_link_info
bpf_tramp_link includes cookie info, we can add it in bpf_link_info.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606165818.3394397-1-chen.dylane@linux.dev
2025-06-16 08:52:44 -07:00
Yonghong Song
553676e8f5 libbpf: Support link-based cgroup attach with options
Currently libbpf supports bpf_program__attach_cgroup() with signature:
  LIBBPF_API struct bpf_link *
  bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd);

To support mprog style attachment, additionsl fields like flags,
relative_{fd,id} and expected_revision are needed.

Add a new API:
  LIBBPF_API struct bpf_link *
  bpf_program__attach_cgroup_opts(const struct bpf_program *prog, int cgroup_fd,
                                  const struct bpf_cgroup_opts *opts);
where bpf_cgroup_opts contains all above needed fields.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163146.2429212-1-yonghong.song@linux.dev
2025-06-16 08:52:44 -07:00
Yonghong Song
15dfc869f8 bpf: Implement mprog API on top of existing cgroup progs
Current cgroup prog ordering is appending at attachment time. This is not
ideal. In some cases, users want specific ordering at a particular cgroup
level. To address this, the existing mprog API seems an ideal solution with
supporting BPF_F_BEFORE and BPF_F_AFTER flags.

But there are a few obstacles to directly use kernel mprog interface.
Currently cgroup bpf progs already support prog attach/detach/replace
and link-based attach/detach/replace. For example, in struct
bpf_prog_array_item, the cgroup_storage field needs to be together
with bpf prog. But the mprog API struct bpf_mprog_fp only has bpf_prog
as the member, which makes it difficult to use kernel mprog interface.

In another case, the current cgroup prog detach tries to use the
same flag as in attach. This is different from mprog kernel interface
which uses flags passed from user space.

So to avoid modifying existing behavior, I made the following changes to
support mprog API for cgroup progs:
 - The support is for prog list at cgroup level. Cross-level prog list
   (a.k.a. effective prog list) is not supported.
 - Previously, BPF_F_PREORDER is supported only for prog attach, now
   BPF_F_PREORDER is also supported by link-based attach.
 - For attach, BPF_F_BEFORE/BPF_F_AFTER/BPF_F_ID/BPF_F_LINK is supported
   similar to kernel mprog but with different implementation.
 - For detach and replace, use the existing implementation.
 - For attach, detach and replace, the revision for a particular prog
   list, associated with a particular attach type, will be updated
   by increasing count by 1.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163141.2428937-1-yonghong.song@linux.dev
2025-06-16 08:52:44 -07:00
Blake Jones
439433a909 libbpf: Add support for printing BTF character arrays as strings
The BTF dumper code currently displays arrays of characters as just that -
arrays, with each character formatted individually. Sometimes this is what
makes sense, but it's nice to be able to treat that array as a string.

This change adds a special case to the btf_dump functionality to allow
0-terminated arrays of single-byte integer values to be printed as
character strings. Characters for which isprint() returns false are
printed as hex-escaped values. This is enabled when the new ".emit_strings"
is set to 1 in the btf_dump_type_data_opts structure.

As an example, here's what it looks like to dump the string "hello" using
a few different field values for btf_dump_type_data_opts (.compact = 1):

- .emit_strings = 0, .skip_names = 0:  (char[6])['h','e','l','l','o',]
- .emit_strings = 0, .skip_names = 1:  ['h','e','l','l','o',]
- .emit_strings = 1, .skip_names = 0:  (char[6])"hello"
- .emit_strings = 1, .skip_names = 1:  "hello"

Here's the string "h\xff", dumped with .compact = 1 and .skip_names = 1:

- .emit_strings = 0:  ['h',-1,]
- .emit_strings = 1:  "h\xff"

Signed-off-by: Blake Jones <blakejones@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250603203701.520541-1-blakejones@google.com
2025-06-16 08:52:44 -07:00
Jiawei Zhao
224ea3ec50 libbpf: Correct some typos and syntax issues in usdt doc
Fix some incorrect words, such as "and" -> "an", "it's" -> "its".  Fix
some grammar issues, such as removing redundant "will", "would
complicated" -> "would complicate".

Signed-off-by: Jiawei Zhao <Phoenix500526@163.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250531095111.57824-1-Phoenix500526@163.com
2025-06-16 08:52:44 -07:00
Tao Chen
b73864fc10 bpf: Add cookie to raw_tp bpf_link_info
After commit 68ca5d4eebb8 ("bpf: support BPF cookie in raw tracepoint
(raw_tp, tp_btf) programs"), we can show the cookie in bpf_link_info
like kprobe etc.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250603154309.3063644-1-chen.dylane@linux.dev
2025-06-16 08:52:44 -07:00
Lorenz Bauer
01dd871a20 libbpf: Use mmap to parse vmlinux BTF from sysfs
Teach libbpf to use mmap when parsing vmlinux BTF from /sys. We don't
apply this to fall-back paths on the regular file system because there
is no way to ensure that modifications underlying the MAP_PRIVATE
mapping are not visible to the process.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250520-vmlinux-mmap-v5-3-e8c941acc414@isovalent.com
2025-06-16 08:52:44 -07:00
Alan Maguire
8df8d67f63 libbpf/btf: Fix string handling to support multi-split BTF
libbpf handling of split BTF has been written largely with the
assumption that multiple splits are possible, i.e. split BTF on top of
split BTF on top of base BTF.  One area where this does not quite work
is string handling in split BTF; the start string offset should be the
base BTF string section length + the base BTF string offset.  This
worked in the past because for a single split BTF with base the start
string offset was always 0.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250519165935.261614-2-alan.maguire@oracle.com
2025-06-16 08:52:44 -07:00
Stanislav Fomichev
31bb8f7936 net: devmem: TCP tx netlink api
Add bind-tx netlink call to attach dmabuf for TX; queue is not
required, only ifindex and dmabuf fd for attachment.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250508004830.4100853-4-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-16 08:51:34 -07:00
Amery Hung
f580871b42 libbpf: Support creating and destroying qdisc
Extend struct bpf_tc_hook with handle, qdisc name and a new attach type,
BPF_TC_QDISC, to allow users to add or remove any qdisc specified in
addition to clsact.

Signed-off-by: Amery Hung <amery.hung@bytedance.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250409214606.2000194-8-ameryhung@gmail.com
2025-06-16 08:51:34 -07:00
Ingo Molnar
52b9b38a22 perf/uapi: Clean up <uapi/linux/perf_event.h> a bit
When applying a recent commit to the <uapi/linux/perf_event.h>
header I noticed that we have accumulated quite a bit of
historic noise in this header, so do a bit of spring cleaning:

 - Define bitfields in a vertically aligned fashion, like
   perf_event_mmap_page::capabilities already does. This
   makes it easier to see the distribution and sizing of
   bits within a word, at a glance. The following is much
   more readable:

			__u64	cap_bit0		: 1,
				cap_bit0_is_deprecated	: 1,
				cap_user_rdpmc		: 1,
				cap_user_time		: 1,
				cap_user_time_zero	: 1,
				cap_user_time_short	: 1,
				cap_____res		: 58;

   Than:

			__u64	cap_bit0:1,
				cap_bit0_is_deprecated:1,
				cap_user_rdpmc:1,
				cap_user_time:1,
				cap_user_time_zero:1,
				cap_user_time_short:1,
				cap_____res:58;

   So convert all bitfield definitions from the latter style to the
   former style.

 - Fix typos and grammar

 - Fix capitalization

 - Remove whitespace noise

 - Harmonize the definitions of various generations and groups of
   PERF_MEM_ ABI values.

 - Vertically align all definitions and assignments to the same
   column (48), as the first definition (enum perf_type_id),
   throughout the entire header.

 - And in general make the code and comments to be more in sync
   with each other and to be more readable overall.

No change in functionality.

Copy the changes over to tools/include/uapi/linux/perf_event.h.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250521221529.2547099-1-irogers@google.com
2025-06-16 08:51:34 -07:00
Ian Rogers
bae6a269ca perf/uapi: Fix PERF_RECORD_SAMPLE comments in <uapi/linux/perf_event.h>
AAUX data for PERF_SAMPLE_AUX appears last. PERF_SAMPLE_CGROUP is
missing from the comment.

This makes the <uapi/linux/perf_event.h> comment match that in the
perf_event_open man page.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20250521221529.2547099-1-irogers@google.com
2025-06-16 08:51:34 -07:00
Paul Chaignon
9b06cd15e0 bpf: Clarify handling of mark and tstamp by redirect_peer
When switching network namespaces with the bpf_redirect_peer helper, the
skb->mark and skb->tstamp fields are not zeroed out like they can be on
a typical netns switch. This patch clarifies that in the helper
description.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/ccc86af26d43c5c0b776bcba2601b7479c0d46d0.1746460653.git.paul.chaignon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16 08:51:34 -07:00
Andrii Nakryiko
346532d711 sync: one-time strip out _UAPI prefix from UAPI header guards
Normalize already synced UAPI headers. For all subsequent syncs this
will be done automatically by the sync script.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-06-13 13:18:23 -07:00
Andrii Nakryiko
15c5317b6c sync: update sync script to strip _UAPI prefix in UAPI headers
It's expected that kernel UAPI headers have #ifndef guards starting with
__LINUX prefix, while in the kernel source code these guards are
actually starting with _UAPI__LINUX. The stripping of _UAPI prefix is
done (among other things) by kernel's scripts/headers_install.sh script.

Given libbpf vendors its own UAPI header under include/uapi subdir, and
those "internal" UAPI headers are sometimes used by libbpf users for
convenience, let's stick to the __LINUX prefix rule and do that during
the sync.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-06-13 13:18:23 -07:00
28 changed files with 891 additions and 419 deletions

View File

@@ -10,6 +10,7 @@ Herbert Xu <herbert@gondor.apana.org.au>
Jakub Kicinski <kuba@kernel.org> <jakub.kicinski@netronome.com> Jakub Kicinski <kuba@kernel.org> <jakub.kicinski@netronome.com>
Jesper Dangaard Brouer <hawk@kernel.org> <brouer@redhat.com> Jesper Dangaard Brouer <hawk@kernel.org> <brouer@redhat.com>
Kees Cook <kees@kernel.org> <keescook@chromium.org> Kees Cook <kees@kernel.org> <keescook@chromium.org>
Kuniyuki Iwashima <kuniyu@google.com> <kuniyu@amazon.co.jp>
Leo Yan <leo.yan@linux.dev> <leo.yan@linaro.org> Leo Yan <leo.yan@linux.dev> <leo.yan@linaro.org>
Mark Starovoytov <mstarovo@pm.me> <mstarovoitov@marvell.com> Mark Starovoytov <mstarovo@pm.me> <mstarovoitov@marvell.com>
Maxim Mikityanskiy <maxtram95@gmail.com> <maximmi@mellanox.com> Maxim Mikityanskiy <maxtram95@gmail.com> <maximmi@mellanox.com>

View File

@@ -1 +1 @@
b4432656b36e5cc1d50a1f2dc15357543add530e 27861fc720be2c39b861d8bdfb68287f54de6855

View File

@@ -1 +1 @@
9325d53fe9adff354b6a93fda5f38c165947da0f 21aeabb68258ce17b91af113a768760b3a491d93

58
fuzz/readme.md Normal file
View File

@@ -0,0 +1,58 @@
# About fuzzing
Fuzzing is done by [OSS-Fuzz](https://google.github.io/oss-fuzz/).
It works by creating a project-specific binary that combines fuzzer
itself and a project provided entry point named
`LLVMFuzzerTestOneInput()`. When invoked, this executable either
searches for new test cases or runs an existing test case.
File `fuzz/bpf-object-fuzzer.c` defines an entry point for the robot:
- robot supplies bytes supposed to be an ELF file;
- wrapper invokes `bpf_object__open_mem()` to process these bytes.
File `scripts/build-fuzzers.sh` provides a recipe for fuzzer
infrastructure on how to build the executable described above (see
[here](https://github.com/google/oss-fuzz/tree/master/projects/libbpf)).
# Reproducing fuzzing errors
## Official way
OSS-Fuzz project describes error reproduction steps in the official
[documentation](https://google.github.io/oss-fuzz/advanced-topics/reproducing/).
Suppose you received an email linking to the fuzzer generated test
case, or got one as an artifact of the `CIFuzz` job (e.g. like
[here](https://github.com/libbpf/libbpf/actions/runs/16375110681)).
Actions to reproduce the error locally are:
```sh
git clone --depth=1 https://github.com/google/oss-fuzz.git
cd oss-fuzz
python infra/helper.py pull_images
python infra/helper.py build_image libbpf
python infra/helper.py build_fuzzers --sanitizer address libbpf <path-to-libbpf-checkout>
python infra/helper.py reproduce libbpf bpf-object-fuzzer <path-to-test-case>
```
`<path-to-test-case>` is usually a `crash-<many-hex-digits>` file w/o
extension, CI job wraps it into zip archive and attaches as an artifact.
To recompile after some fixes, repeat the `build_fuzzers` and
`reproduce` steps after modifying source code in
`<path-to-libbpf-checkout>`.
Note: the `build_fuzzers` step creates a binary
`build/out/libbpf/bpf-object-fuzzer`, it can be executed directly if
your environment is compatible.
## Simple way
From the project root:
```sh
SKIP_LIBELF_REBUILD=1 scripts/build-fuzzers.sh
out/bpf-object-fuzzer <path-to-test-case>
```
`out/bpf-object-fuzzer` is the fuzzer executable described earlier,
can be run with gdb etc.

View File

@@ -5,8 +5,8 @@
* modify it under the terms of version 2 of the GNU General Public * modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation. * License as published by the Free Software Foundation.
*/ */
#ifndef _UAPI__LINUX_BPF_H__ #ifndef __LINUX_BPF_H__
#define _UAPI__LINUX_BPF_H__ #define __LINUX_BPF_H__
#include <linux/types.h> #include <linux/types.h>
#include <linux/bpf_common.h> #include <linux/bpf_common.h>
@@ -450,6 +450,7 @@ union bpf_iter_link_info {
* * **struct bpf_map_info** * * **struct bpf_map_info**
* * **struct bpf_btf_info** * * **struct bpf_btf_info**
* * **struct bpf_link_info** * * **struct bpf_link_info**
* * **struct bpf_token_info**
* *
* Return * Return
* Returns zero on success. On error, -1 is returned and *errno* * Returns zero on success. On error, -1 is returned and *errno*
@@ -906,6 +907,17 @@ union bpf_iter_link_info {
* A new file descriptor (a nonnegative integer), or -1 if an * A new file descriptor (a nonnegative integer), or -1 if an
* error occurred (in which case, *errno* is set appropriately). * error occurred (in which case, *errno* is set appropriately).
* *
* BPF_PROG_STREAM_READ_BY_FD
* Description
* Read data of a program's BPF stream. The program is identified
* by *prog_fd*, and the stream is identified by the *stream_id*.
* The data is copied to a buffer pointed to by *stream_buf*, and
* filled less than or equal to *stream_buf_len* bytes.
*
* Return
* Number of bytes read from the stream on success, or -1 if an
* error occurred (in which case, *errno* is set appropriately).
*
* NOTES * NOTES
* eBPF objects (maps and programs) can be shared between processes. * eBPF objects (maps and programs) can be shared between processes.
* *
@@ -961,6 +973,7 @@ enum bpf_cmd {
BPF_LINK_DETACH, BPF_LINK_DETACH,
BPF_PROG_BIND_MAP, BPF_PROG_BIND_MAP,
BPF_TOKEN_CREATE, BPF_TOKEN_CREATE,
BPF_PROG_STREAM_READ_BY_FD,
__MAX_BPF_CMD, __MAX_BPF_CMD,
}; };
@@ -1463,6 +1476,11 @@ struct bpf_stack_build_id {
#define BPF_OBJ_NAME_LEN 16U #define BPF_OBJ_NAME_LEN 16U
enum {
BPF_STREAM_STDOUT = 1,
BPF_STREAM_STDERR = 2,
};
union bpf_attr { union bpf_attr {
struct { /* anonymous struct used by BPF_MAP_CREATE command */ struct { /* anonymous struct used by BPF_MAP_CREATE command */
__u32 map_type; /* one of enum bpf_map_type */ __u32 map_type; /* one of enum bpf_map_type */
@@ -1794,6 +1812,13 @@ union bpf_attr {
}; };
__u64 expected_revision; __u64 expected_revision;
} netkit; } netkit;
struct {
union {
__u32 relative_fd;
__u32 relative_id;
};
__u64 expected_revision;
} cgroup;
}; };
} link_create; } link_create;
@@ -1842,6 +1867,13 @@ union bpf_attr {
__u32 bpffs_fd; __u32 bpffs_fd;
} token_create; } token_create;
struct {
__aligned_u64 stream_buf;
__u32 stream_buf_len;
__u32 stream_id;
__u32 prog_fd;
} prog_stream_read;
} __attribute__((aligned(8))); } __attribute__((aligned(8)));
/* The description below is an attempt at providing documentation to eBPF /* The description below is an attempt at providing documentation to eBPF
@@ -2056,6 +2088,7 @@ union bpf_attr {
* for updates resulting in a null checksum the value is set to * for updates resulting in a null checksum the value is set to
* **CSUM_MANGLED_0** instead. Flag **BPF_F_PSEUDO_HDR** indicates * **CSUM_MANGLED_0** instead. Flag **BPF_F_PSEUDO_HDR** indicates
* that the modified header field is part of the pseudo-header. * that the modified header field is part of the pseudo-header.
* Flag **BPF_F_IPV6** should be set for IPv6 packets.
* *
* This helper works in combination with **bpf_csum_diff**\ (), * This helper works in combination with **bpf_csum_diff**\ (),
* which does not update the checksum in-place, but offers more * which does not update the checksum in-place, but offers more
@@ -2402,7 +2435,7 @@ union bpf_attr {
* into it. An example is available in file * into it. An example is available in file
* *samples/bpf/trace_output_user.c* in the Linux kernel source * *samples/bpf/trace_output_user.c* in the Linux kernel source
* tree (the eBPF program counterpart is in * tree (the eBPF program counterpart is in
* *samples/bpf/trace_output_kern.c*). * *samples/bpf/trace_output.bpf.c*).
* *
* **bpf_perf_event_output**\ () achieves better performance * **bpf_perf_event_output**\ () achieves better performance
* than **bpf_trace_printk**\ () for sharing data with user * than **bpf_trace_printk**\ () for sharing data with user
@@ -4972,6 +5005,9 @@ union bpf_attr {
* the netns switch takes place from ingress to ingress without * the netns switch takes place from ingress to ingress without
* going through the CPU's backlog queue. * going through the CPU's backlog queue.
* *
* *skb*\ **->mark** and *skb*\ **->tstamp** are not cleared during
* the netns switch.
*
* The *flags* argument is reserved and must be 0. The helper is * The *flags* argument is reserved and must be 0. The helper is
* currently only supported for tc BPF program types at the * currently only supported for tc BPF program types at the
* ingress hook and for veth and netkit target device types. The * ingress hook and for veth and netkit target device types. The
@@ -6069,6 +6105,7 @@ enum {
BPF_F_PSEUDO_HDR = (1ULL << 4), BPF_F_PSEUDO_HDR = (1ULL << 4),
BPF_F_MARK_MANGLED_0 = (1ULL << 5), BPF_F_MARK_MANGLED_0 = (1ULL << 5),
BPF_F_MARK_ENFORCE = (1ULL << 6), BPF_F_MARK_ENFORCE = (1ULL << 6),
BPF_F_IPV6 = (1ULL << 7),
}; };
/* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */ /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
@@ -6648,11 +6685,15 @@ struct bpf_link_info {
struct { struct {
__aligned_u64 tp_name; /* in/out: tp_name buffer ptr */ __aligned_u64 tp_name; /* in/out: tp_name buffer ptr */
__u32 tp_name_len; /* in/out: tp_name buffer len */ __u32 tp_name_len; /* in/out: tp_name buffer len */
__u32 :32;
__u64 cookie;
} raw_tracepoint; } raw_tracepoint;
struct { struct {
__u32 attach_type; __u32 attach_type;
__u32 target_obj_id; /* prog_id for PROG_EXT, otherwise btf object id */ __u32 target_obj_id; /* prog_id for PROG_EXT, otherwise btf object id */
__u32 target_btf_id; /* BTF type id inside the object */ __u32 target_btf_id; /* BTF type id inside the object */
__u32 :32;
__u64 cookie;
} tracing; } tracing;
struct { struct {
__u64 cgroup_id; __u64 cgroup_id;
@@ -6763,6 +6804,13 @@ struct bpf_link_info {
}; };
} __attribute__((aligned(8))); } __attribute__((aligned(8)));
struct bpf_token_info {
__u64 allowed_cmds;
__u64 allowed_maps;
__u64 allowed_progs;
__u64 allowed_attachs;
} __attribute__((aligned(8)));
/* User bpf_sock_addr struct to access socket fields and sockaddr struct passed /* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
* by user and intended to be used by socket (e.g. to bind to, depends on * by user and intended to be used by socket (e.g. to bind to, depends on
* attach type). * attach type).
@@ -7575,4 +7623,4 @@ enum bpf_kfunc_flags {
BPF_F_PAD_ZEROS = (1ULL << 0), BPF_F_PAD_ZEROS = (1ULL << 0),
}; };
#endif /* _UAPI__LINUX_BPF_H__ */ #endif /* __LINUX_BPF_H__ */

View File

@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef _UAPI__LINUX_BPF_COMMON_H__ #ifndef __LINUX_BPF_COMMON_H__
#define _UAPI__LINUX_BPF_COMMON_H__ #define __LINUX_BPF_COMMON_H__
/* Instruction classes */ /* Instruction classes */
#define BPF_CLASS(code) ((code) & 0x07) #define BPF_CLASS(code) ((code) & 0x07)
@@ -54,4 +54,4 @@
#define BPF_MAXINSNS 4096 #define BPF_MAXINSNS 4096
#endif #endif
#endif /* _UAPI__LINUX_BPF_COMMON_H__ */ #endif /* __LINUX_BPF_COMMON_H__ */

View File

@@ -1,7 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/* Copyright (c) 2018 Facebook */ /* Copyright (c) 2018 Facebook */
#ifndef _UAPI__LINUX_BTF_H__ #ifndef __LINUX_BTF_H__
#define _UAPI__LINUX_BTF_H__ #define __LINUX_BTF_H__
#include <linux/types.h> #include <linux/types.h>
@@ -198,4 +198,4 @@ struct btf_enum64 {
__u32 val_hi32; __u32 val_hi32;
}; };
#endif /* _UAPI__LINUX_BTF_H__ */ #endif /* __LINUX_BTF_H__ */

View File

@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef _UAPI_LINUX_IF_LINK_H #ifndef _LINUX_IF_LINK_H
#define _UAPI_LINUX_IF_LINK_H #define _LINUX_IF_LINK_H
#include <linux/types.h> #include <linux/types.h>
#include <linux/netlink.h> #include <linux/netlink.h>
@@ -1977,4 +1977,4 @@ enum {
#define IFLA_DSA_MAX (__IFLA_DSA_MAX - 1) #define IFLA_DSA_MAX (__IFLA_DSA_MAX - 1)
#endif /* _UAPI_LINUX_IF_LINK_H */ #endif /* _LINUX_IF_LINK_H */

View File

@@ -79,6 +79,7 @@ struct xdp_mmap_offsets {
#define XDP_UMEM_COMPLETION_RING 6 #define XDP_UMEM_COMPLETION_RING 6
#define XDP_STATISTICS 7 #define XDP_STATISTICS 7
#define XDP_OPTIONS 8 #define XDP_OPTIONS 8
#define XDP_MAX_TX_SKB_BUDGET 9
struct xdp_umem_reg { struct xdp_umem_reg {
__u64 addr; /* Start of packet data area */ __u64 addr; /* Start of packet data area */

View File

@@ -3,8 +3,8 @@
/* Documentation/netlink/specs/netdev.yaml */ /* Documentation/netlink/specs/netdev.yaml */
/* YNL-GEN uapi header */ /* YNL-GEN uapi header */
#ifndef _UAPI_LINUX_NETDEV_H #ifndef _LINUX_NETDEV_H
#define _UAPI_LINUX_NETDEV_H #define _LINUX_NETDEV_H
#define NETDEV_FAMILY_NAME "netdev" #define NETDEV_FAMILY_NAME "netdev"
#define NETDEV_FAMILY_VERSION 1 #define NETDEV_FAMILY_VERSION 1
@@ -77,6 +77,11 @@ enum netdev_qstats_scope {
NETDEV_QSTATS_SCOPE_QUEUE = 1, NETDEV_QSTATS_SCOPE_QUEUE = 1,
}; };
enum netdev_napi_threaded {
NETDEV_NAPI_THREADED_DISABLED,
NETDEV_NAPI_THREADED_ENABLED,
};
enum { enum {
NETDEV_A_DEV_IFINDEX = 1, NETDEV_A_DEV_IFINDEX = 1,
NETDEV_A_DEV_PAD, NETDEV_A_DEV_PAD,
@@ -134,6 +139,7 @@ enum {
NETDEV_A_NAPI_DEFER_HARD_IRQS, NETDEV_A_NAPI_DEFER_HARD_IRQS,
NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT, NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT,
NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT, NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT,
NETDEV_A_NAPI_THREADED,
__NETDEV_A_NAPI_MAX, __NETDEV_A_NAPI_MAX,
NETDEV_A_NAPI_MAX = (__NETDEV_A_NAPI_MAX - 1) NETDEV_A_NAPI_MAX = (__NETDEV_A_NAPI_MAX - 1)
@@ -219,6 +225,7 @@ enum {
NETDEV_CMD_QSTATS_GET, NETDEV_CMD_QSTATS_GET,
NETDEV_CMD_BIND_RX, NETDEV_CMD_BIND_RX,
NETDEV_CMD_NAPI_SET, NETDEV_CMD_NAPI_SET,
NETDEV_CMD_BIND_TX,
__NETDEV_CMD_MAX, __NETDEV_CMD_MAX,
NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1) NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
@@ -227,4 +234,4 @@ enum {
#define NETDEV_MCGRP_MGMT "mgmt" #define NETDEV_MCGRP_MGMT "mgmt"
#define NETDEV_MCGRP_PAGE_POOL "page-pool" #define NETDEV_MCGRP_PAGE_POOL "page-pool"
#endif /* _UAPI_LINUX_NETDEV_H */ #endif /* _LINUX_NETDEV_H */

View File

@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef _UAPI__LINUX_NETLINK_H #ifndef __LINUX_NETLINK_H
#define _UAPI__LINUX_NETLINK_H #define __LINUX_NETLINK_H
#include <linux/kernel.h> #include <linux/kernel.h>
#include <linux/socket.h> /* for __kernel_sa_family_t */ #include <linux/socket.h> /* for __kernel_sa_family_t */
@@ -249,4 +249,4 @@ struct nla_bitfield32 {
__u32 selector; __u32 selector;
}; };
#endif /* _UAPI__LINUX_NETLINK_H */ #endif /* __LINUX_NETLINK_H */

View File

@@ -12,8 +12,8 @@
* *
* For licencing details see kernel-base/COPYING * For licencing details see kernel-base/COPYING
*/ */
#ifndef _UAPI_LINUX_PERF_EVENT_H #ifndef _LINUX_PERF_EVENT_H
#define _UAPI_LINUX_PERF_EVENT_H #define _LINUX_PERF_EVENT_H
#include <linux/types.h> #include <linux/types.h>
#include <linux/ioctl.h> #include <linux/ioctl.h>
@@ -39,15 +39,18 @@ enum perf_type_id {
/* /*
* attr.config layout for type PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE * attr.config layout for type PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
*
* PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA * PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA
* AA: hardware event ID * AA: hardware event ID
* EEEEEEEE: PMU type ID * EEEEEEEE: PMU type ID
*
* PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB * PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB
* BB: hardware cache ID * BB: hardware cache ID
* CC: hardware cache op ID * CC: hardware cache op ID
* DD: hardware cache op result ID * DD: hardware cache op result ID
* EEEEEEEE: PMU type ID * EEEEEEEE: PMU type ID
* If the PMU type ID is 0, the PERF_TYPE_RAW will be applied. *
* If the PMU type ID is 0, PERF_TYPE_RAW will be applied.
*/ */
#define PERF_PMU_TYPE_SHIFT 32 #define PERF_PMU_TYPE_SHIFT 32
#define PERF_HW_EVENT_MASK 0xffffffff #define PERF_HW_EVENT_MASK 0xffffffff
@@ -112,7 +115,7 @@ enum perf_hw_cache_op_result_id {
/* /*
* Special "software" events provided by the kernel, even if the hardware * Special "software" events provided by the kernel, even if the hardware
* does not support performance events. These events measure various * does not support performance events. These events measure various
* physical and sw events of the kernel (and allow the profiling of them as * physical and SW events of the kernel (and allow the profiling of them as
* well): * well):
*/ */
enum perf_sw_ids { enum perf_sw_ids {
@@ -167,8 +170,9 @@ enum perf_event_sample_format {
}; };
#define PERF_SAMPLE_WEIGHT_TYPE (PERF_SAMPLE_WEIGHT | PERF_SAMPLE_WEIGHT_STRUCT) #define PERF_SAMPLE_WEIGHT_TYPE (PERF_SAMPLE_WEIGHT | PERF_SAMPLE_WEIGHT_STRUCT)
/* /*
* values to program into branch_sample_type when PERF_SAMPLE_BRANCH is set * Values to program into branch_sample_type when PERF_SAMPLE_BRANCH is set.
* *
* If the user does not pass priv level information via branch_sample_type, * If the user does not pass priv level information via branch_sample_type,
* the kernel uses the event's priv level. Branch and event priv levels do * the kernel uses the event's priv level. Branch and event priv levels do
@@ -191,7 +195,7 @@ enum perf_branch_sample_type_shift {
PERF_SAMPLE_BRANCH_NO_TX_SHIFT = 9, /* not in transaction */ PERF_SAMPLE_BRANCH_NO_TX_SHIFT = 9, /* not in transaction */
PERF_SAMPLE_BRANCH_COND_SHIFT = 10, /* conditional branches */ PERF_SAMPLE_BRANCH_COND_SHIFT = 10, /* conditional branches */
PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT = 11, /* call/ret stack */ PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT = 11, /* CALL/RET stack */
PERF_SAMPLE_BRANCH_IND_JUMP_SHIFT = 12, /* indirect jumps */ PERF_SAMPLE_BRANCH_IND_JUMP_SHIFT = 12, /* indirect jumps */
PERF_SAMPLE_BRANCH_CALL_SHIFT = 13, /* direct call */ PERF_SAMPLE_BRANCH_CALL_SHIFT = 13, /* direct call */
@@ -230,8 +234,7 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT, PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT, PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
PERF_SAMPLE_BRANCH_TYPE_SAVE = PERF_SAMPLE_BRANCH_TYPE_SAVE = 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
PERF_SAMPLE_BRANCH_HW_INDEX = 1U << PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT, PERF_SAMPLE_BRANCH_HW_INDEX = 1U << PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT,
@@ -243,30 +246,30 @@ enum perf_branch_sample_type {
}; };
/* /*
* Common flow change classification * Common control flow change classifications:
*/ */
enum { enum {
PERF_BR_UNKNOWN = 0, /* unknown */ PERF_BR_UNKNOWN = 0, /* Unknown */
PERF_BR_COND = 1, /* conditional */ PERF_BR_COND = 1, /* Conditional */
PERF_BR_UNCOND = 2, /* unconditional */ PERF_BR_UNCOND = 2, /* Unconditional */
PERF_BR_IND = 3, /* indirect */ PERF_BR_IND = 3, /* Indirect */
PERF_BR_CALL = 4, /* function call */ PERF_BR_CALL = 4, /* Function call */
PERF_BR_IND_CALL = 5, /* indirect function call */ PERF_BR_IND_CALL = 5, /* Indirect function call */
PERF_BR_RET = 6, /* function return */ PERF_BR_RET = 6, /* Function return */
PERF_BR_SYSCALL = 7, /* syscall */ PERF_BR_SYSCALL = 7, /* Syscall */
PERF_BR_SYSRET = 8, /* syscall return */ PERF_BR_SYSRET = 8, /* Syscall return */
PERF_BR_COND_CALL = 9, /* conditional function call */ PERF_BR_COND_CALL = 9, /* Conditional function call */
PERF_BR_COND_RET = 10, /* conditional function return */ PERF_BR_COND_RET = 10, /* Conditional function return */
PERF_BR_ERET = 11, /* exception return */ PERF_BR_ERET = 11, /* Exception return */
PERF_BR_IRQ = 12, /* irq */ PERF_BR_IRQ = 12, /* IRQ */
PERF_BR_SERROR = 13, /* system error */ PERF_BR_SERROR = 13, /* System error */
PERF_BR_NO_TX = 14, /* not in transaction */ PERF_BR_NO_TX = 14, /* Not in transaction */
PERF_BR_EXTEND_ABI = 15, /* extend ABI */ PERF_BR_EXTEND_ABI = 15, /* Extend ABI */
PERF_BR_MAX, PERF_BR_MAX,
}; };
/* /*
* Common branch speculation outcome classification * Common branch speculation outcome classifications:
*/ */
enum { enum {
PERF_BR_SPEC_NA = 0, /* Not available */ PERF_BR_SPEC_NA = 0, /* Not available */
@@ -323,7 +326,7 @@ enum {
PERF_TXN_ELISION = (1 << 0), /* From elision */ PERF_TXN_ELISION = (1 << 0), /* From elision */
PERF_TXN_TRANSACTION = (1 << 1), /* From transaction */ PERF_TXN_TRANSACTION = (1 << 1), /* From transaction */
PERF_TXN_SYNC = (1 << 2), /* Instruction is related */ PERF_TXN_SYNC = (1 << 2), /* Instruction is related */
PERF_TXN_ASYNC = (1 << 3), /* Instruction not related */ PERF_TXN_ASYNC = (1 << 3), /* Instruction is not related */
PERF_TXN_RETRY = (1 << 4), /* Retry possible */ PERF_TXN_RETRY = (1 << 4), /* Retry possible */
PERF_TXN_CONFLICT = (1 << 5), /* Conflict abort */ PERF_TXN_CONFLICT = (1 << 5), /* Conflict abort */
PERF_TXN_CAPACITY_WRITE = (1 << 6), /* Capacity write abort */ PERF_TXN_CAPACITY_WRITE = (1 << 6), /* Capacity write abort */
@@ -331,7 +334,7 @@ enum {
PERF_TXN_MAX = (1 << 8), /* non-ABI */ PERF_TXN_MAX = (1 << 8), /* non-ABI */
/* bits 32..63 are reserved for the abort code */ /* Bits 32..63 are reserved for the abort code */
PERF_TXN_ABORT_MASK = (0xffffffffULL << 32), PERF_TXN_ABORT_MASK = (0xffffffffULL << 32),
PERF_TXN_ABORT_SHIFT = 32, PERF_TXN_ABORT_SHIFT = 32,
@@ -369,24 +372,22 @@ enum perf_event_read_format {
PERF_FORMAT_MAX = 1U << 5, /* non-ABI */ PERF_FORMAT_MAX = 1U << 5, /* non-ABI */
}; };
#define PERF_ATTR_SIZE_VER0 64 /* sizeof first published struct */ #define PERF_ATTR_SIZE_VER0 64 /* Size of first published 'struct perf_event_attr' */
#define PERF_ATTR_SIZE_VER1 72 /* add: config2 */ #define PERF_ATTR_SIZE_VER1 72 /* Add: config2 */
#define PERF_ATTR_SIZE_VER2 80 /* add: branch_sample_type */ #define PERF_ATTR_SIZE_VER2 80 /* Add: branch_sample_type */
#define PERF_ATTR_SIZE_VER3 96 /* add: sample_regs_user */ #define PERF_ATTR_SIZE_VER3 96 /* Add: sample_regs_user */
/* add: sample_stack_user */ /* Add: sample_stack_user */
#define PERF_ATTR_SIZE_VER4 104 /* add: sample_regs_intr */ #define PERF_ATTR_SIZE_VER4 104 /* Add: sample_regs_intr */
#define PERF_ATTR_SIZE_VER5 112 /* add: aux_watermark */ #define PERF_ATTR_SIZE_VER5 112 /* Add: aux_watermark */
#define PERF_ATTR_SIZE_VER6 120 /* add: aux_sample_size */ #define PERF_ATTR_SIZE_VER6 120 /* Add: aux_sample_size */
#define PERF_ATTR_SIZE_VER7 128 /* add: sig_data */ #define PERF_ATTR_SIZE_VER7 128 /* Add: sig_data */
#define PERF_ATTR_SIZE_VER8 136 /* add: config3 */ #define PERF_ATTR_SIZE_VER8 136 /* Add: config3 */
/* /*
* Hardware event_id to monitor via a performance monitoring event: * 'struct perf_event_attr' contains various attributes that define
* * a performance event - most of them hardware related configuration
* @sample_max_stack: Max number of frame pointers in a callchain, * details, but also a lot of behavioral switches and values implemented
* should be < /proc/sys/kernel/perf_event_max_stack * by the kernel.
* Max number of entries of branch stack
* should be < hardware limit
*/ */
struct perf_event_attr { struct perf_event_attr {
@@ -396,7 +397,7 @@ struct perf_event_attr {
__u32 type; __u32 type;
/* /*
* Size of the attr structure, for fwd/bwd compat. * Size of the attr structure, for forward/backwards compatibility.
*/ */
__u32 size; __u32 size;
@@ -451,21 +452,21 @@ struct perf_event_attr {
comm_exec : 1, /* flag comm events that are due to an exec */ comm_exec : 1, /* flag comm events that are due to an exec */
use_clockid : 1, /* use @clockid for time fields */ use_clockid : 1, /* use @clockid for time fields */
context_switch : 1, /* context switch data */ context_switch : 1, /* context switch data */
write_backward : 1, /* Write ring buffer from end to beginning */ write_backward : 1, /* write ring buffer from end to beginning */
namespaces : 1, /* include namespaces data */ namespaces : 1, /* include namespaces data */
ksymbol : 1, /* include ksymbol events */ ksymbol : 1, /* include ksymbol events */
bpf_event : 1, /* include bpf events */ bpf_event : 1, /* include BPF events */
aux_output : 1, /* generate AUX records instead of events */ aux_output : 1, /* generate AUX records instead of events */
cgroup : 1, /* include cgroup events */ cgroup : 1, /* include cgroup events */
text_poke : 1, /* include text poke events */ text_poke : 1, /* include text poke events */
build_id : 1, /* use build id in mmap2 events */ build_id : 1, /* use build ID in mmap2 events */
inherit_thread : 1, /* children only inherit if cloned with CLONE_THREAD */ inherit_thread : 1, /* children only inherit if cloned with CLONE_THREAD */
remove_on_exec : 1, /* event is removed from task on exec */ remove_on_exec : 1, /* event is removed from task on exec */
sigtrap : 1, /* send synchronous SIGTRAP on event */ sigtrap : 1, /* send synchronous SIGTRAP on event */
__reserved_1 : 26; __reserved_1 : 26;
union { union {
__u32 wakeup_events; /* wakeup every n events */ __u32 wakeup_events; /* wake up every n events */
__u32 wakeup_watermark; /* bytes before wakeup */ __u32 wakeup_watermark; /* bytes before wakeup */
}; };
@@ -510,7 +511,16 @@ struct perf_event_attr {
* Wakeup watermark for AUX area * Wakeup watermark for AUX area
*/ */
__u32 aux_watermark; __u32 aux_watermark;
/*
* Max number of frame pointers in a callchain, should be
* lower than /proc/sys/kernel/perf_event_max_stack.
*
* Max number of entries of branch stack should be lower
* than the hardware limit.
*/
__u16 sample_max_stack; __u16 sample_max_stack;
__u16 __reserved_2; __u16 __reserved_2;
__u32 aux_sample_size; __u32 aux_sample_size;
@@ -537,7 +547,7 @@ struct perf_event_attr {
/* /*
* Structure used by below PERF_EVENT_IOC_QUERY_BPF command * Structure used by below PERF_EVENT_IOC_QUERY_BPF command
* to query bpf programs attached to the same perf tracepoint * to query BPF programs attached to the same perf tracepoint
* as the given perf event. * as the given perf event.
*/ */
struct perf_event_query_bpf { struct perf_event_query_bpf {
@@ -563,14 +573,14 @@ struct perf_event_query_bpf {
#define PERF_EVENT_IOC_DISABLE _IO ('$', 1) #define PERF_EVENT_IOC_DISABLE _IO ('$', 1)
#define PERF_EVENT_IOC_REFRESH _IO ('$', 2) #define PERF_EVENT_IOC_REFRESH _IO ('$', 2)
#define PERF_EVENT_IOC_RESET _IO ('$', 3) #define PERF_EVENT_IOC_RESET _IO ('$', 3)
#define PERF_EVENT_IOC_PERIOD _IOW('$', 4, __u64) #define PERF_EVENT_IOC_PERIOD _IOW ('$', 4, __u64)
#define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5) #define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5)
#define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *) #define PERF_EVENT_IOC_SET_FILTER _IOW ('$', 6, char *)
#define PERF_EVENT_IOC_ID _IOR('$', 7, __u64 *) #define PERF_EVENT_IOC_ID _IOR ('$', 7, __u64 *)
#define PERF_EVENT_IOC_SET_BPF _IOW('$', 8, __u32) #define PERF_EVENT_IOC_SET_BPF _IOW ('$', 8, __u32)
#define PERF_EVENT_IOC_PAUSE_OUTPUT _IOW('$', 9, __u32) #define PERF_EVENT_IOC_PAUSE_OUTPUT _IOW ('$', 9, __u32)
#define PERF_EVENT_IOC_QUERY_BPF _IOWR('$', 10, struct perf_event_query_bpf *) #define PERF_EVENT_IOC_QUERY_BPF _IOWR('$', 10, struct perf_event_query_bpf *)
#define PERF_EVENT_IOC_MODIFY_ATTRIBUTES _IOW('$', 11, struct perf_event_attr *) #define PERF_EVENT_IOC_MODIFY_ATTRIBUTES _IOW ('$', 11, struct perf_event_attr *)
enum perf_event_ioc_flags { enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0, PERF_IOC_FLAG_GROUP = 1U << 0,
@@ -584,7 +594,7 @@ struct perf_event_mmap_page {
__u32 compat_version; /* lowest version this is compat with */ __u32 compat_version; /* lowest version this is compat with */
/* /*
* Bits needed to read the hw events in user-space. * Bits needed to read the HW events in user-space.
* *
* u32 seq, time_mult, time_shift, index, width; * u32 seq, time_mult, time_shift, index, width;
* u64 count, enabled, running; * u64 count, enabled, running;
@@ -622,7 +632,7 @@ struct perf_event_mmap_page {
__u32 index; /* hardware event identifier */ __u32 index; /* hardware event identifier */
__s64 offset; /* add to hardware event value */ __s64 offset; /* add to hardware event value */
__u64 time_enabled; /* time event active */ __u64 time_enabled; /* time event active */
__u64 time_running; /* time event on cpu */ __u64 time_running; /* time event on CPU */
union { union {
__u64 capabilities; __u64 capabilities;
struct { struct {
@@ -650,7 +660,7 @@ struct perf_event_mmap_page {
/* /*
* If cap_usr_time the below fields can be used to compute the time * If cap_usr_time the below fields can be used to compute the time
* delta since time_enabled (in ns) using rdtsc or similar. * delta since time_enabled (in ns) using RDTSC or similar.
* *
* u64 quot, rem; * u64 quot, rem;
* u64 delta; * u64 delta;
@@ -723,7 +733,7 @@ struct perf_event_mmap_page {
* after reading this value. * after reading this value.
* *
* When the mapping is PROT_WRITE the @data_tail value should be * When the mapping is PROT_WRITE the @data_tail value should be
* written by userspace to reflect the last read data, after issueing * written by user-space to reflect the last read data, after issuing
* an smp_mb() to separate the data read from the ->data_tail store. * an smp_mb() to separate the data read from the ->data_tail store.
* In this case the kernel will not over-write unread data. * In this case the kernel will not over-write unread data.
* *
@@ -739,7 +749,7 @@ struct perf_event_mmap_page {
/* /*
* AUX area is defined by aux_{offset,size} fields that should be set * AUX area is defined by aux_{offset,size} fields that should be set
* by the userspace, so that * by the user-space, so that
* *
* aux_offset >= data_offset + data_size * aux_offset >= data_offset + data_size
* *
@@ -813,7 +823,7 @@ struct perf_event_mmap_page {
* Indicates that thread was preempted in TASK_RUNNING state. * Indicates that thread was preempted in TASK_RUNNING state.
* *
* PERF_RECORD_MISC_MMAP_BUILD_ID: * PERF_RECORD_MISC_MMAP_BUILD_ID:
* Indicates that mmap2 event carries build id data. * Indicates that mmap2 event carries build ID data.
*/ */
#define PERF_RECORD_MISC_EXACT_IP (1 << 14) #define PERF_RECORD_MISC_EXACT_IP (1 << 14)
#define PERF_RECORD_MISC_SWITCH_OUT_PREEMPT (1 << 14) #define PERF_RECORD_MISC_SWITCH_OUT_PREEMPT (1 << 14)
@@ -874,7 +884,7 @@ enum perf_event_type {
/* /*
* The MMAP events record the PROT_EXEC mappings so that we can * The MMAP events record the PROT_EXEC mappings so that we can
* correlate userspace IPs to code. They have the following structure: * correlate user-space IPs to code. They have the following structure:
* *
* struct { * struct {
* struct perf_event_header header; * struct perf_event_header header;
@@ -1035,10 +1045,11 @@ enum perf_event_type {
* { u64 abi; # enum perf_sample_regs_abi * { u64 abi; # enum perf_sample_regs_abi
* u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
* { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR * { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR
* { u64 size; * { u64 cgroup;} && PERF_SAMPLE_CGROUP
* char data[size]; } && PERF_SAMPLE_AUX
* { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE * { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE
* { u64 code_page_size;} && PERF_SAMPLE_CODE_PAGE_SIZE * { u64 code_page_size;} && PERF_SAMPLE_CODE_PAGE_SIZE
* { u64 size;
* char data[size]; } && PERF_SAMPLE_AUX
* }; * };
*/ */
PERF_RECORD_SAMPLE = 9, PERF_RECORD_SAMPLE = 9,
@@ -1167,7 +1178,7 @@ enum perf_event_type {
PERF_RECORD_KSYMBOL = 17, PERF_RECORD_KSYMBOL = 17,
/* /*
* Record bpf events: * Record BPF events:
* enum perf_bpf_event_type { * enum perf_bpf_event_type {
* PERF_BPF_EVENT_UNKNOWN = 0, * PERF_BPF_EVENT_UNKNOWN = 0,
* PERF_BPF_EVENT_PROG_LOAD = 1, * PERF_BPF_EVENT_PROG_LOAD = 1,
@@ -1269,10 +1280,10 @@ enum perf_callchain_context {
/** /**
* PERF_RECORD_AUX::flags bits * PERF_RECORD_AUX::flags bits
*/ */
#define PERF_AUX_FLAG_TRUNCATED 0x01 /* record was truncated to fit */ #define PERF_AUX_FLAG_TRUNCATED 0x0001 /* Record was truncated to fit */
#define PERF_AUX_FLAG_OVERWRITE 0x02 /* snapshot from overwrite mode */ #define PERF_AUX_FLAG_OVERWRITE 0x0002 /* Snapshot from overwrite mode */
#define PERF_AUX_FLAG_PARTIAL 0x04 /* record contains gaps */ #define PERF_AUX_FLAG_PARTIAL 0x0004 /* Record contains gaps */
#define PERF_AUX_FLAG_COLLISION 0x08 /* sample collided with another */ #define PERF_AUX_FLAG_COLLISION 0x0008 /* Sample collided with another */
#define PERF_AUX_FLAG_PMU_FORMAT_TYPE_MASK 0xff00 /* PMU specific trace format type */ #define PERF_AUX_FLAG_PMU_FORMAT_TYPE_MASK 0xff00 /* PMU specific trace format type */
/* CoreSight PMU AUX buffer formats */ /* CoreSight PMU AUX buffer formats */
@@ -1281,137 +1292,137 @@ enum perf_callchain_context {
#define PERF_FLAG_FD_NO_GROUP (1UL << 0) #define PERF_FLAG_FD_NO_GROUP (1UL << 0)
#define PERF_FLAG_FD_OUTPUT (1UL << 1) #define PERF_FLAG_FD_OUTPUT (1UL << 1)
#define PERF_FLAG_PID_CGROUP (1UL << 2) /* pid=cgroup id, per-cpu mode only */ #define PERF_FLAG_PID_CGROUP (1UL << 2) /* pid=cgroup ID, per-CPU mode only */
#define PERF_FLAG_FD_CLOEXEC (1UL << 3) /* O_CLOEXEC */ #define PERF_FLAG_FD_CLOEXEC (1UL << 3) /* O_CLOEXEC */
#if defined(__LITTLE_ENDIAN_BITFIELD) #if defined(__LITTLE_ENDIAN_BITFIELD)
union perf_mem_data_src { union perf_mem_data_src {
__u64 val; __u64 val;
struct { struct {
__u64 mem_op:5, /* type of opcode */ __u64 mem_op : 5, /* Type of opcode */
mem_lvl:14, /* memory hierarchy level */ mem_lvl : 14, /* Memory hierarchy level */
mem_snoop:5, /* snoop mode */ mem_snoop : 5, /* Snoop mode */
mem_lock:2, /* lock instr */ mem_lock : 2, /* Lock instr */
mem_dtlb:7, /* tlb access */ mem_dtlb : 7, /* TLB access */
mem_lvl_num:4, /* memory hierarchy level number */ mem_lvl_num : 4, /* Memory hierarchy level number */
mem_remote:1, /* remote */ mem_remote : 1, /* Remote */
mem_snoopx:2, /* snoop mode, ext */ mem_snoopx : 2, /* Snoop mode, ext */
mem_blk:3, /* access blocked */ mem_blk : 3, /* Access blocked */
mem_hops:3, /* hop level */ mem_hops : 3, /* Hop level */
mem_rsvd:18; mem_rsvd : 18;
}; };
}; };
#elif defined(__BIG_ENDIAN_BITFIELD) #elif defined(__BIG_ENDIAN_BITFIELD)
union perf_mem_data_src { union perf_mem_data_src {
__u64 val; __u64 val;
struct { struct {
__u64 mem_rsvd:18, __u64 mem_rsvd : 18,
mem_hops:3, /* hop level */ mem_hops : 3, /* Hop level */
mem_blk:3, /* access blocked */ mem_blk : 3, /* Access blocked */
mem_snoopx:2, /* snoop mode, ext */ mem_snoopx : 2, /* Snoop mode, ext */
mem_remote:1, /* remote */ mem_remote : 1, /* Remote */
mem_lvl_num:4, /* memory hierarchy level number */ mem_lvl_num : 4, /* Memory hierarchy level number */
mem_dtlb:7, /* tlb access */ mem_dtlb : 7, /* TLB access */
mem_lock:2, /* lock instr */ mem_lock : 2, /* Lock instr */
mem_snoop:5, /* snoop mode */ mem_snoop : 5, /* Snoop mode */
mem_lvl:14, /* memory hierarchy level */ mem_lvl : 14, /* Memory hierarchy level */
mem_op:5; /* type of opcode */ mem_op : 5; /* Type of opcode */
}; };
}; };
#else #else
#error "Unknown endianness" # error "Unknown endianness"
#endif #endif
/* type of opcode (load/store/prefetch,code) */ /* Type of memory opcode: */
#define PERF_MEM_OP_NA 0x01 /* not available */ #define PERF_MEM_OP_NA 0x0001 /* Not available */
#define PERF_MEM_OP_LOAD 0x02 /* load instruction */ #define PERF_MEM_OP_LOAD 0x0002 /* Load instruction */
#define PERF_MEM_OP_STORE 0x04 /* store instruction */ #define PERF_MEM_OP_STORE 0x0004 /* Store instruction */
#define PERF_MEM_OP_PFETCH 0x08 /* prefetch */ #define PERF_MEM_OP_PFETCH 0x0008 /* Prefetch */
#define PERF_MEM_OP_EXEC 0x10 /* code (execution) */ #define PERF_MEM_OP_EXEC 0x0010 /* Code (execution) */
#define PERF_MEM_OP_SHIFT 0 #define PERF_MEM_OP_SHIFT 0
/* /*
* PERF_MEM_LVL_* namespace being depricated to some extent in the * The PERF_MEM_LVL_* namespace is being deprecated to some extent in
* favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields. * favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields.
* Supporting this namespace inorder to not break defined ABIs. * We support this namespace in order to not break defined ABIs.
* *
* memory hierarchy (memory level, hit or miss) * Memory hierarchy (memory level, hit or miss)
*/ */
#define PERF_MEM_LVL_NA 0x01 /* not available */ #define PERF_MEM_LVL_NA 0x0001 /* Not available */
#define PERF_MEM_LVL_HIT 0x02 /* hit level */ #define PERF_MEM_LVL_HIT 0x0002 /* Hit level */
#define PERF_MEM_LVL_MISS 0x04 /* miss level */ #define PERF_MEM_LVL_MISS 0x0004 /* Miss level */
#define PERF_MEM_LVL_L1 0x08 /* L1 */ #define PERF_MEM_LVL_L1 0x0008 /* L1 */
#define PERF_MEM_LVL_LFB 0x10 /* Line Fill Buffer */ #define PERF_MEM_LVL_LFB 0x0010 /* Line Fill Buffer */
#define PERF_MEM_LVL_L2 0x20 /* L2 */ #define PERF_MEM_LVL_L2 0x0020 /* L2 */
#define PERF_MEM_LVL_L3 0x40 /* L3 */ #define PERF_MEM_LVL_L3 0x0040 /* L3 */
#define PERF_MEM_LVL_LOC_RAM 0x80 /* Local DRAM */ #define PERF_MEM_LVL_LOC_RAM 0x0080 /* Local DRAM */
#define PERF_MEM_LVL_REM_RAM1 0x100 /* Remote DRAM (1 hop) */ #define PERF_MEM_LVL_REM_RAM1 0x0100 /* Remote DRAM (1 hop) */
#define PERF_MEM_LVL_REM_RAM2 0x200 /* Remote DRAM (2 hops) */ #define PERF_MEM_LVL_REM_RAM2 0x0200 /* Remote DRAM (2 hops) */
#define PERF_MEM_LVL_REM_CCE1 0x400 /* Remote Cache (1 hop) */ #define PERF_MEM_LVL_REM_CCE1 0x0400 /* Remote Cache (1 hop) */
#define PERF_MEM_LVL_REM_CCE2 0x800 /* Remote Cache (2 hops) */ #define PERF_MEM_LVL_REM_CCE2 0x0800 /* Remote Cache (2 hops) */
#define PERF_MEM_LVL_IO 0x1000 /* I/O memory */ #define PERF_MEM_LVL_IO 0x1000 /* I/O memory */
#define PERF_MEM_LVL_UNC 0x2000 /* Uncached memory */ #define PERF_MEM_LVL_UNC 0x2000 /* Uncached memory */
#define PERF_MEM_LVL_SHIFT 5 #define PERF_MEM_LVL_SHIFT 5
#define PERF_MEM_REMOTE_REMOTE 0x01 /* Remote */ #define PERF_MEM_REMOTE_REMOTE 0x0001 /* Remote */
#define PERF_MEM_REMOTE_SHIFT 37 #define PERF_MEM_REMOTE_SHIFT 37
#define PERF_MEM_LVLNUM_L1 0x01 /* L1 */ #define PERF_MEM_LVLNUM_L1 0x0001 /* L1 */
#define PERF_MEM_LVLNUM_L2 0x02 /* L2 */ #define PERF_MEM_LVLNUM_L2 0x0002 /* L2 */
#define PERF_MEM_LVLNUM_L3 0x03 /* L3 */ #define PERF_MEM_LVLNUM_L3 0x0003 /* L3 */
#define PERF_MEM_LVLNUM_L4 0x04 /* L4 */ #define PERF_MEM_LVLNUM_L4 0x0004 /* L4 */
#define PERF_MEM_LVLNUM_L2_MHB 0x05 /* L2 Miss Handling Buffer */ #define PERF_MEM_LVLNUM_L2_MHB 0x0005 /* L2 Miss Handling Buffer */
#define PERF_MEM_LVLNUM_MSC 0x06 /* Memory-side Cache */ #define PERF_MEM_LVLNUM_MSC 0x0006 /* Memory-side Cache */
/* 0x7 available */ /* 0x007 available */
#define PERF_MEM_LVLNUM_UNC 0x08 /* Uncached */ #define PERF_MEM_LVLNUM_UNC 0x0008 /* Uncached */
#define PERF_MEM_LVLNUM_CXL 0x09 /* CXL */ #define PERF_MEM_LVLNUM_CXL 0x0009 /* CXL */
#define PERF_MEM_LVLNUM_IO 0x0a /* I/O */ #define PERF_MEM_LVLNUM_IO 0x000a /* I/O */
#define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */ #define PERF_MEM_LVLNUM_ANY_CACHE 0x000b /* Any cache */
#define PERF_MEM_LVLNUM_LFB 0x0c /* LFB / L1 Miss Handling Buffer */ #define PERF_MEM_LVLNUM_LFB 0x000c /* LFB / L1 Miss Handling Buffer */
#define PERF_MEM_LVLNUM_RAM 0x0d /* RAM */ #define PERF_MEM_LVLNUM_RAM 0x000d /* RAM */
#define PERF_MEM_LVLNUM_PMEM 0x0e /* PMEM */ #define PERF_MEM_LVLNUM_PMEM 0x000e /* PMEM */
#define PERF_MEM_LVLNUM_NA 0x0f /* N/A */ #define PERF_MEM_LVLNUM_NA 0x000f /* N/A */
#define PERF_MEM_LVLNUM_SHIFT 33 #define PERF_MEM_LVLNUM_SHIFT 33
/* snoop mode */ /* Snoop mode */
#define PERF_MEM_SNOOP_NA 0x01 /* not available */ #define PERF_MEM_SNOOP_NA 0x0001 /* Not available */
#define PERF_MEM_SNOOP_NONE 0x02 /* no snoop */ #define PERF_MEM_SNOOP_NONE 0x0002 /* No snoop */
#define PERF_MEM_SNOOP_HIT 0x04 /* snoop hit */ #define PERF_MEM_SNOOP_HIT 0x0004 /* Snoop hit */
#define PERF_MEM_SNOOP_MISS 0x08 /* snoop miss */ #define PERF_MEM_SNOOP_MISS 0x0008 /* Snoop miss */
#define PERF_MEM_SNOOP_HITM 0x10 /* snoop hit modified */ #define PERF_MEM_SNOOP_HITM 0x0010 /* Snoop hit modified */
#define PERF_MEM_SNOOP_SHIFT 19 #define PERF_MEM_SNOOP_SHIFT 19
#define PERF_MEM_SNOOPX_FWD 0x01 /* forward */ #define PERF_MEM_SNOOPX_FWD 0x0001 /* Forward */
#define PERF_MEM_SNOOPX_PEER 0x02 /* xfer from peer */ #define PERF_MEM_SNOOPX_PEER 0x0002 /* Transfer from peer */
#define PERF_MEM_SNOOPX_SHIFT 38 #define PERF_MEM_SNOOPX_SHIFT 38
/* locked instruction */ /* Locked instruction */
#define PERF_MEM_LOCK_NA 0x01 /* not available */ #define PERF_MEM_LOCK_NA 0x0001 /* Not available */
#define PERF_MEM_LOCK_LOCKED 0x02 /* locked transaction */ #define PERF_MEM_LOCK_LOCKED 0x0002 /* Locked transaction */
#define PERF_MEM_LOCK_SHIFT 24 #define PERF_MEM_LOCK_SHIFT 24
/* TLB access */ /* TLB access */
#define PERF_MEM_TLB_NA 0x01 /* not available */ #define PERF_MEM_TLB_NA 0x0001 /* Not available */
#define PERF_MEM_TLB_HIT 0x02 /* hit level */ #define PERF_MEM_TLB_HIT 0x0002 /* Hit level */
#define PERF_MEM_TLB_MISS 0x04 /* miss level */ #define PERF_MEM_TLB_MISS 0x0004 /* Miss level */
#define PERF_MEM_TLB_L1 0x08 /* L1 */ #define PERF_MEM_TLB_L1 0x0008 /* L1 */
#define PERF_MEM_TLB_L2 0x10 /* L2 */ #define PERF_MEM_TLB_L2 0x0010 /* L2 */
#define PERF_MEM_TLB_WK 0x20 /* Hardware Walker*/ #define PERF_MEM_TLB_WK 0x0020 /* Hardware Walker*/
#define PERF_MEM_TLB_OS 0x40 /* OS fault handler */ #define PERF_MEM_TLB_OS 0x0040 /* OS fault handler */
#define PERF_MEM_TLB_SHIFT 26 #define PERF_MEM_TLB_SHIFT 26
/* Access blocked */ /* Access blocked */
#define PERF_MEM_BLK_NA 0x01 /* not available */ #define PERF_MEM_BLK_NA 0x0001 /* Not available */
#define PERF_MEM_BLK_DATA 0x02 /* data could not be forwarded */ #define PERF_MEM_BLK_DATA 0x0002 /* Data could not be forwarded */
#define PERF_MEM_BLK_ADDR 0x04 /* address conflict */ #define PERF_MEM_BLK_ADDR 0x0004 /* Address conflict */
#define PERF_MEM_BLK_SHIFT 40 #define PERF_MEM_BLK_SHIFT 40
/* hop level */ /* Hop level */
#define PERF_MEM_HOPS_0 0x01 /* remote core, same node */ #define PERF_MEM_HOPS_0 0x0001 /* Remote core, same node */
#define PERF_MEM_HOPS_1 0x02 /* remote node, same socket */ #define PERF_MEM_HOPS_1 0x0002 /* Remote node, same socket */
#define PERF_MEM_HOPS_2 0x03 /* remote socket, same board */ #define PERF_MEM_HOPS_2 0x0003 /* Remote socket, same board */
#define PERF_MEM_HOPS_3 0x04 /* remote board */ #define PERF_MEM_HOPS_3 0x0004 /* Remote board */
/* 5-7 available */ /* 5-7 available */
#define PERF_MEM_HOPS_SHIFT 43 #define PERF_MEM_HOPS_SHIFT 43
@@ -1419,7 +1430,7 @@ union perf_mem_data_src {
(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT) (((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
/* /*
* single taken branch record layout: * Layout of single taken branch records:
* *
* from: source instruction (may not always be a branch insn) * from: source instruction (may not always be a branch insn)
* to: branch target * to: branch target
@@ -1438,16 +1449,16 @@ union perf_mem_data_src {
struct perf_branch_entry { struct perf_branch_entry {
__u64 from; __u64 from;
__u64 to; __u64 to;
__u64 mispred:1, /* target mispredicted */ __u64 mispred : 1, /* target mispredicted */
predicted:1,/* target predicted */ predicted : 1, /* target predicted */
in_tx:1, /* in transaction */ in_tx : 1, /* in transaction */
abort:1, /* transaction abort */ abort : 1, /* transaction abort */
cycles:16, /* cycle count to last branch */ cycles : 16, /* cycle count to last branch */
type:4, /* branch type */ type : 4, /* branch type */
spec:2, /* branch speculation info */ spec : 2, /* branch speculation info */
new_type:4, /* additional branch type */ new_type : 4, /* additional branch type */
priv:3, /* privilege level */ priv : 3, /* privilege level */
reserved:31; reserved : 31;
}; };
/* Size of used info bits in struct perf_branch_entry */ /* Size of used info bits in struct perf_branch_entry */
@@ -1468,8 +1479,8 @@ union perf_sample_weight {
__u32 var1_dw; __u32 var1_dw;
}; };
#else #else
#error "Unknown endianness" # error "Unknown endianness"
#endif #endif
}; };
#endif /* _UAPI_LINUX_PERF_EVENT_H */ #endif /* _LINUX_PERF_EVENT_H */

View File

@@ -35,43 +35,47 @@ if [[ "$SANITIZER" == undefined ]]; then
CXXFLAGS+=" $UBSAN_FLAGS" CXXFLAGS+=" $UBSAN_FLAGS"
fi fi
export SKIP_LIBELF_REBUILD=${SKIP_LIBELF_REBUILD:=''}
# Ideally libbelf should be built using release tarballs available # Ideally libbelf should be built using release tarballs available
# at https://sourceware.org/elfutils/ftp/. Unfortunately sometimes they # at https://sourceware.org/elfutils/ftp/. Unfortunately sometimes they
# fail to compile (for example, elfutils-0.185 fails to compile with LDFLAGS enabled # fail to compile (for example, elfutils-0.185 fails to compile with LDFLAGS enabled
# due to https://bugs.gentoo.org/794601) so let's just point the script to # due to https://bugs.gentoo.org/794601) so let's just point the script to
# commits referring to versions of libelf that actually can be built # commits referring to versions of libelf that actually can be built
rm -rf elfutils if [[ ! -e elfutils || "$SKIP_LIBELF_REBUILD" == "" ]]; then
git clone https://sourceware.org/git/elfutils.git rm -rf elfutils
( git clone https://sourceware.org/git/elfutils.git
cd elfutils (
git checkout 67a187d4c1790058fc7fd218317851cb68bb087c cd elfutils
git log --oneline -1 git checkout 67a187d4c1790058fc7fd218317851cb68bb087c
git log --oneline -1
# ASan isn't compatible with -Wl,--no-undefined: https://github.com/google/sanitizers/issues/380 # ASan isn't compatible with -Wl,--no-undefined: https://github.com/google/sanitizers/issues/380
sed -i 's/^\(NO_UNDEFINED=\).*/\1/' configure.ac sed -i 's/^\(NO_UNDEFINED=\).*/\1/' configure.ac
# ASan isn't compatible with -Wl,-z,defs either: # ASan isn't compatible with -Wl,-z,defs either:
# https://clang.llvm.org/docs/AddressSanitizer.html#usage # https://clang.llvm.org/docs/AddressSanitizer.html#usage
sed -i 's/^\(ZDEFS_LDFLAGS=\).*/\1/' configure.ac sed -i 's/^\(ZDEFS_LDFLAGS=\).*/\1/' configure.ac
if [[ "$SANITIZER" == undefined ]]; then if [[ "$SANITIZER" == undefined ]]; then
# That's basicaly what --enable-sanitize-undefined does to turn off unaligned access # That's basicaly what --enable-sanitize-undefined does to turn off unaligned access
# elfutils heavily relies on on i386/x86_64 but without changing compiler flags along the way # elfutils heavily relies on on i386/x86_64 but without changing compiler flags along the way
sed -i 's/\(check_undefined_val\)=[0-9]/\1=1/' configure.ac sed -i 's/\(check_undefined_val\)=[0-9]/\1=1/' configure.ac
fi fi
autoreconf -i -f autoreconf -i -f
if ! ./configure --enable-maintainer-mode --disable-debuginfod --disable-libdebuginfod \ if ! ./configure --enable-maintainer-mode --disable-debuginfod --disable-libdebuginfod \
--disable-demangler --without-bzlib --without-lzma --without-zstd \ --disable-demangler --without-bzlib --without-lzma --without-zstd \
CC="$CC" CFLAGS="-Wno-error $CFLAGS" CXX="$CXX" CXXFLAGS="-Wno-error $CXXFLAGS" LDFLAGS="$CFLAGS"; then CC="$CC" CFLAGS="-Wno-error $CFLAGS" CXX="$CXX" CXXFLAGS="-Wno-error $CXXFLAGS" LDFLAGS="$CFLAGS"; then
cat config.log cat config.log
exit 1 exit 1
fi fi
make -C config -j$(nproc) V=1 make -C config -j$(nproc) V=1
make -C lib -j$(nproc) V=1 make -C lib -j$(nproc) V=1
make -C libelf -j$(nproc) V=1 make -C libelf -j$(nproc) V=1
) )
fi
make -C src BUILD_STATIC_ONLY=y V=1 clean make -C src BUILD_STATIC_ONLY=y V=1 clean
make -C src -j$(nproc) CFLAGS="-I$(pwd)/elfutils/libelf $CFLAGS" BUILD_STATIC_ONLY=y V=1 make -C src -j$(nproc) CFLAGS="-I$(pwd)/elfutils/libelf $CFLAGS" BUILD_STATIC_ONLY=y V=1

View File

@@ -63,6 +63,7 @@ LIBBPF_TREE_FILTER="mkdir -p __libbpf/include/uapi/linux __libbpf/include/tools
for p in "${!PATH_MAP[@]}"; do for p in "${!PATH_MAP[@]}"; do
LIBBPF_TREE_FILTER+="git mv -kf ${p} __libbpf/${PATH_MAP[${p}]} && "$'\\\n' LIBBPF_TREE_FILTER+="git mv -kf ${p} __libbpf/${PATH_MAP[${p}]} && "$'\\\n'
done done
LIBBPF_TREE_FILTER+="find __libbpf/include/uapi/linux -type f -exec sed -i 's/_UAPI\(__\?LINUX\)/\1/' {} + && "$'\\\n'
LIBBPF_TREE_FILTER+="git rm --ignore-unmatch -f __libbpf/src/{Makefile,Build,test_libbpf.c,.gitignore} >/dev/null" LIBBPF_TREE_FILTER+="git rm --ignore-unmatch -f __libbpf/src/{Makefile,Build,test_libbpf.c,.gitignore} >/dev/null"
cd_to() cd_to()
@@ -347,7 +348,7 @@ diff -u ${TMP_DIR}/linux-view.ls ${TMP_DIR}/github-view.ls
echo "Comparing file contents..." echo "Comparing file contents..."
CONSISTENT=1 CONSISTENT=1
for F in $(cat ${TMP_DIR}/linux-view.ls); do for F in $(cat ${TMP_DIR}/linux-view.ls); do
if ! diff -u "${LINUX_ABS_DIR}/${F}" "${GITHUB_ABS_DIR}/${F}"; then if ! diff -u <(sed 's/_UAPI\(__\?LINUX\)/\1/' "${LINUX_ABS_DIR}/${F}") "${GITHUB_ABS_DIR}/${F}"; then
echo "${LINUX_ABS_DIR}/${F} and ${GITHUB_ABS_DIR}/${F} are different!" echo "${LINUX_ABS_DIR}/${F} and ${GITHUB_ABS_DIR}/${F} are different!"
CONSISTENT=0 CONSISTENT=0
fi fi

View File

@@ -9,7 +9,7 @@ else
endif endif
LIBBPF_MAJOR_VERSION := 1 LIBBPF_MAJOR_VERSION := 1
LIBBPF_MINOR_VERSION := 6 LIBBPF_MINOR_VERSION := 7
LIBBPF_PATCH_VERSION := 0 LIBBPF_PATCH_VERSION := 0
LIBBPF_VERSION := $(LIBBPF_MAJOR_VERSION).$(LIBBPF_MINOR_VERSION).$(LIBBPF_PATCH_VERSION) LIBBPF_VERSION := $(LIBBPF_MAJOR_VERSION).$(LIBBPF_MINOR_VERSION).$(LIBBPF_PATCH_VERSION)
LIBBPF_MAJMIN_VERSION := $(LIBBPF_MAJOR_VERSION).$(LIBBPF_MINOR_VERSION).0 LIBBPF_MAJMIN_VERSION := $(LIBBPF_MAJOR_VERSION).$(LIBBPF_MINOR_VERSION).0

View File

@@ -837,6 +837,50 @@ int bpf_link_create(int prog_fd, int target_fd,
if (!OPTS_ZEROED(opts, netkit)) if (!OPTS_ZEROED(opts, netkit))
return libbpf_err(-EINVAL); return libbpf_err(-EINVAL);
break; break;
case BPF_CGROUP_INET_INGRESS:
case BPF_CGROUP_INET_EGRESS:
case BPF_CGROUP_INET_SOCK_CREATE:
case BPF_CGROUP_INET_SOCK_RELEASE:
case BPF_CGROUP_INET4_BIND:
case BPF_CGROUP_INET6_BIND:
case BPF_CGROUP_INET4_POST_BIND:
case BPF_CGROUP_INET6_POST_BIND:
case BPF_CGROUP_INET4_CONNECT:
case BPF_CGROUP_INET6_CONNECT:
case BPF_CGROUP_UNIX_CONNECT:
case BPF_CGROUP_INET4_GETPEERNAME:
case BPF_CGROUP_INET6_GETPEERNAME:
case BPF_CGROUP_UNIX_GETPEERNAME:
case BPF_CGROUP_INET4_GETSOCKNAME:
case BPF_CGROUP_INET6_GETSOCKNAME:
case BPF_CGROUP_UNIX_GETSOCKNAME:
case BPF_CGROUP_UDP4_SENDMSG:
case BPF_CGROUP_UDP6_SENDMSG:
case BPF_CGROUP_UNIX_SENDMSG:
case BPF_CGROUP_UDP4_RECVMSG:
case BPF_CGROUP_UDP6_RECVMSG:
case BPF_CGROUP_UNIX_RECVMSG:
case BPF_CGROUP_SOCK_OPS:
case BPF_CGROUP_DEVICE:
case BPF_CGROUP_SYSCTL:
case BPF_CGROUP_GETSOCKOPT:
case BPF_CGROUP_SETSOCKOPT:
case BPF_LSM_CGROUP:
relative_fd = OPTS_GET(opts, cgroup.relative_fd, 0);
relative_id = OPTS_GET(opts, cgroup.relative_id, 0);
if (relative_fd && relative_id)
return libbpf_err(-EINVAL);
if (relative_id) {
attr.link_create.cgroup.relative_id = relative_id;
attr.link_create.flags |= BPF_F_ID;
} else {
attr.link_create.cgroup.relative_fd = relative_fd;
}
attr.link_create.cgroup.expected_revision =
OPTS_GET(opts, cgroup.expected_revision, 0);
if (!OPTS_ZEROED(opts, cgroup))
return libbpf_err(-EINVAL);
break;
default: default:
if (!OPTS_ZEROED(opts, flags)) if (!OPTS_ZEROED(opts, flags))
return libbpf_err(-EINVAL); return libbpf_err(-EINVAL);
@@ -1331,3 +1375,23 @@ int bpf_token_create(int bpffs_fd, struct bpf_token_create_opts *opts)
fd = sys_bpf_fd(BPF_TOKEN_CREATE, &attr, attr_sz); fd = sys_bpf_fd(BPF_TOKEN_CREATE, &attr, attr_sz);
return libbpf_err_errno(fd); return libbpf_err_errno(fd);
} }
int bpf_prog_stream_read(int prog_fd, __u32 stream_id, void *buf, __u32 buf_len,
struct bpf_prog_stream_read_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, prog_stream_read);
union bpf_attr attr;
int err;
if (!OPTS_VALID(opts, bpf_prog_stream_read_opts))
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.prog_stream_read.stream_buf = ptr_to_u64(buf);
attr.prog_stream_read.stream_buf_len = buf_len;
attr.prog_stream_read.stream_id = stream_id;
attr.prog_stream_read.prog_fd = prog_fd;
err = sys_bpf(BPF_PROG_STREAM_READ_BY_FD, &attr, attr_sz);
return libbpf_err_errno(err);
}

View File

@@ -438,6 +438,11 @@ struct bpf_link_create_opts {
__u32 relative_id; __u32 relative_id;
__u64 expected_revision; __u64 expected_revision;
} netkit; } netkit;
struct {
__u32 relative_fd;
__u32 relative_id;
__u64 expected_revision;
} cgroup;
}; };
size_t :0; size_t :0;
}; };
@@ -704,6 +709,27 @@ struct bpf_token_create_opts {
LIBBPF_API int bpf_token_create(int bpffs_fd, LIBBPF_API int bpf_token_create(int bpffs_fd,
struct bpf_token_create_opts *opts); struct bpf_token_create_opts *opts);
struct bpf_prog_stream_read_opts {
size_t sz;
size_t :0;
};
#define bpf_prog_stream_read_opts__last_field sz
/**
* @brief **bpf_prog_stream_read** reads data from the BPF stream of a given BPF
* program.
*
* @param prog_fd FD for the BPF program whose BPF stream is to be read.
* @param stream_id ID of the BPF stream to be read.
* @param buf Buffer to read data into from the BPF stream.
* @param buf_len Maximum number of bytes to read from the BPF stream.
* @param opts optional options, can be NULL
*
* @return The number of bytes read, on success; negative error code, otherwise
* (errno is also set to the error code)
*/
LIBBPF_API int bpf_prog_stream_read(int prog_fd, __u32 stream_id, void *buf, __u32 buf_len,
struct bpf_prog_stream_read_opts *opts);
#ifdef __cplusplus #ifdef __cplusplus
} /* extern "C" */ } /* extern "C" */
#endif #endif

View File

@@ -286,6 +286,7 @@ static long (* const bpf_l3_csum_replace)(struct __sk_buff *skb, __u32 offset, _
* for updates resulting in a null checksum the value is set to * for updates resulting in a null checksum the value is set to
* **CSUM_MANGLED_0** instead. Flag **BPF_F_PSEUDO_HDR** indicates * **CSUM_MANGLED_0** instead. Flag **BPF_F_PSEUDO_HDR** indicates
* that the modified header field is part of the pseudo-header. * that the modified header field is part of the pseudo-header.
* Flag **BPF_F_IPV6** should be set for IPv6 packets.
* *
* This helper works in combination with **bpf_csum_diff**\ (), * This helper works in combination with **bpf_csum_diff**\ (),
* which does not update the checksum in-place, but offers more * which does not update the checksum in-place, but offers more
@@ -688,7 +689,7 @@ static __u32 (* const bpf_get_route_realm)(struct __sk_buff *skb) = (void *) 24;
* into it. An example is available in file * into it. An example is available in file
* *samples/bpf/trace_output_user.c* in the Linux kernel source * *samples/bpf/trace_output_user.c* in the Linux kernel source
* tree (the eBPF program counterpart is in * tree (the eBPF program counterpart is in
* *samples/bpf/trace_output_kern.c*). * *samples/bpf/trace_output.bpf.c*).
* *
* **bpf_perf_event_output**\ () achieves better performance * **bpf_perf_event_output**\ () achieves better performance
* than **bpf_trace_printk**\ () for sharing data with user * than **bpf_trace_printk**\ () for sharing data with user
@@ -3706,6 +3707,9 @@ static void *(* const bpf_this_cpu_ptr)(const void *percpu_ptr) = (void *) 154;
* the netns switch takes place from ingress to ingress without * the netns switch takes place from ingress to ingress without
* going through the CPU's backlog queue. * going through the CPU's backlog queue.
* *
* *skb*\ **->mark** and *skb*\ **->tstamp** are not cleared during
* the netns switch.
*
* The *flags* argument is reserved and must be 0. The helper is * The *flags* argument is reserved and must be 0. The helper is
* currently only supported for tc BPF program types at the * currently only supported for tc BPF program types at the
* ingress hook and for veth and netkit target device types. The * ingress hook and for veth and netkit target device types. The

View File

@@ -215,6 +215,7 @@ enum libbpf_tristate {
#define __arg_nonnull __attribute((btf_decl_tag("arg:nonnull"))) #define __arg_nonnull __attribute((btf_decl_tag("arg:nonnull")))
#define __arg_nullable __attribute((btf_decl_tag("arg:nullable"))) #define __arg_nullable __attribute((btf_decl_tag("arg:nullable")))
#define __arg_trusted __attribute((btf_decl_tag("arg:trusted"))) #define __arg_trusted __attribute((btf_decl_tag("arg:trusted")))
#define __arg_untrusted __attribute((btf_decl_tag("arg:untrusted")))
#define __arg_arena __attribute((btf_decl_tag("arg:arena"))) #define __arg_arena __attribute((btf_decl_tag("arg:arena")))
#ifndef ___bpf_concat #ifndef ___bpf_concat
@@ -314,6 +315,22 @@ enum libbpf_tristate {
___param, sizeof(___param)); \ ___param, sizeof(___param)); \
}) })
extern int bpf_stream_vprintk(int stream_id, const char *fmt__str, const void *args,
__u32 len__sz, void *aux__prog) __weak __ksym;
#define bpf_stream_printk(stream_id, fmt, args...) \
({ \
static const char ___fmt[] = fmt; \
unsigned long long ___param[___bpf_narg(args)]; \
\
_Pragma("GCC diagnostic push") \
_Pragma("GCC diagnostic ignored \"-Wint-conversion\"") \
___bpf_fill(___param, args); \
_Pragma("GCC diagnostic pop") \
\
bpf_stream_vprintk(stream_id, ___fmt, ___param, sizeof(___param), NULL);\
})
/* Use __bpf_printk when bpf_printk call has 3 or fewer fmt args /* Use __bpf_printk when bpf_printk call has 3 or fewer fmt args
* Otherwise use __bpf_vprintk * Otherwise use __bpf_vprintk
*/ */

View File

@@ -12,6 +12,7 @@
#include <sys/utsname.h> #include <sys/utsname.h>
#include <sys/param.h> #include <sys/param.h>
#include <sys/stat.h> #include <sys/stat.h>
#include <sys/mman.h>
#include <linux/kernel.h> #include <linux/kernel.h>
#include <linux/err.h> #include <linux/err.h>
#include <linux/btf.h> #include <linux/btf.h>
@@ -120,6 +121,9 @@ struct btf {
/* whether base_btf should be freed in btf_free for this instance */ /* whether base_btf should be freed in btf_free for this instance */
bool owns_base; bool owns_base;
/* whether raw_data is a (read-only) mmap */
bool raw_data_is_mmap;
/* BTF object FD, if loaded into kernel */ /* BTF object FD, if loaded into kernel */
int fd; int fd;
@@ -951,6 +955,17 @@ static bool btf_is_modifiable(const struct btf *btf)
return (void *)btf->hdr != btf->raw_data; return (void *)btf->hdr != btf->raw_data;
} }
static void btf_free_raw_data(struct btf *btf)
{
if (btf->raw_data_is_mmap) {
munmap(btf->raw_data, btf->raw_size);
btf->raw_data_is_mmap = false;
} else {
free(btf->raw_data);
}
btf->raw_data = NULL;
}
void btf__free(struct btf *btf) void btf__free(struct btf *btf)
{ {
if (IS_ERR_OR_NULL(btf)) if (IS_ERR_OR_NULL(btf))
@@ -970,7 +985,7 @@ void btf__free(struct btf *btf)
free(btf->types_data); free(btf->types_data);
strset__free(btf->strs_set); strset__free(btf->strs_set);
} }
free(btf->raw_data); btf_free_raw_data(btf);
free(btf->raw_data_swapped); free(btf->raw_data_swapped);
free(btf->type_offs); free(btf->type_offs);
if (btf->owns_base) if (btf->owns_base)
@@ -996,7 +1011,7 @@ static struct btf *btf_new_empty(struct btf *base_btf)
if (base_btf) { if (base_btf) {
btf->base_btf = base_btf; btf->base_btf = base_btf;
btf->start_id = btf__type_cnt(base_btf); btf->start_id = btf__type_cnt(base_btf);
btf->start_str_off = base_btf->hdr->str_len; btf->start_str_off = base_btf->hdr->str_len + base_btf->start_str_off;
btf->swapped_endian = base_btf->swapped_endian; btf->swapped_endian = base_btf->swapped_endian;
} }
@@ -1030,7 +1045,7 @@ struct btf *btf__new_empty_split(struct btf *base_btf)
return libbpf_ptr(btf_new_empty(base_btf)); return libbpf_ptr(btf_new_empty(base_btf));
} }
static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf) static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf, bool is_mmap)
{ {
struct btf *btf; struct btf *btf;
int err; int err;
@@ -1050,12 +1065,18 @@ static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf)
btf->start_str_off = base_btf->hdr->str_len; btf->start_str_off = base_btf->hdr->str_len;
} }
if (is_mmap) {
btf->raw_data = (void *)data;
btf->raw_data_is_mmap = true;
} else {
btf->raw_data = malloc(size); btf->raw_data = malloc(size);
if (!btf->raw_data) { if (!btf->raw_data) {
err = -ENOMEM; err = -ENOMEM;
goto done; goto done;
} }
memcpy(btf->raw_data, data, size); memcpy(btf->raw_data, data, size);
}
btf->raw_size = size; btf->raw_size = size;
btf->hdr = btf->raw_data; btf->hdr = btf->raw_data;
@@ -1083,12 +1104,12 @@ done:
struct btf *btf__new(const void *data, __u32 size) struct btf *btf__new(const void *data, __u32 size)
{ {
return libbpf_ptr(btf_new(data, size, NULL)); return libbpf_ptr(btf_new(data, size, NULL, false));
} }
struct btf *btf__new_split(const void *data, __u32 size, struct btf *base_btf) struct btf *btf__new_split(const void *data, __u32 size, struct btf *base_btf)
{ {
return libbpf_ptr(btf_new(data, size, base_btf)); return libbpf_ptr(btf_new(data, size, base_btf, false));
} }
struct btf_elf_secs { struct btf_elf_secs {
@@ -1209,7 +1230,7 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
if (secs.btf_base_data) { if (secs.btf_base_data) {
dist_base_btf = btf_new(secs.btf_base_data->d_buf, secs.btf_base_data->d_size, dist_base_btf = btf_new(secs.btf_base_data->d_buf, secs.btf_base_data->d_size,
NULL); NULL, false);
if (IS_ERR(dist_base_btf)) { if (IS_ERR(dist_base_btf)) {
err = PTR_ERR(dist_base_btf); err = PTR_ERR(dist_base_btf);
dist_base_btf = NULL; dist_base_btf = NULL;
@@ -1218,7 +1239,7 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
} }
btf = btf_new(secs.btf_data->d_buf, secs.btf_data->d_size, btf = btf_new(secs.btf_data->d_buf, secs.btf_data->d_size,
dist_base_btf ?: base_btf); dist_base_btf ?: base_btf, false);
if (IS_ERR(btf)) { if (IS_ERR(btf)) {
err = PTR_ERR(btf); err = PTR_ERR(btf);
goto done; goto done;
@@ -1335,7 +1356,7 @@ static struct btf *btf_parse_raw(const char *path, struct btf *base_btf)
} }
/* finally parse BTF data */ /* finally parse BTF data */
btf = btf_new(data, sz, base_btf); btf = btf_new(data, sz, base_btf, false);
err_out: err_out:
free(data); free(data);
@@ -1354,6 +1375,37 @@ struct btf *btf__parse_raw_split(const char *path, struct btf *base_btf)
return libbpf_ptr(btf_parse_raw(path, base_btf)); return libbpf_ptr(btf_parse_raw(path, base_btf));
} }
static struct btf *btf_parse_raw_mmap(const char *path, struct btf *base_btf)
{
struct stat st;
void *data;
struct btf *btf;
int fd, err;
fd = open(path, O_RDONLY);
if (fd < 0)
return ERR_PTR(-errno);
if (fstat(fd, &st) < 0) {
err = -errno;
close(fd);
return ERR_PTR(err);
}
data = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
err = -errno;
close(fd);
if (data == MAP_FAILED)
return ERR_PTR(err);
btf = btf_new(data, st.st_size, base_btf, true);
if (IS_ERR(btf))
munmap(data, st.st_size);
return btf;
}
static struct btf *btf_parse(const char *path, struct btf *base_btf, struct btf_ext **btf_ext) static struct btf *btf_parse(const char *path, struct btf *base_btf, struct btf_ext **btf_ext)
{ {
struct btf *btf; struct btf *btf;
@@ -1618,7 +1670,7 @@ struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf)
goto exit_free; goto exit_free;
} }
btf = btf_new(ptr, btf_info.btf_size, base_btf); btf = btf_new(ptr, btf_info.btf_size, base_btf, false);
exit_free: exit_free:
free(ptr); free(ptr);
@@ -1658,10 +1710,8 @@ struct btf *btf__load_from_kernel_by_id(__u32 id)
static void btf_invalidate_raw_data(struct btf *btf) static void btf_invalidate_raw_data(struct btf *btf)
{ {
if (btf->raw_data) { if (btf->raw_data)
free(btf->raw_data); btf_free_raw_data(btf);
btf->raw_data = NULL;
}
if (btf->raw_data_swapped) { if (btf->raw_data_swapped) {
free(btf->raw_data_swapped); free(btf->raw_data_swapped);
btf->raw_data_swapped = NULL; btf->raw_data_swapped = NULL;
@@ -5331,7 +5381,10 @@ struct btf *btf__load_vmlinux_btf(void)
pr_warn("kernel BTF is missing at '%s', was CONFIG_DEBUG_INFO_BTF enabled?\n", pr_warn("kernel BTF is missing at '%s', was CONFIG_DEBUG_INFO_BTF enabled?\n",
sysfs_btf_path); sysfs_btf_path);
} else { } else {
btf = btf_parse_raw_mmap(sysfs_btf_path, NULL);
if (IS_ERR(btf))
btf = btf__parse(sysfs_btf_path, NULL); btf = btf__parse(sysfs_btf_path, NULL);
if (!btf) { if (!btf) {
err = -errno; err = -errno;
pr_warn("failed to read kernel BTF from '%s': %s\n", pr_warn("failed to read kernel BTF from '%s': %s\n",

View File

@@ -326,9 +326,10 @@ struct btf_dump_type_data_opts {
bool compact; /* no newlines/indentation */ bool compact; /* no newlines/indentation */
bool skip_names; /* skip member/type names */ bool skip_names; /* skip member/type names */
bool emit_zeroes; /* show 0-valued fields */ bool emit_zeroes; /* show 0-valued fields */
bool emit_strings; /* print char arrays as strings */
size_t :0; size_t :0;
}; };
#define btf_dump_type_data_opts__last_field emit_zeroes #define btf_dump_type_data_opts__last_field emit_strings
LIBBPF_API int LIBBPF_API int
btf_dump__dump_type_data(struct btf_dump *d, __u32 id, btf_dump__dump_type_data(struct btf_dump *d, __u32 id,

View File

@@ -68,6 +68,7 @@ struct btf_dump_data {
bool compact; bool compact;
bool skip_names; bool skip_names;
bool emit_zeroes; bool emit_zeroes;
bool emit_strings;
__u8 indent_lvl; /* base indent level */ __u8 indent_lvl; /* base indent level */
char indent_str[BTF_DATA_INDENT_STR_LEN]; char indent_str[BTF_DATA_INDENT_STR_LEN];
/* below are used during iteration */ /* below are used during iteration */
@@ -226,6 +227,9 @@ static void btf_dump_free_names(struct hashmap *map)
size_t bkt; size_t bkt;
struct hashmap_entry *cur; struct hashmap_entry *cur;
if (!map)
return;
hashmap__for_each_entry(map, cur, bkt) hashmap__for_each_entry(map, cur, bkt)
free((void *)cur->pkey); free((void *)cur->pkey);
@@ -2028,6 +2032,52 @@ static int btf_dump_var_data(struct btf_dump *d,
return btf_dump_dump_type_data(d, NULL, t, type_id, data, 0, 0); return btf_dump_dump_type_data(d, NULL, t, type_id, data, 0, 0);
} }
static int btf_dump_string_data(struct btf_dump *d,
const struct btf_type *t,
__u32 id,
const void *data)
{
const struct btf_array *array = btf_array(t);
const char *chars = data;
__u32 i;
/* Make sure it is a NUL-terminated string. */
for (i = 0; i < array->nelems; i++) {
if ((void *)(chars + i) >= d->typed_dump->data_end)
return -E2BIG;
if (chars[i] == '\0')
break;
}
if (i == array->nelems) {
/* The caller will print this as a regular array. */
return -EINVAL;
}
btf_dump_data_pfx(d);
btf_dump_printf(d, "\"");
for (i = 0; i < array->nelems; i++) {
char c = chars[i];
if (c == '\0') {
/*
* When printing character arrays as strings, NUL bytes
* are always treated as string terminators; they are
* never printed.
*/
break;
}
if (isprint(c))
btf_dump_printf(d, "%c", c);
else
btf_dump_printf(d, "\\x%02x", (__u8)c);
}
btf_dump_printf(d, "\"");
return 0;
}
static int btf_dump_array_data(struct btf_dump *d, static int btf_dump_array_data(struct btf_dump *d,
const struct btf_type *t, const struct btf_type *t,
__u32 id, __u32 id,
@@ -2055,9 +2105,14 @@ static int btf_dump_array_data(struct btf_dump *d,
* char arrays, so if size is 1 and element is * char arrays, so if size is 1 and element is
* printable as a char, we'll do that. * printable as a char, we'll do that.
*/ */
if (elem_size == 1) if (elem_size == 1) {
if (d->typed_dump->emit_strings &&
btf_dump_string_data(d, t, id, data) == 0) {
return 0;
}
d->typed_dump->is_array_char = true; d->typed_dump->is_array_char = true;
} }
}
/* note that we increment depth before calling btf_dump_print() below; /* note that we increment depth before calling btf_dump_print() below;
* this is intentional. btf_dump_data_newline() will not print a * this is intentional. btf_dump_data_newline() will not print a
@@ -2544,6 +2599,7 @@ int btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
d->typed_dump->compact = OPTS_GET(opts, compact, false); d->typed_dump->compact = OPTS_GET(opts, compact, false);
d->typed_dump->skip_names = OPTS_GET(opts, skip_names, false); d->typed_dump->skip_names = OPTS_GET(opts, skip_names, false);
d->typed_dump->emit_zeroes = OPTS_GET(opts, emit_zeroes, false); d->typed_dump->emit_zeroes = OPTS_GET(opts, emit_zeroes, false);
d->typed_dump->emit_strings = OPTS_GET(opts, emit_strings, false);
ret = btf_dump_dump_type_data(d, NULL, t, id, data, 0, 0); ret = btf_dump_dump_type_data(d, NULL, t, id, data, 0, 0);

View File

@@ -597,7 +597,7 @@ struct extern_desc {
int sym_idx; int sym_idx;
int btf_id; int btf_id;
int sec_btf_id; int sec_btf_id;
const char *name; char *name;
char *essent_name; char *essent_name;
bool is_set; bool is_set;
bool is_weak; bool is_weak;
@@ -735,7 +735,7 @@ struct bpf_object {
struct usdt_manager *usdt_man; struct usdt_manager *usdt_man;
struct bpf_map *arena_map; int arena_map_idx;
void *arena_data; void *arena_data;
size_t arena_data_sz; size_t arena_data_sz;
@@ -1517,6 +1517,7 @@ static struct bpf_object *bpf_object__new(const char *path,
obj->efile.obj_buf_sz = obj_buf_sz; obj->efile.obj_buf_sz = obj_buf_sz;
obj->efile.btf_maps_shndx = -1; obj->efile.btf_maps_shndx = -1;
obj->kconfig_map_idx = -1; obj->kconfig_map_idx = -1;
obj->arena_map_idx = -1;
obj->kern_version = get_kernel_version(); obj->kern_version = get_kernel_version();
obj->state = OBJ_OPEN; obj->state = OBJ_OPEN;
@@ -2964,7 +2965,7 @@ static int init_arena_map_data(struct bpf_object *obj, struct bpf_map *map,
const long page_sz = sysconf(_SC_PAGE_SIZE); const long page_sz = sysconf(_SC_PAGE_SIZE);
size_t mmap_sz; size_t mmap_sz;
mmap_sz = bpf_map_mmap_sz(obj->arena_map); mmap_sz = bpf_map_mmap_sz(map);
if (roundup(data_sz, page_sz) > mmap_sz) { if (roundup(data_sz, page_sz) > mmap_sz) {
pr_warn("elf: sec '%s': declared ARENA map size (%zu) is too small to hold global __arena variables of size %zu\n", pr_warn("elf: sec '%s': declared ARENA map size (%zu) is too small to hold global __arena variables of size %zu\n",
sec_name, mmap_sz, data_sz); sec_name, mmap_sz, data_sz);
@@ -3038,12 +3039,12 @@ static int bpf_object__init_user_btf_maps(struct bpf_object *obj, bool strict,
if (map->def.type != BPF_MAP_TYPE_ARENA) if (map->def.type != BPF_MAP_TYPE_ARENA)
continue; continue;
if (obj->arena_map) { if (obj->arena_map_idx >= 0) {
pr_warn("map '%s': only single ARENA map is supported (map '%s' is also ARENA)\n", pr_warn("map '%s': only single ARENA map is supported (map '%s' is also ARENA)\n",
map->name, obj->arena_map->name); map->name, obj->maps[obj->arena_map_idx].name);
return -EINVAL; return -EINVAL;
} }
obj->arena_map = map; obj->arena_map_idx = i;
if (obj->efile.arena_data) { if (obj->efile.arena_data) {
err = init_arena_map_data(obj, map, ARENA_SEC, obj->efile.arena_data_shndx, err = init_arena_map_data(obj, map, ARENA_SEC, obj->efile.arena_data_shndx,
@@ -3053,7 +3054,7 @@ static int bpf_object__init_user_btf_maps(struct bpf_object *obj, bool strict,
return err; return err;
} }
} }
if (obj->efile.arena_data && !obj->arena_map) { if (obj->efile.arena_data && obj->arena_map_idx < 0) {
pr_warn("elf: sec '%s': to use global __arena variables the ARENA map should be explicitly declared in SEC(\".maps\")\n", pr_warn("elf: sec '%s': to use global __arena variables the ARENA map should be explicitly declared in SEC(\".maps\")\n",
ARENA_SEC); ARENA_SEC);
return -ENOENT; return -ENOENT;
@@ -4259,7 +4260,9 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
return ext->btf_id; return ext->btf_id;
} }
t = btf__type_by_id(obj->btf, ext->btf_id); t = btf__type_by_id(obj->btf, ext->btf_id);
ext->name = btf__name_by_offset(obj->btf, t->name_off); ext->name = strdup(btf__name_by_offset(obj->btf, t->name_off));
if (!ext->name)
return -ENOMEM;
ext->sym_idx = i; ext->sym_idx = i;
ext->is_weak = ELF64_ST_BIND(sym->st_info) == STB_WEAK; ext->is_weak = ELF64_ST_BIND(sym->st_info) == STB_WEAK;
@@ -4579,10 +4582,20 @@ static int bpf_program__record_reloc(struct bpf_program *prog,
/* arena data relocation */ /* arena data relocation */
if (shdr_idx == obj->efile.arena_data_shndx) { if (shdr_idx == obj->efile.arena_data_shndx) {
if (obj->arena_map_idx < 0) {
pr_warn("prog '%s': bad arena data relocation at insn %u, no arena maps defined\n",
prog->name, insn_idx);
return -LIBBPF_ERRNO__RELOC;
}
reloc_desc->type = RELO_DATA; reloc_desc->type = RELO_DATA;
reloc_desc->insn_idx = insn_idx; reloc_desc->insn_idx = insn_idx;
reloc_desc->map_idx = obj->arena_map - obj->maps; reloc_desc->map_idx = obj->arena_map_idx;
reloc_desc->sym_off = sym->st_value; reloc_desc->sym_off = sym->st_value;
map = &obj->maps[obj->arena_map_idx];
pr_debug("prog '%s': found arena map %d (%s, sec %d, off %zu) for insn %u\n",
prog->name, obj->arena_map_idx, map->name, map->sec_idx,
map->sec_offset, insn_idx);
return 0; return 0;
} }
@@ -5080,6 +5093,16 @@ static bool map_is_reuse_compat(const struct bpf_map *map, int map_fd)
return false; return false;
} }
/*
* bpf_get_map_info_by_fd() for DEVMAP will always return flags with
* BPF_F_RDONLY_PROG set, but it generally is not set at map creation time.
* Thus, ignore the BPF_F_RDONLY_PROG flag in the flags returned from
* bpf_get_map_info_by_fd() when checking for compatibility with an
* existing DEVMAP.
*/
if (map->def.type == BPF_MAP_TYPE_DEVMAP || map->def.type == BPF_MAP_TYPE_DEVMAP_HASH)
map_info.map_flags &= ~BPF_F_RDONLY_PROG;
return (map_info.type == map->def.type && return (map_info.type == map->def.type &&
map_info.key_size == map->def.key_size && map_info.key_size == map->def.key_size &&
map_info.value_size == map->def.value_size && map_info.value_size == map->def.value_size &&
@@ -9138,8 +9161,10 @@ void bpf_object__close(struct bpf_object *obj)
zfree(&obj->btf_custom_path); zfree(&obj->btf_custom_path);
zfree(&obj->kconfig); zfree(&obj->kconfig);
for (i = 0; i < obj->nr_extern; i++) for (i = 0; i < obj->nr_extern; i++) {
zfree(&obj->externs[i].name);
zfree(&obj->externs[i].essent_name); zfree(&obj->externs[i].essent_name);
}
zfree(&obj->externs); zfree(&obj->externs);
obj->nr_extern = 0; obj->nr_extern = 0;
@@ -9206,7 +9231,7 @@ int bpf_object__gen_loader(struct bpf_object *obj, struct gen_loader_opts *opts)
return libbpf_err(-EFAULT); return libbpf_err(-EFAULT);
if (!OPTS_VALID(opts, gen_loader_opts)) if (!OPTS_VALID(opts, gen_loader_opts))
return libbpf_err(-EINVAL); return libbpf_err(-EINVAL);
gen = calloc(sizeof(*gen), 1); gen = calloc(1, sizeof(*gen));
if (!gen) if (!gen)
return libbpf_err(-ENOMEM); return libbpf_err(-ENOMEM);
gen->opts = opts; gen->opts = opts;
@@ -10081,7 +10106,7 @@ static int find_kernel_btf_id(struct bpf_object *obj, const char *attach_name,
enum bpf_attach_type attach_type, enum bpf_attach_type attach_type,
int *btf_obj_fd, int *btf_type_id) int *btf_obj_fd, int *btf_type_id)
{ {
int ret, i, mod_len; int ret, i, mod_len = 0;
const char *fn_name, *mod_name = NULL; const char *fn_name, *mod_name = NULL;
fn_name = strchr(attach_name, ':'); fn_name = strchr(attach_name, ':');
@@ -10950,12 +10975,15 @@ struct bpf_link *bpf_program__attach_perf_event_opts(const struct bpf_program *p
} }
link->link.fd = pfd; link->link.fd = pfd;
} }
if (!OPTS_GET(opts, dont_enable, false)) {
if (ioctl(pfd, PERF_EVENT_IOC_ENABLE, 0) < 0) { if (ioctl(pfd, PERF_EVENT_IOC_ENABLE, 0) < 0) {
err = -errno; err = -errno;
pr_warn("prog '%s': failed to enable perf_event FD %d: %s\n", pr_warn("prog '%s': failed to enable perf_event FD %d: %s\n",
prog->name, pfd, errstr(err)); prog->name, pfd, errstr(err));
goto err_out; goto err_out;
} }
}
return &link->link; return &link->link;
err_out: err_out:
@@ -12837,6 +12865,34 @@ struct bpf_link *bpf_program__attach_xdp(const struct bpf_program *prog, int ifi
return bpf_program_attach_fd(prog, ifindex, "xdp", NULL); return bpf_program_attach_fd(prog, ifindex, "xdp", NULL);
} }
struct bpf_link *
bpf_program__attach_cgroup_opts(const struct bpf_program *prog, int cgroup_fd,
const struct bpf_cgroup_opts *opts)
{
LIBBPF_OPTS(bpf_link_create_opts, link_create_opts);
__u32 relative_id;
int relative_fd;
if (!OPTS_VALID(opts, bpf_cgroup_opts))
return libbpf_err_ptr(-EINVAL);
relative_id = OPTS_GET(opts, relative_id, 0);
relative_fd = OPTS_GET(opts, relative_fd, 0);
if (relative_fd && relative_id) {
pr_warn("prog '%s': relative_fd and relative_id cannot be set at the same time\n",
prog->name);
return libbpf_err_ptr(-EINVAL);
}
link_create_opts.cgroup.expected_revision = OPTS_GET(opts, expected_revision, 0);
link_create_opts.cgroup.relative_fd = relative_fd;
link_create_opts.cgroup.relative_id = relative_id;
link_create_opts.flags = OPTS_GET(opts, flags, 0);
return bpf_program_attach_fd(prog, cgroup_fd, "cgroup", &link_create_opts);
}
struct bpf_link * struct bpf_link *
bpf_program__attach_tcx(const struct bpf_program *prog, int ifindex, bpf_program__attach_tcx(const struct bpf_program *prog, int ifindex,
const struct bpf_tcx_opts *opts) const struct bpf_tcx_opts *opts)

View File

@@ -24,8 +24,25 @@
extern "C" { extern "C" {
#endif #endif
/**
* @brief **libbpf_major_version()** provides the major version of libbpf.
* @return An integer, the major version number
*/
LIBBPF_API __u32 libbpf_major_version(void); LIBBPF_API __u32 libbpf_major_version(void);
/**
* @brief **libbpf_minor_version()** provides the minor version of libbpf.
* @return An integer, the minor version number
*/
LIBBPF_API __u32 libbpf_minor_version(void); LIBBPF_API __u32 libbpf_minor_version(void);
/**
* @brief **libbpf_version_string()** provides the version of libbpf in a
* human-readable form, e.g., "v1.7".
* @return Pointer to a static string containing the version
*
* The format is *not* a part of a stable API and may change in the future.
*/
LIBBPF_API const char *libbpf_version_string(void); LIBBPF_API const char *libbpf_version_string(void);
enum libbpf_errno { enum libbpf_errno {
@@ -49,6 +66,14 @@ enum libbpf_errno {
__LIBBPF_ERRNO__END, __LIBBPF_ERRNO__END,
}; };
/**
* @brief **libbpf_strerror()** converts the provided error code into a
* human-readable string.
* @param err The error code to convert
* @param buf Pointer to a buffer where the error message will be stored
* @param size The number of bytes in the buffer
* @return 0, on success; negative error code, otherwise
*/
LIBBPF_API int libbpf_strerror(int err, char *buf, size_t size); LIBBPF_API int libbpf_strerror(int err, char *buf, size_t size);
/** /**
@@ -252,7 +277,7 @@ bpf_object__open_mem(const void *obj_buf, size_t obj_buf_sz,
* @return 0, on success; negative error code, otherwise, error code is * @return 0, on success; negative error code, otherwise, error code is
* stored in errno * stored in errno
*/ */
int bpf_object__prepare(struct bpf_object *obj); LIBBPF_API int bpf_object__prepare(struct bpf_object *obj);
/** /**
* @brief **bpf_object__load()** loads BPF object into kernel. * @brief **bpf_object__load()** loads BPF object into kernel.
@@ -499,9 +524,11 @@ struct bpf_perf_event_opts {
__u64 bpf_cookie; __u64 bpf_cookie;
/* don't use BPF link when attach BPF program */ /* don't use BPF link when attach BPF program */
bool force_ioctl_attach; bool force_ioctl_attach;
/* don't automatically enable the event */
bool dont_enable;
size_t :0; size_t :0;
}; };
#define bpf_perf_event_opts__last_field force_ioctl_attach #define bpf_perf_event_opts__last_field dont_enable
LIBBPF_API struct bpf_link * LIBBPF_API struct bpf_link *
bpf_program__attach_perf_event(const struct bpf_program *prog, int pfd); bpf_program__attach_perf_event(const struct bpf_program *prog, int pfd);
@@ -877,6 +904,21 @@ LIBBPF_API struct bpf_link *
bpf_program__attach_netkit(const struct bpf_program *prog, int ifindex, bpf_program__attach_netkit(const struct bpf_program *prog, int ifindex,
const struct bpf_netkit_opts *opts); const struct bpf_netkit_opts *opts);
struct bpf_cgroup_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
__u32 flags;
__u32 relative_fd;
__u32 relative_id;
__u64 expected_revision;
size_t :0;
};
#define bpf_cgroup_opts__last_field expected_revision
LIBBPF_API struct bpf_link *
bpf_program__attach_cgroup_opts(const struct bpf_program *prog, int cgroup_fd,
const struct bpf_cgroup_opts *opts);
struct bpf_map; struct bpf_map;
LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map); LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map);
@@ -1289,6 +1331,7 @@ enum bpf_tc_attach_point {
BPF_TC_INGRESS = 1 << 0, BPF_TC_INGRESS = 1 << 0,
BPF_TC_EGRESS = 1 << 1, BPF_TC_EGRESS = 1 << 1,
BPF_TC_CUSTOM = 1 << 2, BPF_TC_CUSTOM = 1 << 2,
BPF_TC_QDISC = 1 << 3,
}; };
#define BPF_TC_PARENT(a, b) \ #define BPF_TC_PARENT(a, b) \
@@ -1303,9 +1346,11 @@ struct bpf_tc_hook {
int ifindex; int ifindex;
enum bpf_tc_attach_point attach_point; enum bpf_tc_attach_point attach_point;
__u32 parent; __u32 parent;
__u32 handle;
const char *qdisc;
size_t :0; size_t :0;
}; };
#define bpf_tc_hook__last_field parent #define bpf_tc_hook__last_field qdisc
struct bpf_tc_opts { struct bpf_tc_opts {
size_t sz; size_t sz;

View File

@@ -437,6 +437,8 @@ LIBBPF_1.6.0 {
bpf_linker__add_fd; bpf_linker__add_fd;
bpf_linker__new_fd; bpf_linker__new_fd;
bpf_object__prepare; bpf_object__prepare;
bpf_prog_stream_read;
bpf_program__attach_cgroup_opts;
bpf_program__func_info; bpf_program__func_info;
bpf_program__func_info_cnt; bpf_program__func_info_cnt;
bpf_program__line_info; bpf_program__line_info;
@@ -444,3 +446,6 @@ LIBBPF_1.6.0 {
btf__add_decl_attr; btf__add_decl_attr;
btf__add_type_attr; btf__add_type_attr;
} LIBBPF_1.5.0; } LIBBPF_1.5.0;
LIBBPF_1.7.0 {
} LIBBPF_1.6.0;

View File

@@ -4,6 +4,6 @@
#define __LIBBPF_VERSION_H #define __LIBBPF_VERSION_H
#define LIBBPF_MAJOR_VERSION 1 #define LIBBPF_MAJOR_VERSION 1
#define LIBBPF_MINOR_VERSION 6 #define LIBBPF_MINOR_VERSION 7
#endif /* __LIBBPF_VERSION_H */ #endif /* __LIBBPF_VERSION_H */

View File

@@ -529,9 +529,9 @@ int bpf_xdp_query_id(int ifindex, int flags, __u32 *prog_id)
} }
typedef int (*qdisc_config_t)(struct libbpf_nla_req *req); typedef int (*qdisc_config_t)(struct libbpf_nla_req *req, const struct bpf_tc_hook *hook);
static int clsact_config(struct libbpf_nla_req *req) static int clsact_config(struct libbpf_nla_req *req, const struct bpf_tc_hook *hook)
{ {
req->tc.tcm_parent = TC_H_CLSACT; req->tc.tcm_parent = TC_H_CLSACT;
req->tc.tcm_handle = TC_H_MAKE(TC_H_CLSACT, 0); req->tc.tcm_handle = TC_H_MAKE(TC_H_CLSACT, 0);
@@ -539,6 +539,16 @@ static int clsact_config(struct libbpf_nla_req *req)
return nlattr_add(req, TCA_KIND, "clsact", sizeof("clsact")); return nlattr_add(req, TCA_KIND, "clsact", sizeof("clsact"));
} }
static int qdisc_config(struct libbpf_nla_req *req, const struct bpf_tc_hook *hook)
{
const char *qdisc = OPTS_GET(hook, qdisc, NULL);
req->tc.tcm_parent = OPTS_GET(hook, parent, TC_H_ROOT);
req->tc.tcm_handle = OPTS_GET(hook, handle, 0);
return nlattr_add(req, TCA_KIND, qdisc, strlen(qdisc) + 1);
}
static int attach_point_to_config(struct bpf_tc_hook *hook, static int attach_point_to_config(struct bpf_tc_hook *hook,
qdisc_config_t *config) qdisc_config_t *config)
{ {
@@ -552,6 +562,9 @@ static int attach_point_to_config(struct bpf_tc_hook *hook,
return 0; return 0;
case BPF_TC_CUSTOM: case BPF_TC_CUSTOM:
return -EOPNOTSUPP; return -EOPNOTSUPP;
case BPF_TC_QDISC:
*config = &qdisc_config;
return 0;
default: default:
return -EINVAL; return -EINVAL;
} }
@@ -596,7 +609,7 @@ static int tc_qdisc_modify(struct bpf_tc_hook *hook, int cmd, int flags)
req.tc.tcm_family = AF_UNSPEC; req.tc.tcm_family = AF_UNSPEC;
req.tc.tcm_ifindex = OPTS_GET(hook, ifindex, 0); req.tc.tcm_ifindex = OPTS_GET(hook, ifindex, 0);
ret = config(&req); ret = config(&req, hook);
if (ret < 0) if (ret < 0)
return ret; return ret;
@@ -639,6 +652,7 @@ int bpf_tc_hook_destroy(struct bpf_tc_hook *hook)
case BPF_TC_INGRESS: case BPF_TC_INGRESS:
case BPF_TC_EGRESS: case BPF_TC_EGRESS:
return libbpf_err(__bpf_tc_detach(hook, NULL, true)); return libbpf_err(__bpf_tc_detach(hook, NULL, true));
case BPF_TC_QDISC:
case BPF_TC_INGRESS | BPF_TC_EGRESS: case BPF_TC_INGRESS | BPF_TC_EGRESS:
return libbpf_err(tc_qdisc_delete(hook)); return libbpf_err(tc_qdisc_delete(hook));
case BPF_TC_CUSTOM: case BPF_TC_CUSTOM:

View File

@@ -59,7 +59,7 @@
* *
* STAP_PROBE3(my_usdt_provider, my_usdt_probe_name, 123, x, &y); * STAP_PROBE3(my_usdt_provider, my_usdt_probe_name, 123, x, &y);
* *
* USDT is identified by it's <provider-name>:<probe-name> pair of names. Each * USDT is identified by its <provider-name>:<probe-name> pair of names. Each
* individual USDT has a fixed number of arguments (3 in the above example) * individual USDT has a fixed number of arguments (3 in the above example)
* and specifies values of each argument as if it was a function call. * and specifies values of each argument as if it was a function call.
* *
@@ -81,7 +81,7 @@
* NOP instruction that kernel can replace with an interrupt instruction to * NOP instruction that kernel can replace with an interrupt instruction to
* trigger instrumentation code (BPF program for all that we care about). * trigger instrumentation code (BPF program for all that we care about).
* *
* Semaphore above is and optional feature. It records an address of a 2-byte * Semaphore above is an optional feature. It records an address of a 2-byte
* refcount variable (normally in '.probes' ELF section) used for signaling if * refcount variable (normally in '.probes' ELF section) used for signaling if
* there is anything that is attached to USDT. This is useful for user * there is anything that is attached to USDT. This is useful for user
* applications if, for example, they need to prepare some arguments that are * applications if, for example, they need to prepare some arguments that are
@@ -121,7 +121,7 @@
* a uprobe BPF program (which for kernel, at least currently, is just a kprobe * a uprobe BPF program (which for kernel, at least currently, is just a kprobe
* program, so BPF_PROG_TYPE_KPROBE program type). With the only difference * program, so BPF_PROG_TYPE_KPROBE program type). With the only difference
* that uprobe is usually attached at the function entry, while USDT will * that uprobe is usually attached at the function entry, while USDT will
* normally will be somewhere inside the function. But it should always be * normally be somewhere inside the function. But it should always be
* pointing to NOP instruction, which makes such uprobes the fastest uprobe * pointing to NOP instruction, which makes such uprobes the fastest uprobe
* kind. * kind.
* *
@@ -151,7 +151,7 @@
* libbpf sets to spec ID during attach time, or, if kernel is too old to * libbpf sets to spec ID during attach time, or, if kernel is too old to
* support BPF cookie, through IP-to-spec-ID map that libbpf maintains in such * support BPF cookie, through IP-to-spec-ID map that libbpf maintains in such
* case. The latter means that some modes of operation can't be supported * case. The latter means that some modes of operation can't be supported
* without BPF cookie. Such mode is attaching to shared library "generically", * without BPF cookie. Such a mode is attaching to shared library "generically",
* without specifying target process. In such case, it's impossible to * without specifying target process. In such case, it's impossible to
* calculate absolute IP addresses for IP-to-spec-ID map, and thus such mode * calculate absolute IP addresses for IP-to-spec-ID map, and thus such mode
* is not supported without BPF cookie support. * is not supported without BPF cookie support.
@@ -185,7 +185,7 @@
* as even if USDT spec string is the same, USDT cookie value can be * as even if USDT spec string is the same, USDT cookie value can be
* different. It was deemed excessive to try to deduplicate across independent * different. It was deemed excessive to try to deduplicate across independent
* USDT attachments by taking into account USDT spec string *and* USDT cookie * USDT attachments by taking into account USDT spec string *and* USDT cookie
* value, which would complicated spec ID accounting significantly for little * value, which would complicate spec ID accounting significantly for little
* gain. * gain.
*/ */