libbpf

mirror of https://github.com/netdata/libbpf.git synced 2026-04-28 19:49:11 +08:00

Author	SHA1	Message	Date
Andrii Nakryiko	85d9be97eb	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: 6f0b824a61f212e9707ff68abcabfdfa4724b811 Checkpoint bpf-next commit: 08a7491843224f8b96518fbe70d9e48163046054 Baseline bpf commit: 1d528e794f3db5d32279123a89957c44c4406a09 Checkpoint bpf commit: 22cc16c04b7893d8fc22810599f49a305d600b9e Donglin Peng (4): libbpf: Add BTF permutation support for type reordering libbpf: Optimize type lookup with binary search for sorted BTF libbpf: Verify BTF sorting btf: Refactor the code by calling str_is_empty Emil Tsalapatis (2): libbpf: Turn relo_core->sym_off unsigned libbpf: Move arena globals to the end of the arena Ihor Solodrai (1): bpf: Migrate bpf_stream_vprintk() to KF_IMPLICIT_ARGS Leon Hwang (2): bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags libbpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu maps Matt Bobrowski (1): bpf: add new BPF_CGROUP_ITER_CHILDREN control option Menglong Dong (2): bpf: add fsession support libbpf: add fsession support Thomas Gleixner (1): treewide: Update email address Thomas Weißschuh (1): vfs: use UAPI types for new struct delegation definition Varun R Mallya (1): libbpf: Fix OOB read in btf_dump_get_bitfield_value include/uapi/linux/bpf.h \| 11 ++ include/uapi/linux/fcntl.h \| 10 +- include/uapi/linux/perf_event.h \| 2 +- src/bpf.c \| 1 + src/bpf.h \| 8 + src/bpf_helpers.h \| 6 +- src/btf.c \| 276 +++++++++++++++++++++++++++----- src/btf.h \| 42 +++++ src/btf_dump.c \| 9 ++ src/libbpf.c \| 64 +++++--- src/libbpf.h \| 21 +-- src/libbpf.map \| 1 + 12 files changed, 369 insertions(+), 82 deletions(-) Signed-off-by: Andrii Nakryiko <andrii@kernel.org> v1.6.3.1p_netdata	2026-01-29 14:10:19 -08:00
Andrii Nakryiko	fddf93d20b	sync: update .mailmap Update .mailmap based on libbpf's list of contributors and on the latest .mailmap version in the upstream repository. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2026-01-29 14:10:19 -08:00
Matt Bobrowski	ed6bb65cf1	bpf: add new BPF_CGROUP_ITER_CHILDREN control option Currently, the BPF cgroup iterator supports walking descendants in either pre-order (BPF_CGROUP_ITER_DESCENDANTS_PRE) or post-order (BPF_CGROUP_ITER_DESCENDANTS_POST). These modes perform an exhaustive depth-first search (DFS) of the hierarchy. In scenarios where a BPF program may need to inspect only the direct children of a given parent cgroup, a full DFS is unnecessarily expensive. This patch introduces a new BPF cgroup iterator control option, BPF_CGROUP_ITER_CHILDREN. This control option restricts the traversal to the immediate children of a specified parent cgroup, allowing for more targeted and efficient iteration, particularly when exhaustive depth-first search (DFS) traversal is not required. Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20260127085112.3608687-1-mattbobrowski@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-01-29 14:10:19 -08:00
Menglong Dong	5ee8863eaf	libbpf: add fsession support Add BPF_TRACE_FSESSION to libbpf. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260124062008.8657-9-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-01-29 14:10:19 -08:00
Menglong Dong	adde4f55b7	bpf: add fsession support The fsession is something that similar to kprobe session. It allow to attach a single BPF program to both the entry and the exit of the target functions. Introduce the struct bpf_fsession_link, which allows to add the link to both the fentry and fexit progs_hlist of the trampoline. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Co-developed-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260124062008.8657-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-01-29 14:10:19 -08:00
Ihor Solodrai	977b1f820c	bpf: Migrate bpf_stream_vprintk() to KF_IMPLICIT_ARGS Implement bpf_stream_vprintk with an implicit bpf_prog_aux argument, and remote bpf_stream_vprintk_impl from the kernel. Update the selftests to use the new API with implicit argument. bpf_stream_vprintk macro is changed to use the new bpf_stream_vprintk kfunc, and the extern definition of bpf_stream_vprintk_impl is replaced accordingly. Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Link: https://lore.kernel.org/r/20260120222638.3976562-11-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-01-29 14:10:19 -08:00
Thomas Gleixner	5d02120e10	treewide: Update email address In a vain attempt to consolidate the email zoo switch everything to the kernel.org account. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-01-29 14:10:19 -08:00
Donglin Peng	8a090ef1e5	btf: Refactor the code by calling str_is_empty Calling the str_is_empty function to clarify the code and no functional changes are introduced. Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20260109130003.3313716-12-dolinux.peng@gmail.com	2026-01-29 14:10:19 -08:00
Donglin Peng	ad9c763445	libbpf: Verify BTF sorting This patch checks whether the BTF is sorted by name in ascending order. If sorted, binary search will be used when looking up types. Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20260109130003.3313716-6-dolinux.peng@gmail.com	2026-01-29 14:10:19 -08:00
Donglin Peng	1c96b72cb0	libbpf: Optimize type lookup with binary search for sorted BTF This patch introduces binary search optimization for BTF type lookups when the BTF instance contains sorted types. The optimization significantly improves performance when searching for types in large BTF instances with sorted types. For unsorted BTF, the implementation falls back to the original linear search. Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260109130003.3313716-5-dolinux.peng@gmail.com	2026-01-29 14:10:19 -08:00
Donglin Peng	b7c6c02b5f	libbpf: Add BTF permutation support for type reordering Introduce btf__permute() API to allow in-place rearrangement of BTF types. This function reorganizes BTF type order according to a provided array of type IDs, updating all type references to maintain consistency. Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20260109130003.3313716-2-dolinux.peng@gmail.com	2026-01-29 14:10:19 -08:00
Varun R Mallya	2c5038dcf4	libbpf: Fix OOB read in btf_dump_get_bitfield_value When dumping bitfield data, btf_dump_get_bitfield_value() reads data based on the underlying type's size (t->size). However, it does not verify that the provided data buffer (data_sz) is large enough to contain these bytes. If btf_dump__dump_type_data() is called with a buffer smaller than the type's size, this leads to an out-of-bounds read. This was confirmed by AddressSanitizer in the linked issue. Fix this by ensuring we do not read past the provided data_sz limit. Fixes: a1d3cc3c5eca ("libbpf: Avoid use of __int128 in typed dump display") Reported-by: Harrison Green <harrisonmichaelgreen@gmail.com> Suggested-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Varun R Mallya <varunrmallya@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260106233527.163487-1-varunrmallya@gmail.com Closes: https://github.com/libbpf/libbpf/issues/928	2026-01-29 14:10:19 -08:00
Leon Hwang	dc8673b28b	libbpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu maps Add libbpf support for the BPF_F_CPU flag for percpu maps by embedding the cpu info into the high 32 bits of: 1. flags: bpf_map_lookup_elem_flags(), bpf_map__lookup_elem(), bpf_map_update_elem() and bpf_map__update_elem() 2. opts->elem_flags: bpf_map_lookup_batch() and bpf_map_update_batch() And the flag can be BPF_F_ALL_CPUS, but cannot be 'BPF_F_CPU \| BPF_F_ALL_CPUS'. Behavior: * If the flag is BPF_F_ALL_CPUS, the update is applied across all CPUs. * If the flag is BPF_F_CPU, it updates value only to the specified CPU. * If the flag is BPF_F_CPU, lookup value only from the specified CPU. * lookup does not support BPF_F_ALL_CPUS. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260107022022.12843-7-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-01-29 14:10:19 -08:00
Leon Hwang	a6d7ceaaeb	bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags and check them for following APIs: * 'map_lookup_elem()' * 'map_update_elem()' * 'generic_map_lookup_batch()' * 'generic_map_update_batch()' And, get the correct value size for these APIs. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260107022022.12843-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-01-29 14:10:19 -08:00
Thomas Weißschuh	e64e125ef6	vfs: use UAPI types for new struct delegation definition Using libc types and headers from the UAPI headers is problematic as it introduces a dependency on a full C toolchain. Use the fixed-width integer types provided by the UAPI headers instead. Fixes: 1602bad16d7d ("vfs: expose delegation support to userland") Fixes: 4be9e04ebf75 ("vfs: add needed headers for new struct delegation definition") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20251203-uapi-fcntl-v1-1-490c67bf3425@linutronix.de Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-01-29 14:10:19 -08:00
Emil Tsalapatis	9dd6fda504	libbpf: Move arena globals to the end of the arena Arena globals are currently placed at the beginning of the arena by libbpf. This is convenient, but prevents users from reserving guard pages in the beginning of the arena to identify NULL pointer dereferences. Adjust the load logic to place the globals at the end of the arena instead. Also modify bpftool to set the arena pointer in the program's BPF skeleton to point to the globals. Users now call bpf_map__initial_value() to find the beginning of the arena mapping and use the arena pointer in the skeleton to determine which part of the mapping holds the arena globals and which part is free. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20251216173325.98465-5-emil@etsalapatis.com	2026-01-29 14:10:19 -08:00
Emil Tsalapatis	2c7fe6ec5d	libbpf: Turn relo_core->sym_off unsigned The symbols' relocation offsets in BPF are stored in an int field, but cannot actually be negative. When in the next patch libbpf relocates globals to the end of the arena, it is also possible to have valid offsets > 2GiB that are used to calculate the final relo offsets. Avoid accidentally interpreting large offsets as negative by turning the sym_off field unsigned. Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20251216173325.98465-4-emil@etsalapatis.com	2026-01-29 14:10:19 -08:00
Andrii Nakryiko	160423d498	ci: denylist flaky 'bpf_cookie/perf_event' selftest It keeps failing. It relies on perf_events so not super reliable. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2026-01-26 13:12:58 -08:00
Andrii Nakryiko	afb8b17bc5	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: f8c67d8550ee69ce684c7015b2c8c63cda24bbfb Checkpoint bpf-next commit: 6f0b824a61f212e9707ff68abcabfdfa4724b811 Baseline bpf commit: e427054ae7bc8b1268cf1989381a43885795616f Checkpoint bpf commit: 1d528e794f3db5d32279123a89957c44c4406a09 Alan Maguire (1): libbpf: Add debug messaging in dedup equivalence/identity matching Amery Hung (2): bpf: Support associating BPF program with struct_ops libbpf: Add support for associating BPF program with struct_ops Asbjørn Sloth Tønnesen (1): tools: ynl-gen: add regeneration comment Heiko Carstens (1): tools: Remove s390 compat support James Clark (1): perf: Add perf_event_attr::config4 Jeff Layton (2): vfs: expose delegation support to userland vfs: add needed headers for new struct delegation definition Jianyun Gao (1): libbpf: Fix some incorrect @param descriptions in the comment of libbpf.h Kuniyuki Iwashima (1): bpf: Introduce SK_BPF_BYPASS_PROT_MEM. Mikhail Gavrilov (1): libbpf: Fix -Wdiscarded-qualifiers under C23 Paul Houssel (1): libbpf: Fix BTF dedup to support recursive typedef definitions Peter Zijlstra (1): perf: Support deferred user unwind Samiullah Khawaja (1): net: Extend NAPI threaded polling to allow kthread based busy polling include/uapi/linux/bpf.h \| 19 ++++++ include/uapi/linux/fcntl.h \| 16 +++++ include/uapi/linux/netdev.h \| 2 + include/uapi/linux/perf_event.h \| 23 +++++++- src/bpf.c \| 19 ++++++ src/bpf.h \| 21 +++++++ src/btf.c \| 100 +++++++++++++++++++++++++------- src/libbpf.c \| 42 +++++++++++--- src/libbpf.h \| 43 ++++++++++---- src/libbpf.map \| 2 + src/usdt.c \| 2 - 11 files changed, 247 insertions(+), 42 deletions(-) Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2025-12-16 09:52:07 -08:00
Mikhail Gavrilov	fda2bfcb7a	libbpf: Fix -Wdiscarded-qualifiers under C23 glibc ≥ 2.42 (GCC 15) defaults to -std=gnu23, which promotes -Wdiscarded-qualifiers to an error. In C23, strstr() and strchr() return "const char ". Change variable types to const char where the pointers are never modified (res, sym_sfx, next_path). Suggested-by: Florian Weimer <fweimer@redhat.com> Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Link: https://lore.kernel.org/r/20251206092825.1471385-1-mikhail.v.gavrilov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-12-16 09:52:07 -08:00
Amery Hung	5635185147	libbpf: Add support for associating BPF program with struct_ops Add low-level wrapper and libbpf API for BPF_PROG_ASSOC_STRUCT_OPS command in the bpf() syscall. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251203233748.668365-4-ameryhung@gmail.com	2025-12-16 09:52:07 -08:00
Amery Hung	1a41b12b4f	bpf: Support associating BPF program with struct_ops Add a new BPF command BPF_PROG_ASSOC_STRUCT_OPS to allow associating a BPF program with a struct_ops map. This command takes a file descriptor of a struct_ops map and a BPF program and set prog->aux->st_ops_assoc to the kdata of the struct_ops map. The command does not accept a struct_ops program nor a non-struct_ops map. Programs of a struct_ops map is automatically associated with the map during map update. If a program is shared between two struct_ops maps, prog->aux->st_ops_assoc will be poisoned to indicate that the associated struct_ops is ambiguous. The pointer, once poisoned, cannot be reset since we have lost track of associated struct_ops. For other program types, the associated struct_ops map, once set, cannot be changed later. This restriction may be lifted in the future if there is a use case. A kernel helper bpf_prog_get_assoc_struct_ops() can be used to retrieve the associated struct_ops pointer. The returned pointer, if not NULL, is guaranteed to be valid and point to a fully updated struct_ops struct. For struct_ops program reused in multiple struct_ops map, the return will be NULL. prog->aux->st_ops_assoc is protected by bumping the refcount for non-struct_ops programs and RCU for struct_ops programs. Since it would be inefficient to track programs associated with a struct_ops map, every non-struct_ops program will bump the refcount of the map to make sure st_ops_assoc stays valid. For a struct_ops program, it is protected by RCU as map_free will wait for an RCU grace period before disassociating the program with the map. The helper must be called in BPF program context or RCU read-side critical section. struct_ops implementers should note that the struct_ops returned may not be initialized nor attached yet. The struct_ops implementer will be responsible for tracking and checking the state of the associated struct_ops map if the use case expects an initialized or attached struct_ops. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/bpf/20251203233748.668365-3-ameryhung@gmail.com	2025-12-16 09:52:07 -08:00
Alan Maguire	8ffe064aed	libbpf: Add debug messaging in dedup equivalence/identity matching We have seen a number of issues like [1]; failures to deduplicate key kernel data structures like task_struct. These are often hard to debug from pahole even with verbose output, especially when identity/equivalence checks fail deep in a nested struct comparison. Here we add debug messages of the form libbpf: STRUCT 'task_struct' size=2560 vlen=194 cand_id[54222] canon_id[102820] shallow-equal but not equiv for field#23 'sched_class': 0 These will be emitted during dedup from pahole when --verbose/-V is specified. This greatly helps identify exactly where dedup failures are experienced. [1] https://lore.kernel.org/bpf/b8e8b560-bce5-414b-846d-0da6d22a9983@oracle.com/ Changes since v1: - updated debug messages to refer to shallow-equal, added ids (Andrii) Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251203191507.55565-1-alan.maguire@oracle.com	2025-12-16 09:52:07 -08:00
Asbjørn Sloth Tønnesen	7ac4e3a670	tools: ynl-gen: add regeneration comment Add a comment on regeneration to the generated files. The comment is placed after the YNL-GEN line[1], as to not interfere with ynl-regen.sh's detection logic. [1] and after the optional YNL-ARG line. Link: https://lore.kernel.org/r/aR5m174O7pklKrMR@zx2c4.com/ Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251120174429.390574-3-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-16 09:52:07 -08:00
Samiullah Khawaja	cd173d0ea3	net: Extend NAPI threaded polling to allow kthread based busy polling Add a new state NAPI_STATE_THREADED_BUSY_POLL to the NAPI state enum to enable and disable threaded busy polling. When threaded busy polling is enabled for a NAPI, enable NAPI_STATE_THREADED also. When the threaded NAPI is scheduled, set NAPI_STATE_IN_BUSY_POLL to signal napi_complete_done not to rearm interrupts. Whenever NAPI_STATE_THREADED_BUSY_POLL is unset, the NAPI_STATE_IN_BUSY_POLL will be unset, napi_complete_done unsets the NAPI_STATE_SCHED_THREADED bit also, which in turn will make the kthread go to sleep. Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Martin Karsten <mkarsten@uwaterloo.ca> Tested-by: Martin Karsten <mkarsten@uwaterloo.ca> Link: https://patch.msgid.link/20251028203007.575686-2-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-16 09:52:07 -08:00
Kuniyuki Iwashima	3fe0a72123	bpf: Introduce SK_BPF_BYPASS_PROT_MEM. If a socket has sk->sk_bypass_prot_mem flagged, the socket opts out of the global protocol memory accounting. This is easily controlled by net.core.bypass_prot_mem sysctl, but it lacks flexibility. Let's support flagging (and clearing) sk->sk_bypass_prot_mem via bpf_setsockopt() at the BPF_CGROUP_INET_SOCK_CREATE hook. int val = 1; bpf_setsockopt(ctx, SOL_SOCKET, SK_BPF_BYPASS_PROT_MEM, &val, sizeof(val)); As with net.core.bypass_prot_mem, this is inherited to child sockets, and BPF always takes precedence over sysctl at socket(2) and accept(2). SK_BPF_BYPASS_PROT_MEM is only supported at BPF_CGROUP_INET_SOCK_CREATE and not supported on other hooks for some reasons: 1. UDP charges memory under sk->sk_receive_queue.lock instead of lock_sock() 2. Modifying the flag after skb is charged to sk requires such adjustment during bpf_setsockopt() and complicates the logic unnecessarily We can support other hooks later if a real use case justifies that. Most changes are inline and hard to trace, but a microbenchmark on __sk_mem_raise_allocated() during neper/tcp_stream showed that more samples completed faster with sk->sk_bypass_prot_mem == 1. This will be more visible under tcp_mem pressure (but it's not a fair comparison). # bpftrace -e 'kprobe:__sk_mem_raise_allocated { @start[tid] = nsecs; } kretprobe:__sk_mem_raise_allocated /@start[tid]/ { @end[tid] = nsecs - @start[tid]; @times = hist(@end[tid]); delete(@start[tid]); }' # tcp_stream -6 -F 1000 -N -T 256 Without bpf prog: [128, 256) 3846 \| \| [256, 512) 1505326 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [512, 1K) 1371006 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [1K, 2K) 198207 \|@@@@@@ \| [2K, 4K) 31199 \|@ \| With bpf prog in the next patch: (must be attached before tcp_stream) # bpftool prog load sk_bypass_prot_mem.bpf.o /sys/fs/bpf/test type cgroup/sock_create # bpftool cgroup attach /sys/fs/cgroup/test cgroup_inet_sock_create pinned /sys/fs/bpf/test [128, 256) 6413 \| \| [256, 512) 1868425 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [512, 1K) 1101697 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [1K, 2K) 117031 \|@@@@ \| [2K, 4K) 11773 \| \| Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://patch.msgid.link/20251014235604.3057003-6-kuniyu@google.com	2025-12-16 09:52:07 -08:00
Jianyun Gao	f561c42074	libbpf: Fix some incorrect @param descriptions in the comment of libbpf.h Fix up some of missing or incorrect @param descriptions for libbpf public APIs in libbpf.h. Signed-off-by: Jianyun Gao <jianyungao89@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251118033025.11804-1-jianyungao89@gmail.com	2025-12-16 09:52:07 -08:00
Paul Houssel	370271441c	libbpf: Fix BTF dedup to support recursive typedef definitions Handle recursive typedefs in BTF deduplication Pahole fails to encode BTF for some Go projects (e.g. Kubernetes and Podman) due to recursive type definitions that create reference loops not representable in C. These recursive typedefs trigger a failure in the BTF deduplication algorithm. This patch extends btf_dedup_ref_type() to properly handle potential recursion for BTF_KIND_TYPEDEF, similar to how recursion is already handled for BTF_KIND_STRUCT. This allows pahole to successfully generate BTF for Go binaries using recursive types without impacting existing C-based workflows. Suggested-by: Tristan d'Audibert <tristan.daudibert@gmail.com> Co-developed-by: Martin Horth <martin.horth@telecom-sudparis.eu> Co-developed-by: Ouail Derghal <ouail.derghal@imt-atlantique.fr> Co-developed-by: Guilhem Jazeron <guilhem.jazeron@inria.fr> Co-developed-by: Ludovic Paillat <ludovic.paillat@inria.fr> Co-developed-by: Robin Theveniaut <robin.theveniaut@irit.fr> Signed-off-by: Martin Horth <martin.horth@telecom-sudparis.eu> Signed-off-by: Ouail Derghal <ouail.derghal@imt-atlantique.fr> Signed-off-by: Guilhem Jazeron <guilhem.jazeron@inria.fr> Signed-off-by: Ludovic Paillat <ludovic.paillat@inria.fr> Signed-off-by: Robin Theveniaut <robin.theveniaut@irit.fr> Signed-off-by: Paul Houssel <paul.houssel@orange.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/bf00857b1e06f282aac12f6834de7396a7547ba6.1763037045.git.paul.houssel@orange.com	2025-12-16 09:52:07 -08:00
James Clark	8cc0f2c095	perf: Add perf_event_attr::config4 Arm FEAT_SPE_FDS adds the ability to filter on the data source of a packet using another 64-bits of event filtering control. As the existing perf_event_attr::configN fields are all used up for SPE PMU, an additional field is needed. Add a new 'config4' field. Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-12-16 09:52:07 -08:00
Heiko Carstens	9905b35d8a	tools: Remove s390 compat support Remove s390 compat support from everything within tools, since s390 compat support will be removed from the kernel. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Weißschuh <linux@weissschuh.net> # tools/nolibc selftests/nolibc Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> # selftests/vDSO Acked-by: Alexei Starovoitov <ast@kernel.org> # bpf bits Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-12-16 09:52:07 -08:00
Peter Zijlstra	8d178bd7b6	perf: Support deferred user unwind Add support for deferred userspace unwind to perf. Where perf currently relies on in-place stack unwinding; from NMI context and all that. This moves the userspace part of the unwind to right before the return-to-userspace. This has two distinct benefits, the biggest is that it moves the unwind to a faultable context. It becomes possible to fault in debug info (.eh_frame, SFrame etc.) that might not otherwise be readily available. And secondly, it de-duplicates the user callchain where multiple samples happen during the same kernel entry. To facilitate this the perf interface is extended with a new record type: PERF_RECORD_CALLCHAIN_DEFERRED and two new attribute flags: perf_event_attr::defer_callchain - to request the user unwind be deferred perf_event_attr::defer_output - to request PERF_RECORD_CALLCHAIN_DEFERRED records The existing PERF_RECORD_SAMPLE callchain section gets a new context type: PERF_CONTEXT_USER_DEFERRED After which will come a single entry, denoting the 'cookie' of the deferred callchain that should be attached here, matching the 'cookie' field of the above mentioned PERF_RECORD_CALLCHAIN_DEFERRED. The 'defer_callchain' flag is expected on all events with PERF_SAMPLE_CALLCHAIN. The 'defer_output' flag is expect on the event responsible for collecting side-band events (like mmap, comm etc.). Setting 'defer_output' on multiple events will get you duplicated PERF_RECORD_CALLCHAIN_DEFERRED records. Based on earlier patches by Josh and Steven. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251023150002.GR4067720@noisy.programming.kicks-ass.net	2025-12-16 09:52:07 -08:00
Jeff Layton	530f40421a	vfs: add needed headers for new struct delegation definition The definition of struct delegation uses stdint.h integer types. Add the necessary headers to ensure that always works. Fixes: 1602bad16d7d ("vfs: expose delegation support to userland") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-16 09:52:07 -08:00
Jeff Layton	4f10610ae5	vfs: expose delegation support to userland Now that support for recallable directory delegations is available, expose this functionality to userland with new F_SETDELEG and F_GETDELEG commands for fcntl(). Note that this also allows userland to request a FL_DELEG type lease on files too. Userland applications that do will get signalled when there are metadata changes in addition to just data changes (which is a limitation of FL_LEASE leases). These commands accept a new "struct delegation" argument that contains a flags field for future expansion. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-17-52f3feebb2f2@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-16 09:52:07 -08:00
Andrii Nakryiko	d65dbb412d	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: de7342228b7343774d6a9981c2ddbfb5e201044b Checkpoint bpf-next commit: f8c67d8550ee69ce684c7015b2c8c63cda24bbfb Baseline bpf commit: 4d920ed684392ae064af62957d6f5a90312dfaf6 Checkpoint bpf commit: e427054ae7bc8b1268cf1989381a43885795616f Alan Maguire (1): libbpf: Fix parsing of multi-split BTF Andrii Nakryiko (1): libbpf: Fix powerpc's stack register definition in bpf_tracing.h Anton Protopopov (4): libbpf: fix formatting of bpf_object__append_subprog_code bpf, x86: add new map type: instructions array libbpf: Recognize insn_array map type libbpf: support llvm-generated indirect jumps Donald Hunter (1): docs/bpf: Add missing BPF k/uprobe program types to docs Jianyun Gao (4): libbpf: Optimize the redundant code in the bpf_object__init_user_btf_maps() function. libbpf: Fix the incorrect reference to the memlock_rlim variable in the comment. libbpf: Complete the missing @param and @return tags in btf.h libbpf: Update the comment to remove the reference to the deprecated interface bpf_program__load(). Mykyta Yatsenko (2): bpf: widen dynptr size/offset to 64 bit bpf: add _impl suffix for bpf_stream_vprintk() kfunc Xu Kuohai (1): bpf: Add overwrite mode for BPF ring buffer docs/program_types.rst \| 18 +++ include/uapi/linux/bpf.h \| 33 ++++- src/bpf.c \| 2 +- src/bpf_helpers.h \| 28 ++-- src/bpf_tracing.h \| 2 +- src/btf.c \| 4 +- src/btf.h \| 8 ++ src/libbpf.c \| 296 +++++++++++++++++++++++++++++++++++---- src/libbpf_internal.h \| 2 + src/libbpf_probes.c \| 4 + src/linker.c \| 3 + 11 files changed, 353 insertions(+), 47 deletions(-) Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2025-11-07 14:00:07 -08:00
Andrii Nakryiko	befbf010d7	sync: auto-generate latest BPF helpers Latest changes to BPF helper definitions. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2025-11-07 14:00:07 -08:00
Mykyta Yatsenko	a00b10df8c	bpf: add _impl suffix for bpf_stream_vprintk() kfunc Rename bpf_stream_vprintk() to bpf_stream_vprintk_impl(). This makes bpf_stream_vprintk() follow the already established "_impl" suffix-based naming convention for kfuncs with the bpf_prog_aux argument provided by the verifier implicitly. This convention will be taken advantage of with the upcoming KF_IMPLICIT_ARGS feature to preserve backwards compatibility to BPF programs. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20251104-implv2-v3-2-4772b9ae0e06@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-11-07 14:00:07 -08:00
Anton Protopopov	24a89cb35d	libbpf: support llvm-generated indirect jumps For v4 instruction set LLVM is allowed to generate indirect jumps for switch statements and for 'goto *rX' assembly. Every such a jump will be accompanied by necessary metadata, e.g. (`llvm-objdump -Sr ...`): 0: r2 = 0x0 ll 0000000000000030: R_BPF_64_64 BPF.JT.0.0 Here BPF.JT.1.0 is a symbol residing in the .jumptables section: Symbol table: 4: 0000000000000000 240 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0 The -bpf-min-jump-table-entries llvm option may be used to control the minimal size of a switch which will be converted to an indirect jumps. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251105090410.1250500-11-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-07 14:00:07 -08:00
Anton Protopopov	349b78117b	libbpf: Recognize insn_array map type Teach libbpf about the existence of the new instruction array map. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Link: https://lore.kernel.org/r/20251105090410.1250500-4-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-07 14:00:07 -08:00
Anton Protopopov	9d159773c5	bpf, x86: add new map type: instructions array On bpf(BPF_PROG_LOAD) syscall user-supplied BPF programs are translated by the verifier into "xlated" BPF programs. During this process the original instructions offsets might be adjusted and/or individual instructions might be replaced by new sets of instructions, or deleted. Add a new BPF map type which is aimed to keep track of how, for a given program, the original instructions were relocated during the verification. Also, besides keeping track of the original -> xlated mapping, make x86 JIT to build the xlated -> jitted mapping for every instruction listed in an instruction array. This is required for every future application of instruction arrays: static keys, indirect jumps and indirect calls. A map of the BPF_MAP_TYPE_INSN_ARRAY type must be created with a u32 keys and value of size 8. The values have different semantics for userspace and for BPF space. For userspace a value consists of two u32 values – xlated and jitted offsets. For BPF side the value is a real pointer to a jitted instruction. On map creation/initialization, before loading the program, each element of the map should be initialized to point to an instruction offset within the program. Before the program load such maps should be made frozen. After the program verification xlated and jitted offsets can be read via the bpf(2) syscall. If a tracked instruction is removed by the verifier, then the xlated offset is set to (u32)-1 which is considered to be too big for a valid BPF program offset. One such a map can, obviously, be used to track one and only one BPF program. If the verification process was unsuccessful, then the same map can be re-used to verify the program with a different log level. However, if the program was loaded fine, then such a map, being frozen in any case, can't be reused by other programs even after the program release. Example. Consider the following original and xlated programs: Original prog: Xlated prog: 0: r1 = 0x0 0: r1 = 0 1: (u32 )(r10 - 0x4) = r1 1: (u32 )(r10 -4) = r1 2: r2 = r10 2: r2 = r10 3: r2 += -0x4 3: r2 += -4 4: r1 = 0x0 ll 4: r1 = map[id:88] 6: call 0x1 6: r1 += 272 7: r0 = (u32 )(r2 +0) 8: if r0 >= 0x1 goto pc+3 9: r0 <<= 3 10: r0 += r1 11: goto pc+1 12: r0 = 0 7: r6 = r0 13: r6 = r0 8: if r6 == 0x0 goto +0x2 14: if r6 == 0x0 goto pc+4 9: call 0x76 15: r0 = 0xffffffff8d2079c0 17: r0 = (u64 )(r0 +0) 10: (u64 )(r6 + 0x0) = r0 18: (u64 )(r6 +0) = r0 11: r0 = 0x0 19: r0 = 0x0 12: exit 20: exit An instruction array map, containing, e.g., instructions [0,4,7,12] will be translated by the verifier to [0,4,13,20]. A map with index 5 (the middle of 16-byte instruction) or indexes greater than 12 (outside the program boundaries) would be rejected. The functionality provided by this patch will be extended in consequent patches to implement BPF Static Keys, indirect jumps, and indirect calls. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251105090410.1250500-2-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-07 14:00:07 -08:00
Alan Maguire	0e14a12a1d	libbpf: Fix parsing of multi-split BTF When creating multi-split BTF we correctly set the start string offset to be the size of the base string section plus the base BTF start string offset; the latter is needed for multi-split BTF since the offset is non-zero there. Unfortunately the BTF parsing case needed that logic and it was missed. Fixes: 4e29128a9ace ("libbpf/btf: Fix string handling to support multi-split BTF") Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251104203309.318429-2-alan.maguire@oracle.com	2025-11-07 14:00:07 -08:00
Donald Hunter	813fbe13ab	docs/bpf: Add missing BPF k/uprobe program types to docs Update the table of program types in the libbpf docs with the missing k/uprobe multi and session program types. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251029180932.98038-1-donald.hunter@gmail.com	2025-11-07 14:00:07 -08:00
Jianyun Gao	fd00fd999f	libbpf: Update the comment to remove the reference to the deprecated interface bpf_program__load(). Commit be2f2d1680df ("libbpf: Deprecate bpf_program__load() API") marked bpf_program__load() as deprecated starting with libbpf v0.6. And later in commit 146bf811f5ac ("libbpf: remove most other deprecated high-level APIs") actually removed the bpf_program__load() implementation and related old high-level APIs. This patch update the comment in bpf_program__set_attach_target() to remove the reference to the deprecated interface bpf_program__load(). Signed-off-by: Jianyun Gao <jianyungao89@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251103120727.145965-1-jianyungao89@gmail.com	2025-11-07 14:00:07 -08:00
Jianyun Gao	f4b32db745	libbpf: Complete the missing @param and @return tags in btf.h Complete the missing @param and @return tags in the Doxygen comments of the btf.h file. Signed-off-by: Jianyun Gao <jianyungao89@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251103115836.144339-1-jianyungao89@gmail.com	2025-11-07 14:00:07 -08:00
Andrii Nakryiko	99bf90957a	libbpf: Fix powerpc's stack register definition in bpf_tracing.h retsnoop's build on powerpc (ppc64le) architecture ([0]) failed due to wrong definition of PT_REGS_SP() macro. Looking at powerpc's implementation of stack unwinding in perf_callchain_user_64() clearly shows that stack pointer register is gpr[1]. Fix libbpf's definition of __PT_SP_REG for powerpc to fix all this. [0] https://kojipkgs.fedoraproject.org/work/tasks/1544/137921544/build.log Fixes: 138d6153a139 ("samples/bpf: Enable powerpc support") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/20251020203643.989467-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-07 14:00:07 -08:00
Jianyun Gao	98b6e51fc6	libbpf: Fix the incorrect reference to the memlock_rlim variable in the comment. The variable "memlock_rlim_max" referenced in the comment does not exist. I think that the author probably meant the variable "memlock_rlim". So, correct it. Signed-off-by: Jianyun Gao <jianyungao89@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251027032008.738944-1-jianyungao89@gmail.com	2025-11-07 14:00:07 -08:00
Jianyun Gao	bc3b400e06	libbpf: Optimize the redundant code in the bpf_object__init_user_btf_maps() function. In the elf_sec_data() function, the input parameter 'scn' will be evaluated. If it is NULL, then it will directly return NULL. Therefore, the return value of the elf_sec_data() function already takes into account the case where the input parameter scn is NULL. Therefore, subsequently, the code only needs to check whether the return value of the elf_sec_data() function is NULL. Signed-off-by: Jianyun Gao <jianyungao89@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20251024080802.642189-1-jianyungao89@gmail.com	2025-11-07 14:00:07 -08:00
Xu Kuohai	665ad8c7f7	bpf: Add overwrite mode for BPF ring buffer When the BPF ring buffer is full, a new event cannot be recorded until one or more old events are consumed to make enough space for it. In cases such as fault diagnostics, where recent events are more useful than older ones, this mechanism may lead to critical events being lost. So add overwrite mode for BPF ring buffer to address it. In this mode, the new event overwrites the oldest event when the buffer is full. The basic idea is as follows: 1. producer_pos tracks the next position to record new event. When there is enough free space, producer_pos is simply advanced by producer to make space for the new event. 2. To avoid waiting for consumer when the buffer is full, a new variable, overwrite_pos, is introduced for producer. It points to the oldest event committed in the buffer. It is advanced by producer to discard one or more oldest events to make space for the new event when the buffer is full. 3. pending_pos tracks the oldest event to be committed. pending_pos is never passed by producer_pos, so multiple producers never write to the same position at the same time. The following example diagrams show how it works in a 4096-byte ring buffer. 1. At first, {producer,overwrite,pending,consumer}_pos are all set to 0. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| +-----------------------------------------------------------------------+ ^ \| \| producer_pos = 0 overwrite_pos = 0 pending_pos = 0 consumer_pos = 0 2. Now reserve a 512-byte event A. There is enough free space, so A is allocated at offset 0. And producer_pos is advanced to 512, the end of A. Since A is not submitted, the BUSY bit is set. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| A \| \| \| [BUSY] \| \| +-----------------------------------------------------------------------+ ^ ^ \| \| \| \| \| producer_pos = 512 \| overwrite_pos = 0 pending_pos = 0 consumer_pos = 0 3. Reserve event B, size 1024. B is allocated at offset 512 with BUSY bit set, and producer_pos is advanced to the end of B. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| A \| B \| \| \| [BUSY] \| [BUSY] \| \| +-----------------------------------------------------------------------+ ^ ^ \| \| \| \| \| producer_pos = 1536 \| overwrite_pos = 0 pending_pos = 0 consumer_pos = 0 4. Reserve event C, size 2048. C is allocated at offset 1536, and producer_pos is advanced to 3584. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| A \| B \| C \| \| \| [BUSY] \| [BUSY] \| [BUSY] \| \| +-----------------------------------------------------------------------+ ^ ^ \| \| \| \| \| producer_pos = 3584 \| overwrite_pos = 0 pending_pos = 0 consumer_pos = 0 5. Submit event A. The BUSY bit of A is cleared. B becomes the oldest event to be committed, so pending_pos is advanced to 512, the start of B. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| A \| B \| C \| \| \| \| [BUSY] \| [BUSY] \| \| +-----------------------------------------------------------------------+ ^ ^ ^ \| \| \| \| \| \| \| pending_pos = 512 producer_pos = 3584 \| overwrite_pos = 0 consumer_pos = 0 6. Submit event B. The BUSY bit of B is cleared, and pending_pos is advanced to the start of C, which is now the oldest event to be committed. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| A \| B \| C \| \| \| \| \| [BUSY] \| \| +-----------------------------------------------------------------------+ ^ ^ ^ \| \| \| \| \| \| \| pending_pos = 1536 producer_pos = 3584 \| overwrite_pos = 0 consumer_pos = 0 7. Reserve event D, size 1536 (3 * 512). There are 2048 bytes not being written between producer_pos (currently 3584) and pending_pos, so D is allocated at offset 3584, and producer_pos is advanced by 1536 (from 3584 to 5120). Since event D will overwrite all bytes of event A and the first 512 bytes of event B, overwrite_pos is advanced to the start of event C, the oldest event that is not overwritten. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| D End \| \| C \| D Begin\| \| [BUSY] \| \| [BUSY] \| [BUSY] \| +-----------------------------------------------------------------------+ ^ ^ ^ \| \| \| \| \| pending_pos = 1536 \| \| overwrite_pos = 1536 \| \| \| producer_pos=5120 \| consumer_pos = 0 8. Reserve event E, size 1024. Although there are 512 bytes not being written between producer_pos and pending_pos, E cannot be reserved, as it would overwrite the first 512 bytes of event C, which is still being written. 9. Submit event C and D. pending_pos is advanced to the end of D. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| D End \| \| C \| D Begin\| \| \| \| \| \| +-----------------------------------------------------------------------+ ^ ^ ^ \| \| \| \| \| overwrite_pos = 1536 \| \| \| producer_pos=5120 \| pending_pos=5120 \| consumer_pos = 0 The performance data for overwrite mode will be provided in a follow-up patch that adds overwrite-mode benchmarks. A sample of performance data for non-overwrite mode, collected on an x86_64 CPU and an arm64 CPU, before and after this patch, is shown below. As we can see, no obvious performance regression occurs. - x86_64 (AMD EPYC 9654) Before: Ringbuf, multi-producer contention ================================== rb-libbpf nr_prod 1 11.623 ± 0.027M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 2 15.812 ± 0.014M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 3 7.871 ± 0.003M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 4 6.703 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 8 2.896 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 12 2.054 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 16 1.864 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 20 1.580 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 24 1.484 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 28 1.369 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 32 1.316 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 36 1.272 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 40 1.239 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 44 1.226 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 48 1.213 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 52 1.193 ± 0.001M/s (drops 0.000 ± 0.000M/s) After: Ringbuf, multi-producer contention ================================== rb-libbpf nr_prod 1 11.845 ± 0.036M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 2 15.889 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 3 8.155 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 4 6.708 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 8 2.918 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 12 2.065 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 16 1.870 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 20 1.582 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 24 1.482 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 28 1.372 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 32 1.323 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 36 1.264 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 40 1.236 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 44 1.209 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 48 1.189 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 52 1.165 ± 0.002M/s (drops 0.000 ± 0.000M/s) - arm64 (HiSilicon Kunpeng 920) Before: Ringbuf, multi-producer contention ================================== rb-libbpf nr_prod 1 11.310 ± 0.623M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 2 9.947 ± 0.004M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 3 6.634 ± 0.011M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 4 4.502 ± 0.003M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 8 3.888 ± 0.003M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 12 3.372 ± 0.005M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 16 3.189 ± 0.010M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 20 2.998 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 24 3.086 ± 0.018M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 28 2.845 ± 0.004M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 32 2.815 ± 0.008M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 36 2.771 ± 0.009M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 40 2.814 ± 0.011M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 44 2.752 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 48 2.695 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 52 2.710 ± 0.006M/s (drops 0.000 ± 0.000M/s) After: Ringbuf, multi-producer contention ================================== rb-libbpf nr_prod 1 11.283 ± 0.550M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 2 9.993 ± 0.003M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 3 6.898 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 4 5.257 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 8 3.830 ± 0.005M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 12 3.528 ± 0.013M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 16 3.265 ± 0.018M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 20 2.990 ± 0.007M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 24 2.929 ± 0.014M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 28 2.898 ± 0.010M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 32 2.818 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 36 2.789 ± 0.012M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 40 2.770 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 44 2.651 ± 0.007M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 48 2.669 ± 0.005M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 52 2.695 ± 0.009M/s (drops 0.000 ± 0.000M/s) Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251018035738.4039621-2-xukuohai@huaweicloud.com	2025-11-07 14:00:07 -08:00
Mykyta Yatsenko	42995c95b9	bpf: widen dynptr size/offset to 64 bit Dynptr currently caps size and offset at 24 bits, which isn’t sufficient for file-backed use cases; even 32 bits can be limiting. Refactor dynptr helpers/kfuncs to use 64-bit size and offset, ensuring consistency across the APIs. This change does not affect internals of xdp, skb or other dynptrs, which continue to behave as before. Also it does not break binary compatibility. The widening enables large-file access support via dynptr, implemented in the next patches. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251026203853.135105-3-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-07 14:00:07 -08:00
Anton Protopopov	49c5e0eef4	libbpf: fix formatting of bpf_object__append_subprog_code The commit 6c918709bd30 ("libbpf: Refactor bpf_object__reloc_code") added the bpf_object__append_subprog_code() with incorrect indentations. Use tabs instead. (This also makes a consequent commit better readable.) Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20251019202145.3944697-14-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-07 14:00:07 -08:00
Alain Knaff	30599e72bf	include: add __poll_t typedef in include/linux/types.h for Android/Termux On Android/Termux, linux/types.h is included (indirectly) by sys/epoll.h which depends on this definition to be present. Signed-off-by: Alain Knaff <github@misc.lka.org.lu>	2025-11-07 12:35:07 -08:00

1 2 3 4 5 ...

2767 Commits