libbpf

mirror of https://github.com/netdata/libbpf.git synced 2026-05-02 05:29:12 +08:00

Author	SHA1	Message	Date
Asbjørn Sloth Tønnesen	7ac4e3a670	tools: ynl-gen: add regeneration comment Add a comment on regeneration to the generated files. The comment is placed after the YNL-GEN line[1], as to not interfere with ynl-regen.sh's detection logic. [1] and after the optional YNL-ARG line. Link: https://lore.kernel.org/r/aR5m174O7pklKrMR@zx2c4.com/ Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251120174429.390574-3-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-16 09:52:07 -08:00
Samiullah Khawaja	cd173d0ea3	net: Extend NAPI threaded polling to allow kthread based busy polling Add a new state NAPI_STATE_THREADED_BUSY_POLL to the NAPI state enum to enable and disable threaded busy polling. When threaded busy polling is enabled for a NAPI, enable NAPI_STATE_THREADED also. When the threaded NAPI is scheduled, set NAPI_STATE_IN_BUSY_POLL to signal napi_complete_done not to rearm interrupts. Whenever NAPI_STATE_THREADED_BUSY_POLL is unset, the NAPI_STATE_IN_BUSY_POLL will be unset, napi_complete_done unsets the NAPI_STATE_SCHED_THREADED bit also, which in turn will make the kthread go to sleep. Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Martin Karsten <mkarsten@uwaterloo.ca> Tested-by: Martin Karsten <mkarsten@uwaterloo.ca> Link: https://patch.msgid.link/20251028203007.575686-2-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-16 09:52:07 -08:00
Kuniyuki Iwashima	3fe0a72123	bpf: Introduce SK_BPF_BYPASS_PROT_MEM. If a socket has sk->sk_bypass_prot_mem flagged, the socket opts out of the global protocol memory accounting. This is easily controlled by net.core.bypass_prot_mem sysctl, but it lacks flexibility. Let's support flagging (and clearing) sk->sk_bypass_prot_mem via bpf_setsockopt() at the BPF_CGROUP_INET_SOCK_CREATE hook. int val = 1; bpf_setsockopt(ctx, SOL_SOCKET, SK_BPF_BYPASS_PROT_MEM, &val, sizeof(val)); As with net.core.bypass_prot_mem, this is inherited to child sockets, and BPF always takes precedence over sysctl at socket(2) and accept(2). SK_BPF_BYPASS_PROT_MEM is only supported at BPF_CGROUP_INET_SOCK_CREATE and not supported on other hooks for some reasons: 1. UDP charges memory under sk->sk_receive_queue.lock instead of lock_sock() 2. Modifying the flag after skb is charged to sk requires such adjustment during bpf_setsockopt() and complicates the logic unnecessarily We can support other hooks later if a real use case justifies that. Most changes are inline and hard to trace, but a microbenchmark on __sk_mem_raise_allocated() during neper/tcp_stream showed that more samples completed faster with sk->sk_bypass_prot_mem == 1. This will be more visible under tcp_mem pressure (but it's not a fair comparison). # bpftrace -e 'kprobe:__sk_mem_raise_allocated { @start[tid] = nsecs; } kretprobe:__sk_mem_raise_allocated /@start[tid]/ { @end[tid] = nsecs - @start[tid]; @times = hist(@end[tid]); delete(@start[tid]); }' # tcp_stream -6 -F 1000 -N -T 256 Without bpf prog: [128, 256) 3846 \| \| [256, 512) 1505326 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [512, 1K) 1371006 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [1K, 2K) 198207 \|@@@@@@ \| [2K, 4K) 31199 \|@ \| With bpf prog in the next patch: (must be attached before tcp_stream) # bpftool prog load sk_bypass_prot_mem.bpf.o /sys/fs/bpf/test type cgroup/sock_create # bpftool cgroup attach /sys/fs/cgroup/test cgroup_inet_sock_create pinned /sys/fs/bpf/test [128, 256) 6413 \| \| [256, 512) 1868425 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [512, 1K) 1101697 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [1K, 2K) 117031 \|@@@@ \| [2K, 4K) 11773 \| \| Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://patch.msgid.link/20251014235604.3057003-6-kuniyu@google.com	2025-12-16 09:52:07 -08:00
Jianyun Gao	f561c42074	libbpf: Fix some incorrect @param descriptions in the comment of libbpf.h Fix up some of missing or incorrect @param descriptions for libbpf public APIs in libbpf.h. Signed-off-by: Jianyun Gao <jianyungao89@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251118033025.11804-1-jianyungao89@gmail.com	2025-12-16 09:52:07 -08:00
Paul Houssel	370271441c	libbpf: Fix BTF dedup to support recursive typedef definitions Handle recursive typedefs in BTF deduplication Pahole fails to encode BTF for some Go projects (e.g. Kubernetes and Podman) due to recursive type definitions that create reference loops not representable in C. These recursive typedefs trigger a failure in the BTF deduplication algorithm. This patch extends btf_dedup_ref_type() to properly handle potential recursion for BTF_KIND_TYPEDEF, similar to how recursion is already handled for BTF_KIND_STRUCT. This allows pahole to successfully generate BTF for Go binaries using recursive types without impacting existing C-based workflows. Suggested-by: Tristan d'Audibert <tristan.daudibert@gmail.com> Co-developed-by: Martin Horth <martin.horth@telecom-sudparis.eu> Co-developed-by: Ouail Derghal <ouail.derghal@imt-atlantique.fr> Co-developed-by: Guilhem Jazeron <guilhem.jazeron@inria.fr> Co-developed-by: Ludovic Paillat <ludovic.paillat@inria.fr> Co-developed-by: Robin Theveniaut <robin.theveniaut@irit.fr> Signed-off-by: Martin Horth <martin.horth@telecom-sudparis.eu> Signed-off-by: Ouail Derghal <ouail.derghal@imt-atlantique.fr> Signed-off-by: Guilhem Jazeron <guilhem.jazeron@inria.fr> Signed-off-by: Ludovic Paillat <ludovic.paillat@inria.fr> Signed-off-by: Robin Theveniaut <robin.theveniaut@irit.fr> Signed-off-by: Paul Houssel <paul.houssel@orange.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/bf00857b1e06f282aac12f6834de7396a7547ba6.1763037045.git.paul.houssel@orange.com	2025-12-16 09:52:07 -08:00
James Clark	8cc0f2c095	perf: Add perf_event_attr::config4 Arm FEAT_SPE_FDS adds the ability to filter on the data source of a packet using another 64-bits of event filtering control. As the existing perf_event_attr::configN fields are all used up for SPE PMU, an additional field is needed. Add a new 'config4' field. Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-12-16 09:52:07 -08:00
Heiko Carstens	9905b35d8a	tools: Remove s390 compat support Remove s390 compat support from everything within tools, since s390 compat support will be removed from the kernel. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Weißschuh <linux@weissschuh.net> # tools/nolibc selftests/nolibc Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> # selftests/vDSO Acked-by: Alexei Starovoitov <ast@kernel.org> # bpf bits Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-12-16 09:52:07 -08:00
Peter Zijlstra	8d178bd7b6	perf: Support deferred user unwind Add support for deferred userspace unwind to perf. Where perf currently relies on in-place stack unwinding; from NMI context and all that. This moves the userspace part of the unwind to right before the return-to-userspace. This has two distinct benefits, the biggest is that it moves the unwind to a faultable context. It becomes possible to fault in debug info (.eh_frame, SFrame etc.) that might not otherwise be readily available. And secondly, it de-duplicates the user callchain where multiple samples happen during the same kernel entry. To facilitate this the perf interface is extended with a new record type: PERF_RECORD_CALLCHAIN_DEFERRED and two new attribute flags: perf_event_attr::defer_callchain - to request the user unwind be deferred perf_event_attr::defer_output - to request PERF_RECORD_CALLCHAIN_DEFERRED records The existing PERF_RECORD_SAMPLE callchain section gets a new context type: PERF_CONTEXT_USER_DEFERRED After which will come a single entry, denoting the 'cookie' of the deferred callchain that should be attached here, matching the 'cookie' field of the above mentioned PERF_RECORD_CALLCHAIN_DEFERRED. The 'defer_callchain' flag is expected on all events with PERF_SAMPLE_CALLCHAIN. The 'defer_output' flag is expect on the event responsible for collecting side-band events (like mmap, comm etc.). Setting 'defer_output' on multiple events will get you duplicated PERF_RECORD_CALLCHAIN_DEFERRED records. Based on earlier patches by Josh and Steven. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251023150002.GR4067720@noisy.programming.kicks-ass.net	2025-12-16 09:52:07 -08:00
Jeff Layton	530f40421a	vfs: add needed headers for new struct delegation definition The definition of struct delegation uses stdint.h integer types. Add the necessary headers to ensure that always works. Fixes: 1602bad16d7d ("vfs: expose delegation support to userland") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-16 09:52:07 -08:00
Jeff Layton	4f10610ae5	vfs: expose delegation support to userland Now that support for recallable directory delegations is available, expose this functionality to userland with new F_SETDELEG and F_GETDELEG commands for fcntl(). Note that this also allows userland to request a FL_DELEG type lease on files too. Userland applications that do will get signalled when there are metadata changes in addition to just data changes (which is a limitation of FL_LEASE leases). These commands accept a new "struct delegation" argument that contains a flags field for future expansion. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-17-52f3feebb2f2@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-16 09:52:07 -08:00
Andrii Nakryiko	d65dbb412d	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: de7342228b7343774d6a9981c2ddbfb5e201044b Checkpoint bpf-next commit: f8c67d8550ee69ce684c7015b2c8c63cda24bbfb Baseline bpf commit: 4d920ed684392ae064af62957d6f5a90312dfaf6 Checkpoint bpf commit: e427054ae7bc8b1268cf1989381a43885795616f Alan Maguire (1): libbpf: Fix parsing of multi-split BTF Andrii Nakryiko (1): libbpf: Fix powerpc's stack register definition in bpf_tracing.h Anton Protopopov (4): libbpf: fix formatting of bpf_object__append_subprog_code bpf, x86: add new map type: instructions array libbpf: Recognize insn_array map type libbpf: support llvm-generated indirect jumps Donald Hunter (1): docs/bpf: Add missing BPF k/uprobe program types to docs Jianyun Gao (4): libbpf: Optimize the redundant code in the bpf_object__init_user_btf_maps() function. libbpf: Fix the incorrect reference to the memlock_rlim variable in the comment. libbpf: Complete the missing @param and @return tags in btf.h libbpf: Update the comment to remove the reference to the deprecated interface bpf_program__load(). Mykyta Yatsenko (2): bpf: widen dynptr size/offset to 64 bit bpf: add _impl suffix for bpf_stream_vprintk() kfunc Xu Kuohai (1): bpf: Add overwrite mode for BPF ring buffer docs/program_types.rst \| 18 +++ include/uapi/linux/bpf.h \| 33 ++++- src/bpf.c \| 2 +- src/bpf_helpers.h \| 28 ++-- src/bpf_tracing.h \| 2 +- src/btf.c \| 4 +- src/btf.h \| 8 ++ src/libbpf.c \| 296 +++++++++++++++++++++++++++++++++++---- src/libbpf_internal.h \| 2 + src/libbpf_probes.c \| 4 + src/linker.c \| 3 + 11 files changed, 353 insertions(+), 47 deletions(-) Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2025-11-07 14:00:07 -08:00
Andrii Nakryiko	befbf010d7	sync: auto-generate latest BPF helpers Latest changes to BPF helper definitions. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2025-11-07 14:00:07 -08:00
Mykyta Yatsenko	a00b10df8c	bpf: add _impl suffix for bpf_stream_vprintk() kfunc Rename bpf_stream_vprintk() to bpf_stream_vprintk_impl(). This makes bpf_stream_vprintk() follow the already established "_impl" suffix-based naming convention for kfuncs with the bpf_prog_aux argument provided by the verifier implicitly. This convention will be taken advantage of with the upcoming KF_IMPLICIT_ARGS feature to preserve backwards compatibility to BPF programs. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20251104-implv2-v3-2-4772b9ae0e06@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-11-07 14:00:07 -08:00
Anton Protopopov	24a89cb35d	libbpf: support llvm-generated indirect jumps For v4 instruction set LLVM is allowed to generate indirect jumps for switch statements and for 'goto *rX' assembly. Every such a jump will be accompanied by necessary metadata, e.g. (`llvm-objdump -Sr ...`): 0: r2 = 0x0 ll 0000000000000030: R_BPF_64_64 BPF.JT.0.0 Here BPF.JT.1.0 is a symbol residing in the .jumptables section: Symbol table: 4: 0000000000000000 240 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0 The -bpf-min-jump-table-entries llvm option may be used to control the minimal size of a switch which will be converted to an indirect jumps. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251105090410.1250500-11-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-07 14:00:07 -08:00
Anton Protopopov	349b78117b	libbpf: Recognize insn_array map type Teach libbpf about the existence of the new instruction array map. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Link: https://lore.kernel.org/r/20251105090410.1250500-4-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-07 14:00:07 -08:00
Anton Protopopov	9d159773c5	bpf, x86: add new map type: instructions array On bpf(BPF_PROG_LOAD) syscall user-supplied BPF programs are translated by the verifier into "xlated" BPF programs. During this process the original instructions offsets might be adjusted and/or individual instructions might be replaced by new sets of instructions, or deleted. Add a new BPF map type which is aimed to keep track of how, for a given program, the original instructions were relocated during the verification. Also, besides keeping track of the original -> xlated mapping, make x86 JIT to build the xlated -> jitted mapping for every instruction listed in an instruction array. This is required for every future application of instruction arrays: static keys, indirect jumps and indirect calls. A map of the BPF_MAP_TYPE_INSN_ARRAY type must be created with a u32 keys and value of size 8. The values have different semantics for userspace and for BPF space. For userspace a value consists of two u32 values – xlated and jitted offsets. For BPF side the value is a real pointer to a jitted instruction. On map creation/initialization, before loading the program, each element of the map should be initialized to point to an instruction offset within the program. Before the program load such maps should be made frozen. After the program verification xlated and jitted offsets can be read via the bpf(2) syscall. If a tracked instruction is removed by the verifier, then the xlated offset is set to (u32)-1 which is considered to be too big for a valid BPF program offset. One such a map can, obviously, be used to track one and only one BPF program. If the verification process was unsuccessful, then the same map can be re-used to verify the program with a different log level. However, if the program was loaded fine, then such a map, being frozen in any case, can't be reused by other programs even after the program release. Example. Consider the following original and xlated programs: Original prog: Xlated prog: 0: r1 = 0x0 0: r1 = 0 1: (u32 )(r10 - 0x4) = r1 1: (u32 )(r10 -4) = r1 2: r2 = r10 2: r2 = r10 3: r2 += -0x4 3: r2 += -4 4: r1 = 0x0 ll 4: r1 = map[id:88] 6: call 0x1 6: r1 += 272 7: r0 = (u32 )(r2 +0) 8: if r0 >= 0x1 goto pc+3 9: r0 <<= 3 10: r0 += r1 11: goto pc+1 12: r0 = 0 7: r6 = r0 13: r6 = r0 8: if r6 == 0x0 goto +0x2 14: if r6 == 0x0 goto pc+4 9: call 0x76 15: r0 = 0xffffffff8d2079c0 17: r0 = (u64 )(r0 +0) 10: (u64 )(r6 + 0x0) = r0 18: (u64 )(r6 +0) = r0 11: r0 = 0x0 19: r0 = 0x0 12: exit 20: exit An instruction array map, containing, e.g., instructions [0,4,7,12] will be translated by the verifier to [0,4,13,20]. A map with index 5 (the middle of 16-byte instruction) or indexes greater than 12 (outside the program boundaries) would be rejected. The functionality provided by this patch will be extended in consequent patches to implement BPF Static Keys, indirect jumps, and indirect calls. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251105090410.1250500-2-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-07 14:00:07 -08:00
Alan Maguire	0e14a12a1d	libbpf: Fix parsing of multi-split BTF When creating multi-split BTF we correctly set the start string offset to be the size of the base string section plus the base BTF start string offset; the latter is needed for multi-split BTF since the offset is non-zero there. Unfortunately the BTF parsing case needed that logic and it was missed. Fixes: 4e29128a9ace ("libbpf/btf: Fix string handling to support multi-split BTF") Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251104203309.318429-2-alan.maguire@oracle.com	2025-11-07 14:00:07 -08:00
Donald Hunter	813fbe13ab	docs/bpf: Add missing BPF k/uprobe program types to docs Update the table of program types in the libbpf docs with the missing k/uprobe multi and session program types. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251029180932.98038-1-donald.hunter@gmail.com	2025-11-07 14:00:07 -08:00
Jianyun Gao	fd00fd999f	libbpf: Update the comment to remove the reference to the deprecated interface bpf_program__load(). Commit be2f2d1680df ("libbpf: Deprecate bpf_program__load() API") marked bpf_program__load() as deprecated starting with libbpf v0.6. And later in commit 146bf811f5ac ("libbpf: remove most other deprecated high-level APIs") actually removed the bpf_program__load() implementation and related old high-level APIs. This patch update the comment in bpf_program__set_attach_target() to remove the reference to the deprecated interface bpf_program__load(). Signed-off-by: Jianyun Gao <jianyungao89@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251103120727.145965-1-jianyungao89@gmail.com	2025-11-07 14:00:07 -08:00
Jianyun Gao	f4b32db745	libbpf: Complete the missing @param and @return tags in btf.h Complete the missing @param and @return tags in the Doxygen comments of the btf.h file. Signed-off-by: Jianyun Gao <jianyungao89@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251103115836.144339-1-jianyungao89@gmail.com	2025-11-07 14:00:07 -08:00
Andrii Nakryiko	99bf90957a	libbpf: Fix powerpc's stack register definition in bpf_tracing.h retsnoop's build on powerpc (ppc64le) architecture ([0]) failed due to wrong definition of PT_REGS_SP() macro. Looking at powerpc's implementation of stack unwinding in perf_callchain_user_64() clearly shows that stack pointer register is gpr[1]. Fix libbpf's definition of __PT_SP_REG for powerpc to fix all this. [0] https://kojipkgs.fedoraproject.org/work/tasks/1544/137921544/build.log Fixes: 138d6153a139 ("samples/bpf: Enable powerpc support") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/20251020203643.989467-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-07 14:00:07 -08:00
Jianyun Gao	98b6e51fc6	libbpf: Fix the incorrect reference to the memlock_rlim variable in the comment. The variable "memlock_rlim_max" referenced in the comment does not exist. I think that the author probably meant the variable "memlock_rlim". So, correct it. Signed-off-by: Jianyun Gao <jianyungao89@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251027032008.738944-1-jianyungao89@gmail.com	2025-11-07 14:00:07 -08:00
Jianyun Gao	bc3b400e06	libbpf: Optimize the redundant code in the bpf_object__init_user_btf_maps() function. In the elf_sec_data() function, the input parameter 'scn' will be evaluated. If it is NULL, then it will directly return NULL. Therefore, the return value of the elf_sec_data() function already takes into account the case where the input parameter scn is NULL. Therefore, subsequently, the code only needs to check whether the return value of the elf_sec_data() function is NULL. Signed-off-by: Jianyun Gao <jianyungao89@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20251024080802.642189-1-jianyungao89@gmail.com	2025-11-07 14:00:07 -08:00
Xu Kuohai	665ad8c7f7	bpf: Add overwrite mode for BPF ring buffer When the BPF ring buffer is full, a new event cannot be recorded until one or more old events are consumed to make enough space for it. In cases such as fault diagnostics, where recent events are more useful than older ones, this mechanism may lead to critical events being lost. So add overwrite mode for BPF ring buffer to address it. In this mode, the new event overwrites the oldest event when the buffer is full. The basic idea is as follows: 1. producer_pos tracks the next position to record new event. When there is enough free space, producer_pos is simply advanced by producer to make space for the new event. 2. To avoid waiting for consumer when the buffer is full, a new variable, overwrite_pos, is introduced for producer. It points to the oldest event committed in the buffer. It is advanced by producer to discard one or more oldest events to make space for the new event when the buffer is full. 3. pending_pos tracks the oldest event to be committed. pending_pos is never passed by producer_pos, so multiple producers never write to the same position at the same time. The following example diagrams show how it works in a 4096-byte ring buffer. 1. At first, {producer,overwrite,pending,consumer}_pos are all set to 0. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| +-----------------------------------------------------------------------+ ^ \| \| producer_pos = 0 overwrite_pos = 0 pending_pos = 0 consumer_pos = 0 2. Now reserve a 512-byte event A. There is enough free space, so A is allocated at offset 0. And producer_pos is advanced to 512, the end of A. Since A is not submitted, the BUSY bit is set. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| A \| \| \| [BUSY] \| \| +-----------------------------------------------------------------------+ ^ ^ \| \| \| \| \| producer_pos = 512 \| overwrite_pos = 0 pending_pos = 0 consumer_pos = 0 3. Reserve event B, size 1024. B is allocated at offset 512 with BUSY bit set, and producer_pos is advanced to the end of B. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| A \| B \| \| \| [BUSY] \| [BUSY] \| \| +-----------------------------------------------------------------------+ ^ ^ \| \| \| \| \| producer_pos = 1536 \| overwrite_pos = 0 pending_pos = 0 consumer_pos = 0 4. Reserve event C, size 2048. C is allocated at offset 1536, and producer_pos is advanced to 3584. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| A \| B \| C \| \| \| [BUSY] \| [BUSY] \| [BUSY] \| \| +-----------------------------------------------------------------------+ ^ ^ \| \| \| \| \| producer_pos = 3584 \| overwrite_pos = 0 pending_pos = 0 consumer_pos = 0 5. Submit event A. The BUSY bit of A is cleared. B becomes the oldest event to be committed, so pending_pos is advanced to 512, the start of B. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| A \| B \| C \| \| \| \| [BUSY] \| [BUSY] \| \| +-----------------------------------------------------------------------+ ^ ^ ^ \| \| \| \| \| \| \| pending_pos = 512 producer_pos = 3584 \| overwrite_pos = 0 consumer_pos = 0 6. Submit event B. The BUSY bit of B is cleared, and pending_pos is advanced to the start of C, which is now the oldest event to be committed. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| A \| B \| C \| \| \| \| \| [BUSY] \| \| +-----------------------------------------------------------------------+ ^ ^ ^ \| \| \| \| \| \| \| pending_pos = 1536 producer_pos = 3584 \| overwrite_pos = 0 consumer_pos = 0 7. Reserve event D, size 1536 (3 * 512). There are 2048 bytes not being written between producer_pos (currently 3584) and pending_pos, so D is allocated at offset 3584, and producer_pos is advanced by 1536 (from 3584 to 5120). Since event D will overwrite all bytes of event A and the first 512 bytes of event B, overwrite_pos is advanced to the start of event C, the oldest event that is not overwritten. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| D End \| \| C \| D Begin\| \| [BUSY] \| \| [BUSY] \| [BUSY] \| +-----------------------------------------------------------------------+ ^ ^ ^ \| \| \| \| \| pending_pos = 1536 \| \| overwrite_pos = 1536 \| \| \| producer_pos=5120 \| consumer_pos = 0 8. Reserve event E, size 1024. Although there are 512 bytes not being written between producer_pos and pending_pos, E cannot be reserved, as it would overwrite the first 512 bytes of event C, which is still being written. 9. Submit event C and D. pending_pos is advanced to the end of D. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| D End \| \| C \| D Begin\| \| \| \| \| \| +-----------------------------------------------------------------------+ ^ ^ ^ \| \| \| \| \| overwrite_pos = 1536 \| \| \| producer_pos=5120 \| pending_pos=5120 \| consumer_pos = 0 The performance data for overwrite mode will be provided in a follow-up patch that adds overwrite-mode benchmarks. A sample of performance data for non-overwrite mode, collected on an x86_64 CPU and an arm64 CPU, before and after this patch, is shown below. As we can see, no obvious performance regression occurs. - x86_64 (AMD EPYC 9654) Before: Ringbuf, multi-producer contention ================================== rb-libbpf nr_prod 1 11.623 ± 0.027M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 2 15.812 ± 0.014M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 3 7.871 ± 0.003M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 4 6.703 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 8 2.896 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 12 2.054 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 16 1.864 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 20 1.580 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 24 1.484 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 28 1.369 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 32 1.316 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 36 1.272 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 40 1.239 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 44 1.226 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 48 1.213 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 52 1.193 ± 0.001M/s (drops 0.000 ± 0.000M/s) After: Ringbuf, multi-producer contention ================================== rb-libbpf nr_prod 1 11.845 ± 0.036M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 2 15.889 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 3 8.155 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 4 6.708 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 8 2.918 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 12 2.065 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 16 1.870 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 20 1.582 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 24 1.482 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 28 1.372 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 32 1.323 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 36 1.264 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 40 1.236 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 44 1.209 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 48 1.189 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 52 1.165 ± 0.002M/s (drops 0.000 ± 0.000M/s) - arm64 (HiSilicon Kunpeng 920) Before: Ringbuf, multi-producer contention ================================== rb-libbpf nr_prod 1 11.310 ± 0.623M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 2 9.947 ± 0.004M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 3 6.634 ± 0.011M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 4 4.502 ± 0.003M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 8 3.888 ± 0.003M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 12 3.372 ± 0.005M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 16 3.189 ± 0.010M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 20 2.998 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 24 3.086 ± 0.018M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 28 2.845 ± 0.004M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 32 2.815 ± 0.008M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 36 2.771 ± 0.009M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 40 2.814 ± 0.011M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 44 2.752 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 48 2.695 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 52 2.710 ± 0.006M/s (drops 0.000 ± 0.000M/s) After: Ringbuf, multi-producer contention ================================== rb-libbpf nr_prod 1 11.283 ± 0.550M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 2 9.993 ± 0.003M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 3 6.898 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 4 5.257 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 8 3.830 ± 0.005M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 12 3.528 ± 0.013M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 16 3.265 ± 0.018M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 20 2.990 ± 0.007M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 24 2.929 ± 0.014M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 28 2.898 ± 0.010M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 32 2.818 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 36 2.789 ± 0.012M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 40 2.770 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 44 2.651 ± 0.007M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 48 2.669 ± 0.005M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 52 2.695 ± 0.009M/s (drops 0.000 ± 0.000M/s) Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251018035738.4039621-2-xukuohai@huaweicloud.com	2025-11-07 14:00:07 -08:00
Mykyta Yatsenko	42995c95b9	bpf: widen dynptr size/offset to 64 bit Dynptr currently caps size and offset at 24 bits, which isn’t sufficient for file-backed use cases; even 32 bits can be limiting. Refactor dynptr helpers/kfuncs to use 64-bit size and offset, ensuring consistency across the APIs. This change does not affect internals of xdp, skb or other dynptrs, which continue to behave as before. Also it does not break binary compatibility. The widening enables large-file access support via dynptr, implemented in the next patches. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251026203853.135105-3-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-07 14:00:07 -08:00
Anton Protopopov	49c5e0eef4	libbpf: fix formatting of bpf_object__append_subprog_code The commit 6c918709bd30 ("libbpf: Refactor bpf_object__reloc_code") added the bpf_object__append_subprog_code() with incorrect indentations. Use tabs instead. (This also makes a consequent commit better readable.) Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20251019202145.3944697-14-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-07 14:00:07 -08:00
Alain Knaff	30599e72bf	include: add __poll_t typedef in include/linux/types.h for Android/Termux On Android/Termux, linux/types.h is included (indirectly) by sys/epoll.h which depends on this definition to be present. Signed-off-by: Alain Knaff <github@misc.lka.org.lu>	2025-11-07 12:35:07 -08:00
Andrii Nakryiko	3d451d916f	ci: drop tmp.master testing of pahole It hasn't been updated for a long time, seems abandoned. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2025-10-06 15:59:27 -07:00
Andrii Nakryiko	02b3ec9ffc	ci: denylist verif_scale_pyperf600 as it now fails with newer Clang We get "The sequence of 8193 jumps is too complex." with newer Clang. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2025-10-06 15:59:27 -07:00
Andrii Nakryiko	c7f77de09d	ci: update clang to v21 for test workflow Update Clang version used. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2025-10-06 15:59:27 -07:00
Andrii Nakryiko	2719a398b0	libbpf: fix Github's Makefile for libbpf_utils.c Drop removed str_error.o from the list of object to build. Rename libbpf_errno.o into libbpf_utils.o. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2025-10-06 15:59:27 -07:00
Andrii Nakryiko	7e9d669550	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: 21aeabb68258ce17b91af113a768760b3a491d93 Checkpoint bpf-next commit: de7342228b7343774d6a9981c2ddbfb5e201044b Baseline bpf commit: 27861fc720be2c39b861d8bdfb68287f54de6855 Checkpoint bpf commit: 4d920ed684392ae064af62957d6f5a90312dfaf6 Alasdair McWilliam (1): rtnetlink: add needed_{head,tail}room attributes Andrii Nakryiko (5): libbpf: make libbpf_errno.c into more generic libbpf_utils.c libbpf: remove unused libbpf_strerror_r and STRERR_BUFSIZE libbpf: move libbpf_errstr() into libbpf_utils.c libbpf: move libbpf_sha256() implementation into libbpf_utils.c libbpf: remove linux/unaligned.h dependency for libbpf_sha256() Christian Brauner (1): nsfs: support exhaustive file handles D. Wythe (1): libbpf: Fix error when st-prefix_ops and ops from differ btf Eric Biggers (2): libbpf: Replace AF_ALG with open coded SHA-256 libbpf: Fix undefined behavior in {get,put}_unaligned_be32() Hangbin Liu (1): bonding: add support for per-port LACP actor priority Jakub Kicinski (1): uapi: wrap compiler_types.h in an ifdef instead of the implicit strip Jiawei Zhao (2): libbpf: Fix USDT SIB argument handling causing unrecognized register error libbpf: Remove unused args in parse_usdt_note KP Singh (7): bpf: Implement exclusive map creation libbpf: Implement SHA256 internal helper libbpf: Support exclusive map creation bpf: Return hashes of maps in BPF_OBJ_GET_INFO_BY_FD bpf: Implement signature verification for BPF programs libbpf: Update light skeleton for signing libbpf: Embed and verify the metadata hash in the loader Mykyta Yatsenko (1): bpf: bpf task work plumbing Rong Tao (1): bpf: Finish constification of 1st parameter of bpf_d_path() Tony Ambardar (1): libbpf: Fix missing #pragma in libbpf_utils.c include/uapi/linux/bpf.h \| 24 +++- include/uapi/linux/fcntl.h \| 1 + include/uapi/linux/if_link.h \| 3 + include/uapi/linux/stddef.h \| 2 + src/bpf.c \| 6 +- src/bpf.h \| 5 +- src/bpf_gen_internal.h \| 2 + src/btf.c \| 1 - src/btf_dump.c \| 1 - src/elf.c \| 1 - src/features.c \| 1 - src/gen_loader.c \| 50 ++++++- src/libbpf.c \| 108 ++++++++++++--- src/libbpf.h \| 25 +++- src/libbpf.map \| 3 + src/libbpf_errno.c \| 75 ---------- src/libbpf_internal.h \| 19 +++ src/libbpf_utils.c \| 256 +++++++++++++++++++++++++++++++++++ src/linker.c \| 1 - src/relo_core.c \| 1 - src/ringbuf.c \| 1 - src/skel_internal.h \| 76 ++++++++++- src/str_error.c \| 104 -------------- src/str_error.h \| 19 --- src/usdt.bpf.h \| 44 +++++- src/usdt.c \| 73 ++++++++-- 26 files changed, 650 insertions(+), 252 deletions(-) delete mode 100644 src/libbpf_errno.c create mode 100644 src/libbpf_utils.c delete mode 100644 src/str_error.c delete mode 100644 src/str_error.h Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2025-10-06 15:59:27 -07:00
Andrii Nakryiko	e4dc2acd35	sync: auto-generate latest BPF helpers Latest changes to BPF helper definitions. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2025-10-06 15:59:27 -07:00
Eric Biggers	2b940bcde1	libbpf: Fix undefined behavior in {get,put}_unaligned_be32() These violate aliasing rules and may be miscompiled unless -fno-strict-aliasing is used. Replace them with the standard memcpy() solution. Note that compilers know how to optimize this properly. Fixes: 4a1c9e544b8d ("libbpf: remove linux/unaligned.h dependency for libbpf_sha256()") Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20251006012037.159295-1-ebiggers@kernel.org	2025-10-06 15:59:27 -07:00
Rong Tao	379ac32f2c	bpf: Finish constification of 1st parameter of bpf_d_path() The commit 1b8abbb12128 ("bpf...d_path(): constify path argument") constified the first parameter of the bpf_d_path(), but failed to update it in all places. Finish constification. Otherwise the selftest fail to build: .../selftests/bpf/bpf_experimental.h:222:12: error: conflicting types for 'bpf_path_d_path' 222 \| extern int bpf_path_d_path(const struct path path, char buf, size_t buf__sz) __ksym; \| ^ .../selftests/bpf/tools/include/vmlinux.h:153922:12: note: previous declaration is here 153922 \| extern int bpf_path_d_path(struct path path, char buf, size_t buf__sz) __weak __ksym; Fixes: 1b8abbb12128 ("bpf...d_path(): constify path argument") Signed-off-by: Rong Tao <rongtao@cestc.cn> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-06 15:59:27 -07:00
Tony Ambardar	eca524d5a6	libbpf: Fix missing #pragma in libbpf_utils.c The recent sha256 patch uses a GCC pragma to suppress compile errors for a packed struct, but omits a needed pragma (see related link) and thus still raises errors: (e.g. on GCC 12.3 armhf) libbpf_utils.c:153:29: error: packed attribute causes inefficient alignment for ‘__val’ [-Werror=attributes] 153 \| struct __packed_u32 { __u32 __val; } __attribute__((packed)); \| ^~~~~ Resolve by adding the GCC diagnostic pragma to ignore "-Wattributes". Link: https://lore.kernel.org/bpf/CAP-5=fXURWoZu2j6Y8xQy23i7=DfgThq3WC1RkGFBx-4moQKYQ@mail.gmail.com/ Fixes: 4a1c9e544b8d ("libbpf: remove linux/unaligned.h dependency for libbpf_sha256()") Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-06 15:59:27 -07:00
Andrii Nakryiko	50d1b8e6b4	libbpf: remove linux/unaligned.h dependency for libbpf_sha256() linux/unaligned.h include dependency is causing issues for libbpf's Github mirror due to {get,put}_unaligned_be32() usage. So get rid of it by implementing custom variants of those macros that will work both in kernel and Github mirror repos. Also switch round_up() to roundup(), as the former is not available in Github mirror (and is just a subtly more specific variant of roundup() anyways). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20251001171326.3883055-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com>	2025-10-06 15:59:27 -07:00
Andrii Nakryiko	84aad03545	libbpf: move libbpf_sha256() implementation into libbpf_utils.c Move sha256 implementation out of already large and unwieldy libbpf.c into libbpf_utils.c where we'll keep reusable helpers. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20251001171326.3883055-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com>	2025-10-06 15:59:27 -07:00
Andrii Nakryiko	6fcb2c1963	libbpf: move libbpf_errstr() into libbpf_utils.c Get rid of str_err.{c,h} by moving implementation of libbpf_errstr() into libbpf_utils.c and declarations into libbpf_internal.h. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20251001171326.3883055-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com>	2025-10-06 15:59:27 -07:00
Andrii Nakryiko	ce015f0184	libbpf: remove unused libbpf_strerror_r and STRERR_BUFSIZE libbpf_strerror_r() is not exposed as public API and neither is it used inside libbpf itself. Remove it altogether. Same for STRERR_BUFSIZE, it's just an orphaned leftover constant which we missed to clean up some time earlier. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20251001171326.3883055-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com>	2025-10-06 15:59:27 -07:00
Andrii Nakryiko	33021bb9dd	libbpf: make libbpf_errno.c into more generic libbpf_utils.c Libbpf is missing one convenient place to put common "utils"-like code that is generic and usable from multiple places. Use libbpf_errno.c as the base for more generic libbpf_utils.c. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20251001171326.3883055-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com>	2025-10-06 15:59:27 -07:00
Alasdair McWilliam	97fbf1d106	rtnetlink: add needed_{head,tail}room attributes Various network interface types make use of needed_{head,tail}room values to efficiently reserve buffer space for additional encapsulation headers, such as VXLAN, Geneve, IPSec, etc. However, it is not currently possible to query these values in a generic way. Introduce ability to query the needed_{head,tail}room values of a network device via rtnetlink, such that applications that may wish to use these values can do so. For example, Cilium agent iterates over present devices based on user config (direct routing, vxlan, geneve, wireguard etc.) and in future will configure netkit in order to expose the needed_{head,tail}room into K8s pods. See b9ed315d3c4c ("netkit: Allow for configuring needed_{head,tail}room"). Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alasdair McWilliam <alasdair@mcwilliam.dev> Reviewed-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20250917095543.14039-1-alasdair@mcwilliam.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-06 15:59:27 -07:00
Hangbin Liu	017c96d6e1	bonding: add support for per-port LACP actor priority Introduce a new netlink attribute 'actor_port_prio' to allow setting the LACP actor port priority on a per-slave basis. This extends the existing bonding infrastructure to support more granular control over LACP negotiations. The priority value is embedded in LACPDU packets and will be used by subsequent patches to influence aggregator selection policies. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250902064501.360822-2-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-10-06 15:59:27 -07:00
Jakub Kicinski	1f098bc568	uapi: wrap compiler_types.h in an ifdef instead of the implicit strip The uAPI stddef header includes compiler_types.h, a kernel-only header, to make sure that kernel definitions of annotations like __counted_by() take precedence. There is a hack in scripts/headers_install.sh which strips includes of compiler.h and compiler_types.h when installing uAPI headers. While explicit handling makes sense for compiler.h, which is included all over the uAPI, compiler_types.h is only included by stddef.h (within the uAPI, obviously it's included in kernel code a lot). Remove the stripping from scripts/headers_install.sh and wrap the include of compiler_types.h in #ifdef __KERNEL__ instead. This should be equivalent functionally, but is easier to understand to a casual reader of the code. It also makes it easier to work with kernel headers directly from under tools/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250825201828.2370083-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-10-06 15:59:27 -07:00
Eric Biggers	367798a9cf	libbpf: Replace AF_ALG with open coded SHA-256 Reimplement libbpf_sha256() using some basic SHA-256 C code. This eliminates the newly-added dependency on AF_ALG, which is a problematic UAPI that is not supported by all kernels. Make libbpf_sha256() return void, since it can no longer fail. This simplifies some callers. Also drop the unnecessary 'sha_out_sz' parameter. Finally, also fix the typo in "compute_sha_udpate_offsets". Fixes: c297fe3e9f99 ("libbpf: Implement SHA256 internal helper") Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://lore.kernel.org/r/20250928003833.138407-1-ebiggers@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-06 15:59:27 -07:00
D. Wythe	f9369ca839	libbpf: Fix error when st-prefix_ops and ops from differ btf When a module registers a struct_ops, the struct_ops type and its corresponding map_value type ("bpf_struct_ops_") may reside in different btf objects, here are four possible case: +--------+---------------+-------------+---------------------------------+ \| \|bpf_struct_ops_\| xxx_ops \| \| +--------+---------------+-------------+---------------------------------+ \| case 0 \| btf_vmlinux \| btf_vmlinux \| be used and reg only in vmlinux \| +--------+---------------+-------------+---------------------------------+ \| case 1 \| btf_vmlinux \| mod_btf \| INVALID \| +--------+---------------+-------------+---------------------------------+ \| case 2 \| mod_btf \| btf_vmlinux \| reg in mod but be used both in \| \| \| \| \| vmlinux and mod. \| +--------+---------------+-------------+---------------------------------+ \| case 3 \| mod_btf \| mod_btf \| be used and reg only in mod \| +--------+---------------+-------------+---------------------------------+ Currently we figure out the mod_btf by searching with the struct_ops type, which makes it impossible to figure out the mod_btf when the struct_ops type is in btf_vmlinux while it's corresponding map_value type is in mod_btf (case 2). The fix is to use the corresponding map_value type ("bpf_struct_ops_") as the lookup anchor instead of the struct_ops type to figure out the `btf` and `mod_btf` via find_ksym_btf_id(), and then we can locate the kern_type_id via btf__find_by_name_kind() with the `btf` we just obtained from find_ksym_btf_id(). With this change the lookup obtains the correct btf and mod_btf for case 2, preserves correct behavior for other valid cases, and still fails as expected for the invalid scenario (case 1). Fixes: 590a00888250 ("bpf: libbpf: Add STRUCT_OPS support") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/bpf/20250926071751.108293-1-alibuda@linux.alibaba.com	2025-10-06 15:59:27 -07:00
Mykyta Yatsenko	70619ad135	bpf: bpf task work plumbing This patch adds necessary plumbing in verifier, syscall and maps to support handling new kfunc bpf_task_work_schedule and kernel structure bpf_task_work. The idea is similar to how we already handle bpf_wq and bpf_timer. verifier changes validate calls to bpf_task_work_schedule to make sure it is safe and expected invariants hold. btf part is required to detect bpf_task_work structure inside map value and store its offset, which will be used in the next patch to calculate key and value addresses. arraymap and hashtab changes are needed to handle freeing of the bpf_task_work: run code needed to deinitialize it, for example cancel task_work callback if possible. The use of bpf_task_work and proper implementation for kfuncs are introduced in the next patch. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250923112404.668720-6-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-06 15:59:27 -07:00
KP Singh	56cc32b5e3	libbpf: Embed and verify the metadata hash in the loader To fulfill the BPF signing contract, represented as Sig(I_loader \|\| H_meta), the generated trusted loader program must verify the integrity of the metadata. This signature cryptographically binds the loader's instructions (I_loader) to a hash of the metadata (H_meta). The verification process is embedded directly into the loader program. Upon execution, the loader loads the runtime hash from struct bpf_map i.e. BPF_PSEUDO_MAP_IDX and compares this runtime hash against an expected hash value that has been hardcoded directly by bpf_obj__gen_loader. The load from bpf_map can be improved by calling BPF_OBJ_GET_INFO_BY_FD from the kernel context after BPF_OBJ_GET_INFO_BY_FD has been updated for being called from the kernel context. The following instructions are generated: ld_imm64 r1, const_ptr_to_map // insn[0].src_reg == BPF_PSEUDO_MAP_IDX r2 = (u64 )(r1 + 0); ld_imm64 r3, sha256_of_map_part1 // constant precomputed by bpftool (part of H_meta) if r2 != r3 goto out; r2 = (u64 )(r1 + 8); ld_imm64 r3, sha256_of_map_part2 // (part of H_meta) if r2 != r3 goto out; r2 = (u64 )(r1 + 16); ld_imm64 r3, sha256_of_map_part3 // (part of H_meta) if r2 != r3 goto out; r2 = (u64 )(r1 + 24); ld_imm64 r3, sha256_of_map_part4 // (part of H_meta) if r2 != r3 goto out; ... Signed-off-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/20250921160120.9711-4-kpsingh@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-06 15:59:27 -07:00
KP Singh	986d033976	libbpf: Update light skeleton for signing * The metadata map is created with as an exclusive map (with an excl_prog_hash) This restricts map access exclusively to the signed loader program, preventing tampering by other processes. * The map is then frozen, making it read-only from userspace. * BPF_OBJ_GET_INFO_BY_ID instructs the kernel to compute the hash of the metadata map (H') and store it in bpf_map->sha. * The loader is then loaded with the signature which is then verified by the kernel. loading signed programs prebuilt into the kernel are not currently supported. These can supported by enabling BPF_OBJ_GET_INFO_BY_ID to be called from the kernel. Signed-off-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/20250921160120.9711-3-kpsingh@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-06 15:59:27 -07:00
KP Singh	decfae3a5d	bpf: Implement signature verification for BPF programs This patch extends the BPF_PROG_LOAD command by adding three new fields to `union bpf_attr` in the user-space API: - signature: A pointer to the signature blob. - signature_size: The size of the signature blob. - keyring_id: The serial number of a loaded kernel keyring (e.g., the user or session keyring) containing the trusted public keys. When a BPF program is loaded with a signature, the kernel: 1. Retrieves the trusted keyring using the provided `keyring_id`. 2. Verifies the supplied signature against the BPF program's instruction buffer. 3. If the signature is valid and was generated by a key in the trusted keyring, the program load proceeds. 4. If no signature is provided, the load proceeds as before, allowing for backward compatibility. LSMs can chose to restrict unsigned programs and implement a security policy. 5. If signature verification fails for any reason, the program is not loaded. Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/20250921160120.9711-2-kpsingh@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-06 15:59:27 -07:00

1 2 3 4 5 ...

2744 Commits