libbpf

mirror of https://github.com/netdata/libbpf.git synced 2026-05-08 00:19:11 +08:00

Author	SHA1	Message	Date
Jason Xing	79e19bb62b	bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callback Support the ACK case for bpf timestamping. Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_ACK_CB. This callback will occur at the same timestamping point as the user space's SCM_TSTAMP_ACK. The BPF program can use it to get the same SCM_TSTAMP_ACK timestamp without modifying the user-space application. This patch extends txstamp_ack to two bits: 1 stands for SO_TIMESTAMPING mode, 2 bpf extension. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-10-kerneljasonxing@gmail.com	2025-04-02 14:24:25 -07:00
Jason Xing	253b5ce758	bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback Support hw SCM_TSTAMP_SND case for bpf timestamping. Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SND_HW_CB. This callback will occur at the same timestamping point as the user space's hardware SCM_TSTAMP_SND. The BPF program can use it to get the same SCM_TSTAMP_SND timestamp without modifying the user-space application. To avoid increasing the code complexity, replace SKBTX_HW_TSTAMP with SKBTX_HW_TSTAMP_NOBPF instead of changing numerous callers from driver side using SKBTX_HW_TSTAMP. The new definition of SKBTX_HW_TSTAMP means the combination tests of socket timestamping and bpf timestamping. After this patch, drivers can work under the bpf timestamping. Considering some drivers don't assign the skb with hardware timestamp, this patch does the assignment and then BPF program can acquire the hwstamp from skb directly. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-9-kerneljasonxing@gmail.com	2025-04-02 14:24:25 -07:00
Jason Xing	d855493df1	bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback Support sw SCM_TSTAMP_SND case for bpf timestamping. Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SND_SW_CB. This callback will occur at the same timestamping point as the user space's software SCM_TSTAMP_SND. The BPF program can use it to get the same SCM_TSTAMP_SND timestamp without modifying the user-space application. Based on this patch, BPF program will get the software timestamp when the driver is ready to send the skb. In the sebsequent patch, the hardware timestamp will be supported. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-8-kerneljasonxing@gmail.com	2025-04-02 14:24:25 -07:00
Jason Xing	7ea10cfba8	bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback Support SCM_TSTAMP_SCHED case for bpf timestamping. Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SCHED_CB. This callback will occur at the same timestamping point as the user space's SCM_TSTAMP_SCHED. The BPF program can use it to get the same SCM_TSTAMP_SCHED timestamp without modifying the user-space application. A new SKBTX_BPF flag is added to mark skb_shinfo(skb)->tx_flags, ensuring that the new BPF timestamping and the current user space's SO_TIMESTAMPING do not interfere with each other. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-7-kerneljasonxing@gmail.com	2025-04-02 14:24:25 -07:00
Jason Xing	43b6c2cd70	bpf: Add networking timestamping support to bpf_get/setsockopt() The new SK_BPF_CB_FLAGS and new SK_BPF_CB_TX_TIMESTAMPING are added to bpf_get/setsockopt. The later patches will implement the BPF networking timestamping. The BPF program will use bpf_setsockopt(SK_BPF_CB_FLAGS, SK_BPF_CB_TX_TIMESTAMPING) to enable the BPF networking timestamping on a socket. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-2-kerneljasonxing@gmail.com	2025-04-02 14:24:25 -07:00
Joe Damato	b1cb441916	netdev-genl: Add an XSK attribute to queues Expose a new per-queue nest attribute, xsk, which will be present for queues that are being used for AF_XDP. If the queue is not being used for AF_XDP, the nest will not be present. In the future, this attribute can be extended to include more data about XSK as it is needed. Signed-off-by: Joe Damato <jdamato@fastly.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250214211255.14194-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-02 14:24:25 -07:00
David Wei	01500813ad	netdev: add io_uring memory provider info Add a nested attribute for io_uring memory provider info. For now it is empty and its presence indicates that a particular page pool or queue has an io_uring memory provider attached. $ ./cli.py --spec netlink/specs/netdev.yaml --dump page-pool-get [{'id': 80, 'ifindex': 2, 'inflight': 64, 'inflight-mem': 262144, 'napi-id': 525}, {'id': 79, 'ifindex': 2, 'inflight': 320, 'inflight-mem': 1310720, 'io_uring': {}, 'napi-id': 525}, ... $ ./cli.py --spec netlink/specs/netdev.yaml --dump queue-get [{'id': 0, 'ifindex': 1, 'type': 'rx'}, {'id': 0, 'ifindex': 1, 'type': 'tx'}, {'id': 0, 'ifindex': 2, 'napi-id': 513, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 514, 'type': 'rx'}, ... {'id': 12, 'ifindex': 2, 'io_uring': {}, 'napi-id': 525, 'type': 'rx'}, ... Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250204215622.695511-6-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-02 14:24:25 -07:00
Kan Liang	59171f49e9	perf: Extend per event callchain limit to branch stack The commit 97c79a38cd45 ("perf core: Per event callchain limit") introduced a per-event term to allow finer tuning of the depth of callchains to save space. It should be applied to the branch stack as well. For example, autoFDO collections require maximum LBR entries. In the meantime, other system-wide LBR users may only be interested in the latest a few number of LBRs. A per-event LBR depth would save the perf output buffer. The patch simply drops the uninterested branches, but HW still collects the maximum branches. There may be a model-specific optimization that can reduce the HW depth for some cases to reduce the overhead further. But it isn't included in the patch set. Because it's not useful for all cases. For example, ARCH LBR can utilize the PEBS and XSAVE to collect LBRs. The depth should have less impact on the collecting overhead. The model-specific optimization may be implemented later separately. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250310181536.3645382-1-kan.liang@linux.intel.com	2025-04-02 14:24:25 -07:00
Ihor Solodrai	1b8768339f	ci: add temporary patches for selftests * https://lore.kernel.org/all/20250327185528.1740787-1-song@kernel.org/ * https://lore.kernel.org/bpf/20250328193124.808784-1-song@kernel.org/ * https://lore.kernel.org/bpf/20250331033828.365077-1-yonghong.song@linux.dev/ Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-04-02 10:23:23 -07:00
Ihor Solodrai	374036c9f1	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: 239860828f8660e2be487e2fbdae2640cce3fd67 Checkpoint bpf-next commit: 79d93c8ff35855d3283ee7d82dfe0c54f90b9986 Baseline bpf commit: 319fc77f8f45a1b3dba15b0cc1a869778fd222f7 Checkpoint bpf commit: 6ccf6adb05d0fe3dbb1a77ab90bf054da8a2198d Ihor Solodrai (1): libbpf: Implement bpf_usdt_arg_size BPF function Mykyta Yatsenko (3): libbpf: Use map_is_created helper in map setters libbpf: Introduce more granular state for bpf_object libbpf: Split bpf object load into prepare/load Nandakumar Edamana (1): libbpf: Fix out-of-bound read Peilin Ye (1): bpf: Introduce load-acquire and store-release instructions Yonghong Song (1): bpf: Allow pre-ordering for bpf cgroup progs include/uapi/linux/bpf.h \| 4 + src/libbpf.c \| 201 ++++++++++++++++++++++++++------------- src/libbpf.h \| 13 +++ src/libbpf.map \| 1 + src/usdt.bpf.h \| 32 +++++++ 5 files changed, 183 insertions(+), 68 deletions(-) Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-03-10 15:35:17 -07:00
Peilin Ye	bf62e0dcfd	bpf: Introduce load-acquire and store-release instructions Introduce BPF instructions with load-acquire and store-release semantics, as discussed in [1]. Define 2 new flags: #define BPF_LOAD_ACQ 0x100 #define BPF_STORE_REL 0x110 A "load-acquire" is a BPF_STX \| BPF_ATOMIC instruction with the 'imm' field set to BPF_LOAD_ACQ (0x100). Similarly, a "store-release" is a BPF_STX \| BPF_ATOMIC instruction with the 'imm' field set to BPF_STORE_REL (0x110). Unlike existing atomic read-modify-write operations that only support BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and store-releases also support BPF_B (8-bit) and BPF_H (16-bit). As an exception, however, 64-bit load-acquires/store-releases are not supported on 32-bit architectures (to fix a build error reported by the kernel test robot). An 8- or 16-bit load-acquire zero-extends the value before writing it to a 32-bit register, just like ARM64 instruction LDARH and friends. Similar to existing atomic read-modify-write operations, misaligned load-acquires/store-releases are not allowed (even if BPF_F_ANY_ALIGNMENT is set). As an example, consider the following 64-bit load-acquire BPF instruction (assuming little-endian): db 10 00 00 00 01 00 00 r0 = load_acquire((u64 )(r1 + 0x0)) opcode (0xdb): BPF_ATOMIC \| BPF_DW \| BPF_STX imm (0x00000100): BPF_LOAD_ACQ Similarly, a 16-bit BPF store-release: cb 21 00 00 10 01 00 00 store_release((u16 )(r1 + 0x0), w2) opcode (0xcb): BPF_ATOMIC \| BPF_H \| BPF_STX imm (0x00000110): BPF_STORE_REL In arch/{arm64,s390,x86}/net/bpf_jit_comp.c, have bpf_jit_supports_insn(..., /in_arena=/true) return false for the new instructions, until the corresponding JIT compiler supports them in arena. [1] https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/ Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/a217f46f0e445fbd573a1a024be5c6bf1d5fe716.1741049567.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-03-10 15:35:17 -07:00
Mykyta Yatsenko	855a5d7904	libbpf: Split bpf object load into prepare/load Introduce bpf_object__prepare API: additional intermediate preparation step that performs ELF processing, relocations, prepares final state of BPF program instructions (accessible with bpf_program__insns()), creates and (potentially) pins maps, and stops short of loading BPF programs. We anticipate few use cases for this API, such as: * Use prepare to initialize bpf_token, without loading freplace programs, unlocking possibility to lookup BTF of other programs. * Execute prepare to obtain finalized BPF program instructions without loading programs, enabling tools like veristat to process one program at a time, without incurring cost of ELF parsing and processing. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250303135752.158343-4-mykyta.yatsenko5@gmail.com Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-03-10 15:35:17 -07:00
Mykyta Yatsenko	16c58c33c8	libbpf: Introduce more granular state for bpf_object We are going to split bpf_object loading into 2 stages: preparation and loading. This will increase flexibility when working with bpf_object and unlock some optimizations and use cases. This patch substitutes a boolean flag (loaded) by more finely-grained state for bpf_object. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250303135752.158343-3-mykyta.yatsenko5@gmail.com Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-03-10 15:35:17 -07:00
Mykyta Yatsenko	e14bb3629f	libbpf: Use map_is_created helper in map setters Refactoring: use map_is_created helper in map setters that need to check the state of the map. This helps to reduce the number of the places that depend explicitly on the loaded flag, simplifying refactoring in the next patch of this set. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250303135752.158343-2-mykyta.yatsenko5@gmail.com Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-03-10 15:35:17 -07:00
Yonghong Song	fbda5d7d2f	bpf: Allow pre-ordering for bpf cgroup progs Currently for bpf progs in a cgroup hierarchy, the effective prog array is computed from bottom cgroup to upper cgroups (post-ordering). For example, the following cgroup hierarchy root cgroup: p1, p2 subcgroup: p3, p4 have BPF_F_ALLOW_MULTI for both cgroup levels. The effective cgroup array ordering looks like p3 p4 p1 p2 and at run time, progs will execute based on that order. But in some cases, it is desirable to have root prog executes earlier than children progs (pre-ordering). For example, - prog p1 intends to collect original pkt dest addresses. - prog p3 will modify original pkt dest addresses to a proxy address for security reason. The end result is that prog p1 gets proxy address which is not what it wants. Putting p1 to every child cgroup is not desirable either as it will duplicate itself in many child cgroups. And this is exactly a use case we are encountering in Meta. To fix this issue, let us introduce a flag BPF_F_PREORDER. If the flag is specified at attachment time, the prog has higher priority and the ordering with that flag will be from top to bottom (pre-ordering). For example, in the above example, root cgroup: p1, p2 subcgroup: p3, p4 Let us say p2 and p4 are marked with BPF_F_PREORDER. The final effective array ordering will be p2 p4 p3 p1 Suggested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250224230116.283071-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-03-10 15:35:17 -07:00
Ihor Solodrai	be18fdb16a	libbpf: Implement bpf_usdt_arg_size BPF function Information about USDT argument size is implicitly stored in __bpf_usdt_arg_spec, but currently it's not accessbile to BPF programs that use USDT. Implement bpf_sdt_arg_size() that returns the size of an USDT argument in bytes. v1->v2: * do not add __bpf_usdt_arg_spec() helper v1: https://lore.kernel.org/bpf/20250220215904.3362709-1-ihor.solodrai@linux.dev/ Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250224235756.2612606-1-ihor.solodrai@linux.dev	2025-03-10 15:35:17 -07:00
Nandakumar Edamana	82f60c9b5e	libbpf: Fix out-of-bound read In `set_kcfg_value_str`, an untrusted string is accessed with the assumption that it will be at least two characters long due to the presence of checks for opening and closing quotes. But the check for the closing quote (value[len - 1] != '"') misses the fact that it could be checking the opening quote itself in case of an invalid input that consists of just the opening quote. This commit adds an explicit check to make sure the string is at least two characters long. Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250221210110.3182084-1-nandakumar@nandakumar.co.in Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-03-10 15:35:17 -07:00
Andrii Nakryiko	4c893341f5	Makefile: detect pkg-config availability Detect whether build system has pkg-config tool, and if not, fallback to manually specifying -lelf -lz as dependency. Closes: https://github.com/libbpf/libbpf/issues/885 Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2025-03-03 18:59:47 -08:00
Ihor Solodrai	42a6ef6316	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: 01f3ce5328c405179b2c69ea047c423dad2bfa6d Checkpoint bpf-next commit: 239860828f8660e2be487e2fbdae2640cce3fd67 Baseline bpf commit: c45323b7560ec87c37c729b703c86ee65f136d75 Checkpoint bpf commit: 319fc77f8f45a1b3dba15b0cc1a869778fd222f7 Andrii Nakryiko (2): libbpf: fix LDX/STX/ST CO-RE relocation size adjustment logic libbpf: Fix hypothetical STT_SECTION extern NULL deref case Daniel Borkmann (1): netkit: Allow for configuring needed_{head,tail}room Ihor Solodrai (3): libbpf: Introduce kflag for type_tags and decl_tags in BTF docs/bpf: Document the semantics of BTF tags with kind_flag libbpf: Check the kflag of type tags in btf_dump Tao Chen (1): libbpf: Wrap libbpf API direct err with libbpf_err Tony Ambardar (1): libbpf: Fix accessing BTF.ext core_relo header Yonghong Song (1): bpf: Sync uapi bpf.h header for the tooling infra include/uapi/linux/bpf.h \| 5 +- include/uapi/linux/btf.h \| 3 +- include/uapi/linux/if_link.h \| 2 + src/btf.c \| 90 ++++++++++++++++++++++++++---------- src/btf.h \| 3 ++ src/btf_dump.c \| 5 +- src/libbpf.c \| 26 +++++------ src/libbpf.map \| 2 + src/linker.c \| 2 +- src/relo_core.c \| 24 ++++++++-- 10 files changed, 116 insertions(+), 46 deletions(-) Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-02-24 15:10:59 -08:00
Andrii Nakryiko	041d5948f3	libbpf: Fix hypothetical STT_SECTION extern NULL deref case Fix theoretical NULL dereference in linker when resolving extern STT_SECTION symbol against not-yet-existing ELF section. Not sure if it's possible in practice for valid ELF object files (this would require embedded assembly manipulations, at which point BTF will be missing), but fix the s/dst_sym/dst_sec/ typo guarding this condition anyways. Fixes: faf6ed321cf6 ("libbpf: Add BPF static linker APIs") Fixes: a46349227cd8 ("libbpf: Add linker extern resolution support for functions and global variables") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20250220002821.834400-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-02-24 15:10:59 -08:00
Tao Chen	39a589c74e	libbpf: Wrap libbpf API direct err with libbpf_err Just wrap the direct err with libbpf_err, keep consistency with other APIs. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20250219153711.29651-1-chen.dylane@linux.dev Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-02-24 15:10:59 -08:00
Andrii Nakryiko	d7a4ab1548	libbpf: fix LDX/STX/ST CO-RE relocation size adjustment logic Libbpf has a somewhat obscure feature of automatically adjusting the "size" of LDX/STX/ST instruction (memory store and load instructions), based on originally recorded access size (u8, u16, u32, or u64) and the actual size of the field on target kernel. This is meant to facilitate using BPF CO-RE on 32-bit architectures (pointers are always 64-bit in BPF, but host kernel's BTF will have it as 32-bit type), as well as generally supporting safe type changes (unsigned integer type changes can be transparently "relocated"). One issue that surfaced only now, 5 years after this logic was implemented, is how this all works when dealing with fields that are arrays. This isn't all that easy and straightforward to hit (see selftests that reproduce this condition), but one of sched_ext BPF programs did hit it with innocent looking loop. Long story short, libbpf used to calculate entire array size, instead of making sure to only calculate array's element size. But it's the element that is loaded by LDX/STX/ST instructions (1, 2, 4, or 8 bytes), so that's what libbpf should check. This patch adjusts the logic for arrays and fixed the issue. Reported-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250207014809.1573841-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-02-24 15:10:59 -08:00
Yonghong Song	4eed43c229	bpf: Sync uapi bpf.h header for the tooling infra Commit 0abff462d802 ("bpf: Add comment about helper freeze") missed the tooling header sync. Fix it. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250213050427.2788837-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-02-24 15:10:59 -08:00
Ihor Solodrai	cc278ff7c0	libbpf: Check the kflag of type tags in btf_dump If the kflag is set for a BTF type tag, then the tag represents an arbitrary __attribute__. Change btf_dump accordingly. Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250130201239.1429648-4-ihor.solodrai@linux.dev	2025-02-24 15:10:59 -08:00
Ihor Solodrai	2b8b896bca	docs/bpf: Document the semantics of BTF tags with kind_flag Explain the meaning of kind_flag in BTF type_tags and decl_tags. Update uapi btf.h kind_flag comment to reflect the changes. Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250130201239.1429648-3-ihor.solodrai@linux.dev	2025-02-24 15:10:59 -08:00
Ihor Solodrai	32bda80136	libbpf: Introduce kflag for type_tags and decl_tags in BTF Add the following functions to libbpf API: * btf__add_type_attr() * btf__add_decl_attr() These functions allow to add to BTF the type tags and decl tags with info->kflag set to 1. The kflag indicates that the tag directly encodes an __attribute__ and not a normal tag. See Documentation/bpf/btf.rst changes in the subsequent patch for details on the semantics. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250130201239.1429648-2-ihor.solodrai@linux.dev	2025-02-24 15:10:59 -08:00
Tony Ambardar	71208c3362	libbpf: Fix accessing BTF.ext core_relo header Update btf_ext_parse_info() to ensure the core_relo header is present before reading its fields. This avoids a potential buffer read overflow reported by the OSS Fuzz project. Fixes: cf579164e9ea ("libbpf: Support BTF.ext loading and output in either endianness") Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://issues.oss-fuzz.com/issues/388905046 Link: https://lore.kernel.org/bpf/20250125065236.2603346-1-itugrok@yahoo.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-02-24 15:10:59 -08:00
Daniel Borkmann	9544a909f1	netkit: Allow for configuring needed_{head,tail}room Allow the user to configure needed_{head,tail}room for both netkit devices. The idea is similar to 163e529200af ("veth: implement ndo_set_rx_headroom") with the difference that the two parameters can be specified upon device creation. By default the current behavior stays as is which is needed_{head,tail}room is 0. In case of Cilium, for example, the netkit devices are not enslaved into a bridge or openvswitch device (rather, BPF-based redirection is used out of tcx), and as such these parameters are not propagated into the Pod's netns via peer device. Given Cilium can run in vxlan/geneve tunneling mode (needed_headroom) and/or be used in combination with WireGuard (needed_{head,tail}room), allow the Cilium CNI plugin to specify these two upon netkit device creation. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/bpf/20241220234658.490686-1-daniel@iogearbox.net Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-02-24 15:10:59 -08:00
Ihor Solodrai	d4a841a32b	ci: remove dependency on run-on-arch-action run-on-arch-action is simply a wrapper around docker. There is no value in using it in libbpf, as it is not complicated to run non-native arch docker images directly on github-hosted runners. Docker relies on qemu-user-static installed on the system to emulate different architectures. Recently there were various reports about multi-arch docker builds failing with seemingly random issues, and it appears to boil down to qemu [1]. I stumbled on this problem while updating s390x runners [2] for BPF CI, and setting up more recent version of qemu helped. This change addresses recent build failures on s390x and ppc64le. [1] https://github.com/docker/setup-qemu-action/issues/188 [2] https://github.com/kernel-patches/runner/pull/69 [3] https://docs.docker.com/build/buildkit/#getting-started Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>	2025-01-31 22:30:52 -08:00
Ihor Solodrai	324f3c3846	ci: run coverty scan on push to master Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>	2025-01-17 13:53:19 -08:00
Ihor Solodrai	63528b7a4d	ci: remove sourcing helpers.sh from coverity workflow Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>	2025-01-17 13:53:19 -08:00
Andrii Nakryiko	7abfe520df	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: f44275e7155dc310d36516fc25be503da099781c Checkpoint bpf-next commit: 01f3ce5328c405179b2c69ea047c423dad2bfa6d Baseline bpf commit: 9d89551994a430b50c4fffcb1e617a057fa76e20 Checkpoint bpf commit: c45323b7560ec87c37c729b703c86ee65f136d75 Andrii Nakryiko (1): libbpf: Work around kernel inconsistently stripping '.llvm.' suffix Pu Lehui (2): libbpf: Fix return zero when elf_begin failed libbpf: Fix incorrect traversal end type ID when marking BTF_IS_EMBEDDED Vishal Chourasia (1): tools: Sync if_xdp.h uapi tooling header Yonghong Song (1): libbpf: Add unique_match option for multi kprobe include/uapi/linux/if_xdp.h \| 4 ++-- src/btf.c \| 1 + src/btf_relocate.c \| 2 +- src/libbpf.c \| 39 +++++++++++++++++++++++++++++++++++-- src/libbpf.h \| 4 +++- 5 files changed, 44 insertions(+), 6 deletions(-) Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2025-01-17 12:31:44 -08:00
Vishal Chourasia	d76c770473	tools: Sync if_xdp.h uapi tooling header Sync if_xdp.h uapi header to remove following warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h' differs from latest version at 'include/uapi/linux/if_xdp.h' Fixes: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support") Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20250115032248.125742-1-yoong.siang.song@intel.com	2025-01-17 12:31:44 -08:00
Andrii Nakryiko	444f3c0e7a	libbpf: Work around kernel inconsistently stripping '.llvm.' suffix Some versions of kernel were stripping out '.llvm.<hash>' suffix from kerne symbols (produced by Clang LTO compilation) from function names reported in available_filter_functions, while kallsyms reported full original name. This confuses libbpf's multi-kprobe logic of finding all matching kernel functions for specified user glob pattern by joining available_filter_functions and kallsyms contents, because joining by full symbol name won't work for symbols containing '.llvm.<hash>' suffix. This was eventually fixed by [0] in the kernel, but we'd like to not regress multi-kprobe experience and add a work around for this bug on libbpf side, stripping kallsym's name if it matches user pattern and contains '.llvm.' suffix. [0] fb6a421fb615 ("kallsyms: Match symbols exactly with CONFIG_LTO_CLANG") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20250117003957.179331-1-andrii@kernel.org	2025-01-17 12:31:44 -08:00
Pu Lehui	719aeb7a6e	libbpf: Fix incorrect traversal end type ID when marking BTF_IS_EMBEDDED When redirecting the split BTF to the vmlinux base BTF, we need to mark the distilled base struct/union members of split BTF structs/unions in id_map with BTF_IS_EMBEDDED. This indicates that these types must match both name and size later. Therefore, we need to traverse the entire split BTF, which involves traversing type IDs from nr_dist_base_types to nr_types. However, the current implementation uses an incorrect traversal end type ID, so let's correct it. Fixes: 19e00c897d50 ("libbpf: Split BTF relocation") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250115100241.4171581-3-pulehui@huaweicloud.com	2025-01-17 12:31:44 -08:00
Pu Lehui	a7edf4aec8	libbpf: Fix return zero when elf_begin failed The error number of elf_begin is omitted when encapsulating the btf_find_elf_sections function. Fixes: c86f180ffc99 ("libbpf: Make btf_parse_elf process .BTF.base transparently") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250115100241.4171581-2-pulehui@huaweicloud.com	2025-01-17 12:31:44 -08:00
Yonghong Song	32792ec66c	libbpf: Add unique_match option for multi kprobe Jordan reported an issue in Meta production environment where func try_to_wake_up() is renamed to try_to_wake_up.llvm.<hash>() by clang compiler at lto mode. The original 'kprobe/try_to_wake_up' does not work any more since try_to_wake_up() does not match the actual func name in /proc/kallsyms. There are a couple of ways to resolve this issue. For example, in attach_kprobe(), we could do lookup in /proc/kallsyms so try_to_wake_up() can be replaced by try_to_wake_up.llvm.<hach>(). Or we can force users to use bpf_program__attach_kprobe() where they need to lookup /proc/kallsyms to find out try_to_wake_up.llvm.<hach>(). But these two approaches requires extra work by either libbpf or user. Luckily, suggested by Andrii, multi kprobe already supports wildcard ('') for symbol matching. In the above example, 'try_to_wake_up' can match to try_to_wake_up() or try_to_wake_up.llvm.<hash>() and this allows bpf prog works for different kernels as some kernels may have try_to_wake_up() and some others may have try_to_wake_up.llvm.<hash>(). The original intention is to kprobe try_to_wake_up() only, so an optional field unique_match is added to struct bpf_kprobe_multi_opts. If the field is set to true, the number of matched functions must be one. Otherwise, the attachment will fail. In the above case, multi kprobe with 'try_to_wake_up*' and unique_match preserves user functionality. Reported-by: Jordan Rome <linux@jordanrome.com> Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250109174023.3368432-1-yonghong.song@linux.dev	2025-01-17 12:31:44 -08:00
Ihor Solodrai	c924f8d3dd	ci: sync with libbpf/ci@v3 * vmtest.yml * use v3 of libbpf/ci actions * remove unnecessary selftests preparation steps * ci/vmtest * remove unnecessary scripts and configs * add libbpf-specific run-vmtest.env [1] [1] https://github.com/libbpf/ci/pull/166 Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>	2025-01-15 19:30:48 -08:00
Daniel Müller	0ff2f8e0ee	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: a1087da9d11e5bcacc706002bc0f84b790881f69 Checkpoint bpf-next commit: f44275e7155dc310d36516fc25be503da099781c Baseline bpf commit: fb86c42a2a5d44e849ddfbc98b8d2f4f40d36ee3 Checkpoint bpf commit: 9d89551994a430b50c4fffcb1e617a057fa76e20 Adrian Hunter (1): perf/core: Add aux_pause, aux_resume, aux_start_paused Alastair Robertson (2): libbpf: Pull file-opening logic up to top-level functions libbpf: Extend linker API to support in-memory ELF files Andrii Nakryiko (1): libbpf: don't adjust USDT semaphore address if .stapsdt.base addr is missing Anton Protopopov (2): bpf: Add fd_array_cnt attribute for prog_load libbpf: prog load: Allow to use fd_array_cnt Ben Olson (1): libbpf: Improve debug message when the base BTF cannot be found Daniel Borkmann (1): tools: Sync if_link.h uapi tooling header Daniel Xu (1): libbpf: Set MFD_NOEXEC_SEAL when creating memfd Eric Dumazet (1): net: add IFLA_MAX_PACING_OFFLOAD_HORIZON device attribute Jiri Olsa (1): libbpf: Fix memory leak in bpf_program__attach_uprobe_multi Joe Damato (3): netdev-genl: Dump napi_defer_hard_irqs netdev-genl: Dump gro_flush_timeout netdev-genl: Support setting per-NAPI config values Martin Karsten (1): net: Add napi_struct parameter irq_suspend_timeout Quentin Monnet (1): libbpf: Fix segfault due to libelf functions not setting errno Sidong Yang (1): libbpf: Change hash_combine parameters from long to unsigned long include/uapi/linux/bpf.h \| 10 + include/uapi/linux/if_link.h \| 554 +++++++++++++++++++++++++++++++- include/uapi/linux/netdev.h \| 4 + include/uapi/linux/perf_event.h \| 11 +- src/bpf.c \| 3 +- src/bpf.h \| 5 +- src/btf.c \| 4 +- src/libbpf.c \| 25 +- src/libbpf.h \| 5 + src/libbpf.map \| 4 + src/linker.c \| 248 ++++++++++---- src/usdt.c \| 2 +- 12 files changed, 794 insertions(+), 81 deletions(-) Signed-off-by: Daniel Müller <deso@posteo.net>	2025-01-08 14:58:04 -08:00
Daniel Xu	f468e83c85	libbpf: Set MFD_NOEXEC_SEAL when creating memfd Starting from 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") and until 1717449b4417 ("memfd: drop warning for missing exec-related flags"), the kernel would print a warning if neither MFD_NOEXEC_SEAL nor MFD_EXEC is set in memfd_create(). If libbpf runs on on a kernel between these two commits (eg. on an improperly backported system), it'll trigger this warning. To avoid this warning (and also be more secure), explicitly set MFD_NOEXEC_SEAL. But since libbpf can be run on potentially very old kernels, leave a fallback for kernels without MFD_NOEXEC_SEAL support. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/6e62c2421ad7eb1da49cbf16da95aaaa7f94d394.1735594195.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-01-08 14:58:04 -08:00
Anton Protopopov	48c771c4ce	libbpf: prog load: Allow to use fd_array_cnt Add new fd_array_cnt field to bpf_prog_load_opts and pass it in bpf_attr, if set. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241213130934.1087929-6-aspsk@isovalent.com	2025-01-08 14:58:04 -08:00
Anton Protopopov	266da73237	bpf: Add fd_array_cnt attribute for prog_load The fd_array attribute of the BPF_PROG_LOAD syscall may contain a set of file descriptors: maps or btfs. This field was introduced as a sparse array. Introduce a new attribute, fd_array_cnt, which, if present, indicates that the fd_array is a continuous array of the corresponding length. If fd_array_cnt is non-zero, then every map in the fd_array will be bound to the program, as if it was used by the program. This functionality is similar to the BPF_PROG_BIND_MAP syscall, but such maps can be used by the verifier during the program load. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241213130934.1087929-5-aspsk@isovalent.com	2025-01-08 14:58:04 -08:00
Alastair Robertson	d2f1f4490b	libbpf: Extend linker API to support in-memory ELF files The new_fd and add_fd functions correspond to the original new and add_file functions, but accept an FD instead of a file name. This gives API consumers the option of using anonymous files/memfds to avoid writing ELFs to disk. This new API will be useful for performing linking as part of bpftrace's JIT compilation. The add_buf function is a convenience wrapper that does the work of creating a memfd for the caller. Signed-off-by: Alastair Robertson <ajor@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241211164030.573042-3-ajor@meta.com	2025-01-08 14:58:04 -08:00
Alastair Robertson	f00fad0951	libbpf: Pull file-opening logic up to top-level functions Move the filename arguments and file-descriptor handling from init_output_elf() and linker_load_obj_file() and instead handle them at the top-level in bpf_linker__new() and bpf_linker__add_file(). This will allow the inner functions to be shared with a new, non-filename-based, API in the next commit. Signed-off-by: Alastair Robertson <ajor@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241211164030.573042-2-ajor@meta.com	2025-01-08 14:58:04 -08:00
Quentin Monnet	984dcc97ae	libbpf: Fix segfault due to libelf functions not setting errno Libelf functions do not set errno on failure. Instead, it relies on its internal _elf_errno value, that can be retrieved via elf_errno (or the corresponding message via elf_errmsg()). From "man libelf": If a libelf function encounters an error it will set an internal error code that can be retrieved with elf_errno. Each thread maintains its own separate error code. The meaning of each error code can be determined with elf_errmsg, which returns a string describing the error. As a consequence, libbpf should not return -errno when a function from libelf fails, because an empty value will not be interpreted as an error and won't prevent the program to stop. This is visible in bpf_linker__add_file(), for example, where we call a succession of functions that rely on libelf: err = err ?: linker_load_obj_file(linker, filename, opts, &obj); err = err ?: linker_append_sec_data(linker, &obj); err = err ?: linker_append_elf_syms(linker, &obj); err = err ?: linker_append_elf_relos(linker, &obj); err = err ?: linker_append_btf(linker, &obj); err = err ?: linker_append_btf_ext(linker, &obj); If the object file that we try to process is not, in fact, a correct object file, linker_load_obj_file() may fail with errno not being set, and return 0. In this case we attempt to run linker_append_elf_sysms() and may segfault. This can happen (and was discovered) with bpftool: $ bpftool gen object output.o sample_ret0.bpf.c libbpf: failed to get ELF header for sample_ret0.bpf.c: invalid `Elf' handle zsh: segmentation fault (core dumped) bpftool gen object output.o sample_ret0.bpf.c Fix the issue by returning a non-null error code (-EINVAL) when libelf functions fail. Fixes: faf6ed321cf6 ("libbpf: Add BPF static linker APIs") Signed-off-by: Quentin Monnet <qmo@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241205135942.65262-1-qmo@kernel.org	2025-01-08 14:58:04 -08:00
Ben Olson	3ed57f68e5	libbpf: Improve debug message when the base BTF cannot be found When running `bpftool` on a kernel module installed in `/lib/modules...`, this error is encountered if the user does not specify `--base-btf` to point to a valid base BTF (e.g. usually in `/sys/kernel/btf/vmlinux`). However, looking at the debug output to determine the cause of the error simply says `Invalid BTF string section`, which does not point to the actual source of the error. This just improves that debug message to tell users what happened. Signed-off-by: Ben Olson <matthew.olson@intel.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/Z0YqzQ5lNz7obQG7@bolson-desk Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-01-08 14:58:04 -08:00
Andrii Nakryiko	69d85c5fb3	libbpf: don't adjust USDT semaphore address if .stapsdt.base addr is missing USDT ELF note optionally can record an offset of .stapsdt.base, which is used to make adjustments to USDT target attach address. Currently, libbpf will do this address adjustment unconditionally if it finds .stapsdt.base ELF section in target binary. But there is a corner case where .stapsdt.base ELF section is present, but specific USDT note doesn't reference it. In such case, libbpf will basically just add base address and end up with absolutely incorrect USDT target address. This adjustment has to be done only if both .stapsdt.sema section is present and USDT note is recording a reference to it. Fixes: 74cc6311cec9 ("libbpf: Add USDT notes parsing and resolution logic") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20241121224558.796110-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-01-08 14:58:04 -08:00
Martin Karsten	0d822312fa	net: Add napi_struct parameter irq_suspend_timeout Add a per-NAPI IRQ suspension parameter, which can be get/set with netdev-genl. This patch doesn't change any behavior but prepares the code for other changes in the following commits which use irq_suspend_timeout as a timeout for IRQ suspension. Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca> Co-developed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Joe Damato <jdamato@fastly.com> Tested-by: Martin Karsten <mkarsten@uwaterloo.ca> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Link: https://patch.msgid.link/20241109050245.191288-2-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-08 14:58:04 -08:00
Daniel Borkmann	ba2ba19f6d	tools: Sync if_link.h uapi tooling header Sync if_link uapi header to the latest version as we need the refresher in tooling for netkit device. Given it's been a while since the last sync and the diff is fairly big, it has been done as its own commit. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20241004101335.117711-4-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-01-08 14:58:04 -08:00
Joe Damato	c9a728c329	netdev-genl: Support setting per-NAPI config values Add support to set per-NAPI defer_hard_irqs and gro_flush_timeout. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241011184527.16393-7-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-08 14:58:04 -08:00

1 2 3 4 5 ...

2601 Commits