Commit Graph

2658 Commits

Author SHA1 Message Date
Eduard Zingerman
5f4a34c606 libbpf: __arg_untrusted in bpf_helpers.h
Make btf_decl_tag("arg:untrusted") available for libbpf users via
macro. Makes the following usage possible:

  void foo(struct bar *p __arg_untrusted) { ... }
  void bar(struct foo *p __arg_trusted) {
    ...
    foo(p->buz->bar); // buz derefrence looses __trusted
    ...
  }

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250704230354.1323244-6-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 08:57:15 -07:00
Kumar Kartikeya Dwivedi
c2850a3840 libbpf: Introduce bpf_prog_stream_read() API
Introduce a libbpf API so that users can read data from a given BPF
stream for a BPF prog fd. For now, only the low-level syscall wrapper
is provided, we can add a bpf_program__* accessor as a follow up if
needed.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 08:57:15 -07:00
Kumar Kartikeya Dwivedi
9bb5c46da4 libbpf: Add bpf_stream_printk() macro
Add a convenience macro to print data to the BPF streams. BPF_STDOUT and
BPF_STDERR stream IDs in the vmlinux.h can be passed to the macro to
print to the respective streams.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-10-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 08:57:15 -07:00
Kumar Kartikeya Dwivedi
ae131a0b7c bpf: Introduce BPF standard streams
Add support for a stream API to the kernel and expose related kfuncs to
BPF programs. Two streams are exposed, BPF_STDOUT and BPF_STDERR. These
can be used for printing messages that can be consumed from user space,
thus it's similar in spirit to existing trace_pipe interface.

The kernel will use the BPF_STDERR stream to notify the program of any
errors encountered at runtime. BPF programs themselves may use both
streams for writing debug messages. BPF library-like code may use
BPF_STDERR to print warnings or errors on misuse at runtime.

The implementation of a stream is as follows. Everytime a message is
emitted from the kernel (directly, or through a BPF program), a record
is allocated by bump allocating from per-cpu region backed by a page
obtained using alloc_pages_nolock(). This ensures that we can allocate
memory from any context. The eventual plan is to discard this scheme in
favor of Alexei's kmalloc_nolock() [0].

This record is then locklessly inserted into a list (llist_add()) so
that the printing side doesn't require holding any locks, and works in
any context. Each stream has a maximum capacity of 4MB of text, and each
printed message is accounted against this limit.

Messages from a program are emitted using the bpf_stream_vprintk kfunc,
which takes a stream_id argument in addition to working otherwise
similar to bpf_trace_vprintk.

The bprintf buffer helpers are extracted out to be reused for printing
the string into them before copying it into the stream, so that we can
(with the defined max limit) format a string and know its true length
before performing allocations of the stream element.

For consuming elements from a stream, we expose a bpf(2) syscall command
named BPF_PROG_STREAM_READ_BY_FD, which allows reading data from the
stream of a given prog_fd into a user space buffer. The main logic is
implemented in bpf_stream_read(). The log messages are queued in
bpf_stream::log by the bpf_stream_vprintk kfunc, and then pulled and
ordered correctly in the stream backlog.

For this purpose, we hold a lock around bpf_stream_backlog_peek(), as
llist_del_first() (if we maintained a second lockless list for the
backlog) wouldn't be safe from multiple threads anyway. Then, if we
fail to find something in the backlog log, we splice out everything from
the lockless log, and place it in the backlog log, and then return the
head of the backlog. Once the full length of the element is consumed, we
will pop it and free it.

The lockless list bpf_stream::log is a LIFO stack. Elements obtained
using a llist_del_all() operation are in LIFO order, thus would break
the chronological ordering if printed directly. Hence, this batch of
messages is first reversed. Then, it is stashed into a separate list in
the stream, i.e. the backlog_log. The head of this list is the actual
message that should always be returned to the caller. All of this is
done in bpf_stream_backlog_fill().

From the kernel side, the writing into the stream will be a bit more
involved than the typical printk. First, the kernel typically may print
a collection of messages into the stream, and parallel writers into the
stream may suffer from interleaving of messages. To ensure each group of
messages is visible atomically, we can lift the advantage of using a
lockless list for pushing in messages.

To enable this, we add a bpf_stream_stage() macro, and require kernel
users to use bpf_stream_printk statements for the passed expression to
write into the stream. Underneath the macro, we have a message staging
API, where a bpf_stream_stage object on the stack accumulates the
messages being printed into a local llist_head, and then a commit
operation splices the whole batch into the stream's lockless log list.

This is especially pertinent for rqspinlock deadlock messages printed to
program streams. After this change, we see each deadlock invocation as a
non-interleaving contiguous message without any confusion on the
reader's part, improving their user experience in debugging the fault.

While programs cannot benefit from this staged stream writing API, they
could just as well hold an rqspinlock around their print statements to
serialize messages, hence this is kept kernel-internal for now.

Overall, this infrastructure provides NMI-safe any context printing of
messages to two dedicated streams.

Later patches will add support for printing splats in case of BPF arena
page faults, rqspinlock deadlocks, and cond_break timeouts, and
integration of this facility into bpftool for dumping messages to user
space.

  [0]: https://lore.kernel.org/bpf/20250501032718.65476-1-alexei.starovoitov@gmail.com

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 08:57:15 -07:00
Andrii Nakryiko
7a6e6b484d sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   4a4b84ba9e453295c746d81cb245c0c5d80050f0
Checkpoint bpf-next commit: 07ee18a0bc946b6b407942c88faed089e20f47d1
Baseline bpf commit:        a766cfbbeb3a
Checkpoint bpf commit:      e34a79b96ab9d49ed8b605fee11099cf3efbb428

Adin Scannell (1):
  libbpf: Fix possible use-after-free for externs

Yuan Chen (1):
  libbpf: Fix null pointer dereference in btf_dump__free on allocation
    failure

 src/btf_dump.c |  3 +++
 src/libbpf.c   | 10 +++++++---
 2 files changed, 10 insertions(+), 3 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-07 17:21:39 -07:00
Andrii Nakryiko
dc031df06a sync: update .mailmap
Update .mailmap based on libbpf's list of contributors and on the latest
.mailmap version in the upstream repository.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-01 15:38:22 -07:00
Adin Scannell
b58f5a3e77 libbpf: Fix possible use-after-free for externs
The `name` field in `obj->externs` points into the BTF data at initial
open time. However, some functions may invalidate this after opening and
before loading (e.g. `bpf_map__set_value_size`), which results in
pointers into freed memory and undefined behavior.

The simplest solution is to simply `strdup` these strings, similar to
the `essent_name`, and free them at the same time.

In order to test this path, the `global_map_resize` BPF selftest is
modified slightly to ensure the presence of an extern, which causes this
test to fail prior to the fix. Given there isn't an obvious API or error
to test against, I opted to add this to the existing test as an aspect
of the resizing feature rather than duplicate the test.

Fixes: 9d0a23313b1a ("libbpf: Add capability for resizing datasec maps")
Signed-off-by: Adin Scannell <amscanne@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250625050215.2777374-1-amscanne@meta.com
2025-07-01 15:38:22 -07:00
Yuan Chen
de1d0a25a8 libbpf: Fix null pointer dereference in btf_dump__free on allocation failure
When btf_dump__new() fails to allocate memory for the internal hashmap
(btf_dump->type_names), it returns an error code. However, the cleanup
function btf_dump__free() does not check if btf_dump->type_names is NULL
before attempting to free it. This leads to a null pointer dereference
when btf_dump__free() is called on a btf_dump object.

Fixes: 351131b51c7a ("libbpf: add btf_dump API for BTF-to-C conversion")
Signed-off-by: Yuan Chen <chenyuan@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250618011933.11423-1-chenyuan_fl@163.com
2025-07-01 15:38:22 -07:00
Andrii Nakryiko
95a9035e8b sync: adjust sync-kernel.sh script to handle UAPI header guards better
Adjust the sync script to handle UAPI header guards with singular and
double underscore between UAPI and LINUX. Kernel seems to have a mix of
both approaches.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-06-16 11:30:04 -07:00
Andrii Nakryiko
0e2ac81b00 sync: normalize more of Linux UAPI headers
Normalize UAPI headers that had single underscore between UAPI and LINUX.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-06-16 11:30:04 -07:00
Amery Hung
fdb04dd485 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   9325d53fe9adff354b6a93fda5f38c165947da0f
Checkpoint bpf-next commit: 4a4b84ba9e453295c746d81cb245c0c5d80050f0
Baseline bpf commit:        b4432656b36e5cc1d50a1f2dc15357543add530e
Checkpoint bpf commit:      d60d09eadb7cb17690c847f1623436cd4b58c19c

Alan Maguire (1):
  libbpf/btf: Fix string handling to support multi-split BTF

Amery Hung (1):
  libbpf: Support creating and destroying qdisc

Andrii Nakryiko (1):
  libbpf: Handle unsupported mmap-based /sys/kernel/btf/vmlinux
    correctly

Blake Jones (1):
  libbpf: Add support for printing BTF character arrays as strings

Ian Rogers (1):
  perf/uapi: Fix PERF_RECORD_SAMPLE comments in
    <uapi/linux/perf_event.h>

Ingo Molnar (1):
  perf/uapi: Clean up <uapi/linux/perf_event.h> a bit

Jiawei Zhao (1):
  libbpf: Correct some typos and syntax issues in usdt doc

Lorenz Bauer (1):
  libbpf: Use mmap to parse vmlinux BTF from sysfs

Paul Chaignon (2):
  bpf: Clarify handling of mark and tstamp by redirect_peer
  bpf: Fix L4 csum update on IPv6 in CHECKSUM_COMPLETE

Saket Kumar Bhaskar (1):
  selftests/bpf: Fix bpf selftest build warning

Stanislav Fomichev (1):
  net: devmem: TCP tx netlink api

Tao Chen (2):
  bpf: Add cookie to raw_tp bpf_link_info
  bpf: Add cookie to tracing bpf_link_info

Tobias Klauser (1):
  bpf: adjust path to trace_output sample eBPF program

Yonghong Song (2):
  bpf: Implement mprog API on top of existing cgroup progs
  libbpf: Support link-based cgroup attach with options

 include/uapi/linux/bpf.h        |  18 +-
 include/uapi/linux/if_xdp.h     |   6 +-
 include/uapi/linux/netdev.h     |   1 +
 include/uapi/linux/perf_event.h | 657 ++++++++++++++++----------------
 src/bpf.c                       |  44 +++
 src/bpf.h                       |   5 +
 src/btf.c                       |  91 ++++-
 src/btf.h                       |   3 +-
 src/btf_dump.c                  |  55 ++-
 src/libbpf.c                    |  28 ++
 src/libbpf.h                    |  20 +-
 src/libbpf.map                  |   1 +
 src/netlink.c                   |  20 +-
 src/usdt.c                      |  10 +-
 14 files changed, 602 insertions(+), 357 deletions(-)

Signed-off-by: Amery Hung <ameryhung@gmail.com>
2025-06-16 08:52:44 -07:00
Amery Hung
f6284bb875 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
2025-06-16 08:52:44 -07:00
Andrii Nakryiko
7356be641d libbpf: Handle unsupported mmap-based /sys/kernel/btf/vmlinux correctly
libbpf_err_ptr() helpers are meant to return NULL and set errno, if
there is an error. But btf_parse_raw_mmap() is meant to be used
internally and is expected to return ERR_PTR() values. Because of this
mismatch, when libbpf tries to mmap /sys/kernel/btf/vmlinux, we don't
detect the error correctly with IS_ERR() check, and never fallback to
old non-mmap-based way of loading vmlinux BTF.

Fix this by using proper ERR_PTR() returns internally.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes: 3c0421c93ce4 ("libbpf: Use mmap to parse vmlinux BTF from sysfs")
Cc: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250606202134.2738910-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-16 08:52:44 -07:00
Paul Chaignon
d3d933ac9b bpf: Fix L4 csum update on IPv6 in CHECKSUM_COMPLETE
In Cilium, we use bpf_csum_diff + bpf_l4_csum_replace to, among other
things, update the L4 checksum after reverse SNATing IPv6 packets. That
use case is however not currently supported and leads to invalid
skb->csum values in some cases. This patch adds support for IPv6 address
changes in bpf_l4_csum_update via a new flag.

When calling bpf_l4_csum_replace in Cilium, it ends up calling
inet_proto_csum_replace_by_diff:

    1:  void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
    2:                                       __wsum diff, bool pseudohdr)
    3:  {
    4:      if (skb->ip_summed != CHECKSUM_PARTIAL) {
    5:          csum_replace_by_diff(sum, diff);
    6:          if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
    7:              skb->csum = ~csum_sub(diff, skb->csum);
    8:      } else if (pseudohdr) {
    9:          *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum)));
    10:     }
    11: }

The bug happens when we're in the CHECKSUM_COMPLETE state. We've just
updated one of the IPv6 addresses. The helper now updates the L4 header
checksum on line 5. Next, it updates skb->csum on line 7. It shouldn't.

For an IPv6 packet, the updates of the IPv6 address and of the L4
checksum will cancel each other. The checksums are set such that
computing a checksum over the packet including its checksum will result
in a sum of 0. So the same is true here when we update the L4 checksum
on line 5. We'll update it as to cancel the previous IPv6 address
update. Hence skb->csum should remain untouched in this case.

The same bug doesn't affect IPv4 packets because, in that case, three
fields are updated: the IPv4 address, the IP checksum, and the L4
checksum. The change to the IPv4 address and one of the checksums still
cancel each other in skb->csum, but we're left with one checksum update
and should therefore update skb->csum accordingly. That's exactly what
inet_proto_csum_replace_by_diff does.

This special case for IPv6 L4 checksums is also described atop
inet_proto_csum_replace16, the function we should be using in this case.

This patch introduces a new bpf_l4_csum_replace flag, BPF_F_IPV6,
to indicate that we're updating the L4 checksum of an IPv6 packet. When
the flag is set, inet_proto_csum_replace_by_diff will skip the
skb->csum update.

Fixes: 7d672345ed295 ("bpf: add generic bpf_csum_diff helper")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/96a6bc3a443e6f0b21ff7b7834000e17fb549e05.1748509484.git.paul.chaignon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16 08:52:44 -07:00
Tobias Klauser
161752932b bpf: adjust path to trace_output sample eBPF program
The sample file was renamed from trace_output_kern.c to
trace_output.bpf.c in commit d4fffba4d04b ("samples/bpf: Change _kern
suffix to .bpf with syscall tracing program"). Adjust the path in the
documentation comment for bpf_perf_event_output.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Link: https://lore.kernel.org/r/20250610140756.16332-1-tklauser@distanz.ch
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-16 08:52:44 -07:00
Tao Chen
ae67e5966a bpf: Add cookie to tracing bpf_link_info
bpf_tramp_link includes cookie info, we can add it in bpf_link_info.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606165818.3394397-1-chen.dylane@linux.dev
2025-06-16 08:52:44 -07:00
Yonghong Song
553676e8f5 libbpf: Support link-based cgroup attach with options
Currently libbpf supports bpf_program__attach_cgroup() with signature:
  LIBBPF_API struct bpf_link *
  bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd);

To support mprog style attachment, additionsl fields like flags,
relative_{fd,id} and expected_revision are needed.

Add a new API:
  LIBBPF_API struct bpf_link *
  bpf_program__attach_cgroup_opts(const struct bpf_program *prog, int cgroup_fd,
                                  const struct bpf_cgroup_opts *opts);
where bpf_cgroup_opts contains all above needed fields.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163146.2429212-1-yonghong.song@linux.dev
2025-06-16 08:52:44 -07:00
Yonghong Song
15dfc869f8 bpf: Implement mprog API on top of existing cgroup progs
Current cgroup prog ordering is appending at attachment time. This is not
ideal. In some cases, users want specific ordering at a particular cgroup
level. To address this, the existing mprog API seems an ideal solution with
supporting BPF_F_BEFORE and BPF_F_AFTER flags.

But there are a few obstacles to directly use kernel mprog interface.
Currently cgroup bpf progs already support prog attach/detach/replace
and link-based attach/detach/replace. For example, in struct
bpf_prog_array_item, the cgroup_storage field needs to be together
with bpf prog. But the mprog API struct bpf_mprog_fp only has bpf_prog
as the member, which makes it difficult to use kernel mprog interface.

In another case, the current cgroup prog detach tries to use the
same flag as in attach. This is different from mprog kernel interface
which uses flags passed from user space.

So to avoid modifying existing behavior, I made the following changes to
support mprog API for cgroup progs:
 - The support is for prog list at cgroup level. Cross-level prog list
   (a.k.a. effective prog list) is not supported.
 - Previously, BPF_F_PREORDER is supported only for prog attach, now
   BPF_F_PREORDER is also supported by link-based attach.
 - For attach, BPF_F_BEFORE/BPF_F_AFTER/BPF_F_ID/BPF_F_LINK is supported
   similar to kernel mprog but with different implementation.
 - For detach and replace, use the existing implementation.
 - For attach, detach and replace, the revision for a particular prog
   list, associated with a particular attach type, will be updated
   by increasing count by 1.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163141.2428937-1-yonghong.song@linux.dev
2025-06-16 08:52:44 -07:00
Blake Jones
439433a909 libbpf: Add support for printing BTF character arrays as strings
The BTF dumper code currently displays arrays of characters as just that -
arrays, with each character formatted individually. Sometimes this is what
makes sense, but it's nice to be able to treat that array as a string.

This change adds a special case to the btf_dump functionality to allow
0-terminated arrays of single-byte integer values to be printed as
character strings. Characters for which isprint() returns false are
printed as hex-escaped values. This is enabled when the new ".emit_strings"
is set to 1 in the btf_dump_type_data_opts structure.

As an example, here's what it looks like to dump the string "hello" using
a few different field values for btf_dump_type_data_opts (.compact = 1):

- .emit_strings = 0, .skip_names = 0:  (char[6])['h','e','l','l','o',]
- .emit_strings = 0, .skip_names = 1:  ['h','e','l','l','o',]
- .emit_strings = 1, .skip_names = 0:  (char[6])"hello"
- .emit_strings = 1, .skip_names = 1:  "hello"

Here's the string "h\xff", dumped with .compact = 1 and .skip_names = 1:

- .emit_strings = 0:  ['h',-1,]
- .emit_strings = 1:  "h\xff"

Signed-off-by: Blake Jones <blakejones@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250603203701.520541-1-blakejones@google.com
2025-06-16 08:52:44 -07:00
Jiawei Zhao
224ea3ec50 libbpf: Correct some typos and syntax issues in usdt doc
Fix some incorrect words, such as "and" -> "an", "it's" -> "its".  Fix
some grammar issues, such as removing redundant "will", "would
complicated" -> "would complicate".

Signed-off-by: Jiawei Zhao <Phoenix500526@163.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250531095111.57824-1-Phoenix500526@163.com
2025-06-16 08:52:44 -07:00
Tao Chen
b73864fc10 bpf: Add cookie to raw_tp bpf_link_info
After commit 68ca5d4eebb8 ("bpf: support BPF cookie in raw tracepoint
(raw_tp, tp_btf) programs"), we can show the cookie in bpf_link_info
like kprobe etc.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250603154309.3063644-1-chen.dylane@linux.dev
2025-06-16 08:52:44 -07:00
Lorenz Bauer
01dd871a20 libbpf: Use mmap to parse vmlinux BTF from sysfs
Teach libbpf to use mmap when parsing vmlinux BTF from /sys. We don't
apply this to fall-back paths on the regular file system because there
is no way to ensure that modifications underlying the MAP_PRIVATE
mapping are not visible to the process.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250520-vmlinux-mmap-v5-3-e8c941acc414@isovalent.com
2025-06-16 08:52:44 -07:00
Alan Maguire
8df8d67f63 libbpf/btf: Fix string handling to support multi-split BTF
libbpf handling of split BTF has been written largely with the
assumption that multiple splits are possible, i.e. split BTF on top of
split BTF on top of base BTF.  One area where this does not quite work
is string handling in split BTF; the start string offset should be the
base BTF string section length + the base BTF string offset.  This
worked in the past because for a single split BTF with base the start
string offset was always 0.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250519165935.261614-2-alan.maguire@oracle.com
2025-06-16 08:52:44 -07:00
Stanislav Fomichev
31bb8f7936 net: devmem: TCP tx netlink api
Add bind-tx netlink call to attach dmabuf for TX; queue is not
required, only ifindex and dmabuf fd for attachment.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250508004830.4100853-4-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-16 08:51:34 -07:00
Amery Hung
f580871b42 libbpf: Support creating and destroying qdisc
Extend struct bpf_tc_hook with handle, qdisc name and a new attach type,
BPF_TC_QDISC, to allow users to add or remove any qdisc specified in
addition to clsact.

Signed-off-by: Amery Hung <amery.hung@bytedance.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250409214606.2000194-8-ameryhung@gmail.com
2025-06-16 08:51:34 -07:00
Ingo Molnar
52b9b38a22 perf/uapi: Clean up <uapi/linux/perf_event.h> a bit
When applying a recent commit to the <uapi/linux/perf_event.h>
header I noticed that we have accumulated quite a bit of
historic noise in this header, so do a bit of spring cleaning:

 - Define bitfields in a vertically aligned fashion, like
   perf_event_mmap_page::capabilities already does. This
   makes it easier to see the distribution and sizing of
   bits within a word, at a glance. The following is much
   more readable:

			__u64	cap_bit0		: 1,
				cap_bit0_is_deprecated	: 1,
				cap_user_rdpmc		: 1,
				cap_user_time		: 1,
				cap_user_time_zero	: 1,
				cap_user_time_short	: 1,
				cap_____res		: 58;

   Than:

			__u64	cap_bit0:1,
				cap_bit0_is_deprecated:1,
				cap_user_rdpmc:1,
				cap_user_time:1,
				cap_user_time_zero:1,
				cap_user_time_short:1,
				cap_____res:58;

   So convert all bitfield definitions from the latter style to the
   former style.

 - Fix typos and grammar

 - Fix capitalization

 - Remove whitespace noise

 - Harmonize the definitions of various generations and groups of
   PERF_MEM_ ABI values.

 - Vertically align all definitions and assignments to the same
   column (48), as the first definition (enum perf_type_id),
   throughout the entire header.

 - And in general make the code and comments to be more in sync
   with each other and to be more readable overall.

No change in functionality.

Copy the changes over to tools/include/uapi/linux/perf_event.h.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250521221529.2547099-1-irogers@google.com
2025-06-16 08:51:34 -07:00
Ian Rogers
bae6a269ca perf/uapi: Fix PERF_RECORD_SAMPLE comments in <uapi/linux/perf_event.h>
AAUX data for PERF_SAMPLE_AUX appears last. PERF_SAMPLE_CGROUP is
missing from the comment.

This makes the <uapi/linux/perf_event.h> comment match that in the
perf_event_open man page.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20250521221529.2547099-1-irogers@google.com
2025-06-16 08:51:34 -07:00
Paul Chaignon
9b06cd15e0 bpf: Clarify handling of mark and tstamp by redirect_peer
When switching network namespaces with the bpf_redirect_peer helper, the
skb->mark and skb->tstamp fields are not zeroed out like they can be on
a typical netns switch. This patch clarifies that in the helper
description.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/ccc86af26d43c5c0b776bcba2601b7479c0d46d0.1746460653.git.paul.chaignon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16 08:51:34 -07:00
Andrii Nakryiko
346532d711 sync: one-time strip out _UAPI prefix from UAPI header guards
Normalize already synced UAPI headers. For all subsequent syncs this
will be done automatically by the sync script.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-06-13 13:18:23 -07:00
Andrii Nakryiko
15c5317b6c sync: update sync script to strip _UAPI prefix in UAPI headers
It's expected that kernel UAPI headers have #ifndef guards starting with
__LINUX prefix, while in the kernel source code these guards are
actually starting with _UAPI__LINUX. The stripping of _UAPI prefix is
done (among other things) by kernel's scripts/headers_install.sh script.

Given libbpf vendors its own UAPI header under include/uapi subdir, and
those "internal" UAPI headers are sometimes used by libbpf users for
convenience, let's stick to the __LINUX prefix rule and do that during
the sync.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-06-13 13:18:23 -07:00
Mykyta Yatsenko
02bdeb7a2c sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   8e64c387c942229c551d0f23de4d9993d3a2acb6
Checkpoint bpf-next commit: 9325d53fe9adff354b6a93fda5f38c165947da0f
Baseline bpf commit:        b4432656b36e5cc1d50a1f2dc15357543add530e
Checkpoint bpf commit:      b4432656b36e5cc1d50a1f2dc15357543add530e

Andrii Nakryiko (1):
  libbpf: Improve BTF dedup handling of "identical" BTF types

Anton Protopopov (3):
  libbpf: Use proper errno value in linker
  bpf: Fix uninitialized values in BPF_{CORE,PROBE}_READ
  libbpf: Use proper errno value in nlattr

Jiri Olsa (1):
  bpf: Add support to retrieve ref_ctr_offset for uprobe perf link

Mykyta Yatsenko (1):
  libbpf: Check bpf_map_skeleton link for NULL

 include/uapi/linux/bpf.h |   1 +
 src/bpf_core_read.h      |   6 ++
 src/btf.c                | 137 +++++++++++++++++++++++++--------------
 src/libbpf.c             |   6 ++
 src/linker.c             |   4 +-
 src/nlattr.c             |  15 ++---
 6 files changed, 111 insertions(+), 58 deletions(-)

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
v1.6.0p_netdata v1.6.3p_netdata
2025-05-19 10:07:42 -07:00
Mykyta Yatsenko
453601a65a libbpf: Check bpf_map_skeleton link for NULL
Avoid dereferencing bpf_map_skeleton's link field if it's NULL.
If BPF map skeleton is created with the size, that indicates containing
link field, but the field was not actually initialized with valid
bpf_link pointer, libbpf crashes. This may happen when using libbpf-rs
skeleton.
Skeleton loading may still progress, but user needs to attach struct_ops
map separately.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250514113220.219095-1-mykyta.yatsenko5@gmail.com
2025-05-19 10:07:42 -07:00
Anton Protopopov
8e34ca4e8f libbpf: Use proper errno value in nlattr
Return value of the validate_nla() function can be propagated all the
way up to users of libbpf API. In case of error this libbpf version
of validate_nla returns -1 which will be seen as -EPERM from user's
point of view. Instead, return a more reasonable -EINVAL.

Fixes: bbf48c18ee0c ("libbpf: add error reporting in XDP")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250510182011.2246631-1-a.s.protopopov@gmail.com
2025-05-19 10:07:42 -07:00
Jiri Olsa
eda0e4ca46 bpf: Add support to retrieve ref_ctr_offset for uprobe perf link
Adding support to retrieve ref_ctr_offset for uprobe perf link,
which got somehow omitted from the initial uprobe link info changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/bpf/20250509153539.779599-2-jolsa@kernel.org
2025-05-19 10:07:42 -07:00
Andrii Nakryiko
5ee9fbf7d7 libbpf: Improve BTF dedup handling of "identical" BTF types
BTF dedup has a strong assumption that compiler with deduplicate identical
types within any given compilation unit (i.e., .c file). This property
is used when establishing equilvalence of two subgraphs of types.

Unfortunately, this property doesn't always holds in practice. We've
seen cases of having truly identical structs, unions, array definitions,
and, most recently, even pointers to the same type being duplicated
within CU.

Previously, we mitigated this on a case-by-case basis, adding a few
simple heuristics for validating that two BTF types (having two
different type IDs) are structurally the same. But this approach scales
poorly, and we can have more weird cases come up in the future.

So let's take a half-step back, and implement a bit more generic
structural equivalence check, recursively. We still limit it to
reasonable depth to avoid long reference loops. Depth-wise limiting of
potentially cyclical graph isn't great, but as I mentioned below doesn't
seem to be detrimental performance-wise. We can always improve this in
the future with per-type visited markers, if necessary.

Performance-wise this doesn't seem too affect vmlinux BTF dedup, which
makes sense because this logic kicks in not so frequently and only if we
already established a canonical candidate type match, but suddenly find
a different (but probably identical) type.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20250501235231.1339822-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-19 10:07:42 -07:00
Anton Protopopov
d88ca95133 bpf: Fix uninitialized values in BPF_{CORE,PROBE}_READ
With the latest LLVM bpf selftests build will fail with
the following error message:

    progs/profiler.inc.h:710:31: error: default initialization of an object of type 'typeof ((parent_task)->real_cred->uid.val)' (aka 'const unsigned int') leaves the object uninitialized and is incompatible with C++ [-Werror,-Wdefault-const-init-unsafe]
      710 |         proc_exec_data->parent_uid = BPF_CORE_READ(parent_task, real_cred, uid.val);
          |                                      ^
    tools/testing/selftests/bpf/tools/include/bpf/bpf_core_read.h:520:35: note: expanded from macro 'BPF_CORE_READ'
      520 |         ___type((src), a, ##__VA_ARGS__) __r;                               \
          |                                          ^

This happens because BPF_CORE_READ (and other macro) declare the
variable __r using the ___type macro which can inherit const modifier
from intermediate types.

Fix this by using __typeof_unqual__, when supported. (And when it
is not supported, the problem shouldn't appear, as older compilers
haven't complained.)

Fixes: 792001f4f7aa ("libbpf: Add user-space variants of BPF_CORE_READ() family of macros")
Fixes: a4b09a9ef945 ("libbpf: Add non-CO-RE variants of BPF_CORE_READ() macro family")
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250502193031.3522715-1-a.s.protopopov@gmail.com
2025-05-19 10:07:42 -07:00
Anton Protopopov
28deee2663 libbpf: Use proper errno value in linker
Return values of the linker_append_sec_data() and the
linker_append_elf_relos() functions are propagated all the
way up to users of libbpf API. In some error cases these
functions return -1 which will be seen as -EPERM from user's
point of view. Instead, return a more reasonable -EINVAL.

Fixes: faf6ed321cf6 ("libbpf: Add BPF static linker APIs")
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250430120820.2262053-1-a.s.protopopov@gmail.com
2025-05-19 10:07:42 -07:00
Andrii Nakryiko
374f7807e1 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   25601e85441dd91cf7973b002f27af4c5b8691ea
Checkpoint bpf-next commit: 8e64c387c942229c551d0f23de4d9993d3a2acb6
Baseline bpf commit:        3f8ad18f8184
Checkpoint bpf commit:      b4432656b36e5cc1d50a1f2dc15357543add530e

Alan Maguire (1):
  libbpf: Add identical pointer detection to btf_dedup_is_equiv()

Anton Protopopov (2):
  bpf: Fix a comment describing bpf_attr
  libbpf: Add likely/unlikely macros and use them in selftests

Carlos Llamas (1):
  libbpf: Fix implicit memfd_create() for bionic

Feng Yang (1):
  libbpf: Fix event name too long error

Ihor Solodrai (1):
  libbpf: Verify section type in btf_find_elf_sections

Jonathan Wiepert (1):
  Use thread-safe function pointer in libbpf_print

Mykyta Yatsenko (1):
  libbpf: Add getters for BTF.ext func and line info

Paul Chaignon (2):
  bpf: Clarify role of BPF_F_RECOMPUTE_CSUM
  bpf: Clarify the meaning of BPF_F_PSEUDO_HDR

Tao Chen (1):
  libbpf: Remove sample_period init in perf_buffer

Viktor Malik (1):
  libbpf: Fix buffer overflow in bpf_object__init_prog

 include/uapi/linux/bpf.h | 18 +++++----
 src/bpf_helpers.h        |  8 ++++
 src/btf.c                | 22 +++++++++++
 src/libbpf.c             | 81 +++++++++++++++++++++-------------------
 src/libbpf.h             |  6 +++
 src/libbpf.map           |  4 ++
 src/libbpf_internal.h    |  9 +++++
 src/linker.c             |  2 +-
 8 files changed, 103 insertions(+), 47 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-04-29 11:33:37 -07:00
Andrii Nakryiko
27dc274f68 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-04-29 11:33:37 -07:00
Alan Maguire
c8a28812fb libbpf: Add identical pointer detection to btf_dedup_is_equiv()
Recently as a side-effect of

commit ac053946f5c4 ("compiler.h: introduce TYPEOF_UNQUAL() macro")

issues were observed in deduplication between modules and kernel BTF
such that a large number of kernel types were not deduplicated so
were found in module BTF (task_struct, bpf_prog etc).  The root cause
appeared to be a failure to dedup struct types, specifically those
with members that were pointers with __percpu annotations.

The issue in dedup is at the point that we are deduplicating structures,
we have not yet deduplicated reference types like pointers.  If multiple
copies of a pointer point at the same (deduplicated) integer as in this
case, we do not see them as identical.  Special handling already exists
to deal with structures and arrays, so add pointer handling here too.

Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250429161042.2069678-1-alan.maguire@oracle.com
2025-04-29 11:33:37 -07:00
Jonathan Wiepert
88ae865423 Use thread-safe function pointer in libbpf_print
This patch fixes a thread safety bug where libbpf_print uses the
global variable storing the print function pointer rather than the local
variable that had the print function set via __atomic_load_n.

Fixes: f1cb927cdb62 ("libbpf: Ensure print callback usage is thread-safe")
Signed-off-by: Jonathan Wiepert <jonathan.wiepert@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
Link: https://lore.kernel.org/bpf/20250424221457.793068-1-jonathan.wiepert@gmail.com
2025-04-29 11:33:37 -07:00
Tao Chen
a2dc135196 libbpf: Remove sample_period init in perf_buffer
It seems that sample_period is not used in perf buffer. Actually, only
wakeup_events are meaningful to enable events aggregation for wakeup notification.
Remove sample_period setting code to avoid confusion.

Fixes: fb84b8224655 ("libbpf: add perf buffer API")
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/bpf/20250423163901.2983689-1-chen.dylane@linux.dev
2025-04-29 11:33:37 -07:00
Feng Yang
715808d3e2 libbpf: Fix event name too long error
When the binary path is excessively long, the generated probe_name in libbpf
exceeds the kernel's MAX_EVENT_NAME_LEN limit (64 bytes).
This causes legacy uprobe event attachment to fail with error code -22.

The fix reorders the fields to place the unique ID before the name.
This ensures that even if truncation occurs via snprintf, the unique ID
remains intact, preserving event name uniqueness. Additionally, explicit
checks with MAX_EVENT_NAME_LEN are added to enforce length constraints.

Before Fix:
	./test_progs -t attach_probe/kprobe-long_name
	......
	libbpf: failed to add legacy kprobe event for 'bpf_testmod_looooooooooooooooooooooooooooooong_name+0x0': -EINVAL
	libbpf: prog 'handle_kprobe': failed to create kprobe 'bpf_testmod_looooooooooooooooooooooooooooooong_name+0x0' perf event: -EINVAL
	test_attach_kprobe_long_event_name:FAIL:attach_kprobe_long_event_name unexpected error: -22
	test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec
	#13/11   attach_probe/kprobe-long_name:FAIL
	#13      attach_probe:FAIL

	./test_progs -t attach_probe/uprobe-long_name
	......
	libbpf: failed to add legacy uprobe event for /root/linux-bpf/bpf-next/tools/testing/selftests/bpf/test_progs:0x13efd9: -EINVAL
	libbpf: prog 'handle_uprobe': failed to create uprobe '/root/linux-bpf/bpf-next/tools/testing/selftests/bpf/test_progs:0x13efd9' perf event: -EINVAL
	test_attach_uprobe_long_event_name:FAIL:attach_uprobe_long_event_name unexpected error: -22
	#13/10   attach_probe/uprobe-long_name:FAIL
	#13      attach_probe:FAIL
After Fix:
	./test_progs -t attach_probe/uprobe-long_name
	#13/10   attach_probe/uprobe-long_name:OK
	#13      attach_probe:OK
	Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

	./test_progs -t attach_probe/kprobe-long_name
	#13/11   attach_probe/kprobe-long_name:OK
	#13      attach_probe:OK
	Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

Fixes: 46ed5fc33db9 ("libbpf: Refactor and simplify legacy kprobe code")
Fixes: cc10623c6810 ("libbpf: Add legacy uprobe attaching support")
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250417014848.59321-2-yangfeng59949@163.com
2025-04-29 11:33:37 -07:00
Ihor Solodrai
02bc656f90 libbpf: Verify section type in btf_find_elf_sections
A valid ELF file may contain a SHT_NOBITS .BTF section. This case is
not handled correctly in btf_parse_elf, which leads to a segfault.

Before attempting to load BTF section data, check that the section
type is SHT_PROGBITS, which is the expected type for BTF data.  Fail
with an error if the type is different.

Bug report: https://github.com/libbpf/libbpf/issues/894
v1: https://lore.kernel.org/bpf/20250408184104.3962949-1-ihor.solodrai@linux.dev/

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250410182823.1591681-1-ihor.solodrai@linux.dev
2025-04-29 11:33:37 -07:00
Viktor Malik
806b4e0a9f libbpf: Fix buffer overflow in bpf_object__init_prog
As shown in [1], it is possible to corrupt a BPF ELF file such that
arbitrary BPF instructions are loaded by libbpf. This can be done by
setting a symbol (BPF program) section offset to a large (unsigned)
number such that <section start + symbol offset> overflows and points
before the section data in the memory.

Consider the situation below where:
- prog_start = sec_start + symbol_offset    <-- size_t overflow here
- prog_end   = prog_start + prog_size

    prog_start        sec_start        prog_end        sec_end
        |                |                 |              |
        v                v                 v              v
    .....................|################################|............

The report in [1] also provides a corrupted BPF ELF which can be used as
a reproducer:

    $ readelf -S crash
    Section Headers:
      [Nr] Name              Type             Address           Offset
           Size              EntSize          Flags  Link  Info  Align
    ...
      [ 2] uretprobe.mu[...] PROGBITS         0000000000000000  00000040
           0000000000000068  0000000000000000  AX       0     0     8

    $ readelf -s crash
    Symbol table '.symtab' contains 8 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
    ...
         6: ffffffffffffffb8   104 FUNC    GLOBAL DEFAULT    2 handle_tp

Here, the handle_tp prog has section offset ffffffffffffffb8, i.e. will
point before the actual memory where section 2 is allocated.

This is also reported by AddressSanitizer:

    =================================================================
    ==1232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7302fe0000 at pc 0x7fc3046e4b77 bp 0x7ffe64677cd0 sp 0x7ffe64677490
    READ of size 104 at 0x7c7302fe0000 thread T0
        #0 0x7fc3046e4b76 in memcpy (/lib64/libasan.so.8+0xe4b76)
        #1 0x00000040df3e in bpf_object__init_prog /src/libbpf/src/libbpf.c:856
        #2 0x00000040df3e in bpf_object__add_programs /src/libbpf/src/libbpf.c:928
        #3 0x00000040df3e in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3930
        #4 0x00000040df3e in bpf_object_open /src/libbpf/src/libbpf.c:8067
        #5 0x00000040f176 in bpf_object__open_file /src/libbpf/src/libbpf.c:8090
        #6 0x000000400c16 in main /poc/poc.c:8
        #7 0x7fc3043d25b4 in __libc_start_call_main (/lib64/libc.so.6+0x35b4)
        #8 0x7fc3043d2667 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3667)
        #9 0x000000400b34 in _start (/poc/poc+0x400b34)

    0x7c7302fe0000 is located 64 bytes before 104-byte region [0x7c7302fe0040,0x7c7302fe00a8)
    allocated by thread T0 here:
        #0 0x7fc3046e716b in malloc (/lib64/libasan.so.8+0xe716b)
        #1 0x7fc3045ee600 in __libelf_set_rawdata_wrlock (/lib64/libelf.so.1+0xb600)
        #2 0x7fc3045ef018 in __elf_getdata_rdlock (/lib64/libelf.so.1+0xc018)
        #3 0x00000040642f in elf_sec_data /src/libbpf/src/libbpf.c:3740

The problem here is that currently, libbpf only checks that the program
end is within the section bounds. There used to be a check
`while (sec_off < sec_sz)` in bpf_object__add_programs, however, it was
removed by commit 6245947c1b3c ("libbpf: Allow gaps in BPF program
sections to support overriden weak functions").

Add a check for detecting the overflow of `sec_off + prog_sz` to
bpf_object__init_prog to fix this issue.

[1] https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md

Fixes: 6245947c1b3c ("libbpf: Allow gaps in BPF program sections to support overriden weak functions")
Reported-by: lmarch2 <2524158037@qq.com>
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Link: https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md
Link: https://lore.kernel.org/bpf/20250415155014.397603-1-vmalik@redhat.com
2025-04-29 11:33:37 -07:00
Paul Chaignon
0c85f5a154 bpf: Clarify the meaning of BPF_F_PSEUDO_HDR
In the bpf_l4_csum_replace helper, the BPF_F_PSEUDO_HDR flag should only
be set if the modified header field is part of the pseudo-header.

If you modify for example the UDP ports and pass BPF_F_PSEUDO_HDR,
inet_proto_csum_replace4 will update skb->csum even though it shouldn't
(the port and the UDP checksum updates null each other).

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/5126ef84ba75425b689482cbc98bffe75e5d8ab0.1744102490.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-29 11:33:37 -07:00
Paul Chaignon
7de6a44a0f bpf: Clarify role of BPF_F_RECOMPUTE_CSUM
BPF_F_RECOMPUTE_CSUM doesn't update the actual L3 and L4 checksums in
the packet, but simply updates skb->csum (according to skb->ip_summed).
This patch clarifies that to avoid confusions.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/ff6895d42936f03dbb82334d8bcfd50e00c79086.1744102490.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-29 11:33:37 -07:00
Mykyta Yatsenko
abdb15bedd libbpf: Add getters for BTF.ext func and line info
Introducing new libbpf API getters for BTF.ext func and line info,
namely:
  bpf_program__func_info
  bpf_program__func_info_cnt
  bpf_program__line_info
  bpf_program__line_info_cnt

This change enables scenarios, when user needs to load bpf_program
directly using `bpf_prog_load`, instead of higher-level
`bpf_object__load`. Line and func info are required for checking BTF
info in verifier; verification may fail without these fields if, for
example, program calls `bpf_obj_new`.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250408234417.452565-2-mykyta.yatsenko5@gmail.com
2025-04-29 11:33:37 -07:00
Anton Protopopov
7a1388d55f libbpf: Add likely/unlikely macros and use them in selftests
A few selftests and, more importantly, consequent changes to the
bpf_helpers.h file, use likely/unlikely macros, so define them here
and remove duplicate definitions from existing selftests.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250331203618.1973691-3-a.s.protopopov@gmail.com
2025-04-29 11:33:37 -07:00
Anton Protopopov
4687560af9 bpf: Fix a comment describing bpf_attr
The map_fd field of the bpf_attr union is used in the BPF_MAP_FREEZE
syscall.  Explicitly mention this in the comments.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250331203618.1973691-2-a.s.protopopov@gmail.com
2025-04-29 11:33:37 -07:00