Compare commits

...

602 Commits

Author SHA1 Message Date
thiagoftsm
b981a3a138 Merge branch 'libbpf:master' into master 2024-04-21 14:42:55 +00:00
Geyslan Gregório
46eafba62e ci: bump run-on-arch action to v2.7.1
More info: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/

Signed-off-by: Geyslan Gregório <geyslan@gmail.com>
2024-04-02 21:51:25 -07:00
Geyslan Gregório
6d3595d215 ci: bump checkout action to v4
Due to the transition from Node 16 to Node 20, the checkout action
needs to be updated to v4.

More info: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/

Signed-off-by: Geyslan Gregório <geyslan@gmail.com>
2024-04-02 10:25:11 -07:00
Andrii Nakryiko
20ea95b450 ci: sync DENYLISTs with BPF CI
Keep all the denylisted tests in sync.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
902af6913a ci: clean up temporary patch
It's already applied upstream.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
25a9cc27d7 ci: regenerate latest vmlinux.h
Update vmlinux.h to make BPF selftests compile.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
8db4a2feeb sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   e63985ecd22681c7f5975f2e8637187a326b6791
Checkpoint bpf-next commit: 14bb1e8c8d4ad5d9d2febb7d19c70a3cf536e1e5
Baseline bpf commit:        2487007aa3b9fafbd2cb14068f49791ce1d7ede5
Checkpoint bpf commit:      443574b033876c85a35de4c65c14f7fe092222b2

Alexei Starovoitov (6):
  libbpf: Allow specifying 64-bit integers in map BTF.
  bpf: Introduce bpf_arena.
  bpf: Disasm support for addr_space_cast instruction.
  libbpf: Add __arg_arena to bpf_helpers.h
  libbpf: Add support for bpf_arena.
  libbpf, selftests/bpf: Adjust libbpf, bpftool, selftests to match LLVM

Andrii Nakryiko (4):
  libbpf: Recognize __arena global variables.
  bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs
  libbpf: add support for BPF cookie for raw_tp/tp_btf programs
  libbpf: fix u64-to-pointer cast on 32-bit arches

Arnaldo Carvalho de Melo (1):
  libbpf: Define MFD_CLOEXEC if not available

Jakub Kicinski (2):
  netdev: add per-queue statistics
  netdev: add queue stat for alloc failures

Kui-Feng Lee (1):
  libbpf: Skip zeroed or null fields if not found in the kernel type.

Mykyta Yatsenko (1):
  libbpbpf: Check bpf_map/bpf_program fd validity

Quentin Monnet (1):
  libbpf: Prevent null-pointer dereference when prog to load has no BTF

Yonghong Song (2):
  libbpf: Add new sec_def "sk_skb/verdict"
  bpf: Sync uapi bpf.h to tools directory

 include/uapi/linux/bpf.h    |  20 ++-
 include/uapi/linux/netdev.h |  20 +++
 src/bpf.c                   |  16 +-
 src/bpf.h                   |   9 +
 src/bpf_helpers.h           |   2 +
 src/libbpf.c                | 322 +++++++++++++++++++++++++++++++-----
 src/libbpf.h                |  13 +-
 src/libbpf.map              |   2 +
 src/libbpf_probes.c         |   7 +
 9 files changed, 366 insertions(+), 45 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-25 21:58:26 -07:00
Arnaldo Carvalho de Melo
7fee466676 libbpf: Define MFD_CLOEXEC if not available
Since its going directly to the syscall to avoid not having
memfd_create() available in some systems, do the same for its
MFD_CLOEXEC flags, defining it if not available.

This fixes the build in those systems, noticed while building perf on a
set of build containers.

Fixes: 9fa5e1a180aa639f ("libbpf: Call memfd_create() syscall directly")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/ZfxZ9nCyKvwmpKkE@x1
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
ddf722fb5c libbpf: fix u64-to-pointer cast on 32-bit arches
It's been reported that (void *)map->map_extra is causing compilation
warnings on 32-bit architectures. It's easy enough to fix this by
casting to long first.

Fixes: 79ff13e99169 ("libbpf: Add support for bpf_arena.")
Reported-by: Ryan Eatmon <reatmon@ti.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319215143.1279312-1-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
137193b655 libbpf, selftests/bpf: Adjust libbpf, bpftool, selftests to match LLVM
The selftests use
to tell LLVM about special pointers. For LLVM there is nothing "arena"
about them. They are simply pointers in a different address space.
Hence LLVM diff https://github.com/llvm/llvm-project/pull/85161 renamed:
. macro __BPF_FEATURE_ARENA_CAST -> __BPF_FEATURE_ADDR_SPACE_CAST
. global variables in __attribute__((address_space(N))) are now
  placed in section named ".addr_space.N" instead of ".arena.N".

Adjust libbpf, bpftool, and selftests to match LLVM.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20240315021834.62988-3-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Yonghong Song
d2676a58de bpf: Sync uapi bpf.h to tools directory
There is a difference between kernel uapi bpf.h and tools
uapi bpf.h. There is no functionality difference, but let
us sync properly to make it easy for later bpf.h update.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240325033842.1693553-1-yonghong.song@linux.dev
2024-03-25 21:58:26 -07:00
Yonghong Song
4d95d8b7f0 libbpf: Add new sec_def "sk_skb/verdict"
The new sec_def specifies sk_skb program type with
BPF_SK_SKB_VERDICT attachment type. This way, libbpf
will set expected_attach_type properly for the program.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240319175412.2941149-1-yonghong.song@linux.dev
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
f5828cc352 libbpf: add support for BPF cookie for raw_tp/tp_btf programs
Wire up BPF cookie passing or raw_tp and tp_btf programs, both in
low-level and high-level APIs.

Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319233852.1977493-5-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
cbd6e3596c bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs
Wire up BPF cookie for raw tracepoint programs (both BTF and non-BTF
aware variants). This brings them up to part w.r.t. BPF cookie usage
with classic tracepoint and fentry/fexit programs.

Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319233852.1977493-4-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-25 21:58:26 -07:00
Mykyta Yatsenko
7cfc365995 libbpbpf: Check bpf_map/bpf_program fd validity
libbpf creates bpf_program/bpf_map structs for each program/map that
user defines, but it allows to disable creating/loading those objects in
kernel, in that case they won't have associated file descriptor
(fd < 0). Such functionality is used for backward compatibility
with some older kernels.

Nothing prevents users from passing these maps or programs with no
kernel counterpart to libbpf APIs. This change introduces explicit
checks for kernel objects existence, aiming to improve visibility of
those edge cases and provide meaningful warnings to users.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240318131808.95959-1-yatsenko@meta.com
2024-03-25 21:58:26 -07:00
Kui-Feng Lee
a5459eac49 libbpf: Skip zeroed or null fields if not found in the kernel type.
Accept additional fields of a struct_ops type with all zero values even if
these fields are not in the corresponding type in the kernel. This provides
a way to be backward compatible. User space programs can use the same map
on a machine running an old kernel by clearing fields that do not exist in
the kernel.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240313214139.685112-2-thinker.li@gmail.com
2024-03-25 21:58:26 -07:00
Quentin Monnet
f84ee80801 libbpf: Prevent null-pointer dereference when prog to load has no BTF
In bpf_objec_load_prog(), there's no guarantee that obj->btf is non-NULL
when passing it to btf__fd(), and this function does not perform any
check before dereferencing its argument (as bpf_object__btf_fd() used to
do). As a consequence, we get segmentation fault errors in bpftool (for
example) when trying to load programs that come without BTF information.

v2: Keep btf__fd() in the fix instead of reverting to bpf_object__btf_fd().

Fixes: df7c3f7d3a3d ("libbpf: make uniform use of btf__fd() accessor inside libbpf")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240314150438.232462-1-qmo@kernel.org
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
2d042d22a7 libbpf: Recognize __arena global variables.
LLVM automatically places __arena variables into ".arena.1" ELF section.
In order to use such global variables bpf program must include definition
of arena map in ".maps" section, like:
struct {
       __uint(type, BPF_MAP_TYPE_ARENA);
       __uint(map_flags, BPF_F_MMAPABLE);
       __uint(max_entries, 1000);         /* number of pages */
       __ulong(map_extra, 2ull << 44);    /* start of mmap() region */
} arena SEC(".maps");

libbpf recognizes both uses of arena and creates single `struct bpf_map *`
instance in libbpf APIs.
".arena.1" ELF section data is used as initial data image, which is exposed
through skeleton and bpf_map__initial_value() to the user, if they need to tune
it before the load phase. During load phase, this initial image is copied over
into mmap()'ed region corresponding to arena, and discarded.

Few small checks here and there had to be added to make sure this
approach works with bpf_map__initial_value(), mostly due to hard-coded
assumption that map->mmaped is set up with mmap() syscall and should be
munmap()'ed. For arena, .arena.1 can be (much) smaller than maximum
arena size, so this smaller data size has to be tracked separately.
Given it is enforced that there is only one arena for entire bpf_object
instance, we just keep it in a separate field. This can be generalized
if necessary later.

All global variables from ".arena.1" section are accessible from user space
via skel->arena->name_of_var.

For bss/data/rodata the skeleton/libbpf perform the following sequence:
1. addr = mmap(MAP_ANONYMOUS)
2. user space optionally modifies global vars
3. map_fd = bpf_create_map()
4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel
5. mmap(addr, MAP_FIXED, map_fd)
after step 5 user spaces see the values it wrote at step 2 at the same addresses

arena doesn't support update_map_elem. Hence skeleton/libbpf do:
1. addr = malloc(sizeof SEC ".arena.1")
2. user space optionally modifies global vars
3. map_fd = bpf_create_map(MAP_TYPE_ARENA)
4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd)
5. memcpy(real_addr, addr) // this will fault-in and allocate pages

At the end look and feel of global data vs __arena global data is the same from
bpf prog pov.

Another complication is:
struct {
  __uint(type, BPF_MAP_TYPE_ARENA);
} arena SEC(".maps");

int __arena foo;
int bar;

  ptr1 = &foo;   // relocation against ".arena.1" section
  ptr2 = &arena; // relocation against ".maps" section
  ptr3 = &bar;   // relocation against ".bss" section

Fo the kernel ptr1 and ptr2 has point to the same arena's map_fd
while ptr3 points to a different global array's map_fd.
For the verifier:
ptr1->type == unknown_scalar
ptr2->type == const_ptr_to_map
ptr3->type == ptr_to_map_value

After verification, from JIT pov all 3 ptr-s are normal ld_imm64 insns.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-11-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
4524a45a2a libbpf: Add support for bpf_arena.
mmap() bpf_arena right after creation, since the kernel needs to
remember the address returned from mmap. This is user_vm_start.
LLVM will generate bpf_arena_cast_user() instructions where
necessary and JIT will add upper 32-bit of user_vm_start
to such pointers.

Fix up bpf_map_mmap_sz() to compute mmap size as
map->value_size * map->max_entries for arrays and
PAGE_SIZE * map->max_entries for arena.

Don't set BTF at arena creation time, since it doesn't support it.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-9-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
086825355f libbpf: Add __arg_arena to bpf_helpers.h
Add __arg_arena to bpf_helpers.h

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-8-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
6de941bc1e bpf: Disasm support for addr_space_cast instruction.
LLVM generates rX = addr_space_cast(rY, dst_addr_space, src_addr_space)
instruction when pointers in non-zero address space are used by the bpf
program. Recognize this insn in uapi and in bpf disassembler.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-3-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
1675c13fae bpf: Introduce bpf_arena.
Introduce bpf_arena, which is a sparse shared memory region between the bpf
program and user space.

Use cases:
1. User space mmap-s bpf_arena and uses it as a traditional mmap-ed
   anonymous region, like memcached or any key/value storage. The bpf
   program implements an in-kernel accelerator. XDP prog can search for
   a key in bpf_arena and return a value without going to user space.
2. The bpf program builds arbitrary data structures in bpf_arena (hash
   tables, rb-trees, sparse arrays), while user space consumes it.
3. bpf_arena is a "heap" of memory from the bpf program's point of view.
   The user space may mmap it, but bpf program will not convert pointers
   to user base at run-time to improve bpf program speed.

Initially, the kernel vm_area and user vma are not populated. User space
can fault in pages within the range. While servicing a page fault,
bpf_arena logic will insert a new page into the kernel and user vmas. The
bpf program can allocate pages from that region via
bpf_arena_alloc_pages(). This kernel function will insert pages into the
kernel vm_area. The subsequent fault-in from user space will populate that
page into the user vma. The BPF_F_SEGV_ON_FAULT flag at arena creation time
can be used to prevent fault-in from user space. In such a case, if a page
is not allocated by the bpf program and not present in the kernel vm_area,
the user process will segfault. This is useful for use cases 2 and 3 above.

bpf_arena_alloc_pages() is similar to user space mmap(). It allocates pages
either at a specific address within the arena or allocates a range with the
maple tree. bpf_arena_free_pages() is analogous to munmap(), which frees
pages and removes the range from the kernel vm_area and from user process
vmas.

bpf_arena can be used as a bpf program "heap" of up to 4GB. The speed of
bpf program is more important than ease of sharing with user space. This is
use case 3. In such a case, the BPF_F_NO_USER_CONV flag is recommended.
It will tell the verifier to treat the rX = bpf_arena_cast_user(rY)
instruction as a 32-bit move wX = wY, which will improve bpf prog
performance. Otherwise, bpf_arena_cast_user is translated by JIT to
conditionally add the upper 32 bits of user vm_start (if the pointer is not
NULL) to arena pointers before they are stored into memory. This way, user
space sees them as valid 64-bit pointers.

Diff https://github.com/llvm/llvm-project/pull/84410 enables LLVM BPF
backend generate the bpf_addr_space_cast() instruction to cast pointers
between address_space(1) which is reserved for bpf_arena pointers and
default address space zero. All arena pointers in a bpf program written in
C language are tagged as __attribute__((address_space(1))). Hence, clang
provides helpful diagnostics when pointers cross address space. Libbpf and
the kernel support only address_space == 1. All other address space
identifiers are reserved.

rX = bpf_addr_space_cast(rY, /* dst_as */ 1, /* src_as */ 0) tells the
verifier that rX->type = PTR_TO_ARENA. Any further operations on
PTR_TO_ARENA register have to be in the 32-bit domain. The verifier will
mark load/store through PTR_TO_ARENA with PROBE_MEM32. JIT will generate
them as kern_vm_start + 32bit_addr memory accesses. The behavior is similar
to copy_from_kernel_nofault() except that no address checks are necessary.
The address is guaranteed to be in the 4GB range. If the page is not
present, the destination register is zeroed on read, and the operation is
ignored on write.

rX = bpf_addr_space_cast(rY, 0, 1) tells the verifier that rX->type =
unknown scalar. If arena->map_flags has BPF_F_NO_USER_CONV set, then the
verifier converts such cast instructions to mov32. Otherwise, JIT will emit
native code equivalent to:
rX = (u32)rY;
if (rY)
  rX |= clear_lo32_bits(arena->user_vm_start); /* replace hi32 bits in rX */

After such conversion, the pointer becomes a valid user pointer within
bpf_arena range. The user process can access data structures created in
bpf_arena without any additional computations. For example, a linked list
built by a bpf program can be walked natively by user space.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Barret Rhoden <brho@google.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-2-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
385d344839 libbpf: Allow specifying 64-bit integers in map BTF.
__uint() macro that is used to specify map attributes like:
  __uint(type, BPF_MAP_TYPE_ARRAY);
  __uint(map_flags, BPF_F_MMAPABLE);
It is limited to 32-bit, since BTF_KIND_ARRAY has u32 "number of elements"
field in "struct btf_array".

Introduce __ulong() macro that allows specifying values bigger than 32-bit.
In map definition "map_extra" is the only u64 field, so far.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20240307031228.42896-5-alexei.starovoitov@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-25 21:58:26 -07:00
Jakub Kicinski
d71c0ed2ef netdev: add queue stat for alloc failures
Rx alloc failures are commonly counted by drivers.
Support reporting those via netdev-genl queue stats.

Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306195509.1502746-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25 21:58:26 -07:00
Jakub Kicinski
5e80833e50 netdev: add per-queue statistics
The ethtool-nl family does a good job exposing various protocol
related and IEEE/IETF statistics which used to get dumped under
ethtool -S, with creative names. Queue stats don't have a netlink
API, yet, and remain a lion's share of ethtool -S output for new
drivers. Not only is that bad because the names differ driver to
driver but it's also bug-prone. Intuitively drivers try to report
only the stats for active queues, but querying ethtool stats
involves multiple system calls, and the number of stats is
read separately from the stats themselves. Worse still when user
space asks for values of the stats, it doesn't inform the kernel
how big the buffer is. If number of stats increases in the meantime
kernel will overflow user buffer.

Add a netlink API for dumping queue stats. Queue information is
exposed via the netdev-genl family, so add the stats there.
Support per-queue and sum-for-device dumps. Latter will be useful
when subsequent patches add more interesting common stats than
just bytes and packets.

The API does not currently distinguish between HW and SW stats.
The expectation is that the source of the stats will either not
matter much (good packets) or be obvious (skb alloc errors).

Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306195509.1502746-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
2778cbce60 ci: add xdp_bonding fixes from bpf/master
bpf tree has fixes for xdp_bonding selftests which are not yet in
bpf-next, so add them as temporary CI-only patches.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-06 15:22:13 -08:00
Andrii Nakryiko
4f875865b7 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   2ab256e93249f5ac1da665861aa0f03fb4208d9c
Checkpoint bpf-next commit: 7d763bc4a44a51e48dde406d6c6c8a26a60ec647
Baseline bpf commit:        dced881ead78e4d6add3735d02a9186ba2415630
Checkpoint bpf commit:      2487007aa3b9fafbd2cb14068f49791ce1d7ede5

Aahil Awatramani (1):
  bonding: Add independent control state machine

Alexei Starovoitov (1):
  bpf: Introduce may_goto instruction

Chen Shen (1):
  libbpf: Correct debug message in btf__load_vmlinux_btf

Eduard Zingerman (7):
  libbpf: Allow version suffixes (___smth) for struct_ops types
  libbpf: Tie struct_ops programs to kernel BTF ids, not to local ids
  libbpf: Honor autocreate flag for struct_ops maps
  libbpf: Sync progs autoload with maps autocreate for struct_ops maps
  libbpf: Replace elf_state->st_ops_* fields with SEC_ST_OPS sec_type
  libbpf: Struct_ops in SEC("?.struct_ops") / SEC("?.struct_ops.link")
  libbpf: Rewrite btf datasec names starting from '?'

Kees Cook (1):
  bpf: Replace bpf_lpm_trie_key 0-length array with flexible array

Kui-Feng Lee (2):
  libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
  libbpf: Convert st_ops->data to shadow type.

 include/uapi/linux/bpf.h     |  24 +++-
 include/uapi/linux/if_link.h |   1 +
 src/btf.c                    |   2 +-
 src/features.c               |  22 +++
 src/libbpf.c                 | 252 +++++++++++++++++++++++++++--------
 src/libbpf_internal.h        |   2 +
 6 files changed, 242 insertions(+), 61 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-06 15:22:13 -08:00
Eduard Zingerman
438adf417d libbpf: Rewrite btf datasec names starting from '?'
Optional struct_ops maps are defined using question mark at the start
of the section name, e.g.:

    SEC("?.struct_ops")
    struct test_ops optional_map = { ... };

This commit teaches libbpf to detect if kernel allows '?' prefix
in datasec names, and if it doesn't then to rewrite such names
by replacing '?' with '_', e.g.:

    DATASEC ?.struct_ops -> DATASEC _.struct_ops

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-13-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
fc8b86bda2 libbpf: Struct_ops in SEC("?.struct_ops") / SEC("?.struct_ops.link")
Allow using two new section names for struct_ops maps:
- SEC("?.struct_ops")
- SEC("?.struct_ops.link")

To specify maps that have bpf_map->autocreate == false after open.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-12-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
d5d0b6e920 libbpf: Replace elf_state->st_ops_* fields with SEC_ST_OPS sec_type
The next patch would add two new section names for struct_ops maps.
To make working with multiple struct_ops sections more convenient:
- remove fields like elf_state->st_ops_{shndx,link_shndx};
- mark section descriptions hosting struct_ops as
  elf_sec_desc->sec_type == SEC_ST_OPS;

After these changes struct_ops sections could be processed uniformly
by iterating bpf_object->efile.secs entries.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-11-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
060c604db8 libbpf: Sync progs autoload with maps autocreate for struct_ops maps
Automatically select which struct_ops programs to load depending on
which struct_ops maps are selected for automatic creation.
E.g. for the BPF code below:

    SEC("struct_ops/test_1") int BPF_PROG(foo) { ... }
    SEC("struct_ops/test_2") int BPF_PROG(bar) { ... }

    SEC(".struct_ops.link")
    struct test_ops___v1 A = {
        .foo = (void *)foo
    };

    SEC(".struct_ops.link")
    struct test_ops___v2 B = {
        .foo = (void *)foo,
        .bar = (void *)bar,
    };

And the following libbpf API calls:

    bpf_map__set_autocreate(skel->maps.A, true);
    bpf_map__set_autocreate(skel->maps.B, false);

The autoload would be enabled for program 'foo' and disabled for
program 'bar'.

During load, for each struct_ops program P, referenced from some
struct_ops map M:
- set P.autoload = true if M.autocreate is true for some M;
- set P.autoload = false if M.autocreate is false for all M;
- don't change P.autoload, if P is not referenced from any map.

Do this after bpf_object__init_kern_struct_ops_maps()
to make sure that shadow vars assignment is done.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-9-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
cb426140d0 libbpf: Honor autocreate flag for struct_ops maps
Skip load steps for struct_ops maps not marked for automatic creation.
This should allow to load bpf object in situations like below:

    SEC("struct_ops/foo") int BPF_PROG(foo) { ... }
    SEC("struct_ops/bar") int BPF_PROG(bar) { ... }

    struct test_ops___v1 {
    	int (*foo)(void);
    };

    struct test_ops___v2 {
    	int (*foo)(void);
    	int (*does_not_exist)(void);
    };

    SEC(".struct_ops.link")
    struct test_ops___v1 map_for_old = {
    	.test_1 = (void *)foo
    };

    SEC(".struct_ops.link")
    struct test_ops___v2 map_for_new = {
    	.test_1 = (void *)foo,
    	.does_not_exist = (void *)bar
    };

Suppose program is loaded on old kernel that does not have definition
for 'does_not_exist' struct_ops member. After this commit it would be
possible to load such object file after the following tweaks:

    bpf_program__set_autoload(skel->progs.bar, false);
    bpf_map__set_autocreate(skel->maps.map_for_new, false);

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20240306104529.6453-4-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
aded62b120 libbpf: Tie struct_ops programs to kernel BTF ids, not to local ids
Enforce the following existing limitation on struct_ops programs based
on kernel BTF id instead of program-local BTF id:

    struct_ops BPF prog can be re-used between multiple .struct_ops &
    .struct_ops.link as long as it's the same struct_ops struct
    definition and the same function pointer field

This allows reusing same BPF program for versioned struct_ops map
definitions, e.g.:

    SEC("struct_ops/test")
    int BPF_PROG(foo) { ... }

    struct some_ops___v1 { int (*test)(void); };
    struct some_ops___v2 { int (*test)(void); };

    SEC(".struct_ops.link") struct some_ops___v1 a = { .test = foo }
    SEC(".struct_ops.link") struct some_ops___v2 b = { .test = foo }

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-3-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
f7fd5dbc07 libbpf: Allow version suffixes (___smth) for struct_ops types
E.g. allow the following struct_ops definitions:

    struct bpf_testmod_ops___v1 { int (*test)(void); };
    struct bpf_testmod_ops___v2 { int (*test)(void); };

    SEC(".struct_ops.link")
    struct bpf_testmod_ops___v1 a = { .test = ... }
    SEC(".struct_ops.link")
    struct bpf_testmod_ops___v2 b = { .test = ... }

Where both bpf_testmod_ops__v1 and bpf_testmod_ops__v2 would be
resolved as 'struct bpf_testmod_ops' from kernel BTF.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-2-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Alexei Starovoitov
00b08dceea bpf: Introduce may_goto instruction
Introduce may_goto instruction that from the verifier pov is similar to
open coded iterators bpf_for()/bpf_repeat() and bpf_loop() helper, but it
doesn't iterate any objects.
In assembly 'may_goto' is a nop most of the time until bpf runtime has to
terminate the program for whatever reason. In the current implementation
may_goto has a hidden counter, but other mechanisms can be used.
For programs written in C the later patch introduces 'cond_break' macro
that combines 'may_goto' with 'break' statement and has similar semantics:
cond_break is a nop until bpf runtime has to break out of this loop.
It can be used in any normal "for" or "while" loop, like

  for (i = zero; i < cnt; cond_break, i++) {

The verifier recognizes that may_goto is used in the program, reserves
additional 8 bytes of stack, initializes them in subprog prologue, and
replaces may_goto instruction with:
aux_reg = *(u64 *)(fp - 40)
if aux_reg == 0 goto pc+off
aux_reg -= 1
*(u64 *)(fp - 40) = aux_reg

may_goto instruction can be used by LLVM to implement __builtin_memcpy,
__builtin_strcmp.

may_goto is not a full substitute for bpf_for() macro.
bpf_for() doesn't have induction variable that verifiers sees,
so 'i' in bpf_for(i, 0, 100) is seen as imprecise and bounded.

But when the code is written as:
for (i = 0; i < 100; cond_break, i++)
the verifier see 'i' as precise constant zero,
hence cond_break (aka may_goto) doesn't help to converge the loop.
A static or global variable can be used as a workaround:
static int zero = 0;
for (i = zero; i < 100; cond_break, i++) // works!

may_goto works well with arena pointers that don't need to be bounds
checked on access. Load/store from arena returns imprecise unbounded
scalar and loops with may_goto pass the verifier.

Reserve new opcode BPF_JMP | BPF_JCOND for may_goto insn.
JCOND stands for conditional pseudo jump.
Since goto_or_nop insn was proposed, it may use the same opcode.
may_goto vs goto_or_nop can be distinguished by src_reg:
code = BPF_JMP | BPF_JCOND
src_reg = 0 - may_goto
src_reg = 1 - goto_or_nop

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240306031929.42666-2-alexei.starovoitov@gmail.com
2024-03-06 13:58:27 -08:00
Chen Shen
bf52494e2b libbpf: Correct debug message in btf__load_vmlinux_btf
In the function btf__load_vmlinux_btf, the debug message incorrectly
refers to 'path' instead of 'sysfs_btf_path'.

Signed-off-by: Chen Shen <peterchenshen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240302062218.3587-1-peterchenshen@gmail.com
2024-03-06 13:58:27 -08:00
Kui-Feng Lee
acfaeffeaa libbpf: Convert st_ops->data to shadow type.
Convert st_ops->data to the shadow type of the struct_ops map. The shadow
type of a struct_ops type is a variant of the original struct type
providing a way to access/change the values in the maps of the struct_ops
type.

bpf_map__initial_value() will return st_ops->data for struct_ops types. The
skeleton is going to use it as the pointer to the shadow type of the
original struct type.

One of the main differences between the original struct type and the shadow
type is that all function pointers of the shadow type are converted to
pointers of struct bpf_program. Users can replace these bpf_program
pointers with other BPF programs. The st_ops->progs[] will be updated
before updating the value of a map to reflect the changes made by users.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240229064523.2091270-3-thinker.li@gmail.com
2024-03-06 13:58:27 -08:00
Kui-Feng Lee
0758d8b0f2 libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
For a struct_ops map, btf_value_type_id is the type ID of it's struct
type. This value is required by bpftool to generate skeleton including
pointers of shadow types. The code generator gets the type ID from
bpf_map__btf_value_type_id() in order to get the type information of the
struct type of a map.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240229064523.2091270-2-thinker.li@gmail.com
2024-03-06 13:58:27 -08:00
Kees Cook
fa4d00254d bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
Replace deprecated 0-length array in struct bpf_lpm_trie_key with
flexible array. Found with GCC 13:

../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=]
  207 |                                        *(__be16 *)&key->data[i]);
      |                                                   ^~~~~~~~~~~~~
../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16'
  102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
      |                                                      ^
../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu'
   97 | #define be16_to_cpu __be16_to_cpu
      |                     ^~~~~~~~~~~~~
../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu'
  206 |                 u16 diff = be16_to_cpu(*(__be16 *)&node->data[i]
^
      |                            ^~~~~~~~~~~
In file included from ../include/linux/bpf.h:7:
../include/uapi/linux/bpf.h:82:17: note: while referencing 'data'
   82 |         __u8    data[0];        /* Arbitrary size */
      |                 ^~~~

And found at run-time under CONFIG_FORTIFY_SOURCE:

  UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:218:49
  index 0 is out of range for type '__u8 [*]'

Changing struct bpf_lpm_trie_key is difficult since has been used by
userspace. For example, in Cilium:

	struct egress_gw_policy_key {
	        struct bpf_lpm_trie_key lpm_key;
	        __u32 saddr;
	        __u32 daddr;
	};

While direct references to the "data" member haven't been found, there
are static initializers what include the final member. For example,
the "{}" here:

        struct egress_gw_policy_key in_key = {
                .lpm_key = { 32 + 24, {} },
                .saddr   = CLIENT_IP,
                .daddr   = EXTERNAL_SVC_IP & 0Xffffff,
        };

To avoid the build time and run time warnings seen with a 0-sized
trailing array for struct bpf_lpm_trie_key, introduce a new struct
that correctly uses a flexible array for the trailing bytes,
struct bpf_lpm_trie_key_u8. As part of this, include the "header"
portion (which is just the "prefixlen" member), so it can be used
by anything building a bpf_lpr_trie_key that has trailing members that
aren't a u8 flexible array (like the self-test[1]), which is named
struct bpf_lpm_trie_key_hdr.

Unfortunately, C++ refuses to parse the __struct_group() helper, so
it is not possible to define struct bpf_lpm_trie_key_hdr directly in
struct bpf_lpm_trie_key_u8, so we must open-code the union directly.

Adjust the kernel code to use struct bpf_lpm_trie_key_u8 through-out,
and for the selftest to use struct bpf_lpm_trie_key_hdr. Add a comment
to the UAPI header directing folks to the two new options.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Closes: https://paste.debian.net/hidden/ca500597/
Link: https://lore.kernel.org/all/202206281009.4332AA33@keescook/ [1]
Link: https://lore.kernel.org/bpf/20240222155612.it.533-kees@kernel.org
2024-03-06 13:58:27 -08:00
Aahil Awatramani
f749be80b7 bonding: Add independent control state machine
Add support for the independent control state machine per IEEE
802.1AX-2008 5.4.15 in addition to the existing implementation of the
coupled control state machine.

Introduces two new states, AD_MUX_COLLECTING and AD_MUX_DISTRIBUTING in
the LACP MUX state machine for separated handling of an initial
Collecting state before the Collecting and Distributing state. This
enables a port to be in a state where it can receive incoming packets
while not still distributing. This is useful for reducing packet loss when
a port begins distributing before its partner is able to collect.

Added new functions such as bond_set_slave_tx_disabled_flags and
bond_set_slave_rx_enabled_flags to precisely manage the port's collecting
and distributing states. Previously, there was no dedicated method to
disable TX while keeping RX enabled, which this patch addresses.

Note that the regular flow process in the kernel's bonding driver remains
unaffected by this patch. The extension requires explicit opt-in by the
user (in order to ensure no disruptions for existing setups) via netlink
support using the new bonding parameter coupled_control. The default value
for coupled_control is set to 1 so as to preserve existing behaviour.

Signed-off-by: Aahil Awatramani <aahila@google.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240202175858.1573852-1-aahila@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-06 13:58:27 -08:00
Andrii Nakryiko
fb98d4bd25 include: fix BPF_CALL_REL definition
Fix our Github-specific definition of BPF_CALL_REL macro. It was missing
the code part.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-01 15:39:45 -08:00
Kui-Feng Lee
f4e9b606f4 ci: clean up bpf_test_no_cfi.ko for v5.5.0 and v4.9.0.
bpf_test_no_cfi.ko is not available for v5.5.0 and v4.9.0.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
2024-02-27 10:14:31 -08:00
Kui-Feng Lee
ff95bd6238 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   92a871ab9fa59a74d013bc04f321026a057618e7
Checkpoint bpf-next commit: 2ab256e93249f5ac1da665861aa0f03fb4208d9c
Baseline bpf commit:        577e4432f3ac810049cb7e6b71f4d96ec7c6e894
Checkpoint bpf commit:      dced881ead78e4d6add3735d02a9186ba2415630

Arnaldo Carvalho de Melo (1):
  tools headers UAPI: Sync linux/fcntl.h with the kernel sources

Cupertino Miranda (1):
  libbpf: Add support to GCC in CORE macro definitions

Martin Kelly (1):
  bpf: Clarify batch lookup/lookup_and_delete semantics

Matt Bobrowski (1):
  libbpf: Make remark about zero-initializing bpf_*_info structs

 include/uapi/linux/bpf.h   |  6 ++++-
 include/uapi/linux/fcntl.h |  3 +++
 src/bpf.h                  | 39 ++++++++++++++++++++++++---------
 src/bpf_core_read.h        | 45 ++++++++++++++++++++++++++++++++------
 4 files changed, 75 insertions(+), 18 deletions(-)

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
2024-02-27 10:14:31 -08:00
Arnaldo Carvalho de Melo
a894b0cb9b tools headers UAPI: Sync linux/fcntl.h with the kernel sources
To get the changes in:

  8a924db2d7b5eb69 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface function")

That don't add anything that is handled by existing hard coded tables or
table generation scripts.

This silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stefan Berger <stefanb@linux.ibm.com>
Link: https://lore.kernel.org/lkml/ZbJv9fGF_k2xXEdr@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-02-27 10:14:31 -08:00
Martin Kelly
afa81fb1cb bpf: Clarify batch lookup/lookup_and_delete semantics
The batch lookup and lookup_and_delete APIs have two parameters,
in_batch and out_batch, to facilitate iterative
lookup/lookup_and_deletion operations for supported maps. Except NULL
for in_batch at the start of these two batch operations, both parameters
need to point to memory equal or larger than the respective map key
size, except for various hashmaps (hash, percpu_hash, lru_hash,
lru_percpu_hash) where the in_batch/out_batch memory size should be
at least 4 bytes.

Document these semantics to clarify the API.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240221211838.1241578-1-martin.kelly@crowdstrike.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-27 10:14:31 -08:00
Matt Bobrowski
16e68ab13c libbpf: Make remark about zero-initializing bpf_*_info structs
In some situations, if you fail to zero-initialize the
bpf_{prog,map,btf,link}_info structs supplied to the set of LIBBPF
helpers bpf_{prog,map,btf,link}_get_info_by_fd(), you can expect the
helper to return an error. This can possibly leave people in a
situation where they're scratching their heads for an unnnecessary
amount of time. Make an explicit remark about the requirement of
zero-initializing the supplied bpf_{prog,map,btf,link}_info structs
for the respective LIBBPF helpers.

Internally, LIBBPF helpers bpf_{prog,map,btf,link}_get_info_by_fd()
call into bpf_obj_get_info_by_fd() where the bpf(2)
BPF_OBJ_GET_INFO_BY_FD command is used. This specific command is
effectively backed by restrictions enforced by the
bpf_check_uarg_tail_zero() helper. This function ensures that if the
size of the supplied bpf_{prog,map,btf,link}_info structs are larger
than what the kernel can handle, trailing bits are zeroed. This can be
a problem when compiling against UAPI headers that don't necessarily
match the sizes of the same underlying types known to the kernel.

Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/ZcyEb8x4VbhieWsL@google.com
2024-02-27 10:14:31 -08:00
Cupertino Miranda
b19fdbf1be libbpf: Add support to GCC in CORE macro definitions
Due to internal differences between LLVM and GCC the current
implementation for the CO-RE macros does not fit GCC parser, as it will
optimize those expressions even before those would be accessible by the
BPF backend.

As examples, the following would be optimized out with the original
definitions:
  - As enums are converted to their integer representation during
  parsing, the IR would not know how to distinguish an integer
  constant from an actual enum value.
  - Types need to be kept as temporary variables, as the existing type
  casts of the 0 address (as expanded for LLVM), are optimized away by
  the GCC C parser, never really reaching GCCs IR.

Although, the macros appear to add extra complexity, the expanded code
is removed from the compilation flow very early in the compilation
process, not really affecting the quality of the generated assembly.

Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240213173543.1397708-1-cupertino.miranda@oracle.com
2024-02-27 10:14:31 -08:00
Manu Bretelle
445486dcbf ci: Pass arch parameter to setup-build-env
Since 1bc40aecb3
arch parameter needs to be passed to `setup-build-env`

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2024-02-15 10:44:45 -08:00
Andrii Nakryiko
820bca2cb6 ci: verifier_global_subprogs can't be run on 5.5
We get:

  libbpf: struct_ops init_kern: struct bpf_dummy_ops is not found in kernel BTF

So even though it's irrelevant to the subtests we do want to test,
entire test has to be skipped, unfortunately.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-02-06 11:52:00 -08:00
Andrii Nakryiko
8a8feae5f4 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   943b043aeecce9accb6d367af47791c633e95e4d
Checkpoint bpf-next commit: 92a871ab9fa59a74d013bc04f321026a057618e7
Baseline bpf commit:        577e4432f3ac810049cb7e6b71f4d96ec7c6e894
Checkpoint bpf commit:      577e4432f3ac810049cb7e6b71f4d96ec7c6e894

Andrii Nakryiko (1):
  libbpf: fix return value for PERF_EVENT __arg_ctx type fix up check

Toke Høiland-Jørgensen (1):
  libbpf: Use OPTS_SET() macro in bpf_xdp_query()

 src/libbpf.c  | 6 +++---
 src/netlink.c | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-02-06 11:52:00 -08:00
Toke Høiland-Jørgensen
a20b60f971 libbpf: Use OPTS_SET() macro in bpf_xdp_query()
When the feature_flags and xdp_zc_max_segs fields were added to the libbpf
bpf_xdp_query_opts, the code writing them did not use the OPTS_SET() macro.
This causes libbpf to write to those fields unconditionally, which means
that programs compiled against an older version of libbpf (with a smaller
size of the bpf_xdp_query_opts struct) will have its stack corrupted by
libbpf writing out of bounds.

The patch adding the feature_flags field has an early bail out if the
feature_flags field is not part of the opts struct (via the OPTS_HAS)
macro, but the patch adding xdp_zc_max_segs does not. For consistency, this
fix just changes the assignments to both fields to use the OPTS_SET()
macro.

Fixes: 13ce2daa259a ("xsk: add new netlink attribute dedicated for ZC max frags")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240206125922.1992815-1-toke@redhat.com
2024-02-06 11:52:00 -08:00
Andrii Nakryiko
b24a6277cc libbpf: fix return value for PERF_EVENT __arg_ctx type fix up check
If PERF_EVENT program has __arg_ctx argument with matching
architecture-specific pt_regs/user_pt_regs/user_regs_struct pointer
type, libbpf should still perform type rewrite for old kernels, but not
emit the warning. Fix copy/paste from kernel code where 0 is meant to
signify "no error" condition. For libbpf we need to return "true" to
proceed with type rewrite (which for PERF_EVENT program will be
a canonical `struct bpf_perf_event_data *` type).

Fixes: 9eea8fafe33e ("libbpf: fix __arg_ctx type enforcement for perf_event programs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240206002243.1439450-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-06 11:52:00 -08:00
Andrii Nakryiko
25fe467af4 ci: allowlist tests validating libbpf's __arg_ctx type rewrite logic
Allowlist test_global_funcs/arg_tag_ctx* and a few of
verifier_global_subprogs subtests that validate libbpf's logic for
rewriting __arg_ctx globl subprog argument types on kernels that don't
natively support __arg_ctx.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-02-06 10:17:28 -08:00
Andrii Nakryiko
f11758a780 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   ced33f2cfa21a14a292a00e31dc9f85c1bfbda1c
Checkpoint bpf-next commit: 943b043aeecce9accb6d367af47791c633e95e4d
Baseline bpf commit:        577e4432f3ac810049cb7e6b71f4d96ec7c6e894
Checkpoint bpf commit:      577e4432f3ac810049cb7e6b71f4d96ec7c6e894

Andrii Nakryiko (8):
  libbpf: integrate __arg_ctx feature detector into kernel_supports()
  libbpf: fix __arg_ctx type enforcement for perf_event programs
  libbpf: add __arg_trusted and __arg_nullable tag macros
  libbpf: add bpf_core_cast() macro
  libbpf: Call memfd_create() syscall directly
  libbpf: Add missing LIBBPF_API annotation to libbpf_set_memlock_rlim
    API
  libbpf: Add btf__new_split() API that was declared but not implemented
  libbpf: Add missed btf_ext__raw_data() API

Eduard Zingerman (1):
  libbpf: Remove unnecessary null check in kernel_supports()

Ian Rogers (1):
  libbpf: Add some details for BTF parsing failures

 src/bpf.h             |  2 +-
 src/bpf_core_read.h   | 13 ++++++
 src/bpf_helpers.h     |  2 +
 src/btf.c             | 33 ++++++++++++---
 src/features.c        | 58 +++++++++++++++++++++++++
 src/libbpf.c          | 99 ++++++++++++++-----------------------------
 src/libbpf.map        |  5 ++-
 src/libbpf_internal.h |  2 +
 src/linker.c          |  2 +-
 9 files changed, 140 insertions(+), 76 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
cbb8ba352d sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
95b4beb502 libbpf: Add missed btf_ext__raw_data() API
Another API that was declared in libbpf.map but actual implementation
was missing. btf_ext__get_raw_data() was intended as a discouraged alias
to consistently-named btf_ext__raw_data(), so make this an actuality.

Fixes: 20eccf29e297 ("libbpf: hide and discourage inconsistently named getters")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-5-andrii@kernel.org
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
5b7613e50f libbpf: Add btf__new_split() API that was declared but not implemented
Seems like original commit adding split BTF support intended to add
btf__new_split() API, and even declared it in libbpf.map, but never
added (trivial) implementation. Fix this.

Fixes: ba451366bf44 ("libbpf: Implement basic split BTF support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-4-andrii@kernel.org
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
245394fb36 libbpf: Add missing LIBBPF_API annotation to libbpf_set_memlock_rlim API
LIBBPF_API annotation seems missing on libbpf_set_memlock_rlim API, so
add it to make this API callable from libbpf's shared library version.

Fixes: e542f2c4cd16 ("libbpf: Auto-bump RLIMIT_MEMLOCK if kernel needs it for BPF")
Fixes: ab9a5a05dc48 ("libbpf: fix up few libbpf.map problems")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-3-andrii@kernel.org
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
3b19b1bb55 libbpf: Call memfd_create() syscall directly
Some versions of Android do not implement memfd_create() wrapper in
their libc implementation, leading to build failures ([0]). On the other
hand, memfd_create() is available as a syscall on quite old kernels
(3.17+, while bpf() syscall itself is available since 3.18+), so it is
ok to assume that syscall availability and call into it with syscall()
helper to avoid Android-specific workarounds.

Validated in libbpf-bootstrap's CI ([1]).

  [0] https://github.com/libbpf/libbpf-bootstrap/actions/runs/7701003207/job/20986080319#step:5:83
  [1] https://github.com/libbpf/libbpf-bootstrap/actions/runs/7715988887/job/21031767212?pr=253

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-2-andrii@kernel.org
2024-02-01 15:10:17 -08:00
Eduard Zingerman
7529e0c4c7 libbpf: Remove unnecessary null check in kernel_supports()
After recent changes, Coverity complained about inconsistent null checks
in kernel_supports() function:

    kernel_supports(const struct bpf_object *obj, ...)
    [...]
    // var_compare_op: Comparing obj to null implies that obj might be null
    if (obj && obj->gen_loader)
        return true;

    // var_deref_op: Dereferencing null pointer obj
    if (obj->token_fd)
        return feat_supported(obj->feat_cache, feat_id);
    [...]

- The original null check was introduced by commit [0], which introduced
  a call `kernel_supports(NULL, ...)` in function bump_rlimit_memlock();
- This call was refactored to use `feat_supported(NULL, ...)` in commit [1].

Looking at all places where kernel_supports() is called:

- There is either `obj->...` access before the call;
- Or `obj` comes from `prog->obj` expression, where `prog` comes from
  enumeration of programs in `obj`;
- Or `obj` comes from `prog->obj`, where `prog` is a parameter to one
  of the API functions:
  - bpf_program__attach_kprobe_opts;
  - bpf_program__attach_kprobe;
  - bpf_program__attach_ksyscall.

Assuming correct API usage, it appears that `obj` can never be null when
passed to kernel_supports(). Silence the Coverity warning by removing
redundant null check.

  [0] e542f2c4cd16 ("libbpf: Auto-bump RLIMIT_MEMLOCK if kernel needs it for BPF")
  [1] d6dd1d49367a ("libbpf: Further decouple feature checking logic from bpf_object")

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240131212615.20112-1-eddyz87@gmail.com
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
688879fb01 libbpf: add bpf_core_cast() macro
Add bpf_core_cast() macro that wraps bpf_rdonly_cast() kfunc. It's more
ergonomic than kfunc, as it automatically extracts btf_id with
bpf_core_type_id_kernel(), and works with type names. It also casts result
to (T *) pointer. See the definition of the macro, it's self-explanatory.

libbpf declares bpf_rdonly_cast() extern as __weak __ksym and should be
safe to not conflict with other possible declarations in user code.

But we do have a conflict with current BPF selftests that declare their
externs with first argument as `void *obj`, while libbpf opts into more
permissive `const void *obj`. This causes conflict, so we fix up BPF
selftests uses in the same patch.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130212023.183765-2-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
0303e25be3 libbpf: add __arg_trusted and __arg_nullable tag macros
Add __arg_trusted to annotate global func args that accept trusted
PTR_TO_BTF_ID arguments.

Also add __arg_nullable to combine with __arg_trusted (and maybe other
tags in the future) to force global subprog itself (i.e., callee) to do
NULL checks, as opposed to default non-NULL semantics (and thus caller's
responsibility to ensure non-NULL values).

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130000648.2144827-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-01 15:10:17 -08:00
Ian Rogers
9b306ac9be libbpf: Add some details for BTF parsing failures
As CONFIG_DEBUG_INFO_BTF is default off the existing "failed to find
valid kernel BTF" message makes diagnosing the kernel build issue somewhat
cryptic. Add a little more detail with the hope of helping users.

Before:
```
libbpf: failed to find valid kernel BTF
libbpf: Error loading vmlinux BTF: -3
```

After not accessible:
```
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
libbpf: failed to find valid kernel BTF
libbpf: Error loading vmlinux BTF: -3
```

After not readable:
```
libbpf: failed to read kernel BTF from (/sys/kernel/btf/vmlinux): -1
```

Closes: https://lore.kernel.org/bpf/CAP-5=fU+DN_+Y=Y4gtELUsJxKNDDCOvJzPHvjUVaUoeFAzNnig@mail.gmail.com/

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240125231840.1647951-1-irogers@google.com
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
c57fb75864 libbpf: fix __arg_ctx type enforcement for perf_event programs
Adjust PERF_EVENT type enforcement around __arg_ctx to match exactly
what kernel is doing.

Fixes: 76ec90a996e3 ("libbpf: warn on unexpected __arg_ctx type when rewriting BTF")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240125205510.3642094-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
0b412d1918 libbpf: integrate __arg_ctx feature detector into kernel_supports()
Now that feature detection code is in bpf-next tree, integrate __arg_ctx
kernel-side support into kernel_supports() framework.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240125205510.3642094-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
3b09738928 sync: remove NETDEV_XSK_FLAGS_MASK which is not in bpf/bpf-next anymore
This part of code is not present in either bpf or bpf-next trees
anymore, so manually remove it.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-29 10:48:12 -08:00
Andrii Nakryiko
5139f12ef1 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   c8632acf193beac64bbdaebef013368c480bf74f
Checkpoint bpf-next commit: ced33f2cfa21a14a292a00e31dc9f85c1bfbda1c
Baseline bpf commit:        0a5bd0ffe790511d802e7f40898429a89e2487df
Checkpoint bpf commit:      577e4432f3ac810049cb7e6b71f4d96ec7c6e894

Andrii Nakryiko (1):
  libbpf: Fix faccessat() usage on Android

 src/libbpf_internal.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-29 10:48:12 -08:00
Andrii Nakryiko
830e0d017b libbpf: Fix faccessat() usage on Android
Android implementation of libc errors out with -EINVAL in faccessat() if
passed AT_EACCESS ([0]), this leads to ridiculous issue with libbpf
refusing to load /sys/kernel/btf/vmlinux on Androids ([1]). Fix by
detecting Android and redefining AT_EACCESS to 0, it's equivalent on
Android.

  [0] https://android.googlesource.com/platform/bionic/+/refs/heads/android13-release/libc/bionic/faccessat.cpp#50
  [1] https://github.com/libbpf/libbpf-bootstrap/issues/250#issuecomment-1911324250

Fixes: 6a4ab8869d0b ("libbpf: Fix the case of running as non-root with capabilities")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240126220944.2497665-1-andrii@kernel.org
2024-01-29 10:48:12 -08:00
Andrii Nakryiko
fad5d91381 libbpf: make sure linux/kernel.h includes linux/compiler.h
This replicates kernel upstream setup and brings READ_ONCE() and
WRITE_ONCE() macros anywhere where linux/kernel.h is included, which is
assumption libbpf code makes.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
8ca30626cc Makefile: add features.o to Makefile
Libbpf got new source code file, features.c, we need to add it to
Makefile here on Github version as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
274d6037f8 libbpf: add BPF_CALL_REL() macro implementation
Add BPF_CALL_REL() macro implementation into include/linux/filter.h
header, which is now used by libbpf code for feature detection.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
0f84f3bef6 ci: regenerate vmlinux.h
Update vmlinux.h for old kernel CI workflows.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
3dea2db84b ci: drop custom patches for fixing upstream kernel issues
All the issues should be fixed upstream already.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
2f81310ec0 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   98e20e5e13d2811898921f999288be7151a11954
Checkpoint bpf-next commit: c8632acf193beac64bbdaebef013368c480bf74f
Baseline bpf commit:        7c5e046bdcb2513f9decb3765d8bf92d604279cf
Checkpoint bpf commit:      0a5bd0ffe790511d802e7f40898429a89e2487df

Andrey Grafin (1):
  libbpf: Apply map_set_def_max_entries() for inner_maps on creation

Andrii Nakryiko (17):
  libbpf: feature-detect arg:ctx tag support in kernel
  libbpf: warn on unexpected __arg_ctx type when rewriting BTF
  libbpf: call dup2() syscall directly
  bpf: Introduce BPF token object
  bpf: Add BPF token support to BPF_MAP_CREATE command
  bpf: Add BPF token support to BPF_BTF_LOAD command
  bpf: Add BPF token support to BPF_PROG_LOAD command
  libbpf: Add bpf_token_create() API
  libbpf: Add BPF token support to bpf_map_create() API
  libbpf: Add BPF token support to bpf_btf_load() API
  libbpf: Add BPF token support to bpf_prog_load() API
  libbpf: Split feature detectors definitions from cached results
  libbpf: Further decouple feature checking logic from bpf_object
  libbpf: Move feature detection code into its own file
  libbpf: Wire up token_fd into feature probing logic
  libbpf: Wire up BPF token support at BPF object level
  libbpf: Support BPF token path setting through LIBBPF_BPF_TOKEN_PATH
    envvar

Daniel Borkmann (1):
  bpf: Sync uapi bpf.h header for the tooling infra

Dima Tisnek (1):
  libbpf: Correct bpf_core_read.h comment wrt bpf_core_relo struct

Jiri Olsa (2):
  bpf: Add cookie to perf_event bpf_link_info records
  bpf: Store cookies in kprobe_multi bpf_link_info data

Kan Liang (2):
  perf: Add branch stack counters
  perf/x86/intel: Support branch counters logging

Kui-Feng Lee (3):
  bpf: pass btf object id in bpf_map_info.
  bpf: pass attached BTF to the bpf_struct_ops subsystem
  libbpf: Find correct module BTFs for struct_ops maps and progs.

Martin KaFai Lau (1):
  libbpf: Ensure undefined bpf_attr field stays 0

 include/uapi/linux/bpf.h        |  79 +++-
 include/uapi/linux/perf_event.h |  13 +
 src/bpf.c                       |  42 +-
 src/bpf.h                       |  38 +-
 src/bpf_core_read.h             |   2 +-
 src/btf.c                       |  10 +-
 src/elf.c                       |   2 -
 src/features.c                  | 503 +++++++++++++++++++++
 src/libbpf.c                    | 744 ++++++++++++--------------------
 src/libbpf.h                    |  21 +-
 src/libbpf.map                  |   1 +
 src/libbpf_internal.h           |  50 ++-
 src/libbpf_probes.c             |  12 +-
 src/str_error.h                 |   3 +
 14 files changed, 1019 insertions(+), 501 deletions(-)
 create mode 100644 src/features.c

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
0e57fade4e sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
a36646e2b3 libbpf: Support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar
To allow external admin authority to override default BPF FS location
(/sys/fs/bpf) for implicit BPF token creation, teach libbpf to recognize
LIBBPF_BPF_TOKEN_PATH envvar. If it is specified and user application
didn't explicitly specify bpf_token_path option, it will be treated
exactly like bpf_token_path option, overriding default /sys/fs/bpf
location and making BPF token mandatory.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-29-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
e1a43809a9 libbpf: Wire up BPF token support at BPF object level
Add BPF token support to BPF object-level functionality.

BPF token is supported by BPF object logic either as an explicitly
provided BPF token from outside (through BPF FS path), or implicitly
(unless prevented through bpf_object_open_opts).

Implicit mode is assumed to be the most common one for user namespaced
unprivileged workloads. The assumption is that privileged container
manager sets up default BPF FS mount point at /sys/fs/bpf with BPF token
delegation options (delegate_{cmds,maps,progs,attachs} mount options).
BPF object during loading will attempt to create BPF token from
/sys/fs/bpf location, and pass it for all relevant operations
(currently, map creation, BTF load, and program load).

In this implicit mode, if BPF token creation fails due to whatever
reason (BPF FS is not mounted, or kernel doesn't support BPF token,
etc), this is not considered an error. BPF object loading sequence will
proceed with no BPF token.

In explicit BPF token mode, user provides explicitly custom BPF FS mount
point path. In such case, BPF object will attempt to create BPF token
from provided BPF FS location. If BPF token creation fails, that is
considered a critical error and BPF object load fails with an error.

Libbpf provides a way to disable implicit BPF token creation, if it
causes any troubles (BPF token is designed to be completely optional and
shouldn't cause any problems even if provided, but in the world of BPF
LSM, custom security logic can be installed that might change outcome
depending on the presence of BPF token). To disable libbpf's default BPF
token creation behavior user should provide either invalid BPF token FD
(negative), or empty bpf_token_path option.

BPF token presence can influence libbpf's feature probing, so if BPF
object has associated BPF token, feature probing is instructed to use
BPF object-specific feature detection cache and token FD.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-26-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
a3b317a9c0 libbpf: Wire up token_fd into feature probing logic
Adjust feature probing callbacks to take into account optional token_fd.
In unprivileged contexts, some feature detectors would fail to detect
kernel support just because BPF program, BPF map, or BTF object can't be
loaded due to privileged nature of those operations. So when BPF object
is loaded with BPF token, this token should be used for feature probing.

This patch is setting support for this scenario, but we don't yet pass
non-zero token FD. This will be added in the next patch.

We also switched BPF cookie detector from using kprobe program to
tracepoint one, as tracepoint is somewhat less dangerous BPF program
type and has higher likelihood of being allowed through BPF token in the
future. This change has no effect on detection behavior.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-25-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
d42f0b8943 libbpf: Move feature detection code into its own file
It's quite a lot of well isolated code, so it seems like a good
candidate to move it out of libbpf.c to reduce its size.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-24-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
9bf95048b7 libbpf: Further decouple feature checking logic from bpf_object
Add feat_supported() helper that accepts feature cache instead of
bpf_object. This allows low-level code in bpf.c to not know or care
about higher-level concept of bpf_object, yet it will be able to utilize
custom feature checking in cases where BPF token might influence the
outcome.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-23-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
9454419946 libbpf: Split feature detectors definitions from cached results
Split a list of supported feature detectors with their corresponding
callbacks from actual cached supported/missing values. This will allow
to have more flexible per-token or per-object feature detectors in
subsequent refactorings.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-22-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
8082a311d3 libbpf: Add BPF token support to bpf_prog_load() API
Wire through token_fd into bpf_prog_load().

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-16-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
ac4a66ea12 libbpf: Add BPF token support to bpf_btf_load() API
Allow user to specify token_fd for bpf_btf_load() API that wraps
kernel's BPF_BTF_LOAD command. This allows loading BTF from unprivileged
process as long as it has BPF token allowing BPF_BTF_LOAD command, which
can be created and delegated by privileged process.

Wire through new btf_flags as well, so that user can provide
BPF_F_TOKEN_FD flag, if necessary.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-15-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
8002c052f3 libbpf: Add BPF token support to bpf_map_create() API
Add ability to provide token_fd for BPF_MAP_CREATE command through
bpf_map_create() API.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-14-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
5cc8482fe2 libbpf: Add bpf_token_create() API
Add low-level wrapper API for BPF_TOKEN_CREATE command in bpf() syscall.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-13-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
21fb08cb35 bpf: Add BPF token support to BPF_PROG_LOAD command
Add basic support of BPF token to BPF_PROG_LOAD. BPF_F_TOKEN_FD flag
should be set in prog_flags field when providing prog_token_fd.

Wire through a set of allowed BPF program types and attach types,
derived from BPF FS at BPF token creation time. Then make sure we
perform bpf_token_capable() checks everywhere where it's relevant.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-7-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
eb9d10835c bpf: Add BPF token support to BPF_BTF_LOAD command
Accept BPF token FD in BPF_BTF_LOAD command to allow BTF data loading
through delegated BPF token. BPF_F_TOKEN_FD flag has to be specified
when passing BPF token FD. Given BPF_BTF_LOAD command didn't have flags
field before, we also add btf_flags field.

BTF loading is a pretty straightforward operation, so as long as BPF
token is created with allow_cmds granting BPF_BTF_LOAD command, kernel
proceeds to parsing BTF data and creating BTF object.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-6-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
1386c15b7b bpf: Add BPF token support to BPF_MAP_CREATE command
Allow providing token_fd for BPF_MAP_CREATE command to allow controlled
BPF map creation from unprivileged process through delegated BPF token.
New BPF_F_TOKEN_FD flag is added to specify together with BPF token FD
for BPF_MAP_CREATE command.

Wire through a set of allowed BPF map types to BPF token, derived from
BPF FS at BPF token creation time. This, in combination with allowed_cmds
allows to create a narrowly-focused BPF token (controlled by privileged
agent) with a restrictive set of BPF maps that application can attempt
to create.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-5-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
5cd6a8493b bpf: Introduce BPF token object
Add new kind of BPF kernel object, BPF token. BPF token is meant to
allow delegating privileged BPF functionality, like loading a BPF
program or creating a BPF map, from privileged process to a *trusted*
unprivileged process, all while having a good amount of control over which
privileged operations could be performed using provided BPF token.

This is achieved through mounting BPF FS instance with extra delegation
mount options, which determine what operations are delegatable, and also
constraining it to the owning user namespace (as mentioned in the
previous patch).

BPF token itself is just a derivative from BPF FS and can be created
through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF
FS FD, which can be attained through open() API by opening BPF FS mount
point. Currently, BPF token "inherits" delegated command, map types,
prog type, and attach type bit sets from BPF FS as is. In the future,
having an BPF token as a separate object with its own FD, we can allow
to further restrict BPF token's allowable set of things either at the
creation time or after the fact, allowing the process to guard itself
further from unintentionally trying to load undesired kind of BPF
programs. But for now we keep things simple and just copy bit sets as is.

When BPF token is created from BPF FS mount, we take reference to the
BPF super block's owning user namespace, and then use that namespace for
checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
capabilities that are normally only checked against init userns (using
capable()), but now we check them using ns_capable() instead (if BPF
token is provided). See bpf_token_capable() for details.

Such setup means that BPF token in itself is not sufficient to grant BPF
functionality. User namespaced process has to *also* have necessary
combination of capabilities inside that user namespace. So while
previously CAP_BPF was useless when granted within user namespace, now
it gains a meaning and allows container managers and sys admins to have
a flexible control over which processes can and need to use BPF
functionality within the user namespace (i.e., container in practice).
And BPF FS delegation mount options and derived BPF tokens serve as
a per-container "flag" to grant overall ability to use bpf() (plus further
restrict on which parts of bpf() syscalls are treated as namespaced).

Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
within the BPF FS owning user namespace, rounding up the ns_capable()
story of BPF token. Also creating BPF token in init user namespace is
currently not supported, given BPF token doesn't have any effect in init
user namespace anyways.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-4-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Martin KaFai Lau
385ae492fa libbpf: Ensure undefined bpf_attr field stays 0
The commit 9e926acda0c2 ("libbpf: Find correct module BTFs for struct_ops maps and progs.")
sets a newly added field (value_type_btf_obj_fd) to -1 in libbpf when
the caller of the libbpf's bpf_map_create did not define this field by
passing a NULL "opts" or passing in a "opts" that does not cover this
new field. OPT_HAS(opts, field) is used to decide if the field is
defined or not:

	((opts) && opts->sz >= offsetofend(typeof(*(opts)), field))

Once OPTS_HAS decided the field is not defined, that field should
be set to 0. For this particular new field (value_type_btf_obj_fd),
its corresponding map_flags "BPF_F_VTYPE_BTF_OBJ_FD" is not set.
Thus, the kernel does not treat it as an fd field.

Fixes: 9e926acda0c2 ("libbpf: Find correct module BTFs for struct_ops maps and progs.")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240124224418.2905133-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-26 18:12:29 -05:00
Dima Tisnek
517ff8d823 libbpf: Correct bpf_core_read.h comment wrt bpf_core_relo struct
Past commit ([0]) removed the last vestiges of struct bpf_field_reloc,
it's called struct bpf_core_relo now.

  [0] 28b93c64499a ("libbpf: Clean up and improve CO-RE reloc logging")

Signed-off-by: Dima Tisnek <dimaqq@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240121060126.15650-1-dimaqq@gmail.com
2024-01-26 18:12:29 -05:00
Kui-Feng Lee
0b0dfaf1be libbpf: Find correct module BTFs for struct_ops maps and progs.
Locate the module BTFs for struct_ops maps and progs and pass them to the
kernel. This ensures that the kernel correctly resolves type IDs from the
appropriate module BTFs.

For the map of a struct_ops object, the FD of the module BTF is set to
bpf_map to keep a reference to the module BTF. The FD is passed to the
kernel as value_type_btf_obj_fd when the struct_ops object is loaded.

For a bpf_struct_ops prog, attach_btf_obj_fd of bpf_prog is the FD of a
module BTF in the kernel.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240119225005.668602-13-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-26 18:12:29 -05:00
Kui-Feng Lee
d767ead51f bpf: pass attached BTF to the bpf_struct_ops subsystem
Pass the fd of a btf from the userspace to the bpf() syscall, and then
convert the fd into a btf. The btf is generated from the module that
defines the target BPF struct_ops type.

In order to inform the kernel about the module that defines the target
struct_ops type, the userspace program needs to provide a btf fd for the
respective module's btf. This btf contains essential information on the
types defined within the module, including the target struct_ops type.

A btf fd must be provided to the kernel for struct_ops maps and for the bpf
programs attached to those maps.

In the case of the bpf programs, the attach_btf_obj_fd parameter is passed
as part of the bpf_attr and is converted into a btf. This btf is then
stored in the prog->aux->attach_btf field. Here, it just let the verifier
access attach_btf directly.

In the case of struct_ops maps, a btf fd is passed as value_type_btf_obj_fd
of bpf_attr. The bpf_struct_ops_map_alloc() function converts the fd to a
btf and stores it as st_map->btf. A flag BPF_F_VTYPE_BTF_OBJ_FD is added
for map_flags to indicate that the value of value_type_btf_obj_fd is set.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240119225005.668602-9-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-26 18:12:29 -05:00
Kui-Feng Lee
d70e2071e2 bpf: pass btf object id in bpf_map_info.
Include btf object id (btf_obj_id) in bpf_map_info so that tools (ex:
bpftools struct_ops dump) know the correct btf from the kernel to look up
type information of struct_ops types.

Since struct_ops types can be defined and registered in a module. The
type information of a struct_ops type are defined in the btf of the
module defining it.  The userspace tools need to know which btf is for
the module defining a struct_ops type.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240119225005.668602-7-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-26 18:12:29 -05:00
Jiri Olsa
fe508381b4 bpf: Store cookies in kprobe_multi bpf_link_info data
Storing cookies in kprobe_multi bpf_link_info data. The cookies
field is optional and if provided it needs to be an array of
__u64 with kprobe_multi.count length.

Acked-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240119110505.400573-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-26 18:12:29 -05:00
Jiri Olsa
de2f366450 bpf: Add cookie to perf_event bpf_link_info records
At the moment we don't store cookie for perf_event probes,
while we do that for the rest of the probes.

Adding cookie fields to struct bpf_link_info perf event
probe records:

  perf_event.uprobe
  perf_event.kprobe
  perf_event.tracepoint
  perf_event.perf_event

And the code to store that in bpf_link_info struct.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20240119110505.400573-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
390c6234eb libbpf: call dup2() syscall directly
We've ran into issues with using dup2() API in production setting, where
libbpf is linked into large production environment and ends up calling
unintended custom implementations of dup2(). These custom implementations
don't provide atomic FD replacement guarantees of dup2() syscall,
leading to subtle and hard to debug issues.

To prevent this in the future and guarantee that no libc implementation
will do their own custom non-atomic dup2() implementation, call dup2()
syscall directly with syscall(SYS_dup2).

Note that some architectures don't seem to provide dup2 and have dup3
instead. Try to detect and pick best syscall.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240119210201.1295511-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-26 18:12:29 -05:00
Andrey Grafin
89ca11a79b libbpf: Apply map_set_def_max_entries() for inner_maps on creation
This patch allows to auto create BPF_MAP_TYPE_ARRAY_OF_MAPS and
BPF_MAP_TYPE_HASH_OF_MAPS with values of BPF_MAP_TYPE_PERF_EVENT_ARRAY
by bpf_object__load().

Previous behaviour created a zero filled btf_map_def for inner maps and
tried to use it for a map creation but the linux kernel forbids to create
a BPF_MAP_TYPE_PERF_EVENT_ARRAY map with max_entries=0.

Fixes: 646f02ffdd49 ("libbpf: Add BTF-defined map-in-map support")
Signed-off-by: Andrey Grafin <conquistador@yandex-team.ru>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20240117130619.9403-1-conquistador@yandex-team.ru
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-26 18:12:29 -05:00
Daniel Borkmann
4c3742d9c1 bpf: Sync uapi bpf.h header for the tooling infra
Both commit 91051f003948 ("tcp: Dump bound-only sockets in inet_diag.")
and commit 985b8ea9ec7e ("bpf, docs: Fix bpf_redirect_peer header doc")
missed the tooling header sync. Fix it.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
f4211a704f libbpf: warn on unexpected __arg_ctx type when rewriting BTF
On kernel that don't support arg:ctx tag, before adjusting global
subprog BTF information to match kernel's expected canonical type names,
make sure that types used by user are meaningful, and if not, warn and
don't do BTF adjustments.

This is similar to checks that kernel performs, but narrower in scope,
as only a small subset of BPF program types can be accommodated by
libbpf using canonical type names.

Libbpf unconditionally allows `struct pt_regs *` for perf_event program
types, unlike kernel, which supports that conditionally on architecture.
This is done to keep things simple and not cause unnecessary false
positives. This seems like a minor and harmless deviation, which in
real-world programs will be caught by kernels with arg:ctx tag support
anyways. So KISS principle.

This logic is hard to test (especially on latest kernels), so manual
testing was performed instead. Libbpf emitted the following warning for
perf_event program with wrong context argument type:

  libbpf: prog 'arg_tag_ctx_perf': subprog 'subprog_ctx_tag' arg#0 is expected to be of `struct bpf_perf_event_data *` type

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240118033143.3384355-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
939ab641b8 libbpf: feature-detect arg:ctx tag support in kernel
Add feature detector of kernel-side arg:ctx (__arg_ctx) tag support. If
this is detected, libbpf will avoid doing any __arg_ctx-related BTF
rewriting and checks in favor of letting kernel handle this completely.

test_global_funcs/ctx_arg_rewrite subtest is adjusted to do the same
feature detection (albeit in much simpler, though round-about and
inefficient, way), and skip the tests. This is done to still be able to
execute this test on older kernels (like in libbpf CI).

Note, BPF token series ([0]) does a major refactor and code moving of
libbpf-internal feature detection "framework", so to avoid unnecessary
conflicts we keep newly added feature detection stand-alone with ad-hoc
result caching. Once things settle, there will be a small follow up to
re-integrate everything back and move code into its final place in
newly-added (by BPF token series) features.c file.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=814209&state=*

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240118033143.3384355-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-26 18:12:29 -05:00
Kan Liang
82ebbd97c8 perf/x86/intel: Support branch counters logging
The branch counters logging (A.K.A LBR event logging) introduces a
per-counter indication of precise event occurrences in LBRs. It can
provide a means to attribute exposed retirement latency to combinations
of events across a block of instructions. It also provides a means of
attributing Timed LBR latencies to events.

The feature is first introduced on SRF/GRR. It is an enhancement of the
ARCH LBR. It adds new fields in the LBR_INFO MSRs to log the occurrences
of events on the GP counters. The information is displayed by the order
of counters.

The design proposed in this patch requires that the events which are
logged must be in a group with the event that has LBR. If there are
more than one LBR group, the counters logging information only from the
current group (overflowed) are stored for the perf tool, otherwise the
perf tool cannot know which and when other groups are scheduled
especially when multiplexing is triggered. The user can ensure it uses
the maximum number of counters that support LBR info (4 by now) by
making the group large enough.

The HW only logs events by the order of counters. The order may be
different from the order of enabling which the perf tool can understand.
When parsing the information of each branch entry, convert the counter
order to the enabled order, and store the enabled order in the extension
space.

Unconditionally reset LBRs for an LBR event group when it's deleted. The
logged counter information is only valid for the current LBR group. If
another LBR group is scheduled later, the information from the stale
LBRs would be otherwise wrongly interpreted.

Add a sanity check in intel_pmu_hw_config(). Disable the feature if other
counter filters (inv, cmask, edge, in_tx) are set or LBR call stack mode
is enabled. (For the LBR call stack mode, we cannot simply flush the
LBR, since it will break the call stack. Also, there is no obvious usage
with the call stack mode for now.)

Only applying the PERF_SAMPLE_BRANCH_COUNTERS doesn't require any branch
stack setup.

Expose the maximum number of supported counters and the width of the
counters into the sysfs. The perf tool can use the information to parse
the logged counters in each branch.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231025201626.3000228-5-kan.liang@linux.intel.com
2024-01-26 18:12:29 -05:00
Kan Liang
9705c4c622 perf: Add branch stack counters
Currently, the additional information of a branch entry is stored in a
u64 space. With more and more information added, the space is running
out. For example, the information of occurrences of events will be added
for each branch.

Two places were suggested to append the counters.
https://lore.kernel.org/lkml/20230802215814.GH231007@hirez.programming.kicks-ass.net/
One place is right after the flags of each branch entry. It changes the
existing struct perf_branch_entry. The later ARCH specific
implementation has to be really careful to consistently pick
the right struct.
The other place is right after the entire struct perf_branch_stack.
The disadvantage is that the pointer of the extra space has to be
recorded. The common interface perf_sample_save_brstack() has to be
updated.

The latter is much straightforward, and should be easily understood and
maintained. It is implemented in the patch.

Add a new branch sample type, PERF_SAMPLE_BRANCH_COUNTERS, to indicate
the event which is recorded in the branch info.

The "u64 counters" may store the occurrences of several events. The
information regarding the number of events/counters and the width of
each counter should be exposed via sysfs as a reference for the perf
tool. Define the branch_counter_nr and branch_counter_width ABI here.
The support will be implemented later in the Intel-specific patch.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231025201626.3000228-1-kan.liang@linux.intel.com
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
528cb9d3e9 README: update Ubuntu link
Do what [0] proposed to do, but with properly formatted commit message
and Signed-off-by.

  [0] https://github.com/libbpf/libbpf/pull/742

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-25 16:47:44 -08:00
thiagoftsm
f8f9df60e0 Merge branch 'libbpf:master' into master 2024-01-24 12:31:08 +00:00
Andrii Nakryiko
f81eef23b3 ci: skip two tests failing due to kernel bug
Add lwt_reroute and tc_links_ingress to DENYLIST, as they are currently
broken due to kernel bug. Fix is underreview and should make it into
bpf-next soon.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
feabd96e00 ci: regenerate vmlinux.h
Need bpf_xfrm_state_opts and others.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
1570d568a0 Makefile: bump to v1.4.0 dev version
Bump Github-only Makefile to match 1.4 development version.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
e2203b3057 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   750011e239a50873251c16207b0fe78eabf8577e
Checkpoint bpf-next commit: 98e20e5e13d2811898921f999288be7151a11954
Baseline bpf commit:        bc4fbf022c68967cb49b2b820b465cf90de974b8
Checkpoint bpf commit:      7c5e046bdcb2513f9decb3765d8bf92d604279cf

Alyssa Ross (1):
  libbpf: Skip DWARF sections in linker sanity check

Amritha Nambiar (4):
  netdev-genl: spec: Extend netdev netlink spec in YAML for queue
  netdev-genl: spec: Extend netdev netlink spec in YAML for NAPI
  netdev-genl: spec: Add irq in netdev netlink YAML spec
  netdev-genl: spec: Add PID in netdev netlink YAML spec

Andrii Nakryiko (24):
  bpf: introduce BPF token object
  bpf: add BPF token support to BPF_MAP_CREATE command
  bpf: add BPF token support to BPF_BTF_LOAD command
  bpf: add BPF token support to BPF_PROG_LOAD command
  libbpf: add bpf_token_create() API
  libbpf: add BPF token support to bpf_map_create() API
  libbpf: add BPF token support to bpf_btf_load() API
  libbpf: add BPF token support to bpf_prog_load() API
  bpf: rename MAX_BPF_LINK_TYPE into __MAX_BPF_LINK_TYPE for consistency
  libbpf: split feature detectors definitions from cached results
  libbpf: further decouple feature checking logic from bpf_object
  libbpf: move feature detection code into its own file
  libbpf: wire up token_fd into feature probing logic
  libbpf: wire up BPF token support at BPF object level
  libbpf: support BPF token path setting through LIBBPF_BPF_TOKEN_PATH
    envvar
  Revert BPF token-related functionality
  libbpf: add __arg_xxx macros for annotating global func args
  libbpf: make uniform use of btf__fd() accessor inside libbpf
  libbpf: use explicit map reuse flag to skip map creation steps
  libbpf: don't rely on map->fd as an indicator of map being created
  libbpf: use stable map placeholder FDs
  libbpf: move exception callbacks assignment logic into relocation step
  libbpf: move BTF loading step after relocation step
  libbpf: implement __arg_ctx fallback logic

Daniel Xu (1):
  libbpf: Add BPF_CORE_WRITE_BITFIELD() macro

David Vernet (1):
  bpf: Load vmlinux btf for any struct_ops map

Eduard Zingerman (1):
  libbpf: Start v1.4 development cycle

Jakub Kicinski (1):
  tools: ynl: add sample for getting page-pool information

Jamal Hadi Salim (5):
  net/sched: Remove uapi support for rsvp classifier
  net/sched: Remove uapi support for tcindex classifier
  net/sched: Remove uapi support for dsmark qdisc
  net/sched: Remove uapi support for ATM qdisc
  net/sched: Remove uapi support for CBQ qdisc

Jiri Olsa (2):
  libbpf: Add st_type argument to elf_resolve_syms_offsets function
  bpf: Add link_info support for uprobe multi link

Larysa Zaremba (1):
  xdp: Add VLAN tag hint

Mingyi Zhang (1):
  libbpf: Fix NULL pointer dereference in bpf_object__collect_prog_relos

Sergei Trofimovich (1):
  libbpf: Add pr_warn() for EINVAL cases in linker_sanity_check_elf

Stanislav Fomichev (3):
  xsk: Support tx_metadata_len
  xsk: Add TX timestamp and TX checksum offload support
  xsk: Add option to calculate TX checksum in SW

 include/uapi/linux/bpf.h       |  14 +-
 include/uapi/linux/if_xdp.h    |  61 +++-
 include/uapi/linux/netdev.h    |  81 ++++-
 include/uapi/linux/pkt_cls.h   |  47 ---
 include/uapi/linux/pkt_sched.h | 109 ------
 src/bpf_core_read.h            |  32 ++
 src/bpf_helpers.h              |   3 +
 src/elf.c                      |   5 +-
 src/libbpf.c                   | 585 +++++++++++++++++++++++++--------
 src/libbpf.map                 |   3 +
 src/libbpf_internal.h          |  17 +-
 src/libbpf_version.h           |   2 +-
 src/linker.c                   |  27 +-
 13 files changed, 673 insertions(+), 313 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
3102067b4e libbpf: implement __arg_ctx fallback logic
Out of all special global func arg tag annotations, __arg_ctx is
practically is the most immediately useful and most critical to have
working across multitude kernel version, if possible. This would allow
end users to write much simpler code if __arg_ctx semantics worked for
older kernels that don't natively understand btf_decl_tag("arg:ctx") in
verifier logic.

Luckily, it is possible to ensure __arg_ctx works on old kernels through
a bit of extra work done by libbpf, at least in a lot of common cases.

To explain the overall idea, we need to go back at how context argument
was supported in global funcs before __arg_ctx support was added. This
was done based on special struct name checks in kernel. E.g., for
BPF_PROG_TYPE_PERF_EVENT the expectation is that argument type `struct
bpf_perf_event_data *` mark that argument as PTR_TO_CTX. This is all
good as long as global function is used from the same BPF program types
only, which is often not the case. If the same subprog has to be called
from, say, kprobe and perf_event program types, there is no single
definition that would satisfy BPF verifier. Subprog will have context
argument either for kprobe (if using bpf_user_pt_regs_t struct name) or
perf_event (with bpf_perf_event_data struct name), but not both.

This limitation was the reason to add btf_decl_tag("arg:ctx"), making
the actual argument type not important, so that user can just define
"generic" signature:

  __noinline int global_subprog(void *ctx __arg_ctx) { ... }

I won't belabor how libbpf is implementing subprograms, see a huge
comment next to bpf_object_relocate_calls() function. The idea is that
each main/entry BPF program gets its own copy of global_subprog's code
appended.

This per-program copy of global subprog code *and* associated func_info
.BTF.ext information, pointing to FUNC -> FUNC_PROTO BTF type chain
allows libbpf to simulate __arg_ctx behavior transparently, even if the
kernel doesn't yet support __arg_ctx annotation natively.

The idea is straightforward: each time we append global subprog's code
and func_info information, we adjust its FUNC -> FUNC_PROTO type
information, if necessary (that is, libbpf can detect the presence of
btf_decl_tag("arg:ctx") just like BPF verifier would do it).

The rest is just mechanical and somewhat painful BTF manipulation code.
It's painful because we need to clone FUNC -> FUNC_PROTO, instead of
reusing it, as same FUNC -> FUNC_PROTO chain might be used by another
main BPF program within the same BPF object, so we can't just modify it
in-place (and cloning BTF types within the same struct btf object is
painful due to constant memory invalidation, see comments in code).
Uploaded BPF object's BTF information has to work for all BPF
programs at the same time.

Once we have FUNC -> FUNC_PROTO clones, we make sure that instead of
using some `void *ctx` parameter definition, we have an expected `struct
bpf_perf_event_data *ctx` definition (as far as BPF verifier and kernel
is concerned), which will mark it as context for BPF verifier. Same
global subprog relocated and copied into another main BPF program will
get different type information according to main program's type. It all
works out in the end in a completely transparent way for end user.

Libbpf maintains internal program type -> expected context struct name
mapping internally. Note, not all BPF program types have named context
struct, so this approach won't work for such programs (just like it
didn't before __arg_ctx). So native __arg_ctx is still important to have
in kernel to have generic context support across all BPF program types.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-8-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
a4f0740b3d libbpf: move BTF loading step after relocation step
With all the preparations in previous patches done we are ready to
postpone BTF loading and sanitization step until after all the
relocations are performed.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
94470256c1 libbpf: move exception callbacks assignment logic into relocation step
Move the logic of finding and assigning exception callback indices from
BTF sanitization step to program relocations step, which seems more
logical and will unblock moving BTF loading to after relocation step.

Exception callbacks discovery and assignment has no dependency on BTF
being loaded into the kernel, it only uses BTF information. It does need
to happen before subprogram relocations happen, though. Which is why the
split.

No functional changes.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
4d68ea90c2 libbpf: use stable map placeholder FDs
Move map creation to later during BPF object loading by pre-creating
stable placeholder FDs (utilizing memfd_create()). Use dup2()
syscall to then atomically make those placeholder FDs point to real
kernel BPF map objects.

This change allows to delay BPF map creation to after all the BPF
program relocations. That, in turn, allows to delay BTF finalization and
loading into kernel to after all the relocations as well. We'll take
advantage of the latter in subsequent patches to allow libbpf to adjust
BTF in a way that helps with BPF global function usage.

Clean up a few places where we close map->fd, which now shouldn't
happen, because map->fd should be a valid FD regardless of whether map
was created or not. Surprisingly and nicely it simplifies a bunch of
error handling code. If this change doesn't backfire, I'm tempted to
pre-create such stable FDs for other entities (progs, maybe even BTF).
We previously did some manipulations to make gen_loader work with fake
map FDs, with stable map FDs this hack is not necessary for maps (we
still have it for BTF, but I left it as is for now).

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
2ea3d8042f libbpf: don't rely on map->fd as an indicator of map being created
With the upcoming switch to preallocated placeholder FDs for maps,
switch various getters/setter away from checking map->fd. Use
map_is_created() helper that detect whether BPF map can be modified based
on map->obj->loaded state, with special provision for maps set up with
bpf_map__reuse_fd().

For backwards compatibility, we take map_is_created() into account in
bpf_map__fd() getter as well. This way before bpf_object__load() phase
bpf_map__fd() will always return -1, just as before the changes in
subsequent patches adding stable map->fd placeholders.

We also get rid of all internal uses of bpf_map__fd() getter, as it's
more oriented for uses external to libbpf. The above map_is_created()
check actually interferes with some of the internal uses, if map FD is
fetched through bpf_map__fd().

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
e9ce55197b libbpf: use explicit map reuse flag to skip map creation steps
Instead of inferring whether map already point to previously
created/pinned BPF map (which user can specify with bpf_map__reuse_fd()) API),
use explicit map->reused flag that is set in such case.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
3fb45d3761 libbpf: make uniform use of btf__fd() accessor inside libbpf
It makes future grepping and code analysis a bit easier.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Jamal Hadi Salim
2e49eb8bf6 net/sched: Remove uapi support for CBQ qdisc
Commit 051d44209842 ("net/sched: Retire CBQ qdisc") retired the CBQ qdisc.
Remove UAPI for it. Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04 19:15:17 -05:00
Jamal Hadi Salim
5473fe6aef net/sched: Remove uapi support for ATM qdisc
Commit fb38306ceb9e ("net/sched: Retire ATM qdisc") retired the ATM qdisc.
Remove UAPI for it. Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04 19:15:17 -05:00
Jamal Hadi Salim
c04d1b669d net/sched: Remove uapi support for dsmark qdisc
Commit bbe77c14ee61 ("net/sched: Retire dsmark qdisc") retired the dsmark
classifier. Remove UAPI support for it.
Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04 19:15:17 -05:00
Jamal Hadi Salim
717798e2f9 net/sched: Remove uapi support for tcindex classifier
commit 8c710f75256b ("net/sched: Retire tcindex classifier") retired the TC
tcindex classifier.
Remove UAPI for it.  Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04 19:15:17 -05:00
Jamal Hadi Salim
f2c790ca1a net/sched: Remove uapi support for rsvp classifier
commit 265b4da82dbf ("net/sched: Retire rsvp classifier") retired the TC RSVP
classifier.
Remove UAPI for it. Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04 19:15:17 -05:00
Mingyi Zhang
c008eb921e libbpf: Fix NULL pointer dereference in bpf_object__collect_prog_relos
An issue occurred while reading an ELF file in libbpf.c during fuzzing:

	Program received signal SIGSEGV, Segmentation fault.
	0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206
	4206 in libbpf.c
	(gdb) bt
	#0 0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206
	#1 0x000000000094f9d6 in bpf_object.collect_relos () at libbpf.c:6706
	#2 0x000000000092bef3 in bpf_object_open () at libbpf.c:7437
	#3 0x000000000092c046 in bpf_object.open_mem () at libbpf.c:7497
	#4 0x0000000000924afa in LLVMFuzzerTestOneInput () at fuzz/bpf-object-fuzzer.c:16
	#5 0x000000000060be11 in testblitz_engine::fuzzer::Fuzzer::run_one ()
	#6 0x000000000087ad92 in tracing::span::Span::in_scope ()
	#7 0x00000000006078aa in testblitz_engine::fuzzer::util::walkdir ()
	#8 0x00000000005f3217 in testblitz_engine::entrypoint::main::{{closure}} ()
	#9 0x00000000005f2601 in main ()
	(gdb)

scn_data was null at this code(tools/lib/bpf/src/libbpf.c):

	if (rel->r_offset % BPF_INSN_SZ || rel->r_offset >= scn_data->d_size) {

The scn_data is derived from the code above:

	scn = elf_sec_by_idx(obj, sec_idx);
	scn_data = elf_sec_data(obj, scn);

	relo_sec_name = elf_sec_str(obj, shdr->sh_name);
	sec_name = elf_sec_name(obj, scn);
	if (!relo_sec_name || !sec_name)// don't check whether scn_data is NULL
		return -EINVAL;

In certain special scenarios, such as reading a malformed ELF file,
it is possible that scn_data may be a null pointer

Signed-off-by: Mingyi Zhang <zhangmingyi5@huawei.com>
Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Changye Wu <wuchangye@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231221033947.154564-1-liuxin350@huawei.com
2024-01-04 19:15:17 -05:00
Alyssa Ross
6252a2fdcc libbpf: Skip DWARF sections in linker sanity check
clang can generate (with -g -Wa,--compress-debug-sections) 4-byte
aligned DWARF sections that declare themselves to be 8-byte aligned in
the section header.  Since DWARF sections are dropped during linking
anyway, just skip running the sanity checks on them.

Reported-by: Sergei Trofimovich <slyich@gmail.com>
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://lore.kernel.org/bpf/ZXcFRJVKbKxtEL5t@nz.home/
Link: https://lore.kernel.org/bpf/20231219110324.8989-1-hi@alyssa.is
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
c378eff58c libbpf: add __arg_xxx macros for annotating global func args
Add a set of __arg_xxx macros which can be used to augment BPF global
subprogs/functions with extra information for use by BPF verifier.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231215011334.2307144-9-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
c65b319c04 Revert BPF token-related functionality
This patch includes the following revert (one  conflicting BPF FS
patch and three token patch sets, represented by merge commits):
  - revert 0f5d5454c723 "Merge branch 'bpf-fs-mount-options-parsing-follow-ups'";
  - revert 750e785796bb "bpf: Support uid and gid when mounting bpffs";
  - revert 733763285acf "Merge branch 'bpf-token-support-in-libbpf-s-bpf-object'";
  - revert c35919dcce28 "Merge branch 'bpf-token-and-bpf-fs-based-delegation'".

Link: https://lore.kernel.org/bpf/CAHk-=wg7JuFYwGy=GOMbRCtOL+jwSQsdUaBsRWkDVYbxipbM5A@mail.gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-04 19:15:17 -05:00
Larysa Zaremba
43e7309228 xdp: Add VLAN tag hint
Implement functionality that enables drivers to expose VLAN tag
to XDP code.

VLAN tag is represented by 2 variables:
- protocol ID, which is passed to bpf code in BE
- VLAN TCI, in host byte order

Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20231205210847.28460-10-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
b166b99eed libbpf: support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar
To allow external admin authority to override default BPF FS location
(/sys/fs/bpf) for implicit BPF token creation, teach libbpf to recognize
LIBBPF_BPF_TOKEN_PATH envvar. If it is specified and user application
didn't explicitly specify neither bpf_token_path nor bpf_token_fd
option, it will be treated exactly like bpf_token_path option,
overriding default /sys/fs/bpf location and making BPF token mandatory.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231213190842.3844987-10-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
5df9eba06a libbpf: wire up BPF token support at BPF object level
Add BPF token support to BPF object-level functionality.

BPF token is supported by BPF object logic either as an explicitly
provided BPF token from outside (through BPF FS path or explicit BPF
token FD), or implicitly (unless prevented through
bpf_object_open_opts).

Implicit mode is assumed to be the most common one for user namespaced
unprivileged workloads. The assumption is that privileged container
manager sets up default BPF FS mount point at /sys/fs/bpf with BPF token
delegation options (delegate_{cmds,maps,progs,attachs} mount options).
BPF object during loading will attempt to create BPF token from
/sys/fs/bpf location, and pass it for all relevant operations
(currently, map creation, BTF load, and program load).

In this implicit mode, if BPF token creation fails due to whatever
reason (BPF FS is not mounted, or kernel doesn't support BPF token,
etc), this is not considered an error. BPF object loading sequence will
proceed with no BPF token.

In explicit BPF token mode, user provides explicitly either custom BPF
FS mount point path or creates BPF token on their own and just passes
token FD directly. In such case, BPF object will either dup() token FD
(to not require caller to hold onto it for entire duration of BPF object
lifetime) or will attempt to create BPF token from provided BPF FS
location. If BPF token creation fails, that is considered a critical
error and BPF object load fails with an error.

Libbpf provides a way to disable implicit BPF token creation, if it
causes any troubles (BPF token is designed to be completely optional and
shouldn't cause any problems even if provided, but in the world of BPF
LSM, custom security logic can be installed that might change outcome
dependin on the presence of BPF token). To disable libbpf's default BPF
token creation behavior user should provide either invalid BPF token FD
(negative), or empty bpf_token_path option.

BPF token presence can influence libbpf's feature probing, so if BPF
object has associated BPF token, feature probing is instructed to use
BPF object-specific feature detection cache and token FD.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231213190842.3844987-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
b14daa8b9b libbpf: wire up token_fd into feature probing logic
Adjust feature probing callbacks to take into account optional token_fd.
In unprivileged contexts, some feature detectors would fail to detect
kernel support just because BPF program, BPF map, or BTF object can't be
loaded due to privileged nature of those operations. So when BPF object
is loaded with BPF token, this token should be used for feature probing.

This patch is setting support for this scenario, but we don't yet pass
non-zero token FD. This will be added in the next patch.

We also switched BPF cookie detector from using kprobe program to
tracepoint one, as tracepoint is somewhat less dangerous BPF program
type and has higher likelihood of being allowed through BPF token in the
future. This change has no effect on detection behavior.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231213190842.3844987-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
fab327c888 libbpf: move feature detection code into its own file
It's quite a lot of well isolated code, so it seems like a good
candidate to move it out of libbpf.c to reduce its size.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231213190842.3844987-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
feda0728e0 libbpf: further decouple feature checking logic from bpf_object
Add feat_supported() helper that accepts feature cache instead of
bpf_object. This allows low-level code in bpf.c to not know or care
about higher-level concept of bpf_object, yet it will be able to utilize
custom feature checking in cases where BPF token might influence the
outcome.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231213190842.3844987-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
11c977ffaf libbpf: split feature detectors definitions from cached results
Split a list of supported feature detectors with their corresponding
callbacks from actual cached supported/missing values. This will allow
to have more flexible per-token or per-object feature detectors in
subsequent refactorings.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231213190842.3844987-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Daniel Xu
9d2f8aaf21 libbpf: Add BPF_CORE_WRITE_BITFIELD() macro
=== Motivation ===

Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
writing wrapper to make the verifier happy.

Two alternatives to this approach are:

1. Use the upcoming `preserve_static_offset` [0] attribute to disable
   CO-RE on specific structs.
2. Use broader byte-sized writes to write to bitfields.

(1) is a bit hard to use. It requires specific and not-very-obvious
annotations to bpftool generated vmlinux.h. It's also not generally
available in released LLVM versions yet.

(2) makes the code quite hard to read and write. And especially if
BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
to have an inverse helper for writing.

=== Implementation details ===

Since the logic is a bit non-obvious, I thought it would be helpful
to explain exactly what's going on.

To start, it helps by explaining what LSHIFT_U64 (lshift) and RSHIFT_U64
(rshift) is designed to mean. Consider the core of the
BPF_CORE_READ_BITFIELD() algorithm:

        val <<= __CORE_RELO(s, field, LSHIFT_U64);
        val = val >> __CORE_RELO(s, field, RSHIFT_U64);

Basically what happens is we lshift to clear the non-relevant (blank)
higher order bits. Then we rshift to bring the relevant bits (bitfield)
down to LSB position (while also clearing blank lower order bits). To
illustrate:

        Start:    ........XXX......
        Lshift:   XXX......00000000
        Rshift:   00000000000000XXX

where `.` means blank bit, `0` means 0 bit, and `X` means bitfield bit.

After the two operations, the bitfield is ready to be interpreted as a
regular integer.

Next, we want to build an alternative (but more helpful) mental model
on lshift and rshift. That is, to consider:

* rshift as the total number of blank bits in the u64
* lshift as number of blank bits left of the bitfield in the u64

Take a moment to consider why that is true by consulting the above
diagram.

With this insight, we can now define the following relationship:

              bitfield
                 _
                | |
        0.....00XXX0...00
        |      |   |    |
        |______|   |    |
         lshift    |    |
                   |____|
              (rshift - lshift)

That is, we know the number of higher order blank bits is just lshift.
And the number of lower order blank bits is (rshift - lshift).

Finally, we can examine the core of the write side algorithm:

        mask = (~0ULL << rshift) >> lshift;              // 1
        val = (val & ~mask) | ((nval << rpad) & mask);   // 2

1. Compute a mask where the set bits are the bitfield bits. The first
   left shift zeros out exactly the number of blank bits, leaving a
   bitfield sized set of 1s. The subsequent right shift inserts the
   correct amount of higher order blank bits.

2. On the left of the `|`, mask out the bitfield bits. This creates
   0s where the new bitfield bits will go. On the right of the `|`,
   bring nval into the correct bit position and mask out any bits
   that fall outside of the bitfield. Finally, by bor'ing the two
   halves, we get the final set of bits to write back.

[0]: https://reviews.llvm.org/D133361
Co-developed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Co-developed-by: Jonathan Lemon <jlemon@aviatrix.com>
Signed-off-by: Jonathan Lemon <jlemon@aviatrix.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/4d3dd215a4fd57d980733886f9c11a45e1a9adf3.1702325874.git.dxu@dxuuu.xyz
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-04 19:15:17 -05:00
Sergei Trofimovich
5f68c571c8 libbpf: Add pr_warn() for EINVAL cases in linker_sanity_check_elf
Before the change on `i686-linux` `systemd` build failed as:

    $ bpftool gen object src/core/bpf/socket_bind/socket-bind.bpf.o src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o
    Error: failed to link 'src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o': Invalid argument (22)

After the change it fails as:

    $ bpftool gen object src/core/bpf/socket_bind/socket-bind.bpf.o src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o
    libbpf: ELF section #9 has inconsistent alignment addr=8 != d=4 in src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o
    Error: failed to link 'src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o': Invalid argument (22)

Now it's slightly easier to figure out what is wrong with an ELF file.

Signed-off-by: Sergei Trofimovich <slyich@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231208215100.435876-1-slyich@gmail.com
2024-01-04 19:15:17 -05:00
David Vernet
235ea85487 bpf: Load vmlinux btf for any struct_ops map
In libbpf, when determining whether we need to load vmlinux btf, we're
currently (among other things) checking whether there is any struct_ops
program present in the object. This works for most realistic struct_ops
maps, as a struct_ops map is of course typically composed of one or more
struct_ops programs. However, that technically need not be the case. A
struct_ops interface could be defined which allows a map to be specified
which one or more non-prog fields, and which provides default behavior
if no struct_ops progs is actually provided otherwise. For sched_ext,
for example, you technically only need to specify the name of the
scheduler in the struct_ops map, with the core scheduler logic providing
default behavior if no prog is actually specified.

If we were to define and try to load such a struct_ops map, we would
crash in libbpf when initializing it as obj->btf_vmlinux will be NULL:

Reading symbols from minimal...
(gdb) r
Starting program: minimal_example
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x000055555558308c in btf__type_cnt (btf=0x0) at btf.c:612
612             return btf->start_id + btf->nr_types;
(gdb) bt
    type_name=0x5555555d99e3 "sched_ext_ops", kind=4) at btf.c:914
    kind=4) at btf.c:942
    type=0x7fffffffe558, type_id=0x7fffffffe548, ...
    data_member=0x7fffffffe568) at libbpf.c:948
    kern_btf=0x0) at libbpf.c:1017
    at libbpf.c:8059

So as to account for such bare-bones struct_ops maps, let's update
obj_needs_vmlinux_btf() to also iterate over an obj's maps and check
whether any of them are struct_ops maps.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20231208061704.400463-1-void@manifault.com
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
400cbd6148 bpf: rename MAX_BPF_LINK_TYPE into __MAX_BPF_LINK_TYPE for consistency
To stay consistent with the naming pattern used for similar cases in BPF
UAPI (__MAX_BPF_ATTACH_TYPE, etc), rename MAX_BPF_LINK_TYPE into
__MAX_BPF_LINK_TYPE.

Also similar to MAX_BPF_ATTACH_TYPE and MAX_BPF_REG, add:

  #define MAX_BPF_LINK_TYPE __MAX_BPF_LINK_TYPE

Not all __MAX_xxx enums have such #define, so I'm not sure if we should
add it or not, but I figured I'll start with a completely backwards
compatible way, and we can drop that, if necessary.

Also adjust a selftest that used MAX_BPF_LINK_TYPE enum.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20231206190920.1651226-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
ec1cab73a7 libbpf: add BPF token support to bpf_prog_load() API
Wire through token_fd into bpf_prog_load().

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-16-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
207b6ebb60 libbpf: add BPF token support to bpf_btf_load() API
Allow user to specify token_fd for bpf_btf_load() API that wraps
kernel's BPF_BTF_LOAD command. This allows loading BTF from unprivileged
process as long as it has BPF token allowing BPF_BTF_LOAD command, which
can be created and delegated by privileged process.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-15-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
a23b8ffcf6 libbpf: add BPF token support to bpf_map_create() API
Add ability to provide token_fd for BPF_MAP_CREATE command through
bpf_map_create() API.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-14-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
f8954ca692 libbpf: add bpf_token_create() API
Add low-level wrapper API for BPF_TOKEN_CREATE command in bpf() syscall.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-13-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
1ebea57322 bpf: add BPF token support to BPF_PROG_LOAD command
Add basic support of BPF token to BPF_PROG_LOAD. Wire through a set of
allowed BPF program types and attach types, derived from BPF FS at BPF
token creation time. Then make sure we perform bpf_token_capable()
checks everywhere where it's relevant.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
544acb9af6 bpf: add BPF token support to BPF_BTF_LOAD command
Accept BPF token FD in BPF_BTF_LOAD command to allow BTF data loading
through delegated BPF token. BTF loading is a pretty straightforward
operation, so as long as BPF token is created with allow_cmds granting
BPF_BTF_LOAD command, kernel proceeds to parsing BTF data and creating
BTF object.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
9abcc5efc8 bpf: add BPF token support to BPF_MAP_CREATE command
Allow providing token_fd for BPF_MAP_CREATE command to allow controlled
BPF map creation from unprivileged process through delegated BPF token.

Wire through a set of allowed BPF map types to BPF token, derived from
BPF FS at BPF token creation time. This, in combination with allowed_cmds
allows to create a narrowly-focused BPF token (controlled by privileged
agent) with a restrictive set of BPF maps that application can attempt
to create.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
33de35fd83 bpf: introduce BPF token object
Add new kind of BPF kernel object, BPF token. BPF token is meant to
allow delegating privileged BPF functionality, like loading a BPF
program or creating a BPF map, from privileged process to a *trusted*
unprivileged process, all while having a good amount of control over which
privileged operations could be performed using provided BPF token.

This is achieved through mounting BPF FS instance with extra delegation
mount options, which determine what operations are delegatable, and also
constraining it to the owning user namespace (as mentioned in the
previous patch).

BPF token itself is just a derivative from BPF FS and can be created
through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF
FS FD, which can be attained through open() API by opening BPF FS mount
point. Currently, BPF token "inherits" delegated command, map types,
prog type, and attach type bit sets from BPF FS as is. In the future,
having an BPF token as a separate object with its own FD, we can allow
to further restrict BPF token's allowable set of things either at the
creation time or after the fact, allowing the process to guard itself
further from unintentionally trying to load undesired kind of BPF
programs. But for now we keep things simple and just copy bit sets as is.

When BPF token is created from BPF FS mount, we take reference to the
BPF super block's owning user namespace, and then use that namespace for
checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
capabilities that are normally only checked against init userns (using
capable()), but now we check them using ns_capable() instead (if BPF
token is provided). See bpf_token_capable() for details.

Such setup means that BPF token in itself is not sufficient to grant BPF
functionality. User namespaced process has to *also* have necessary
combination of capabilities inside that user namespace. So while
previously CAP_BPF was useless when granted within user namespace, now
it gains a meaning and allows container managers and sys admins to have
a flexible control over which processes can and need to use BPF
functionality within the user namespace (i.e., container in practice).
And BPF FS delegation mount options and derived BPF tokens serve as
a per-container "flag" to grant overall ability to use bpf() (plus further
restrict on which parts of bpf() syscalls are treated as namespaced).

Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
within the BPF FS owning user namespace, rounding up the ns_capable()
story of BPF token.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Amritha Nambiar
ac9cd25de9 netdev-genl: spec: Add PID in netdev netlink YAML spec
Add support in netlink spec(netdev.yaml) for PID of the
NAPI thread. Add code generated from the spec.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/170147335301.5260.11872351477120434501.stgit@anambiarhost.jf.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 19:15:17 -05:00
Amritha Nambiar
cfa6e420f4 netdev-genl: spec: Add irq in netdev netlink YAML spec
Add support in netlink spec(netdev.yaml) for interrupt number
among the NAPI attributes. Add code generated from the spec.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/170147334210.5260.18178387869057516983.stgit@anambiarhost.jf.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 19:15:17 -05:00
Amritha Nambiar
36f30e4c30 netdev-genl: spec: Extend netdev netlink spec in YAML for NAPI
Add support in netlink spec(netdev.yaml) for napi related information.
Add code generated from the spec.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/170147333119.5260.7050639053080529108.stgit@anambiarhost.jf.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 19:15:17 -05:00
Amritha Nambiar
e4fcfe7db7 netdev-genl: spec: Extend netdev netlink spec in YAML for queue
Add support in netlink spec(netdev.yaml) for queue information.
Add code generated from the spec.

Note: The "queue-type" attribute takes values 0 and 1 for rx
and tx queue type respectively.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/170147330963.5260.2576294626647300472.stgit@anambiarhost.jf.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 19:15:17 -05:00
Stanislav Fomichev
419eab9ec7 xsk: Add option to calculate TX checksum in SW
For XDP_COPY mode, add a UMEM option XDP_UMEM_TX_SW_CSUM
to call skb_checksum_help in transmit path. Might be useful
to debugging issues with real hardware. I also use this mode
in the selftests.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20231127190319.1190813-9-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Stanislav Fomichev
95134be22e xsk: Add TX timestamp and TX checksum offload support
This change actually defines the (initial) metadata layout
that should be used by AF_XDP userspace (xsk_tx_metadata).
The first field is flags which requests appropriate offloads,
followed by the offload-specific fields. The supported per-device
offloads are exported via netlink (new xsk-flags).

The offloads themselves are still implemented in a bit of a
framework-y fashion that's left from my initial kfunc attempt.
I'm introducing new xsk_tx_metadata_ops which drivers are
supposed to implement. The drivers are also supposed
to call xsk_tx_metadata_request/xsk_tx_metadata_complete in
the right places. Since xsk_tx_metadata_{request,_complete}
are static inline, we don't incur any extra overhead doing
indirect calls.

The benefit of this scheme is as follows:
- keeps all metadata layout parsing away from driver code
- makes it easy to grep and see which drivers implement what
- don't need any extra flags to maintain to keep track of what
  offloads are implemented; if the callback is implemented - the offload
  is supported (used by netlink reporting code)

Two offloads are defined right now:
1. XDP_TXMD_FLAGS_CHECKSUM: skb-style csum_start+csum_offset
2. XDP_TXMD_FLAGS_TIMESTAMP: writes TX timestamp back into metadata
   area upon completion (tx_timestamp field)

XDP_TXMD_FLAGS_TIMESTAMP is also implemented for XDP_COPY mode: it writes
SW timestamp from the skb destructor (note I'm reusing hwtstamps to pass
metadata pointer).

The struct is forward-compatible and can be extended in the future
by appending more fields.

Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20231127190319.1190813-3-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Stanislav Fomichev
2f95d28664 xsk: Support tx_metadata_len
For zerocopy mode, tx_desc->addr can point to an arbitrary offset
and carry some TX metadata in the headroom. For copy mode, there
is no way currently to populate skb metadata.

Introduce new tx_metadata_len umem config option that indicates how many
bytes to treat as metadata. Metadata bytes come prior to tx_desc address
(same as in RX case).

The size of the metadata has mostly the same constraints as XDP:
- less than 256 bytes
- 8-byte aligned (compared to 4-byte alignment on xdp, due to 8-byte
  timestamp in the completion)
- non-zero

This data is not interpreted in any way right now.

Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20231127190319.1190813-2-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Jiri Olsa
afb384f685 bpf: Add link_info support for uprobe multi link
Adding support to get uprobe_link details through bpf_link_info
interface.

Adding new struct uprobe_multi to struct bpf_link_info to carry
the uprobe_multi link details.

The uprobe_multi.count is passed from user space to denote size
of array fields (offsets/ref_ctr_offsets/cookies). The actual
array size is stored back to uprobe_multi.count (allowing user
to find out the actual array size) and array fields are populated
up to the user passed size.

All the non-array fields (path/count/flags/pid) are always set.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20231125193130.834322-4-jolsa@kernel.org
2024-01-04 19:15:17 -05:00
Jiri Olsa
467dd7bda5 libbpf: Add st_type argument to elf_resolve_syms_offsets function
We need to get offsets for static variables in following changes,
so making elf_resolve_syms_offsets to take st_type value as argument
and passing it to elf_sym_iter_new.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20231125193130.834322-2-jolsa@kernel.org
2024-01-04 19:15:17 -05:00
Eduard Zingerman
9c794e5ab4 libbpf: Start v1.4 development cycle
Bump libbpf.map to v1.4.0 to start a new libbpf version cycle.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231123000439.12025-1-eddyz87@gmail.com
2024-01-04 19:15:17 -05:00
Jakub Kicinski
eb40a93a10 tools: ynl: add sample for getting page-pool information
Regenerate the tools/ code after netdev spec changes.

Add sample to query page-pool info in a concise fashion:

$ ./page-pool
    eth0[2]	page pools: 10 (zombies: 0)
		refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
		recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)

Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-04 19:15:17 -05:00
Eduard Zingerman
1baa3e2355 ci: move /dev/kvm permissions setup from to actions/vmtest.yml
The vmtest action is used by several workflows: test, pahole, ondemand.
At the same time, vmtest action requires valid access rights to /dev/kvm
and is the only action that uses it.
This commit moves /dev/kvm permissions setup from test workflow to
vmtest action, in order to make sure that setup logic is shared by all
workflows that run vmtest.
Should fix CI failures like [1].

[1] https://github.com/libbpf/libbpf/actions/runs/7104762048/job/19340484589

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2023-12-13 15:50:08 -05:00
Andrii Nakryiko
1b2ae67c1d ci: custom patch to patch out BPF_F_TEST_REG_INVARIANTS flag
Without needing to modify tons of BPF selftests file, make sure we don't
pass BPF_F_TEST_REG_INVARIANTS to kernel, to make BPF selftests work on
4.9 and 5.5 kernels.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-12-05 12:51:08 -05:00
Andrii Nakryiko
20c0a9e3d7 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   155addf0814a92d08fce26a11b27e3315cdba977
Checkpoint bpf-next commit: 750011e239a50873251c16207b0fe78eabf8577e
Baseline bpf commit:        83b9dda8afa4e968d9cce253f390b01c0612a2a5
Checkpoint bpf commit:      bc4fbf022c68967cb49b2b820b465cf90de974b8

Andrii Nakryiko (2):
  bpf: add register bounds sanity checks and sanitization
  bpf: rename BPF_F_TEST_SANITY_STRICT to BPF_F_TEST_REG_INVARIANTS

Jordan Rome (1):
  bpf: Add crosstask check to __bpf_get_stack

 include/uapi/linux/bpf.h | 6 ++++++
 1 file changed, 6 insertions(+)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-11-22 16:20:56 -05:00
Andrii Nakryiko
b88b3ac09d sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-11-22 16:20:56 -05:00
Andrii Nakryiko
96ed1c508f bpf: rename BPF_F_TEST_SANITY_STRICT to BPF_F_TEST_REG_INVARIANTS
Rename verifier internal flag BPF_F_TEST_SANITY_STRICT to more neutral
BPF_F_TEST_REG_INVARIANTS. This is a follow up to [0].

A few selftests and veristat need to be adjusted in the same patch as
well.

  [0] https://patchwork.kernel.org/project/netdevbpf/patch/20231112010609.848406-5-andrii@kernel.org/

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231117171404.225508-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-22 16:20:56 -05:00
Andrii Nakryiko
7ccc41c138 bpf: add register bounds sanity checks and sanitization
Add simple sanity checks that validate well-formed ranges (min <= max)
across u64, s64, u32, and s32 ranges. Also for cases when the value is
constant (either 64-bit or 32-bit), we validate that ranges and tnums
are in agreement.

These bounds checks are performed at the end of BPF_ALU/BPF_ALU64
operations, on conditional jumps, and for LDX instructions (where subreg
zero/sign extension is probably the most important to check). This
covers most of the interesting cases.

Also, we validate the sanity of the return register when manually
adjusting it for some special helpers.

By default, sanity violation will trigger a warning in verifier log and
resetting register bounds to "unbounded" ones. But to aid development
and debugging, BPF_F_TEST_SANITY_STRICT flag is added, which will
trigger hard failure of verification with -EFAULT on register bounds
violations. This allows selftests to catch such issues. veristat will
also gain a CLI option to enable this behavior.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Link: https://lore.kernel.org/r/20231112010609.848406-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-22 16:20:56 -05:00
Jordan Rome
785a079966 bpf: Add crosstask check to __bpf_get_stack
Currently get_perf_callchain only supports user stack walking for
the current task. Passing the correct *crosstask* param will return
0 frames if the task passed to __bpf_get_stack isn't the current
one instead of a single incorrect frame/address. This change
passes the correct *crosstask* param but also does a preemptive
check in __bpf_get_stack if the task is current and returns
-EOPNOTSUPP if it is not.

This issue was found using bpf_get_task_stack inside a BPF
iterator ("iter/task"), which iterates over all tasks.
bpf_get_task_stack works fine for fetching kernel stacks
but because get_perf_callchain relies on the caller to know
if the requested *task* is the current one (via *crosstask*)
it was failing in a confusing way.

It might be possible to get user stacks for all tasks utilizing
something like access_process_vm but that requires the bpf
program calling bpf_get_task_stack to be sleepable and would
therefore be a breaking change.

Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()")
Signed-off-by: Jordan Rome <jordalgo@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231108112334.3433136-1-jordalgo@meta.com
2023-11-22 16:20:56 -05:00
Eduard Zingerman
a6b990991c ci: disable sockopt selftest for 5.5 kernel
The following 'sockopt' selftests fail on libbpf CI for kernel 5.5:
- sockopt/getsockopt: read ctx->optlen:FAIL
- sockopt/getsockopt: support smaller ctx->optlen:FAIL
- sockopt/setsockopt: read ctx->level:FAIL
- sockopt/setsockopt: read ctx->optname:FAIL
- sockopt/setsockopt: read ctx->optlen:FAIL
- sockopt/setsockopt: ctx->optlen == -1 is ok:FAIL

Examples of failing CI runs:
- https://github.com/libbpf/libbpf/actions/runs/6961182067
- https://github.com/libbpf/libbpf/actions/runs/6961088131

The failures are strange as all tests were added quite a while ago
(Jun 27 2019) by commit:

  9ec8a4c9489d ("selftests/bpf: add sockopt test")

But seem to be unrelated to libbpf.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2023-11-22 16:20:43 -05:00
Eduard Zingerman
4161e1f41d ci: disable a number of selftest causing CI for LATEST kernel
All tests disabled in this commit pass on main kernel CI and fail or
flip/flop on libbpf CI. Failures do not seem to be related to libbpf.
It appears that common theme for all failing tests is that hardware
perf events are not delivered as expected on github CI worker
machines.

Examples of failed CI runs:
- https://github.com/libbpf/libbpf/actions/runs/6961182067
- https://github.com/libbpf/libbpf/actions/runs/6961088131

Fails with the following log:

  test_send_signal_common:FAIL:incorrect result \
    unexpected incorrect result: actual 48 != expected 50

Test mode of operation:
- fork'
- child:
  - install handler for SIGUSR1;
  - send ready message to parent;
  - wait for SIGUSR1 in busy loop;
  - send message '2' (50) to parent if SIGUSR1 occured;
  - send message '0' (48) to parent if no SIGUSR1 occured.
- parent:
  - wait for ready message from child;
  - install perf_event or tracepoint bpf program that uses
    bpf_send_signal() to send SIGUSR1;
  - wait for message '0' or '2' from child, '2' is expected for test
    success.

It appears that perf event that should be triggered by parent never
happens, thus message 48 is received by parent and test fails.

Fails with the following log:

  test_and_reset_skel:FAIL:found_vm_exec \
    unexpected found_vm_exec: actual 0 != expected 1

Such log is printed if variables set from BPF program are not set
after some timeout. The program that should set the variable is
SEC("perf_event") int handle_pe(void), it appears that it is never run.

Fails with the following log:

  pe_subtest:FAIL:pe_res1 unexpected pe_res1: actual 0 != expected 1048576

Variable pe_res1 should be triggered by program
SEC("perf_event") int handle_pe(struct pt_regs *ctx),
it appears that it is never run.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2023-11-22 16:20:43 -05:00
Eduard Zingerman
93f360cf4b ci: don't set /dev/kvm permissions when CI user is root
s390 tests are executed on selfhosted runner using root user,
avoid setting /dev/kvm permissions in such case.
This should fix CI failures like [0].
(Still necessary for x86 tests executed on standard github runners).

[0] https://github.com/libbpf/libbpf/actions/runs/6898545987/job/18768732980?pr=752

Fixes: 168630f852 ("ci: give /dev/kvm 0666 permissions inside CI runner")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2023-11-17 15:36:52 -05:00
Eduard Zingerman
5ff0102329 ci: use config.vm for kernel config when present
Recent kernel commit [0] changed selftests config snippets structure
by extracting VM specific options to the file 'config.vm'. This file
has to be used in .github/actions/vmtest/action.yml at step
'Prepare to build BPF selftests', otherwise drivers necessary for e.g.
root file system access are not compiled into the kernel, leading to
CI failures like [1].

[0] b0cf0dcde8ca ("selftests/bpf: Consolidate VIRTIO/9P configs in config.vm file")
[1] https://github.com/libbpf/libbpf/actions/runs/6830439839/job/18578379328?pr=747

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2023-11-16 20:25:07 -05:00
Andrii Nakryiko
0c54691bae ci: apply temporary patch to make bpf-next build
Apply fe69a1b1b6ed ("selftests: bpf: xskxceiver: ksft_print_msg: fix
format type error") to make bpf-next build.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-11-13 21:51:02 -05:00
Eduard Zingerman
168630f852 ci: give /dev/kvm 0666 permissions inside CI runner
Starting recently libbpf CI runs started failing with the following
error:

    ##[group]vm_init - Starting virtual machine...
    Starting VM with 4 CPUs...
    INFO: /dev/kvm exists
    KVM acceleration can be used
    Could not access KVM kernel module: Permission denied
    qemu-system-x86_64: failed to initialize KVM: Permission denied
    ##[error]Process completed with exit code 2.

E.g. see here [0]. The error happens because CI user has not enough
rights to access /dev/kvm. On a regular machine the solution would be
to add user to group 'kvm', however that would require a re-login,
which is cumbersome to achieve in CI setting.
Instead, use a recipe described in [1] to make udev set 0666 access
permissions for /dev/kvm.

[0] https://github.com/libbpf/libbpf/actions/runs/6819530119/job/18547589967?pr=746
[1] https://stackoverflow.com/questions/37300811/android-studio-dev-kvm-device-permission-denied/61984745#61984745

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2023-11-13 18:21:02 -08:00
Eduard Zingerman
5d4237d52d ci: regenerate vmlinux.h
Regenerate latest vmlinux.h for old kernel CI tests.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2023-11-13 18:21:02 -08:00
Eduard Zingerman
fa0e866373 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   0e133a13370389d3894891eafe54fec2c44ad735
Checkpoint bpf-next commit: e80742d917492f10926b46b0caca050c6c9231d6
Baseline bpf commit:        8f8abb863fa5a4cc18955c6a0e17af0ded3e4a76
Checkpoint bpf commit:      83b9dda8afa4e968d9cce253f390b01c0612a2a5

Daniel Borkmann (3):
  netkit, bpf: Add bpf programmable net device
  tools: Sync if_link uapi header
  libbpf: Add link-based API for netkit

Yonghong Song (2):
  libbpf: Fix potential uninitialized tail padding with
    LIBBPF_OPTS_RESET
  bpf: Use named fields for certain bpf uapi structs

 include/uapi/linux/bpf.h     |  37 +++++----
 include/uapi/linux/if_link.h | 141 +++++++++++++++++++++++++++++++++++
 src/bpf.c                    |  16 ++++
 src/bpf.h                    |   5 ++
 src/libbpf.c                 |  39 ++++++++++
 src/libbpf.h                 |  15 ++++
 src/libbpf.map               |   1 +
 src/libbpf_common.h          |  13 ++--
 8 files changed, 246 insertions(+), 21 deletions(-)

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2023-11-13 18:21:02 -08:00
Yonghong Song
0fa5ff4f54 bpf: Use named fields for certain bpf uapi structs
Martin and Vadim reported a verifier failure with bpf_dynptr usage.
The issue is mentioned but Vadim workarounded the issue with source
change ([1]). The below describes what is the issue and why there
is a verification failure.

  int BPF_PROG(skb_crypto_setup) {
    struct bpf_dynptr algo, key;
    ...

    bpf_dynptr_from_mem(..., ..., 0, &algo);
    ...
  }

The bpf program is using vmlinux.h, so we have the following definition in
vmlinux.h:
  struct bpf_dynptr {
        long: 64;
        long: 64;
  };
Note that in uapi header bpf.h, we have
  struct bpf_dynptr {
        long: 64;
        long: 64;
} __attribute__((aligned(8)));

So we lost alignment information for struct bpf_dynptr by using vmlinux.h.
Let us take a look at a simple program below:
  $ cat align.c
  typedef unsigned long long __u64;
  struct bpf_dynptr_no_align {
        __u64 :64;
        __u64 :64;
  };
  struct bpf_dynptr_yes_align {
        __u64 :64;
        __u64 :64;
  } __attribute__((aligned(8)));

  void bar(void *, void *);
  int foo() {
    struct bpf_dynptr_no_align a;
    struct bpf_dynptr_yes_align b;
    bar(&a, &b);
    return 0;
  }
  $ clang --target=bpf -O2 -S -emit-llvm align.c

Look at the generated IR file align.ll:
  ...
  %a = alloca %struct.bpf_dynptr_no_align, align 1
  %b = alloca %struct.bpf_dynptr_yes_align, align 8
  ...

The compiler dictates the alignment for struct bpf_dynptr_no_align is 1 and
the alignment for struct bpf_dynptr_yes_align is 8. So theoretically compiler
could allocate variable %a with alignment 1 although in reallity the compiler
may choose a different alignment by considering other local variables.

In [1], the verification failure happens because variable 'algo' is allocated
on the stack with alignment 4 (fp-28). But the verifer wants its alignment
to be 8.

To fix the issue, the RFC patch ([1]) tried to add '__attribute__((aligned(8)))'
to struct bpf_dynptr plus other similar structs. Andrii suggested that
we could directly modify uapi struct with named fields like struct 'bpf_iter_num':
  struct bpf_iter_num {
        /* opaque iterator state; having __u64 here allows to preserve correct
         * alignment requirements in vmlinux.h, generated from BTF
         */
        __u64 __opaque[1];
  } __attribute__((aligned(8)));

Indeed, adding named fields for those affected structs in this patch can preserve
alignment when bpf program references them in vmlinux.h. With this patch,
the verification failure in [1] can also be resolved.

  [1] https://lore.kernel.org/bpf/1b100f73-7625-4c1f-3ae5-50ecf84d3ff0@linux.dev/
  [2] https://lore.kernel.org/bpf/20231103055218.2395034-1-yonghong.song@linux.dev/

Cc: Vadim Fedorenko <vadfed@meta.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231104024900.1539182-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-10 13:27:01 -08:00
Yonghong Song
2d5df9f626 libbpf: Fix potential uninitialized tail padding with LIBBPF_OPTS_RESET
Martin reported that there is a libbpf complaining of non-zero-value tail
padding with LIBBPF_OPTS_RESET macro if struct bpf_netkit_opts is modified
to have a 4-byte tail padding. This only happens to clang compiler.
The commend line is: ./test_progs -t tc_netkit_multi_links
Martin and I did some investigation and found this indeed the case and
the following are the investigation details.

Clang:
  clang version 18.0.0
  <I tried clang15/16/17 and they all have similar results>

tools/lib/bpf/libbpf_common.h:
  #define LIBBPF_OPTS_RESET(NAME, ...)                                      \
        do {                                                                \
                memset(&NAME, 0, sizeof(NAME));                             \
                NAME = (typeof(NAME)) {                                     \
                        .sz = sizeof(NAME),                                 \
                        __VA_ARGS__                                         \
                };                                                          \
        } while (0)

  #endif

tools/lib/bpf/libbpf.h:
  struct bpf_netkit_opts {
        /* size of this struct, for forward/backward compatibility */
        size_t sz;
        __u32 flags;
        __u32 relative_fd;
        __u32 relative_id;
        __u64 expected_revision;
        size_t :0;
  };
  #define bpf_netkit_opts__last_field expected_revision
In the above struct bpf_netkit_opts, there is no tail padding.

prog_tests/tc_netkit.c:
  static void serial_test_tc_netkit_multi_links_target(int mode, int target)
  {
        ...
        LIBBPF_OPTS(bpf_netkit_opts, optl);
        ...
        LIBBPF_OPTS_RESET(optl,
                .flags = BPF_F_BEFORE,
                .relative_fd = bpf_program__fd(skel->progs.tc1),
        );
        ...
  }

Let us make the following source change, note that we have a 4-byte
tailing padding now.
  diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
  index 6cd9c501624f..0dd83910ae9a 100644
  --- a/tools/lib/bpf/libbpf.h
  +++ b/tools/lib/bpf/libbpf.h
  @@ -803,13 +803,13 @@ bpf_program__attach_tcx(const struct bpf_program *prog, int ifindex,
   struct bpf_netkit_opts {
        /* size of this struct, for forward/backward compatibility */
        size_t sz;
  -       __u32 flags;
        __u32 relative_fd;
        __u32 relative_id;
        __u64 expected_revision;
  +       __u32 flags;
        size_t :0;
   };
  -#define bpf_netkit_opts__last_field expected_revision
  +#define bpf_netkit_opts__last_field flags

The clang 18 generated asm code looks like below:
    ;       LIBBPF_OPTS_RESET(optl,
    55e3: 48 8d 7d 98                   leaq    -0x68(%rbp), %rdi
    55e7: 31 f6                         xorl    %esi, %esi
    55e9: ba 20 00 00 00                movl    $0x20, %edx
    55ee: e8 00 00 00 00                callq   0x55f3 <serial_test_tc_netkit_multi_links_target+0x18d3>
    55f3: 48 c7 85 10 fd ff ff 20 00 00 00      movq    $0x20, -0x2f0(%rbp)
    55fe: 48 8b 85 68 ff ff ff          movq    -0x98(%rbp), %rax
    5605: 48 8b 78 18                   movq    0x18(%rax), %rdi
    5609: e8 00 00 00 00                callq   0x560e <serial_test_tc_netkit_multi_links_target+0x18ee>
    560e: 89 85 18 fd ff ff             movl    %eax, -0x2e8(%rbp)
    5614: c7 85 1c fd ff ff 00 00 00 00 movl    $0x0, -0x2e4(%rbp)
    561e: 48 c7 85 20 fd ff ff 00 00 00 00      movq    $0x0, -0x2e0(%rbp)
    5629: c7 85 28 fd ff ff 08 00 00 00 movl    $0x8, -0x2d8(%rbp)
    5633: 48 8b 85 10 fd ff ff          movq    -0x2f0(%rbp), %rax
    563a: 48 89 45 98                   movq    %rax, -0x68(%rbp)
    563e: 48 8b 85 18 fd ff ff          movq    -0x2e8(%rbp), %rax
    5645: 48 89 45 a0                   movq    %rax, -0x60(%rbp)
    5649: 48 8b 85 20 fd ff ff          movq    -0x2e0(%rbp), %rax
    5650: 48 89 45 a8                   movq    %rax, -0x58(%rbp)
    5654: 48 8b 85 28 fd ff ff          movq    -0x2d8(%rbp), %rax
    565b: 48 89 45 b0                   movq    %rax, -0x50(%rbp)
    ;       link = bpf_program__attach_netkit(skel->progs.tc2, ifindex, &optl);

At -O0 level, the clang compiler creates an intermediate copy.
We have below to store 'flags' with 4-byte store and leave another 4 byte
in the same 8-byte-aligned storage undefined,
    5629: c7 85 28 fd ff ff 08 00 00 00 movl    $0x8, -0x2d8(%rbp)
and later we store 8-byte to the original zero'ed buffer
    5654: 48 8b 85 28 fd ff ff          movq    -0x2d8(%rbp), %rax
    565b: 48 89 45 b0                   movq    %rax, -0x50(%rbp)

This caused a problem as the 4-byte value at [%rbp-0x2dc, %rbp-0x2e0)
may be garbage.

gcc (gcc 11.4) does not have this issue as it does zeroing struct first before
doing assignments:
  ;       LIBBPF_OPTS_RESET(optl,
    50fd: 48 8d 85 40 fc ff ff          leaq    -0x3c0(%rbp), %rax
    5104: ba 20 00 00 00                movl    $0x20, %edx
    5109: be 00 00 00 00                movl    $0x0, %esi
    510e: 48 89 c7                      movq    %rax, %rdi
    5111: e8 00 00 00 00                callq   0x5116 <serial_test_tc_netkit_multi_links_target+0x1522>
    5116: 48 8b 45 f0                   movq    -0x10(%rbp), %rax
    511a: 48 8b 40 18                   movq    0x18(%rax), %rax
    511e: 48 89 c7                      movq    %rax, %rdi
    5121: e8 00 00 00 00                callq   0x5126 <serial_test_tc_netkit_multi_links_target+0x1532>
    5126: 48 c7 85 40 fc ff ff 00 00 00 00      movq    $0x0, -0x3c0(%rbp)
    5131: 48 c7 85 48 fc ff ff 00 00 00 00      movq    $0x0, -0x3b8(%rbp)
    513c: 48 c7 85 50 fc ff ff 00 00 00 00      movq    $0x0, -0x3b0(%rbp)
    5147: 48 c7 85 58 fc ff ff 00 00 00 00      movq    $0x0, -0x3a8(%rbp)
    5152: 48 c7 85 40 fc ff ff 20 00 00 00      movq    $0x20, -0x3c0(%rbp)
    515d: 89 85 48 fc ff ff             movl    %eax, -0x3b8(%rbp)
    5163: c7 85 58 fc ff ff 08 00 00 00 movl    $0x8, -0x3a8(%rbp)
  ;       link = bpf_program__attach_netkit(skel->progs.tc2, ifindex, &optl);

It is not clear how to resolve the compiler code generation as the compiler
generates correct code w.r.t. how to handle unnamed padding in C standard.
So this patch changed LIBBPF_OPTS_RESET macro to avoid uninitialized tail
padding. We already knows LIBBPF_OPTS macro works on both gcc and clang,
even with tail padding. So LIBBPF_OPTS_RESET is changed to be a
LIBBPF_OPTS followed by a memcpy(), thus avoiding uninitialized tail padding.

The below is asm code generated with this patch and with clang compiler:
    ;       LIBBPF_OPTS_RESET(optl,
    55e3: 48 8d bd 10 fd ff ff          leaq    -0x2f0(%rbp), %rdi
    55ea: 31 f6                         xorl    %esi, %esi
    55ec: ba 20 00 00 00                movl    $0x20, %edx
    55f1: e8 00 00 00 00                callq   0x55f6 <serial_test_tc_netkit_multi_links_target+0x18d6>
    55f6: 48 c7 85 10 fd ff ff 20 00 00 00      movq    $0x20, -0x2f0(%rbp)
    5601: 48 8b 85 68 ff ff ff          movq    -0x98(%rbp), %rax
    5608: 48 8b 78 18                   movq    0x18(%rax), %rdi
    560c: e8 00 00 00 00                callq   0x5611 <serial_test_tc_netkit_multi_links_target+0x18f1>
    5611: 89 85 18 fd ff ff             movl    %eax, -0x2e8(%rbp)
    5617: c7 85 1c fd ff ff 00 00 00 00 movl    $0x0, -0x2e4(%rbp)
    5621: 48 c7 85 20 fd ff ff 00 00 00 00      movq    $0x0, -0x2e0(%rbp)
    562c: c7 85 28 fd ff ff 08 00 00 00 movl    $0x8, -0x2d8(%rbp)
    5636: 48 8b 85 10 fd ff ff          movq    -0x2f0(%rbp), %rax
    563d: 48 89 45 98                   movq    %rax, -0x68(%rbp)
    5641: 48 8b 85 18 fd ff ff          movq    -0x2e8(%rbp), %rax
    5648: 48 89 45 a0                   movq    %rax, -0x60(%rbp)
    564c: 48 8b 85 20 fd ff ff          movq    -0x2e0(%rbp), %rax
    5653: 48 89 45 a8                   movq    %rax, -0x58(%rbp)
    5657: 48 8b 85 28 fd ff ff          movq    -0x2d8(%rbp), %rax
    565e: 48 89 45 b0                   movq    %rax, -0x50(%rbp)
    ;       link = bpf_program__attach_netkit(skel->progs.tc2, ifindex, &optl);

In the above code, a temporary buffer is zeroed and then has proper value assigned.
Finally, values in temporary buffer are copied to the original variable buffer,
hence tail padding is guaranteed to be 0.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20231107201511.2548645-1-yonghong.song@linux.dev
2023-11-10 13:27:01 -08:00
Daniel Borkmann
2cb0236318 libbpf: Add link-based API for netkit
This adds bpf_program__attach_netkit() API to libbpf. Overall it is very
similar to tcx. The API looks as following:

  LIBBPF_API struct bpf_link *
  bpf_program__attach_netkit(const struct bpf_program *prog, int ifindex,
                             const struct bpf_netkit_opts *opts);

The struct bpf_netkit_opts is done in similar way as struct bpf_tcx_opts
for supporting bpf_mprog control parameters. The attach location for the
primary and peer device is derived from the program section "netkit/primary"
and "netkit/peer", respectively.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231024214904.29825-4-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-11-10 13:27:01 -08:00
Daniel Borkmann
cc7f085286 tools: Sync if_link uapi header
Sync if_link uapi header to the latest version as we need the refresher
in tooling for netkit device. Given it's been a while since the last sync
and the diff is fairly big, it has been done as its own commit.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231024214904.29825-3-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-11-10 13:27:01 -08:00
Daniel Borkmann
62b1e4905b netkit, bpf: Add bpf programmable net device
This work adds a new, minimal BPF-programmable device called "netkit"
(former PoC code-name "meta") we recently presented at LSF/MM/BPF. The
core idea is that BPF programs are executed within the drivers xmit routine
and therefore e.g. in case of containers/Pods moving BPF processing closer
to the source.

One of the goals was that in case of Pod egress traffic, this allows to
move BPF programs from hostns tcx ingress into the device itself, providing
earlier drop or forward mechanisms, for example, if the BPF program
determines that the skb must be sent out of the node, then a redirect to
the physical device can take place directly without going through per-CPU
backlog queue. This helps to shift processing for such traffic from softirq
to process context, leading to better scheduling decisions/performance (see
measurements in the slides).

In this initial version, the netkit device ships as a pair, but we plan to
extend this further so it can also operate in single device mode. The pair
comes with a primary and a peer device. Only the primary device, typically
residing in hostns, can manage BPF programs for itself and its peer. The
peer device is designated for containers/Pods and cannot attach/detach
BPF programs. Upon the device creation, the user can set the default policy
to 'pass' or 'drop' for the case when no BPF program is attached.

Additionally, the device can be operated in L3 (default) or L2 mode. The
management of BPF programs is done via bpf_mprog, so that multi-attach is
supported right from the beginning with similar API and dependency controls
as tcx. For details on the latter see commit 053c8e1f235d ("bpf: Add generic
attach/detach/query API for multi-progs"). tc BPF compatibility is provided,
so that existing programs can be easily migrated.

Going forward, we plan to use netkit devices in Cilium as the main device
type for connecting Pods. They will be operated in L3 mode in order to
simplify a Pod's neighbor management and the peer will operate in default
drop mode, so that no traffic is leaving between the time when a Pod is
brought up by the CNI plugin and programs attached by the agent.
Additionally, the programs we attach via tcx on the physical devices are
using bpf_redirect_peer() for inbound traffic into netkit device, hence the
latter is also supporting the ndo_get_peer_dev callback. Similarly, we use
bpf_redirect_neigh() for the way out, pushing from netkit peer to phys device
directly. Also, BIG TCP is supported on netkit device. For the follow-up
work in single device mode, we plan to convert Cilium's cilium_host/_net
devices into a single one.

An extensive test suite for checking device operations and the BPF program
and link management API comes as BPF selftests in this series.

Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://github.com/borkmann/iproute2/tree/pr/netkit
Link: http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf (24ff.)
Link: https://lore.kernel.org/r/20231024214904.29825-2-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-11-10 13:27:01 -08:00
Andrii Nakryiko
3189f70538 docs: attempt to fix .readthedocs.yaml
Seems like we need to update the config ([0],[1]).

  [0] https://blog.readthedocs.com/migrate-configuration-v2/
  [1] https://blog.readthedocs.com/use-build-os-config/

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-10-27 14:07:51 -07:00
Yonghong Song
6a5776066c sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   2147c8d07e1abc8dfc3433ca18eed5295e230ede
Checkpoint bpf-next commit: 0e133a13370389d3894891eafe54fec2c44ad735
Baseline bpf commit:        9ff8d2717fc8f63e5cb226ddbda20649eefa2728
Checkpoint bpf commit:      9ff8d2717fc8f63e5cb226ddbda20649eefa2728

Alexandre Ghiti (1):
  libbpf: Fix syscall access arguments on riscv

Andrii Nakryiko (1):
  libbpf: Don't assume SHT_GNU_verdef presence for SHT_GNU_versym
    section

Daan De Meyer (3):
  bpf: Implement cgroup sockaddr hooks for unix sockets
  libbpf: Add support for cgroup unix socket address hooks
  documentation/bpf: Document cgroup unix socket address hooks

David Vernet (1):
  bpf: Add ability to pin bpf timer to calling CPU

Martynas Pumputis (1):
  bpf: Derive source IP addr via bpf_*_fib_lookup()

 docs/program_types.rst   | 10 ++++++++++
 include/uapi/linux/bpf.h | 27 +++++++++++++++++++++++----
 src/bpf_tracing.h        |  2 --
 src/elf.c                | 16 ++++++++++------
 src/libbpf.c             | 10 ++++++++++
 5 files changed, 53 insertions(+), 12 deletions(-)

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
2023-10-26 09:00:01 -07:00
Yonghong Song
acecaf855d sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
2023-10-19 11:36:22 -07:00
Andrii Nakryiko
365cefa149 libbpf: Don't assume SHT_GNU_verdef presence for SHT_GNU_versym section
Fix too eager assumption that SHT_GNU_verdef ELF section is going to be
present whenever binary has SHT_GNU_versym section. It seems like either
SHT_GNU_verdef or SHT_GNU_verneed can be used, so failing on missing
SHT_GNU_verdef actually breaks use cases in production.

One specific reported issue, which was used to manually test this fix,
was trying to attach to `readline` function in BASH binary.

Fixes: bb7fa09399b9 ("libbpf: Support symbol versioning for uprobe")
Reported-by: Liam Wisehart <liamwisehart@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Manu Bretelle <chantr4@gmail.com>
Reviewed-by: Fangrui Song <maskray@google.com>
Acked-by: Hengqi Chen <hengqi.chen@gmail.com>
Link: https://lore.kernel.org/bpf/20231016182840.4033346-1-andrii@kernel.org
2023-10-19 11:36:22 -07:00
Daan De Meyer
f4b6dcfca1 documentation/bpf: Document cgroup unix socket address hooks
Update the documentation to mention the new cgroup unix sockaddr
hooks.

Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Link: https://lore.kernel.org/r/20231011185113.140426-8-daan.j.demeyer@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-19 11:36:22 -07:00
Daan De Meyer
748787456b libbpf: Add support for cgroup unix socket address hooks
Add the necessary plumbing to hook up the new cgroup unix sockaddr
hooks into libbpf.

Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Link: https://lore.kernel.org/r/20231011185113.140426-6-daan.j.demeyer@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-19 11:36:22 -07:00
Daan De Meyer
8a08d63f29 bpf: Implement cgroup sockaddr hooks for unix sockets
These hooks allows intercepting connect(), getsockname(),
getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
socket hooks get write access to the address length because the
address length is not fixed when dealing with unix sockets and
needs to be modified when a unix socket address is modified by
the hook. Because abstract socket unix addresses start with a
NUL byte, we cannot recalculate the socket address in kernelspace
after running the hook by calculating the length of the unix socket
path using strlen().

These hooks can be used when users want to multiplex syscall to a
single unix socket to multiple different processes behind the scenes
by redirecting the connect() and other syscalls to process specific
sockets.

We do not implement support for intercepting bind() because when
using bind() with unix sockets with a pathname address, this creates
an inode in the filesystem which must be cleaned up. If we rewrite
the address, the user might try to clean up the wrong file, leaking
the socket in the filesystem where it is never cleaned up. Until we
figure out a solution for this (and a use case for intercepting bind()),
we opt to not allow rewriting the sockaddr in bind() calls.

We also implement recvmsg() support for connected streams so that
after a connect() that is modified by a sockaddr hook, any corresponding
recmvsg() on the connected socket can also be modified to make the
connected program think it is connected to the "intended" remote.

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Link: https://lore.kernel.org/r/20231011185113.140426-5-daan.j.demeyer@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-19 11:36:22 -07:00
Martynas Pumputis
c9f8eb5310 bpf: Derive source IP addr via bpf_*_fib_lookup()
Extend the bpf_fib_lookup() helper by making it to return the source
IPv4/IPv6 address if the BPF_FIB_LOOKUP_SRC flag is set.

For example, the following snippet can be used to derive the desired
source IP address:

    struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr };

    ret = bpf_skb_fib_lookup(skb, p, sizeof(p),
            BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_SKIP_NEIGH);
    if (ret != BPF_FIB_LKUP_RET_SUCCESS)
        return TC_ACT_SHOT;

    /* the p.ipv4_src now contains the source address */

The inability to derive the proper source address may cause malfunctions
in BPF-based dataplanes for hosts containing netdevs with more than one
routable IP address or for multi-homed hosts.

For example, Cilium implements packet masquerading in BPF. If an
egressing netdev to which the Cilium's BPF prog is attached has
multiple IP addresses, then only one [hardcoded] IP address can be used for
masquerading. This breaks connectivity if any other IP address should have
been selected instead, for example, when a public and private addresses
are attached to the same egress interface.

The change was tested with Cilium [1].

Nikolay Aleksandrov helped to figure out the IPv6 addr selection.

[1]: https://github.com/cilium/cilium/pull/28283

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Link: https://lore.kernel.org/r/20231007081415.33502-2-m@lambda.lt
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-19 11:36:22 -07:00
David Vernet
1c0358823c bpf: Add ability to pin bpf timer to calling CPU
BPF supports creating high resolution timers using bpf_timer_* helper
functions. Currently, only the BPF_F_TIMER_ABS flag is supported, which
specifies that the timeout should be interpreted as absolute time. It
would also be useful to be able to pin that timer to a core. For
example, if you wanted to make a subset of cores run without timer
interrupts, and only have the timer be invoked on a single core.

This patch adds support for this with a new BPF_F_TIMER_CPU_PIN flag.
When specified, the HRTIMER_MODE_PINNED flag is passed to
hrtimer_start(). A subsequent patch will update selftests to validate.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20231004162339.200702-2-void@manifault.com
2023-10-19 11:36:22 -07:00
Alexandre Ghiti
20c1170ea4 libbpf: Fix syscall access arguments on riscv
Since commit 08d0ce30e0e4 ("riscv: Implement syscall wrappers"), riscv
selects ARCH_HAS_SYSCALL_WRAPPER so let's use the generic implementation
of PT_REGS_SYSCALL_REGS().

Fixes: 08d0ce30e0e4 ("riscv: Implement syscall wrappers")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/bpf/20231004110905.49024-2-bjorn@kernel.org
2023-10-19 11:36:22 -07:00
Yonghong Song
b44eb3a8fa libbpf: fix bpf-checkpoint-commit
The previous sync bpf-checkpoint-commit becomes invalid
due to upstream bpf tree force-push. This patch picked
a new valid commit as the bpf-checkpoint-commit so
the sync script can work with newer changes.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
2023-10-19 11:36:22 -07:00
Yonghong Song
14648264b1 ci: Regenerate latest vmlinux.h for old kernel CI testts
Without the change, we will have failures like below:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h' differs from latest version at 'include/uapi/linux/if_xdp.h'
      progs/getsockname_unix_prog.c:27:15: error: no member named 'uaddrlen' in 'struct bpf_sock_addr_kern'
              if (sa_kern->uaddrlen != unaddrlen)
                  ~~~~~~~  ^
      1 error generated.
      make: *** [Makefile:605: /home/runner/work/libbpf/libbpf/.kernel/tools/testing/selftests/bpf/getsockname_unix_prog.bpf.o] Error 1
      make: *** Waiting for unfinished jobs....
      Error: Process completed with exit code 2.

    in Kernel 5.5.0 on ubuntu-20.04 + selftests

    Manu Bretelle kindly helped regenerate the vmlinux.h from latest
    bpf-next kernel for me.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
2023-10-19 11:36:22 -07:00
Song Liu
e26b84dc33 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   45ee73a0722b9e1d0b7a524d06756291b13b5912
Checkpoint bpf-next commit: 2147c8d07e1abc8dfc3433ca18eed5295e230ede
Baseline bpf commit:        57eb5e1c5c57972c95e8efab6bc81b87161b0b07
Checkpoint bpf commit:      4cb893e89221be9c791e43cab6a8e937cd57e17f

Hengqi Chen (3):
  libbpf: Resolve symbol conflicts at the same offset for uprobe
  libbpf: Support symbol versioning for uprobe
  libbpf: Allow Golang symbols in uprobe secdef

Jiri Olsa (2):
  bpf: Add missed value to kprobe_multi link info
  bpf: Add missed value to kprobe perf link info

Kumar Kartikeya Dwivedi (2):
  libbpf: Refactor bpf_object__reloc_code
  libbpf: Add support for custom exception callbacks

Martin Kelly (8):
  libbpf: Refactor cleanup in ring_buffer__add
  libbpf: Switch rings to array of pointers
  libbpf: Add ring_buffer__ring
  libbpf: Add ring__producer_pos, ring__consumer_pos
  libbpf: Add ring__avail_data_size
  libbpf: Add ring__size
  libbpf: Add ring__map_fd
  libbpf: Add ring__consume

 include/uapi/linux/bpf.h |   2 +
 src/elf.c                | 139 ++++++++++++++++++++++++++---
 src/libbpf.c             | 188 ++++++++++++++++++++++++++++++++-------
 src/libbpf.h             |  73 +++++++++++++++
 src/libbpf.map           |   7 ++
 src/ringbuf.c            |  85 +++++++++++++++---
 6 files changed, 439 insertions(+), 55 deletions(-)

Signed-off-by: Song Liu <song@kernel.org>
2023-10-02 11:17:48 -07:00
Hengqi Chen
9a3a2e9303 libbpf: Allow Golang symbols in uprobe secdef
Golang symbols in ELF files are different from C/C++
which contains special characters like '*', '(' and ')'.
With generics, things get more complicated, there are
symbols like:

  github.com/cilium/ebpf/internal.(*Deque[go.shape.interface { Format(fmt.State, int32); TypeName() string;github.com/cilium/ebpf/btf.copy() github.com/cilium/ebpf/btf.Type}]).Grow

Matching such symbols using `%m[^\n]` in sscanf, this
excludes newline which typically does not appear in ELF
symbols. This should work in most use-cases and also
work for unicode letters in identifiers. If newline do
show up in ELF symbols, users can still attach to such
symbol by specifying bpf_uprobe_opts::func_name.

A working example can be found at this repo ([0]).

  [0]: https://github.com/chenhengqi/libbpf-go-symbols

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230929155954.92448-1-hengqi.chen@gmail.com
2023-10-02 11:17:48 -07:00
Jiri Olsa
96d70a52ad bpf: Add missed value to kprobe perf link info
Add missed value to kprobe attached through perf link info to
hold the stats of missed kprobe handler execution.

The kprobe's missed counter gets incremented when kprobe handler
is not executed due to another kprobe running on the same cpu.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230920213145.1941596-4-jolsa@kernel.org
2023-10-02 11:17:48 -07:00
Jiri Olsa
de02cb1697 bpf: Add missed value to kprobe_multi link info
Add missed value to kprobe_multi link info to hold the stats of missed
kprobe_multi probe.

The missed counter gets incremented when fprobe fails the recursion
check or there's no rethook available for return probe. In either
case the attached bpf program is not executed.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Song Liu <song@kernel.org>
Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230920213145.1941596-3-jolsa@kernel.org
2023-10-02 11:17:48 -07:00
Martin Kelly
b520bcd7d8 libbpf: Add ring__consume
Add ring__consume to consume a single ringbuffer, analogous to
ring_buffer__consume.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-14-martin.kelly@crowdstrike.com
2023-10-02 11:17:48 -07:00
Martin Kelly
6413c2d063 libbpf: Add ring__map_fd
Add ring__map_fd to get the file descriptor underlying a given
ringbuffer.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-12-martin.kelly@crowdstrike.com
2023-10-02 11:17:48 -07:00
Martin Kelly
cd3fe56c75 libbpf: Add ring__size
Add ring__size to get the total size of a given ringbuffer.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-10-martin.kelly@crowdstrike.com
2023-10-02 11:17:48 -07:00
Martin Kelly
3e675ed6ab libbpf: Add ring__avail_data_size
Add ring__avail_data_size for querying the currently available data in
the ringbuffer, similar to the BPF_RB_AVAIL_DATA flag in
bpf_ringbuf_query. This is racy during ongoing operations but is still
useful for overall information on how a ringbuffer is behaving.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-8-martin.kelly@crowdstrike.com
2023-10-02 11:17:48 -07:00
Martin Kelly
2ad16b970a libbpf: Add ring__producer_pos, ring__consumer_pos
Add APIs to get the producer and consumer position for a given
ringbuffer.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-6-martin.kelly@crowdstrike.com
2023-10-02 11:17:48 -07:00
Martin Kelly
a20576f5f2 libbpf: Add ring_buffer__ring
Add a new function ring_buffer__ring, which exposes struct ring * to the
user, representing a single ringbuffer.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-4-martin.kelly@crowdstrike.com
2023-10-02 11:17:48 -07:00
Martin Kelly
bfa471bc85 libbpf: Switch rings to array of pointers
Switch rb->rings to be an array of pointers instead of a contiguous
block. This allows for each ring pointer to be stable after
ring_buffer__add is called, which allows us to expose struct ring * to
the user without gotchas. Without this change, the realloc in
ring_buffer__add could invalidate a struct ring *, making it unsafe to
give to the user.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-3-martin.kelly@crowdstrike.com
2023-10-02 11:17:48 -07:00
Martin Kelly
64f2b4ab49 libbpf: Refactor cleanup in ring_buffer__add
Refactor the cleanup code in ring_buffer__add to use a unified err_out
label. This reduces code duplication, as well as plugging a potential
leak if mmap_sz != (__u64)(size_t)mmap_sz (currently this would miss
unmapping tmp because ringbuf_unmap_ring isn't called).

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-2-martin.kelly@crowdstrike.com
2023-10-02 11:17:48 -07:00
Hengqi Chen
cd91ca8f99 libbpf: Support symbol versioning for uprobe
In current implementation, we assume that symbol found in .dynsym section
would have a version suffix and use it to compare with symbol user supplied.
According to the spec ([0]), this assumption is incorrect, the version info
of dynamic symbols are stored in .gnu.version and .gnu.version_d sections
of ELF objects. For example:

    $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
    000000000009b1a0 T __pthread_rwlock_wrlock@GLIBC_2.2.5
    000000000009b1a0 T pthread_rwlock_wrlock@@GLIBC_2.34
    000000000009b1a0 T pthread_rwlock_wrlock@GLIBC_2.2.5

    $ readelf -W --dyn-syms /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
      706: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 __pthread_rwlock_wrlock@GLIBC_2.2.5
      2568: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@@GLIBC_2.34
      2571: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@GLIBC_2.2.5

In this case, specify pthread_rwlock_wrlock@@GLIBC_2.34 or
pthread_rwlock_wrlock@GLIBC_2.2.5 in bpf_uprobe_opts::func_name won't work.
Because the qualified name does NOT match `pthread_rwlock_wrlock` (without
version suffix) in .dynsym sections.

This commit implements the symbol versioning for dynsym and allows user to
specify symbol in the following forms:
  - func
  - func@LIB_VERSION
  - func@@LIB_VERSION

In case of symbol conflicts, error out and users should resolve it by
specifying a qualified name.

  [0]: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/symversion.html

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230918024813.237475-3-hengqi.chen@gmail.com
2023-10-02 11:17:48 -07:00
Hengqi Chen
df9cd9f69c libbpf: Resolve symbol conflicts at the same offset for uprobe
Dynamic symbols in shared library may have the same name, for example:

    $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
    000000000009b1a0 T __pthread_rwlock_wrlock@GLIBC_2.2.5
    000000000009b1a0 T pthread_rwlock_wrlock@@GLIBC_2.34
    000000000009b1a0 T pthread_rwlock_wrlock@GLIBC_2.2.5

    $ readelf -W --dyn-syms /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
     706: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 __pthread_rwlock_wrlock@GLIBC_2.2.5
    2568: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@@GLIBC_2.34
    2571: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@GLIBC_2.2.5

Currently, users can't attach a uprobe to pthread_rwlock_wrlock because
there are two symbols named pthread_rwlock_wrlock and both are global
bind. And libbpf considers it as a conflict.

Since both of them are at the same offset we could accept one of them
harmlessly. Note that we already does this in elf_resolve_syms_offsets.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230918024813.237475-2-hengqi.chen@gmail.com
2023-10-02 11:17:48 -07:00
Kumar Kartikeya Dwivedi
713d1f5a83 libbpf: Add support for custom exception callbacks
Add support to libbpf to append exception callbacks when loading a
program. The exception callback is found by discovering the declaration
tag 'exception_callback:<value>' and finding the callback in the value
of the tag.

The process is done in two steps. First, for each main program, the
bpf_object__sanitize_and_load_btf function finds and marks its
corresponding exception callback as defined by the declaration tag on
it. Second, bpf_object__reloc_code is modified to append the indicated
exception callback at the end of the instruction iteration (since
exception callback will never be appended in that loop, as it is not
directly referenced).

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230912233214.1518551-16-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-02 11:17:48 -07:00
Kumar Kartikeya Dwivedi
998213a1e3 libbpf: Refactor bpf_object__reloc_code
Refactor bpf_object__append_subprog_code out of bpf_object__reloc_code
to be able to reuse it to append subprog related code for the exception
callback to the main program.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230912233214.1518551-15-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-02 11:17:48 -07:00
Andrii Nakryiko
56069cda78 ci: denylist empty_skb temporary
The fix is in bpf tree. Needs to be merged to bpf-next, on which libbpf
CI is tested.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-09-15 15:57:14 -07:00
Andrii Nakryiko
aadf88d4f6 ci: remove outdated temporary patches
Remove patches, they don't apply and are not needed anymore.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-09-15 15:57:14 -07:00
Andrii Nakryiko
10da3d2384 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   9e3b47abeb8f76c39c570ffc924ac0b35f132274
Checkpoint bpf-next commit: 45ee73a0722b9e1d0b7a524d06756291b13b5912
Baseline bpf commit:        23d775f12dcd23d052a4927195f15e970e27ab26
Checkpoint bpf commit:      57eb5e1c5c57972c95e8efab6bc81b87161b0b07

Andrii Nakryiko (1):
  libbpf: Add basic BTF sanity validation

Ravi Bangoria (1):
  perf/mem: Introduce PERF_MEM_LVLNUM_UNC

Stanislav Fomichev (2):
  bpf: expose information about supported xdp metadata kfunc
  bpf: Clarify error expectations from bpf_clone_redirect

Yonghong Song (2):
  libbpf: Add __percpu_kptr macro definition
  bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated

 include/uapi/linux/bpf.h        |  13 ++-
 include/uapi/linux/netdev.h     |  16 ++++
 include/uapi/linux/perf_event.h |   3 +-
 src/bpf_helpers.h               |   1 +
 src/btf.c                       | 160 ++++++++++++++++++++++++++++++++
 5 files changed, 190 insertions(+), 3 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-09-15 15:57:14 -07:00
Andrii Nakryiko
d2838b2be3 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-09-15 15:57:14 -07:00
Stanislav Fomichev
aa44abfdd2 bpf: Clarify error expectations from bpf_clone_redirect
Commit 151e887d8ff9 ("veth: Fixing transmit return status for dropped
packets") exposed the fact that bpf_clone_redirect is capable of
returning raw NET_XMIT_XXX return codes.

This is in the conflict with its UAPI doc which says the following:
"0 on success, or a negative error in case of failure."

Update the UAPI to reflect the fact that bpf_clone_redirect can
return positive error numbers, but don't explicitly define
their meaning.

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230911194731.286342-1-sdf@google.com
2023-09-15 15:57:14 -07:00
Stanislav Fomichev
6070b1bdcf bpf: expose information about supported xdp metadata kfunc
Add new xdp-rx-metadata-features member to netdev netlink
which exports a bitmask of supported kfuncs. Most of the patch
is autogenerated (headers), the only relevant part is netdev.yaml
and the changes in netdev-genl.c to marshal into netlink.

Example output on veth:

$ ip link add veth0 type veth peer name veth1 # ifndex == 12
$ ./tools/net/ynl/samples/netdev 12

Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 12
   veth1[12]    xdp-features (23): basic redirect rx-sg xdp-rx-metadata-features (3): timestamp hash xdp-zc-max-segs=0

Cc: netdev@vger.kernel.org
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230913171350.369987-3-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-15 15:57:14 -07:00
Yonghong Song
6f30f1a00a bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated
Now 'BPF_MAP_TYPE_CGRP_STORAGE + local percpu ptr'
can cover all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE functionality
and more. So mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated.
Also make changes in selftests/bpf/test_bpftool_synctypes.py
and selftest libbpf_str to fix otherwise test errors.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230827152837.2003563-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15 15:57:14 -07:00
Yonghong Song
332198af03 libbpf: Add __percpu_kptr macro definition
Add __percpu_kptr macro definition in bpf_helpers.h.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230827152800.1998492-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15 15:57:14 -07:00
Andrii Nakryiko
2dbdd3b564 libbpf: Add basic BTF sanity validation
Implement a simple and straightforward BTF sanity check when parsing BTF
data. Right now it's very basic and just validates that all the string
offsets and type IDs are within valid range. For FUNC we also check that
it points to FUNC_PROTO kinds.

Even with such simple checks it fixes a bunch of crashes found by OSS
fuzzer ([0]-[5]) and will allow fuzzer to make further progress.

Some other invariants will be checked in follow up patches (like
ensuring there is no infinite type loops), but this seems like a good
start already.

Adding FUNC -> FUNC_PROTO check revealed that one of selftests has
a problem with FUNC pointing to VAR instead, so fix it up in the same
commit.

  [0] https://github.com/libbpf/libbpf/issues/482
  [1] https://github.com/libbpf/libbpf/issues/483
  [2] https://github.com/libbpf/libbpf/issues/485
  [3] https://github.com/libbpf/libbpf/issues/613
  [4] https://github.com/libbpf/libbpf/issues/618
  [5] https://github.com/libbpf/libbpf/issues/619

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Song Liu <song@kernel.org>
Closes: https://github.com/libbpf/libbpf/issues/617
Link: https://lore.kernel.org/bpf/20230825202152.1813394-1-andrii@kernel.org
2023-09-15 15:57:14 -07:00
Ravi Bangoria
d8a4b198da perf/mem: Introduce PERF_MEM_LVLNUM_UNC
Older API PERF_MEM_LVL_UNC can be replaced by PERF_MEM_LVLNUM_UNC.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230725150206.184-2-ravi.bangoria@amd.com
2023-09-15 15:57:14 -07:00
Andrii Nakryiko
5fc0677111 ci: update list of tests/subtests for 5.5 kernel
Some tests can't succeed on 5.5, which is very old.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-09-07 09:11:51 -07:00
Daniel Müller
295b5726f0 Introduce pull request template
This change introduces a pull request template that hopefully helps
prevent more libbpf-specific pull requests that should really be
submitted to the BPF mailing from being opened against this repository.
Recent examples include [0] [1].

[0] https://github.com/libbpf/libbpf/pull/712
[1] https://github.com/libbpf/libbpf/pull/723

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-09-05 11:08:57 -07:00
Andrii Nakryiko
5a46421ad8 ci: deny newly added tc_bpf/tc_bpf_non_root for 5.5
It doesn't work on 5.5 and was just recently introduced as a new subtest
to already existing test. Add subtest to denylist.

Also clean up old denylist, leaving only "exception" relative to
ALLOWLIST.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-08-25 11:51:28 -07:00
Andrii Nakryiko
942a0b8056 Makefile: silence GCC's bogus complaint about possible NULL in printf
GCC started complaining that some of libbpf pr_warn() statements might
be passing NULL for map name. Map name is never NULL for non-NULL map
pointer, so this is a false positive which triggers build failures.
Silence format-overflow warning altogether to avoid this in the future
as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-08-25 11:51:28 -07:00
Andrii Nakryiko
fcc940e6b2 Makefile: add elf.c to a list of built files
Libbpf now has one more .c file, make sure Github Makefile builds it.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-08-25 11:51:28 -07:00
Andrii Nakryiko
2e6b54e5ea sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   0a55264cf966fb95ebf9d03d9f81fa992f069312
Checkpoint bpf-next commit: 9e3b47abeb8f76c39c570ffc924ac0b35f132274
Baseline bpf commit:        23d775f12dcd23d052a4927195f15e970e27ab26
Checkpoint bpf commit:      23d775f12dcd23d052a4927195f15e970e27ab26

Andrii Nakryiko (1):
  libbpf: fix signedness determination in CO-RE relo handling logic

Daniel Xu (1):
  libbpf: Add bpf_object__unpin()

Hao Luo (1):
  libbpf: Free btf_vmlinux when closing bpf_object

Jiri Olsa (15):
  bpf: Switch BPF_F_KPROBE_MULTI_RETURN macro to enum
  bpf: Add multi uprobe link
  bpf: Add cookies support for uprobe_multi link
  bpf: Add pid filter support for uprobe_multi link
  libbpf: Add uprobe_multi attach type and link names
  libbpf: Move elf_find_func_offset* functions to elf object
  libbpf: Add elf_open/elf_close functions
  libbpf: Add elf symbol iterator
  libbpf: Add elf_resolve_syms_offsets function
  libbpf: Add elf_resolve_pattern_offsets function
  libbpf: Add bpf_link_create support for multi uprobes
  libbpf: Add bpf_program__attach_uprobe_multi function
  libbpf: Add support for u[ret]probe.multi[.s] program sections
  libbpf: Add uprobe multi link detection
  libbpf: Add uprobe multi link support to bpf_program__attach_usdt

 include/uapi/linux/bpf.h |  22 +-
 src/bpf.c                |  11 +
 src/bpf.h                |  11 +-
 src/elf.c                | 440 +++++++++++++++++++++++++++++++++++++++
 src/libbpf.c             | 404 ++++++++++++++++++-----------------
 src/libbpf.h             |  52 +++++
 src/libbpf.map           |   2 +
 src/libbpf_internal.h    |  21 ++
 src/relo_core.c          |   2 +-
 src/usdt.c               | 116 +++++++----
 10 files changed, 853 insertions(+), 228 deletions(-)
 create mode 100644 src/elf.c

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-08-25 11:51:28 -07:00
Andrii Nakryiko
b4c8def45f libbpf: fix signedness determination in CO-RE relo handling logic
Extracting btf_int_encoding() is only meaningful for BTF_KIND_INT, so we
need to check that first before inferring signedness.

Closes: https://github.com/libbpf/libbpf/issues/704
Reported-by: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230824000016.2658017-2-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-25 11:51:28 -07:00
Daniel Xu
62a186ea68 libbpf: Add bpf_object__unpin()
For bpf_object__pin_programs() there is bpf_object__unpin_programs().
Likewise bpf_object__unpin_maps() for bpf_object__pin_maps().

But no bpf_object__unpin() for bpf_object__pin(). Adding the former adds
symmetry to the API.

It's also convenient for cleanup in application code. It's an API I
would've used if it was available for a repro I was writing earlier.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/b2f9d41da4a350281a0b53a804d11b68327e14e5.1692832478.git.dxu@dxuuu.xyz
2023-08-25 11:51:28 -07:00
Hao Luo
a687461867 libbpf: Free btf_vmlinux when closing bpf_object
I hit a memory leak when testing bpf_program__set_attach_target().
Basically, set_attach_target() may allocate btf_vmlinux, for example,
when setting attach target for bpf_iter programs. But btf_vmlinux
is freed only in bpf_object_load(), which means if we only open
bpf object but not load it, setting attach target may leak
btf_vmlinux.

So let's free btf_vmlinux in bpf_object__close() anyway.

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230822193840.1509809-1-haoluo@google.com
2023-08-25 11:51:28 -07:00
Jiri Olsa
74188c1740 libbpf: Add uprobe multi link support to bpf_program__attach_usdt
Adding support for usdt_manager_attach_usdt to use uprobe_multi
link to attach to usdt probes.

The uprobe_multi support is detected before the usdt program is
loaded and its expected_attach_type is set accordingly.

If uprobe_multi support is detected the usdt_manager_attach_usdt
gathers uprobes info and calls bpf_program__attach_uprobe to
create all needed uprobes.

If uprobe_multi support is not detected the old behaviour stays.

Also adding usdt.s program section for sleepable usdt probes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-18-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
60cf42249b libbpf: Add uprobe multi link detection
Adding uprobe-multi link detection. It will be used later in
bpf_program__attach_usdt function to check and use uprobe_multi
link over standard uprobe links.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-17-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
bc829bac06 libbpf: Add support for u[ret]probe.multi[.s] program sections
Adding support for several uprobe_multi program sections
to allow auto attach of multi_uprobe programs.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-16-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
9f76dd6dd0 libbpf: Add bpf_program__attach_uprobe_multi function
Adding bpf_program__attach_uprobe_multi function that
allows to attach multiple uprobes with uprobe_multi link.

The user can specify uprobes with direct arguments:

  binary_path/func_pattern/pid

or with struct bpf_uprobe_multi_opts opts argument fields:

  const char **syms;
  const unsigned long *offsets;
  const unsigned long *ref_ctr_offsets;
  const __u64 *cookies;

User can specify 2 mutually exclusive set of inputs:

 1) use only path/func_pattern/pid arguments

 2) use path/pid with allowed combinations of:
    syms/offsets/ref_ctr_offsets/cookies/cnt

    - syms and offsets are mutually exclusive
    - ref_ctr_offsets and cookies are optional

Any other usage results in error.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-15-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
cd21cc08cc libbpf: Add bpf_link_create support for multi uprobes
Adding new uprobe_multi struct to bpf_link_create_opts object
to pass multiple uprobe data to link_create attr uapi.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-14-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
c7ef3a169e libbpf: Add elf_resolve_pattern_offsets function
Adding elf_resolve_pattern_offsets function that looks up
offsets for symbols specified by pattern argument.

The 'pattern' argument allows wildcards (*?' supported).

Offsets are returned in allocated array together with its
size and needs to be released by the caller.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-13-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
91fd655644 libbpf: Add elf_resolve_syms_offsets function
Adding elf_resolve_syms_offsets function that looks up
offsets for symbols specified in syms array argument.

Offsets are returned in allocated array with the 'cnt' size,
that needs to be released by the caller.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-12-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
b7ec9d9669 libbpf: Add elf symbol iterator
Adding elf symbol iterator object (and some functions) that follow
open-coded iterator pattern and some functions to ease up iterating
elf object symbols.

The idea is to iterate single symbol section with:

  struct elf_sym_iter iter;
  struct elf_sym *sym;

  if (elf_sym_iter_new(&iter, elf, binary_path, SHT_DYNSYM))
        goto error;

  while ((sym = elf_sym_iter_next(&iter))) {
        ...
  }

I considered opening the elf inside the iterator and iterate all symbol
sections, but then it gets more complicated wrt user checks for when
the next section is processed.

Plus side is the we don't need 'exit' function, because caller/user is
in charge of that.

The returned iterated symbol object from elf_sym_iter_next function
is placed inside the struct elf_sym_iter, so no extra allocation or
argument is needed.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-11-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
1f8929293e libbpf: Add elf_open/elf_close functions
Adding elf_open/elf_close functions and using it in
elf_find_func_offset_from_file function. It will be
used in following changes to save some common code.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-10-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
0cd5b05f53 libbpf: Move elf_find_func_offset* functions to elf object
Adding new elf object that will contain elf related functions.
There's no functional change.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-9-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
a1c2e05c4f libbpf: Add uprobe_multi attach type and link names
Adding new uprobe_multi attach type and link names,
so the functions can resolve the new values.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-8-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
c1a12134bd bpf: Add pid filter support for uprobe_multi link
Adding support to specify pid for uprobe_multi link and the uprobes
are created only for task with given pid value.

Using the consumer.filter filter callback for that, so the task gets
filtered during the uprobe installation.

We still need to check the task during runtime in the uprobe handler,
because the handler could get executed if there's another system
wide consumer on the same uprobe (thanks Oleg for the insight).

Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230809083440.3209381-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
12466f75db bpf: Add cookies support for uprobe_multi link
Adding support to specify cookies array for uprobe_multi link.

The cookies array share indexes and length with other uprobe_multi
arrays (offsets/ref_ctr_offsets).

The cookies[i] value defines cookie for i-the uprobe and will be
returned by bpf_get_attach_cookie helper when called from ebpf
program hooked to that specific uprobe.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230809083440.3209381-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
ba4a10d764 bpf: Add multi uprobe link
Adding new multi uprobe link that allows to attach bpf program
to multiple uprobes.

Uprobes to attach are specified via new link_create uprobe_multi
union:

  struct {
    __aligned_u64   path;
    __aligned_u64   offsets;
    __aligned_u64   ref_ctr_offsets;
    __u32           cnt;
    __u32           flags;
  } uprobe_multi;

Uprobes are defined for single binary specified in path and multiple
calling sites specified in offsets array with optional reference
counters specified in ref_ctr_offsets array. All specified arrays
have length of 'cnt'.

The 'flags' supports single bit for now that marks the uprobe as
return probe.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230809083440.3209381-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
8765ef8276 bpf: Switch BPF_F_KPROBE_MULTI_RETURN macro to enum
Switching BPF_F_KPROBE_MULTI_RETURN macro to anonymous enum,
so it'd show up in vmlinux.h. There's not functional change
compared to having this as macro.

Acked-by: Yafang Shao <laoar.shao@gmail.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230809083440.3209381-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Andrii Nakryiko
6a91da19fe fuzz: use https-based URL for elfutils
For environments behind proxies, having https:// URL for pulling GIT is
more convenient.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-08-24 14:14:18 -07:00
Andrii Nakryiko
383198dc49 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   a3e7e6b17946f48badce98d7ac360678a0ea7393
Checkpoint bpf-next commit: 0a55264cf966fb95ebf9d03d9f81fa992f069312
Baseline bpf commit:        496720b7cfb6574a8f6f4d434f23e3d1e6cfaeb9
Checkpoint bpf commit:      23d775f12dcd23d052a4927195f15e970e27ab26

Alan Maguire (1):
  bpf: sync tools/ uapi header with

Arnaldo Carvalho de Melo (1):
  tools headers uapi: Sync linux/fcntl.h with the kernel sources

Daniel Borkmann (5):
  bpf: Add generic attach/detach/query API for multi-progs
  bpf: Add fd-based tcx multi-prog infra with link support
  libbpf: Add opts-based attach/detach/query API for tcx
  libbpf: Add link-based API for tcx
  libbpf: Add helper macro to clear opts structs

Daniel Xu (1):
  netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link

Dave Marchevsky (1):
  libbpf: Support triple-underscore flavors for kfunc relocation

Jiri Olsa (1):
  bpf: Add support for bpf_get_func_ip helper for uprobe program

Lorenz Bauer (1):
  bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign

Maciej Fijalkowski (1):
  xsk: add new netlink attribute dedicated for ZC max frags

Magnus Karlsson (2):
  selftests/xsk: transmit and receive multi-buffer packets
  selftests/xsk: add basic multi-buffer test

Marco Vedovati (1):
  libbpf: Set close-on-exec flag on gzopen

Sergey Kacheev (1):
  libbpf: Use local includes inside the library

Stanislav Fomichev (1):
  ynl: regenerate all headers

Yafang Shao (2):
  bpf: Support ->fill_link_info for kprobe_multi
  bpf: Support ->fill_link_info for perf_event

Yonghong Song (1):
  bpf: Support new sign-extension load insns

 include/uapi/linux/bpf.h    | 128 +++++++++++++++++++++++++++++++-----
 include/uapi/linux/fcntl.h  |   5 ++
 include/uapi/linux/if_xdp.h |   9 +++
 include/uapi/linux/netdev.h |   4 +-
 src/bpf.c                   | 127 ++++++++++++++++++++++++-----------
 src/bpf.h                   |  97 +++++++++++++++++++++++----
 src/bpf_tracing.h           |   2 +-
 src/libbpf.c                |  94 +++++++++++++++++++++-----
 src/libbpf.h                |  18 ++++-
 src/libbpf.map              |   2 +
 src/libbpf_common.h         |  16 +++++
 src/netlink.c               |   5 ++
 src/usdt.bpf.h              |   4 +-
 13 files changed, 423 insertions(+), 88 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-08-21 13:27:45 -07:00
Andrii Nakryiko
839c08a6d8 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-08-21 13:27:45 -07:00
Dave Marchevsky
6d704c7ffd libbpf: Support triple-underscore flavors for kfunc relocation
The function signature of kfuncs can change at any time due to their
intentional lack of stability guarantees. As kfuncs become more widely
used, BPF program writers will need facilities to support calling
different versions of a kfunc from a single BPF object. Consider this
simplified example based on a real scenario we ran into at Meta:

  /* initial kfunc signature */
  int some_kfunc(void *ptr)

  /* Oops, we need to add some flag to modify behavior. No problem,
    change the kfunc. flags = 0 retains original behavior */
  int some_kfunc(void *ptr, long flags)

If the initial version of the kfunc is deployed on some portion of the
fleet and the new version on the rest, a fleetwide service that uses
some_kfunc will currently need to load different BPF programs depending
on which some_kfunc is available.

Luckily CO-RE provides a facility to solve a very similar problem,
struct definition changes, by allowing program writers to declare
my_struct___old and my_struct___new, with ___suffix being considered a
'flavor' of the non-suffixed name and being ignored by
bpf_core_type_exists and similar calls.

This patch extends the 'flavor' facility to the kfunc extern
relocation process. BPF program writers can now declare

  extern int some_kfunc___old(void *ptr)
  extern int some_kfunc___new(void *ptr, int flags)

then test which version of the kfunc exists with bpf_ksym_exists.
Relocation and verifier's dead code elimination will work in concert as
expected, allowing this pattern:

  if (bpf_ksym_exists(some_kfunc___old))
    some_kfunc___old(ptr);
  else
    some_kfunc___new(ptr, 0);

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230817225353.2570845-1-davemarchevsky@fb.com
2023-08-21 13:27:45 -07:00
Marco Vedovati
20699ecf61 libbpf: Set close-on-exec flag on gzopen
Enable the close-on-exec flag when using gzopen. This is especially important
for multithreaded programs making use of libbpf, where a fork + exec could
race with libbpf library calls, potentially resulting in a file descriptor
leaked to the new process. This got missed in 59842c5451fe ("libbpf: Ensure
libbpf always opens files with O_CLOEXEC").

Fixes: 59842c5451fe ("libbpf: Ensure libbpf always opens files with O_CLOEXEC")
Signed-off-by: Marco Vedovati <marco.vedovati@crowdstrike.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230810214350.106301-1-martin.kelly@crowdstrike.com
2023-08-21 13:27:45 -07:00
Jiri Olsa
cd85f34103 bpf: Add support for bpf_get_func_ip helper for uprobe program
Adding support for bpf_get_func_ip helper for uprobe program to return
probed address for both uprobe and return uprobe.

We discussed this in [1] and agreed that uprobe can have special use
of bpf_get_func_ip helper that differs from kprobe.

The kprobe bpf_get_func_ip returns:
  - address of the function if probe is attach on function entry
    for both kprobe and return kprobe
  - 0 if the probe is not attach on function entry

The uprobe bpf_get_func_ip returns:
  - address of the probe for both uprobe and return uprobe

The reason for this semantic change is that kernel can't really tell
if the probe user space address is function entry.

The uprobe program is actually kprobe type program attached as uprobe.
One of the consequences of this design is that uprobes do not have its
own set of helpers, but share them with kprobes.

As we need different functionality for bpf_get_func_ip helper for uprobe,
I'm adding the bool value to the bpf_trace_run_ctx, so the helper can
detect that it's executed in uprobe context and call specific code.

The is_uprobe bool is set as true in bpf_prog_run_array_sleepable, which
is currently used only for executing bpf programs in uprobe.

Renaming bpf_prog_run_array_sleepable to bpf_prog_run_array_uprobe
to address that it's only used for uprobes and that it sets the
run_ctx.is_uprobe as suggested by Yafang Shao.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
[1] https://lore.kernel.org/bpf/CAEf4BzZ=xLVkG5eurEuvLU79wAMtwho7ReR+XJAgwhFF4M-7Cg@mail.gmail.com/
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Viktor Malik <vmalik@redhat.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230807085956.2344866-2-jolsa@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-21 13:27:45 -07:00
Sergey Kacheev
26e32f542b libbpf: Use local includes inside the library
In our monrepo, we try to minimize special processing when importing
(aka vendor) third-party source code. Ideally, we try to import
directly from the repositories with the code without changing it, we
try to stick to the source code dependency instead of the artifact
dependency. In the current situation, a patch has to be made for
libbpf to fix the includes in bpf headers so that they work directly
from libbpf/src.

Signed-off-by: Sergey Kacheev <s.kacheev@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/CAJVhQqUg6OKq6CpVJP5ng04Dg+z=igevPpmuxTqhsR3dKvd9+Q@mail.gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-21 13:27:45 -07:00
Daniel Xu
c5f64030de netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
This commit adds support for enabling IP defrag using pre-existing
netfilter defrag support. Basically all the flag does is bump a refcnt
while the link the active. Checks are also added to ensure the prog
requesting defrag support is run _after_ netfilter defrag hooks.

We also take care to avoid any issues w.r.t. module unloading -- while
defrag is active on a link, the module is prevented from unloading.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/5cff26f97e55161b7d56b09ddcf5f8888a5add1d.1689970773.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Yonghong Song
3d0e1c5a3a bpf: Support new sign-extension load insns
Add interpreter/jit support for new sign-extension load insns
which adds a new mode (BPF_MEMSX).
Also add verifier support to recognize these insns and to
do proper verification with new insns. In verifier, besides
to deduce proper bounds for the dst_reg, probed memory access
is also properly handled.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011156.3711870-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Lorenz Bauer
36cabf8a4a bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT
sockets. This means we can't use the helper to steer traffic to Envoy,
which configures SO_REUSEPORT on its sockets. In turn, we're blocked
from removing TPROXY from our setup.

The reason that bpf_sk_assign refuses such sockets is that the
bpf_sk_lookup helpers don't execute SK_REUSEPORT programs. Instead,
one of the reuseport sockets is selected by hash. This could cause
dispatch to the "wrong" socket:

    sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash
    bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed

Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup
helpers unfortunately. In the tc context, L2 headers are at the start
of the skb, while SK_REUSEPORT expects L3 headers instead.

Instead, we execute the SK_REUSEPORT program when the assigned socket
is pulled out of the skb, further up the stack. This creates some
trickiness with regards to refcounting as bpf_sk_assign will put both
refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU
freed. We can infer that the sk_assigned socket is RCU freed if the
reuseport lookup succeeds, but convincing yourself of this fact isn't
straight forward. Therefore we defensively check refcounting on the
sk_assign sock even though it's probably not required in practice.

Fixes: 8e368dc72e86 ("bpf: Fix use of sk->sk_reuseport from sk_assign")
Fixes: cf7fbe660f2d ("bpf: Add socket assign support")
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@cilium.io>
Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-7-7021b683cdae@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-21 13:27:45 -07:00
Stanislav Fomichev
1180ab4066 ynl: regenerate all headers
Also add support to pass topdir to ynl-regen.sh (Jakub) and call
it from the makefile to update the UAPI headers.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Co-developed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230727163001.3952878-4-sdf@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-21 13:27:45 -07:00
Arnaldo Carvalho de Melo
e6ab647970 tools headers uapi: Sync linux/fcntl.h with the kernel sources
To get the changes in:

  96b2b072ee62be8a ("exportfs: allow exporting non-decodeable file handles to userspace")

That don't add anything that is handled by existing hard coded tables or
table generation scripts.

This silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZK11P5AwRBUxxutI@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-21 13:27:45 -07:00
Alan Maguire
71d8eadb90 bpf: sync tools/ uapi header with
Seeing the following:

Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h'

...so sync tools version missing some list_node/rb_tree fields.

Fixes: c3c510ce431c ("bpf: Add 'owner' field to bpf_{list,rb}_node")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20230719162257.20818-1-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Daniel Borkmann
488031955d libbpf: Add helper macro to clear opts structs
Add a small and generic LIBBPF_OPTS_RESET() helper macros which clears an
opts structure and reinitializes its .sz member to place the structure
size. Additionally, the user can pass option-specific data to reinitialize
via varargs.

I found this very useful when developing selftests, but it is also generic
enough as a macro next to the existing LIBBPF_OPTS() which hides the .sz
initialization, too.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20230719140858.13224-6-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Daniel Borkmann
0fadd4ba39 libbpf: Add link-based API for tcx
Implement tcx BPF link support for libbpf.

The bpf_program__attach_fd() API has been refactored slightly in order to pass
bpf_link_create_opts pointer as input.

A new bpf_program__attach_tcx() has been added on top of this which allows for
passing all relevant data via extensible struct bpf_tcx_opts.

The program sections tcx/ingress and tcx/egress correspond to the hook locations
for tc ingress and egress, respectively.

For concrete usage examples, see the extensive selftests that have been
developed as part of this series.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230719140858.13224-5-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Daniel Borkmann
bb5d7c1be8 libbpf: Add opts-based attach/detach/query API for tcx
Extend libbpf attach opts and add a new detach opts API so this can be used
to add/remove fd-based tcx BPF programs. The old-style bpf_prog_detach() and
bpf_prog_detach2() APIs are refactored to reuse the new bpf_prog_detach_opts()
internally.

The bpf_prog_query_opts() API got extended to be able to handle the new
link_ids, link_attach_flags and revision fields.

For concrete usage examples, see the extensive selftests that have been
developed as part of this series.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230719140858.13224-4-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Daniel Borkmann
b064c40d94 bpf: Add fd-based tcx multi-prog infra with link support
This work refactors and adds a lightweight extension ("tcx") to the tc BPF
ingress and egress data path side for allowing BPF program management based
on fds via bpf() syscall through the newly added generic multi-prog API.
The main goal behind this work which we also presented at LPC [0] last year
and a recent update at LSF/MM/BPF this year [3] is to support long-awaited
BPF link functionality for tc BPF programs, which allows for a model of safe
ownership and program detachment.

Given the rise in tc BPF users in cloud native environments, this becomes
necessary to avoid hard to debug incidents either through stale leftover
programs or 3rd party applications accidentally stepping on each others toes.
As a recap, a BPF link represents the attachment of a BPF program to a BPF
hook point. The BPF link holds a single reference to keep BPF program alive.
Moreover, hook points do not reference a BPF link, only the application's
fd or pinning does. A BPF link holds meta-data specific to attachment and
implements operations for link creation, (atomic) BPF program update,
detachment and introspection. The motivation for BPF links for tc BPF programs
is multi-fold, for example:

  - From Meta: "It's especially important for applications that are deployed
    fleet-wide and that don't "control" hosts they are deployed to. If such
    application crashes and no one notices and does anything about that, BPF
    program will keep running draining resources or even just, say, dropping
    packets. We at FB had outages due to such permanent BPF attachment
    semantics. With fd-based BPF link we are getting a framework, which allows
    safe, auto-detachable behavior by default, unless application explicitly
    opts in by pinning the BPF link." [1]

  - From Cilium-side the tc BPF programs we attach to host-facing veth devices
    and phys devices build the core datapath for Kubernetes Pods, and they
    implement forwarding, load-balancing, policy, EDT-management, etc, within
    BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently
    experienced hard-to-debug issues in a user's staging environment where
    another Kubernetes application using tc BPF attached to the same prio/handle
    of cls_bpf, accidentally wiping all Cilium-based BPF programs from underneath
    it. The goal is to establish a clear/safe ownership model via links which
    cannot accidentally be overridden. [0,2]

BPF links for tc can co-exist with non-link attachments, and the semantics are
in line also with XDP links: BPF links cannot replace other BPF links, BPF
links cannot replace non-BPF links, non-BPF links cannot replace BPF links and
lastly only non-BPF links can replace non-BPF links. In case of Cilium, this
would solve mentioned issue of safe ownership model as 3rd party applications
would not be able to accidentally wipe Cilium programs, even if they are not
BPF link aware.

Earlier attempts [4] have tried to integrate BPF links into core tc machinery
to solve cls_bpf, which has been intrusive to the generic tc kernel API with
extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could
be wiped from the qdisc also. Locking a tc BPF program in place this way, is
getting into layering hacks given the two object models are vastly different.

We instead implemented the tcx (tc 'express') layer which is an fd-based tc BPF
attach API, so that the BPF link implementation blends in naturally similar to
other link types which are fd-based and without the need for changing core tc
internal APIs. BPF programs for tc can then be successively migrated from classic
cls_bpf to the new tc BPF link without needing to change the program's source
code, just the BPF loader mechanics for attaching is sufficient.

For the current tc framework, there is no change in behavior with this change
and neither does this change touch on tc core kernel APIs. The gist of this
patch is that the ingress and egress hook have a lightweight, qdisc-less
extension for BPF to attach its tc BPF programs, in other words, a minimal
entry point for tc BPF. The name tcx has been suggested from discussion of
earlier revisions of this work as a good fit, and to more easily differ between
the classic cls_bpf attachment and the fd-based one.

For the ingress and egress tcx points, the device holds a cache-friendly array
with program pointers which is separated from control plane (slow-path) data.
Earlier versions of this work used priority to determine ordering and expression
of dependencies similar as with classic tc, but it was challenged that for
something more future-proof a better user experience is required. Hence this
resulted in the design and development of the generic attach/detach/query API
for multi-progs. See prior patch with its discussion on the API design. tcx is
the first user and later we plan to integrate also others, for example, one
candidate is multi-prog support for XDP which would benefit and have the same
'look and feel' from API perspective.

The goal with tcx is to have maximum compatibility to existing tc BPF programs,
so they don't need to be rewritten specifically. Compatibility to call into
classic tcf_classify() is also provided in order to allow successive migration
or both to cleanly co-exist where needed given its all one logical tc layer and
the tcx plus classic tc cls/act build one logical overall processing pipeline.

tcx supports the simplified return codes TCX_NEXT which is non-terminating (go
to next program) and terminating ones with TCX_PASS, TCX_DROP, TCX_REDIRECT.
The fd-based API is behind a static key, so that when unused the code is also
not entered. The struct tcx_entry's program array is currently static, but
could be made dynamic if necessary at a point in future. The a/b pair swap
design has been chosen so that for detachment there are no allocations which
otherwise could fail.

The work has been tested with tc-testing selftest suite which all passes, as
well as the tc BPF tests from the BPF CI, and also with Cilium's L4LB.

Thanks also to Nikolay Aleksandrov and Martin Lau for in-depth early reviews
of this work.

  [0] https://lpc.events/event/16/contributions/1353/
  [1] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com
  [2] https://colocatedeventseu2023.sched.com/event/1Jo6O/tales-from-an-ebpf-programs-murder-mystery-hemanth-malla-guillaume-fournier-datadog
  [3] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
  [4] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230719140858.13224-3-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Daniel Borkmann
d7e583a6ea bpf: Add generic attach/detach/query API for multi-progs
This adds a generic layer called bpf_mprog which can be reused by different
attachment layers to enable multi-program attachment and dependency resolution.
In-kernel users of the bpf_mprog don't need to care about the dependency
resolution internals, they can just consume it with few API calls.

The initial idea of having a generic API sparked out of discussion [0] from an
earlier revision of this work where tc's priority was reused and exposed via
BPF uapi as a way to coordinate dependencies among tc BPF programs, similar
as-is for classic tc BPF. The feedback was that priority provides a bad user
experience and is hard to use [1], e.g.:

  I cannot help but feel that priority logic copy-paste from old tc, netfilter
  and friends is done because "that's how things were done in the past". [...]
  Priority gets exposed everywhere in uapi all the way to bpftool when it's
  right there for users to understand. And that's the main problem with it.

  The user don't want to and don't need to be aware of it, but uapi forces them
  to pick the priority. [...] Your cover letter [0] example proves that in
  real life different service pick the same priority. They simply don't know
  any better. Priority is an unnecessary magic that apps _have_ to pick, so
  they just copy-paste and everyone ends up using the same.

The course of the discussion showed more and more the need for a generic,
reusable API where the "same look and feel" can be applied for various other
program types beyond just tc BPF, for example XDP today does not have multi-
program support in kernel, but also there was interest around this API for
improving management of cgroup program types. Such common multi-program
management concept is useful for BPF management daemons or user space BPF
applications coordinating internally about their attachments.

Both from Cilium and Meta side [2], we've collected the following requirements
for a generic attach/detach/query API for multi-progs which has been implemented
as part of this work:

  - Support prog-based attach/detach and link API
  - Dependency directives (can also be combined):
    - BPF_F_{BEFORE,AFTER} with relative_{fd,id} which can be {prog,link,none}
      - BPF_F_ID flag as {fd,id} toggle; the rationale for id is so that user
        space application does not need CAP_SYS_ADMIN to retrieve foreign fds
        via bpf_*_get_fd_by_id()
      - BPF_F_LINK flag as {prog,link} toggle
      - If relative_{fd,id} is none, then BPF_F_BEFORE will just prepend, and
        BPF_F_AFTER will just append for attaching
      - Enforced only at attach time
    - BPF_F_REPLACE with replace_bpf_fd which can be prog, links have their
      own infra for replacing their internal prog
    - If no flags are set, then it's default append behavior for attaching
  - Internal revision counter and optionally being able to pass expected_revision
  - User space application can query current state with revision, and pass it
    along for attachment to assert current state before doing updates
  - Query also gets extension for link_ids array and link_attach_flags:
    - prog_ids are always filled with program IDs
    - link_ids are filled with link IDs when link was used, otherwise 0
    - {prog,link}_attach_flags for holding {prog,link}-specific flags
  - Must be easy to integrate/reuse for in-kernel users

The uapi-side changes needed for supporting bpf_mprog are rather minimal,
consisting of the additions of the attachment flags, revision counter, and
expanding existing union with relative_{fd,id} member.

The bpf_mprog framework consists of an bpf_mprog_entry object which holds
an array of bpf_mprog_fp (fast-path structure). The bpf_mprog_cp (control-path
structure) is part of bpf_mprog_bundle. Both have been separated, so that
fast-path gets efficient packing of bpf_prog pointers for maximum cache
efficiency. Also, array has been chosen instead of linked list or other
structures to remove unnecessary indirections for a fast point-to-entry in
tc for BPF.

The bpf_mprog_entry comes as a pair via bpf_mprog_bundle so that in case of
updates the peer bpf_mprog_entry is populated and then just swapped which
avoids additional allocations that could otherwise fail, for example, in
detach case. bpf_mprog_{fp,cp} arrays are currently static, but they could
be converted to dynamic allocation if necessary at a point in future.
Locking is deferred to the in-kernel user of bpf_mprog, for example, in case
of tcx which uses this API in the next patch, it piggybacks on rtnl.

An extensive test suite for checking all aspects of this API for prog-based
attach/detach and link API comes as BPF selftests in this series.

Thanks also to Andrii Nakryiko for early API discussions wrt Meta's BPF prog
management.

  [0] https://lore.kernel.org/bpf/20221004231143.19190-1-daniel@iogearbox.net
  [1] https://lore.kernel.org/bpf/CAADnVQ+gEY3FjCR=+DmjDR4gp5bOYZUFJQXj4agKFHT9CQPZBw@mail.gmail.com
  [2] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20230719140858.13224-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Magnus Karlsson
071630384b selftests/xsk: add basic multi-buffer test
Add the first basic multi-buffer test that sends a stream of 9K
packets and validates that they are received at the other end. In
order to enable sending and receiving multi-buffer packets, code that
sets the MTU is introduced as well as modifications to the XDP
programs so that they signal that they are multi-buffer enabled.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-20-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Magnus Karlsson
658b107d4d selftests/xsk: transmit and receive multi-buffer packets
Add the ability to send and receive packets that are larger than the
size of a umem frame, using the AF_XDP /XDP multi-buffer
support. There are three pieces of code that need to be changed to
achieve this: the Rx path, the Tx path, and the validation logic.

Both the Rx path and Tx could only deal with a single fragment per
packet. The Tx path is extended with a new function called
pkt_nb_frags() that can be used to retrieve the number of fragments a
packet will consume. We then create these many fragments in a loop and
fill the N-1 first ones to the max size limit to use the buffer space
efficiently, and the Nth one with whatever data that is left. This
goes on until we have filled in at the most BATCH_SIZE worth of
descriptors and fragments. If we detect that the next packet would
lead to BATCH_SIZE number of fragments sent being exceeded, we do not
send this packet and finish the batch. This packet is instead sent in
the next iteration of BATCH_SIZE fragments.

For Rx, we loop over all fragments we receive as usual, but for every
descriptor that we receive we call a new validation function called
is_frag_valid() to validate the consistency of this fragment. The code
then checks if the packet continues in the next frame. If so, it loops
over the next packet and performs the same validation. once we have
received the last fragment of the packet we also call the function
is_pkt_valid() to validate the packet as a whole. If we get to the end
of the batch and we are not at the end of the current packet, we back
out the partial packet and end the loop. Once we get into the receive
loop next time, we start over from the beginning of that packet. This
so the code becomes simpler at the cost of some performance.

The validation function is_frag_valid() checks that the sequence and
packet numbers are correct at the start and end of each fragment.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-19-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Maciej Fijalkowski
8ae70bcbdf xsk: add new netlink attribute dedicated for ZC max frags
Introduce new netlink attribute NETDEV_A_DEV_XDP_ZC_MAX_SEGS that will
carry maximum fragments that underlying ZC driver is able to handle on
TX side. It is going to be included in netlink response only when driver
supports ZC. Any value higher than 1 implies multi-buffer ZC support on
underlying device.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-11-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Yafang Shao
4cd8e50d37 bpf: Support ->fill_link_info for perf_event
By introducing support for ->fill_link_info to the perf_event link, users
gain the ability to inspect it using `bpftool link show`. While the current
approach involves accessing this information via `bpftool perf show`,
consolidating link information for all link types in one place offers
greater convenience. Additionally, this patch extends support to the
generic perf event, which is not currently accommodated by
`bpftool perf show`. While only the perf type and config are exposed to
userspace, other attributes such as sample_period and sample_freq are
ignored. It's important to note that if kptr_restrict is not permitted, the
probed address will not be exposed, maintaining security measures.

A new enum bpf_perf_event_type is introduced to help the user understand
which struct is relevant.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230709025630.3735-9-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Yafang Shao
b89ede420b bpf: Support ->fill_link_info for kprobe_multi
With the addition of support for fill_link_info to the kprobe_multi link,
users will gain the ability to inspect it conveniently using the
`bpftool link show`. This enhancement provides valuable information to the
user, including the count of probed functions and their respective
addresses. It's important to note that if the kptr_restrict setting is not
permitted, the probed address will not be exposed, ensuring security.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230709025630.3735-2-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
thiagoftsm
360a2fd909 Merge branch 'libbpf:master' into master 2023-07-12 12:10:00 +00:00
Andrii Nakryiko
05f94ddbb8 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   c628747cc8800cf6d33d09f7f42c8b6f91e64dc7
Checkpoint bpf-next commit: a3e7e6b17946f48badce98d7ac360678a0ea7393
Baseline bpf commit:        496720b7cfb6574a8f6f4d434f23e3d1e6cfaeb9
Checkpoint bpf commit:      496720b7cfb6574a8f6f4d434f23e3d1e6cfaeb9

Andrii Nakryiko (1):
  libbpf: Fix realloc API handling in zero-sized edge cases

John Sanpe (1):
  libbpf: Remove HASHMAP_INIT static initialization helper

 src/hashmap.h | 10 ----------
 src/libbpf.c  | 15 ++++++++++++---
 src/usdt.c    |  5 ++++-
 3 files changed, 16 insertions(+), 14 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-07-11 10:03:25 -07:00
John Sanpe
bf88aaa6fe libbpf: Remove HASHMAP_INIT static initialization helper
Remove the wrong HASHMAP_INIT. It's not used anywhere in libbpf.

Signed-off-by: John Sanpe <sanpeqf@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230711070712.2064144-1-sanpeqf@gmail.com
2023-07-11 10:03:25 -07:00
Andrii Nakryiko
f117080307 libbpf: Fix realloc API handling in zero-sized edge cases
realloc() and reallocarray() can either return NULL or a special
non-NULL pointer, if their size argument is zero. This requires a bit
more care to handle NULL-as-valid-result situation differently from
NULL-as-error case. This has caused real issues before ([0]), and just
recently bit again in production when performing bpf_program__attach_usdt().

This patch fixes 4 places that do or potentially could suffer from this
mishandling of NULL, including the reported USDT-related one.

There are many other places where realloc()/reallocarray() is used and
NULL is always treated as an error value, but all those have guarantees
that their size is always non-zero, so those spot don't need any extra
handling.

  [0] d08ab82f59d5 ("libbpf: Fix double-free when linker processes empty sections")

Fixes: 999783c8bbda ("libbpf: Wire up spec management and other arch-independent USDT logic")
Fixes: b63b3c490eee ("libbpf: Add bpf_program__set_insns function")
Fixes: 697f104db8a6 ("libbpf: Support custom SEC() handlers")
Fixes: b12688267280 ("libbpf: Change the order of data and text relocations.")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230711024150.1566433-1-andrii@kernel.org
2023-07-11 10:03:25 -07:00
thiagoftsm
8b905090e8 Merge branch 'libbpf:master' into master 2023-07-10 22:36:12 +00:00
Andrii Nakryiko
6c020e6c47 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   856fe03d929205b4c8c8fa51296342cd85592e3f
Checkpoint bpf-next commit: c628747cc8800cf6d33d09f7f42c8b6f91e64dc7
Baseline bpf commit:        496720b7cfb6574a8f6f4d434f23e3d1e6cfaeb9
Checkpoint bpf commit:      496720b7cfb6574a8f6f4d434f23e3d1e6cfaeb9

Andrii Nakryiko (1):
  libbpf: only reset sec_def handler when necessary

 src/libbpf.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-07-10 14:24:42 -07:00
Andrii Nakryiko
1743bd1e40 libbpf: only reset sec_def handler when necessary
Don't reset recorded sec_def handler unconditionally on
bpf_program__set_type(). There are two situations where this is wrong.

First, if the program type didn't actually change. In that case original
SEC handler should work just fine.

Second, catch-all custom SEC handler is supposed to work with any BPF
program type and SEC() annotation, so it also doesn't make sense to
reset that.

This patch fixes both issues. This was reported recently in the context
of breaking perf tool, which uses custom catch-all handler for fancy BPF
prologue generation logic. This patch should fix the issue.

  [0] https://lore.kernel.org/linux-perf-users/ab865e6d-06c5-078e-e404-7f90686db50d@amd.com/

Fixes: d6e6286a12e7 ("libbpf: disassociate section handler on explicit bpf_program__set_type() call")
Reported-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230707231156.1711948-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-10 14:24:42 -07:00
Andrii Nakryiko
a2258003f2 ci: install headers before building selftests
Ensure latest kernel headers are available. Similar to [0].

  [0] https://github.com/libbpf/ci/pull/102

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-07-07 18:55:44 -07:00
Andrii Nakryiko
add1aac281 ci: add kprobe_multi_bench_attach to DENYLIST
It is suspected to be causing kernel crashes in libbpf CI, which we
don't see in kernel-patches CI.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-07-07 18:55:44 -07:00
Andrii Nakryiko
ea27ebcffd sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   25085b4e9251c77758964a8e8651338972353642
Checkpoint bpf-next commit: 856fe03d929205b4c8c8fa51296342cd85592e3f
Baseline bpf commit:        ad96f1c9138e0897bee7f7c5e54b3e24f8b62f57
Checkpoint bpf commit:      496720b7cfb6574a8f6f4d434f23e3d1e6cfaeb9

Andrea Terzolo (1):
  libbpf: Skip modules BTF loading when CAP_SYS_ADMIN is missing

Florian Westphal (1):
  libbpf: Add netfilter link attach helper

Jackie Liu (2):
  libbpf: Cross-join available_filter_functions and kallsyms for
    multi-kprobes
  libbpf: Use available_filter_functions_addrs with multi-kprobes

 src/bpf.c      |   8 ++
 src/bpf.h      |   6 ++
 src/libbpf.c   | 216 ++++++++++++++++++++++++++++++++++++++++++++++---
 src/libbpf.h   |  15 ++++
 src/libbpf.map |   1 +
 5 files changed, 233 insertions(+), 13 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-07-07 18:55:44 -07:00
Jackie Liu
b9c4ad5468 libbpf: Use available_filter_functions_addrs with multi-kprobes
Now that kernel provides a new available_filter_functions_addrs file
which can help us avoid the need to cross-validate
available_filter_functions and kallsyms, we can improve efficiency of
multi-attach kprobes. For example, on my device, the sample program [1]
of start time:

$ sudo ./funccount "tcp_*"

before   after
1.2s     1.0s

  [1]: https://github.com/JackieLiu1/ketones/tree/master/src/funccount

Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230705091209.3803873-2-liu.yun@linux.dev
2023-07-07 18:55:44 -07:00
Jackie Liu
732c4c6df2 libbpf: Cross-join available_filter_functions and kallsyms for multi-kprobes
When using regular expression matching with "kprobe multi", it scans all
the functions under "/proc/kallsyms" that can be matched. However, not all
of them can be traced by kprobe.multi. If any one of the functions fails
to be traced, it will result in the failure of all functions. The best
approach is to filter out the functions that cannot be traced to ensure
proper tracking of the functions.

Closes: https://lore.kernel.org/oe-kbuild-all/202307030355.TdXOHklM-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Jiri Olsa <jolsa@kernel.org>
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230705091209.3803873-1-liu.yun@linux.dev
2023-07-07 18:55:44 -07:00
Florian Westphal
6bec18258c libbpf: Add netfilter link attach helper
Add new api function: bpf_program__attach_netfilter.

It takes a bpf program (netfilter type), and a pointer to a option struct
that contains the desired attachment (protocol family, priority, hook
location, ...).

It returns a pointer to a 'bpf_link' structure or NULL on error.

Next patch adds new netfilter_basic test that uses this function to
attach a program to a few pf/hook/priority combinations.

v2: change name and use bpf_link_create.

Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/bpf/CAEf4BzZrmUv27AJp0dDxBDMY_B8e55-wLs8DUKK69vCWsCG_pQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/CAEf4BzZ69YgrQW7DHCJUT_X+GqMq_ZQQPBwopaJJVGFD5=d5Vg@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20230628152738.22765-2-fw@strlen.de
2023-07-07 18:55:44 -07:00
Andrea Terzolo
3f33f9a6b8 libbpf: Skip modules BTF loading when CAP_SYS_ADMIN is missing
If during CO-RE relocations libbpf is not able to find the target type
in the running kernel BTF, it searches for it in modules' BTF.
The downside of this approach is that loading modules' BTF requires
CAP_SYS_ADMIN and this prevents BPF applications from running with more
granular capabilities (e.g. CAP_BPF) when they don't need to search
types into modules' BTF.

This patch skips by default modules' BTF loading phase when
CAP_SYS_ADMIN is missing.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Co-developed-by: Federico Di Pierro <nierro92@gmail.com>
Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/CAGQdkDvYU_e=_NX+6DRkL_-TeH3p+QtsdZwHkmH0w3Fuzw0C4w@mail.gmail.com
Link: https://lore.kernel.org/bpf/20230626093614.21270-1-andreaterzolo3@gmail.com
2023-07-07 18:55:44 -07:00
Manu Bretelle
ec6f716eda ci: Add bpf_nf/{xdp,tc-bpf}-ct to denylist for x86
This test is consistently failing on x86 for unknown reasons.

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2023-06-17 00:07:28 +00:00
Manu Bretelle
3c7fcfe0ce sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   fcf1fa29c8ea75bf104c35ce29b65ce2ba6a6a9d
Checkpoint bpf-next commit: 25085b4e9251c77758964a8e8651338972353642
Baseline bpf commit:        f726e03564ef4e754dd93beb54303e2e1671049e
Checkpoint bpf commit:      ad96f1c9138e0897bee7f7c5e54b3e24f8b62f57

Andrii Nakryiko (2):
  libbpf: Ensure libbpf always opens files with O_CLOEXEC
  libbpf: Ensure FD >= 3 during bpf_map__reuse_fd()

Florian Westphal (1):
  bpf: netfilter: Add BPF_NETFILTER bpf_attach_type

JP Kobryn (1):
  libbpf: Change var type in datasec resize func

Louis DeLosSantos (1):
  bpf: Add table ID to bpf_fib_lookup BPF helper

 include/uapi/linux/bpf.h | 22 +++++++++++++++++++---
 src/btf.c                |  2 +-
 src/libbpf.c             | 26 +++++++++++++-------------
 src/libbpf_probes.c      |  4 +++-
 src/usdt.c               |  5 ++---
 5 files changed, 38 insertions(+), 21 deletions(-)

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2023-06-17 00:07:28 +00:00
Manu Bretelle
ef3e2ef82a sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2023-06-17 00:07:28 +00:00
Florian Westphal
45188d0d01 bpf: netfilter: Add BPF_NETFILTER bpf_attach_type
Andrii Nakryiko writes:

 And we currently don't have an attach type for NETLINK BPF link.
 Thankfully it's not too late to add it. I see that link_create() in
 kernel/bpf/syscall.c just bypasses attach_type check. We shouldn't
 have done that. Instead we need to add BPF_NETLINK attach type to enum
 bpf_attach_type. And wire all that properly throughout the kernel and
 libbpf itself.

This adds BPF_NETFILTER and uses it.  This breaks uabi but this
wasn't in any non-rc release yet, so it should be fine.

v2: check link_attack prog type in link_create too

Fixes: 84601d6ee68a ("bpf: add bpf_link support for BPF_NETFILTER programs")
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/CAEf4BzZ69YgrQW7DHCJUT_X+GqMq_ZQQPBwopaJJVGFD5=d5Vg@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20230605131445.32016-1-fw@strlen.de
2023-06-17 00:07:28 +00:00
Louis DeLosSantos
f02ec78083 bpf: Add table ID to bpf_fib_lookup BPF helper
Add ability to specify routing table ID to the `bpf_fib_lookup` BPF
helper.

A new field `tbid` is added to `struct bpf_fib_lookup` used as
parameters to the `bpf_fib_lookup` BPF helper.

When the helper is called with the `BPF_FIB_LOOKUP_DIRECT` and
`BPF_FIB_LOOKUP_TBID` flags the `tbid` field in `struct bpf_fib_lookup`
will be used as the table ID for the fib lookup.

If the `tbid` does not exist the fib lookup will fail with
`BPF_FIB_LKUP_RET_NOT_FWDED`.

The `tbid` field becomes a union over the vlan related output fields
in `struct bpf_fib_lookup` and will be zeroed immediately after usage.

This functionality is useful in containerized environments.

For instance, if a CNI wants to dictate the next-hop for traffic leaving
a container it can create a container-specific routing table and perform
a fib lookup against this table in a "host-net-namespace-side" TC program.

This functionality also allows `ip rule` like functionality at the TC
layer, allowing an eBPF program to pick a routing table based on some
aspect of the sk_buff.

As a concrete use case, this feature will be used in Cilium's SRv6 L3VPN
datapath.

When egress traffic leaves a Pod an eBPF program attached by Cilium will
determine which VRF the egress traffic should target, and then perform a
FIB lookup in a specific table representing this VRF's FIB.

Signed-off-by: Louis DeLosSantos <louis.delos.devel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230505-bpf-add-tbid-fib-lookup-v2-1-0a31c22c748c@gmail.com
2023-06-17 00:07:28 +00:00
Andrii Nakryiko
fa1a18d38b libbpf: Ensure FD >= 3 during bpf_map__reuse_fd()
Improve bpf_map__reuse_fd() logic and ensure that dup'ed map FD is
"good" (>= 3) and has O_CLOEXEC flags. Use fcntl(F_DUPFD_CLOEXEC) for
that, similarly to ensure_good_fd() helper we already use in low-level
APIs that work with bpf() syscall.

Suggested-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230525221311.2136408-2-andrii@kernel.org
2023-06-17 00:07:28 +00:00
Andrii Nakryiko
ba7a44da68 libbpf: Ensure libbpf always opens files with O_CLOEXEC
Make sure that libbpf code always gets FD with O_CLOEXEC flag set,
regardless if file is open through open() or fopen(). For the latter
this means to add "e" to mode string, which is supported since pretty
ancient glibc v2.7.

Also drop the outdated TODO comment in usdt.c, which was already completed.

Suggested-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230525221311.2136408-1-andrii@kernel.org
2023-06-17 00:07:28 +00:00
Manu Bretelle
cb23f981c3 ci: Dump kconfig before running tests
This helps troubleshooting by validating what the Kconfig of the testing
environment is.

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2023-06-15 14:04:53 -07:00
Daniel Müller
f7eb43b90f ci: add fix for sockopt sub-tests
Sockopt sub-tests currently don't honor denylisting properly. Fix them.
Upstream fix was found at [0].

[0] https://lore.kernel.org/bpf/20230525232248.640465-1-deso@posteo.net/T/#u

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Daniel Müller
9710829e78 ci: Gracefully handle test names with spaces inside
Cherry pick of pieces of f909f8bf110d ("ci: temporarily disable
test_btf_dump_case") from vmtest to handle spaces in test names
properly.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
JP Kobryn
e021ccbd7d libbpf: Change var type in datasec resize func
This changes a local variable type that stores a new array id to match
the return type of btf__add_array().

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230525001323.8554-1-inwardvessel@gmail.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Daniel Müller
0755b497cf ci: add fix for multi-kprobe as temporary patch
This fixes 39d954200bf6 ("fprobe: Skip exit_handler if entry_handler
returns !0"), which causes multiple multi-kprobe tests to fail. Upstream
fix was found at [0].

[0] https://lore.kernel.org/all/168100731160.79534.374827110083836722.stgit@devnote2/#r

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Daniel Müller
c4ffdf1e72 ci: Adjust allow/deny lists for most recent sync
Adjust the allow & deny lists for use after the most recent sync with
upstream.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Daniel Müller
c850306199 ci: Regenerate latest vmlinux.h for old kernel CI tests.
CI will fail without it.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Daniel Müller
fb6998382d libbpf: Bump version to v1.3 in Makefile
Bump LIBBPF_MINOR_VERSION to 3 for v1.3 dev cycle.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Daniel Müller
9aea1da2bb sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   2ddade322925641ee2a75f13665c51f2e74d7791
Checkpoint bpf-next commit: fcf1fa29c8ea75bf104c35ce29b65ce2ba6a6a9d
Baseline bpf commit:        71b547f561247897a0a14f3082730156c0533fed
Checkpoint bpf commit:      f726e03564ef4e754dd93beb54303e2e1671049e

Alexey Dobriyan (1):
  ELF: fix all "Elf" typos

Andrii Nakryiko (4):
  libbpf: fix offsetof() and container_of() to work with CO-RE
  libbpf: Start v1.3 development cycle
  bpf: Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
  libbpf: Add opts-based bpf_obj_pin() API and add support for path_fd

Florian Westphal (1):
  tools: bpftool: print netfilter link info

JP Kobryn (1):
  libbpf: Add capability for resizing datasec maps

Jiri Olsa (1):
  libbpf: Store zero fd to fd_array for loader kfunc relocation

Kenjiro Nakayama (1):
  libbpf: Fix comment about arc and riscv arch in bpf_tracing.h

Martin KaFai Lau (1):
  libbpf: btf_dump_type_data_check_overflow needs to consider
    BTF_MEMBER_BITFIELD_SIZE

 include/uapi/linux/bpf.h |  24 +++++++
 src/bpf.c                |  17 ++++-
 src/bpf.h                |  18 ++++-
 src/bpf_helpers.h        |  15 +++--
 src/bpf_tracing.h        |   3 +-
 src/btf_dump.c           |  22 +++++-
 src/gen_loader.c         |  14 ++--
 src/libbpf.c             | 140 ++++++++++++++++++++++++++++++++++++---
 src/libbpf.h             |  18 ++++-
 src/libbpf.map           |   5 ++
 src/libbpf_probes.c      |   1 +
 src/libbpf_version.h     |   2 +-
 src/usdt.c               |   2 +-
 13 files changed, 246 insertions(+), 35 deletions(-)

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
JP Kobryn
8b4e1b39a4 libbpf: Add capability for resizing datasec maps
This patch updates bpf_map__set_value_size() so that if the given map is
memory mapped, it will attempt to resize the mapped region. Initial
contents of the mapped region are preserved. BTF is not required, but
after the mapping is resized an attempt is made to adjust the associated
BTF information if the following criteria is met:
 - BTF info is present
 - the map is a datasec
 - the final variable in the datasec is an array

... the resulting BTF info will be updated so that the final array
variable is associated with a new BTF array type sized to cover the
requested size.

Note that the initial resizing of the memory mapped region can succeed
while the subsequent BTF adjustment can fail. In this case, BTF info is
dropped from the map by clearing the key and value type.

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230524004537.18614-2-inwardvessel@gmail.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Andrii Nakryiko
a50544ef45 libbpf: Add opts-based bpf_obj_pin() API and add support for path_fd
Add path_fd support for bpf_obj_pin() and bpf_obj_get() operations
(through their opts-based variants). This allows to take advantage of
new kernel-side support for O_PATH-based pin/get location specification.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230523170013.728457-4-andrii@kernel.org
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Andrii Nakryiko
bfb0454244 bpf: Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
Current UAPI of BPF_OBJ_PIN and BPF_OBJ_GET commands of bpf() syscall
forces users to specify pinning location as a string-based absolute or
relative (to current working directory) path. This has various
implications related to security (e.g., symlink-based attacks), forces
BPF FS to be exposed in the file system, which can cause races with
other applications.

One of the feedbacks we got from folks working with containers heavily
was that inability to use purely FD-based location specification was an
unfortunate limitation and hindrance for BPF_OBJ_PIN and BPF_OBJ_GET
commands. This patch closes this oversight, adding path_fd field to
BPF_OBJ_PIN and BPF_OBJ_GET UAPI, following conventions established by
*at() syscalls for dirfd + pathname combinations.

This now allows interesting possibilities like working with detached BPF
FS mount (e.g., to perform multiple pinnings without running a risk of
someone interfering with them), and generally making pinning/getting
more secure and not prone to any races and/or security attacks.

This is demonstrated by a selftest added in subsequent patch that takes
advantage of new mount APIs (fsopen, fsconfig, fsmount) to demonstrate
creating detached BPF FS mount, pinning, and then getting BPF map out of
it, all while never exposing this private instance of BPF FS to outside
worlds.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/bpf/20230523170013.728457-4-andrii@kernel.org
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Andrii Nakryiko
79811cad50 libbpf: Start v1.3 development cycle
Bump libbpf.map to v1.3.0 to start a new libbpf version cycle.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230523170013.728457-3-andrii@kernel.org
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Jiri Olsa
4bb0b0ca09 libbpf: Store zero fd to fd_array for loader kfunc relocation
When moving some of the test kfuncs to bpf_testmod I hit an issue
when some of the kfuncs that object uses are in module and some
in vmlinux.

The problem is that both vmlinux and module kfuncs get allocated
btf_fd_idx index into fd_array, but we store to it the BTF fd value
only for module's kfunc, not vmlinux's one because (it's zero).

Then after the program is loaded we check if fd_array[btf_fd_idx] != 0
and close the fd.

When the object has kfuncs from both vmlinux and module, the fd from
fd_array[btf_fd_idx] from previous load will be stored in there for
vmlinux's kfunc, so we close unrelated fd (of the program we just
loaded in my case).

Fixing this by storing zero to fd_array[btf_fd_idx] for vmlinux
kfuncs, so the we won't close stale fd.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Andrii Nakryiko
ac42790129 libbpf: fix offsetof() and container_of() to work with CO-RE
It seems like __builtin_offset() doesn't preserve CO-RE field
relocations properly. So if offsetof() macro is defined through
__builtin_offset(), CO-RE-enabled BPF code using container_of() will be
subtly and silently broken.

To avoid this problem, redefine offsetof() and container_of() in the
form that works with CO-RE relocations more reliably.

Fixes: 5fbc220862fc ("tools/libpf: Add offsetof/container_of macro in bpf_helpers.h")
Reported-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230509065502.2306180-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Kenjiro Nakayama
6a6cf6dcdc libbpf: Fix comment about arc and riscv arch in bpf_tracing.h
To make comments about arc and riscv arch in bpf_tracing.h accurate,
this patch fixes the comment about arc and adds the comment for riscv.

Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230504035443.427927-1-nakayamakenjiro@gmail.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Martin KaFai Lau
b9711e7015 libbpf: btf_dump_type_data_check_overflow needs to consider BTF_MEMBER_BITFIELD_SIZE
The btf_dump/struct_data selftest is failing with:

  [...]
  test_btf_dump_struct_data:FAIL:unexpected return value dumping fs_context unexpected unexpected return value dumping fs_context: actual -7 != expected 264
  [...]

The reason is in btf_dump_type_data_check_overflow(). It does not use
BTF_MEMBER_BITFIELD_SIZE from the struct's member (btf_member). Instead,
it is using the enum size which is 4. It had been working till the recent
commit 4e04143c869c ("fs_context: drop the unused lsm_flags member")
removed an integer member which also removed the 4 bytes padding at the
end of the fs_context. Missing this 4 bytes padding exposed this bug. In
particular, when btf_dump_type_data_check_overflow() reaches the member
'phase', -E2BIG is returned.

The fix is to pass bit_sz to btf_dump_type_data_check_overflow(). In
btf_dump_type_data_check_overflow(), it does a different size check when
bit_sz is not zero.

The current fs_context:

[3600] ENUM 'fs_context_purpose' encoding=UNSIGNED size=4 vlen=3
	'FS_CONTEXT_FOR_MOUNT' val=0
	'FS_CONTEXT_FOR_SUBMOUNT' val=1
	'FS_CONTEXT_FOR_RECONFIGURE' val=2
[3601] ENUM 'fs_context_phase' encoding=UNSIGNED size=4 vlen=7
	'FS_CONTEXT_CREATE_PARAMS' val=0
	'FS_CONTEXT_CREATING' val=1
	'FS_CONTEXT_AWAITING_MOUNT' val=2
	'FS_CONTEXT_AWAITING_RECONF' val=3
	'FS_CONTEXT_RECONF_PARAMS' val=4
	'FS_CONTEXT_RECONFIGURING' val=5
	'FS_CONTEXT_FAILED' val=6
[3602] STRUCT 'fs_context' size=264 vlen=21
	'ops' type_id=3603 bits_offset=0
	'uapi_mutex' type_id=235 bits_offset=64
	'fs_type' type_id=872 bits_offset=1216
	'fs_private' type_id=21 bits_offset=1280
	'sget_key' type_id=21 bits_offset=1344
	'root' type_id=781 bits_offset=1408
	'user_ns' type_id=251 bits_offset=1472
	'net_ns' type_id=984 bits_offset=1536
	'cred' type_id=1785 bits_offset=1600
	'log' type_id=3621 bits_offset=1664
	'source' type_id=42 bits_offset=1792
	'security' type_id=21 bits_offset=1856
	's_fs_info' type_id=21 bits_offset=1920
	'sb_flags' type_id=20 bits_offset=1984
	'sb_flags_mask' type_id=20 bits_offset=2016
	's_iflags' type_id=20 bits_offset=2048
	'purpose' type_id=3600 bits_offset=2080 bitfield_size=8
	'phase' type_id=3601 bits_offset=2088 bitfield_size=8
	'need_free' type_id=67 bits_offset=2096 bitfield_size=1
	'global' type_id=67 bits_offset=2097 bitfield_size=1
	'oldapi' type_id=67 bits_offset=2098 bitfield_size=1

Fixes: 920d16af9b42 ("libbpf: BTF dumper support for typed data")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230428013638.1581263-1-martin.lau@linux.dev
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Alexey Dobriyan
4c484d662c ELF: fix all "Elf" typos
ELF is acronym and therefore should be spelled in all caps.

I left one exception at Documentation/arm/nwfpe/nwfpe.rst which looks like
being written in the first person.

Link: https://lkml.kernel.org/r/Y/3wGWQviIOkyLJW@p183
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Florian Westphal
1c9aa4791a tools: bpftool: print netfilter link info
Dump protocol family, hook and priority value:
$ bpftool link
2: netfilter  prog 14
        ip input prio -128
        pids install(3264)
5: netfilter  prog 14
        ip6 forward prio 21
        pids a.out(3387)
9: netfilter  prog 14
        ip prerouting prio 123
        pids a.out(5700)
10: netfilter  prog 14
        ip input prio 21
        pids test2(5701)

v2: Quentin Monnet suggested to also add 'bpftool net' support:

$ bpftool net
xdp:

tc:

flow_dissector:

netfilter:

        ip prerouting prio 21 prog_id 14
        ip input prio -128 prog_id 14
        ip input prio 21 prog_id 14
        ip forward prio 21 prog_id 14
        ip output prio 21 prog_id 14
        ip postrouting prio 21 prog_id 14

'bpftool net' only dumps netfilter link type, links are sorted by protocol
family, hook and priority.

v5: fix bpf ci failure: libbpf needs small update to prog_type_name[]
    and probe_prog_load helper.
v4: don't fail with -EOPNOTSUPP in libbpf probe_prog_load, update
    prog_type_name[] with "netfilter" entry (bpf ci)
v3: fix bpf.h copy, 'reserved' member was removed (Alexei)
    use p_err, not fprintf (Quentin)

Suggested-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/eeeaac99-9053-90c2-aa33-cc1ecb1ae9ca@isovalent.com/
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-6-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Andrii Nakryiko
3f591a6610 git: make .gitattributes compatible with git-archive-all action
As reported by Quentin, using Github Action to archive all submodules
(e.g., for retsnoop release packaging) is impacted by it not supporting
"<glob>/" pattern in .gitattributes. Use "<glob>/**" instead.

  [0] https://github.com/anakryiko/retsnoop/pull/42#issuecomment-1560797837

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-05-25 13:14:58 -07:00
Evgeny Vereshchagin
532293bdf4 fuzz: bump elfutils to 0.189
The elfutils project has fixed several issues found by fuzz targets so it
should help to prevent the libbpf fuzz target from running into them.

Signed-off-by: Evgeny Vereshchagin <evvers@ya.ru>
2023-05-12 14:29:41 -07:00
thiagoftsm
dd7dd01114 Merge branch 'libbpf:master' into master 2023-05-04 16:40:08 +00:00
Song Liu
fbd60dbff5 ci: Fix test_progs failure
Fix test_progs failure xdp_bonding/xdp_bonding_redirect_multi with a
missing commit (in bpf, but not in bpf-next yet).

Signed-off-by: Song Liu <song@kernel.org>
2023-04-20 12:01:06 -07:00
Song Liu
44b0bc9ad7 ci: Regenerate latest vmlinux.h for old kernel CI tests.
CI fails without it.

Signed-off-by: Song Liu <song@kernel.org>
2023-04-19 16:15:07 -07:00
Song Liu
f0e39b4946 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   4ca13d1002f37c10038ff4ed3cfdc70dbe049d60
Checkpoint bpf-next commit: 2ddade322925641ee2a75f13665c51f2e74d7791
Baseline bpf commit:        a6f6a95f25803500079513780d11a911ce551d76
Checkpoint bpf commit:      71b547f561247897a0a14f3082730156c0533fed

Andrii Nakryiko (9):
  libbpf: Don't enforce unnecessary verifier log restrictions on libbpf
    side
  bpf: Add log_true_size output field to return necessary log buffer
    size
  libbpf: Wire through log_true_size returned from kernel for
    BPF_PROG_LOAD
  libbpf: Wire through log_true_size for bpf_btf_load() API
  libbpf: misc internal libbpf clean ups around log fixup
  libbpf: report vmlinux vs module name when dealing with ksyms
  libbpf: improve handling of unresolved kfuncs
  libbpf: move bpf_for(), bpf_for_each(), and bpf_repeat() into
    bpf_helpers.h
  libbpf: mark bpf_iter_num_{new,next,destroy} as __weak

Arnaldo Carvalho de Melo (1):
  tools include UAPI: Synchronize linux/fcntl.h with the kernel sources

Dave Marchevsky (1):
  bpf: Introduce opaque bpf_refcount struct and add btf_record plumbing

Herbert Xu (1):
  macvlan: Add netlink attribute for broadcast cutoff

Lorenzo Bianconi (1):
  xdp: add xdp_set_features_flag utility routine

 include/uapi/linux/bpf.h     |  16 +++++-
 include/uapi/linux/fcntl.h   |   1 +
 include/uapi/linux/if_link.h |   1 +
 include/uapi/linux/netdev.h  |   2 +
 src/bpf.c                    |  17 +++---
 src/bpf.h                    |  22 +++++--
 src/bpf_helpers.h            | 103 +++++++++++++++++++++++++++++++++
 src/libbpf.c                 | 107 ++++++++++++++++++++++++++++-------
 8 files changed, 237 insertions(+), 32 deletions(-)

Signed-off-by: Song Liu <song@kernel.org>
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
294c85e9b3 libbpf: mark bpf_iter_num_{new,next,destroy} as __weak
Mark bpf_iter_num_{new,next,destroy}() kfuncs declared for
bpf_for()/bpf_repeat() macros as __weak to allow users to feature-detect
their presence and guard bpf_for()/bpf_repeat() loops accordingly for
backwards compatibility with old kernels.

Now that libbpf supports kfunc calls poisoning and better reporting of
unresolved (but called) kfuncs, declaring number iterator kfuncs in
bpf_helpers.h won't degrade user experience and won't cause unnecessary
kernel feature dependencies.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
2293c20f82 libbpf: move bpf_for(), bpf_for_each(), and bpf_repeat() into bpf_helpers.h
To make it easier for bleeding-edge BPF applications, such as sched_ext,
to utilize open-coded iterators, move bpf_for(), bpf_for_each(), and
bpf_repeat() macros from selftests/bpf-internal bpf_misc.h helper, to
libbpf-provided bpf_helpers.h header.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
e6cc30f445 libbpf: improve handling of unresolved kfuncs
Currently, libbpf leaves `call #0` instruction for __weak unresolved
kfuncs, which might lead to a confusing verifier log situations, where
invalid `call #0` will be treated as successfully validated.

We can do better. Libbpf already has an established mechanism of
poisoning instructions that failed some form of resolution (e.g., CO-RE
relocation and BPF map set to not be auto-created). Libbpf doesn't fail
them outright to allow users to guard them through other means, and as
long as BPF verifier can prove that such poisoned instructions cannot be
ever reached, this doesn't consistute an invalid BPF program. If user
didn't guard such code, libbpf will extract few pieces of information to
tie such poisoned instructions back to additional information about what
entitity wasn't resolved (e.g., BPF map name, or CO-RE relocation
information).

__weak unresolved kfuncs fit this model well, so this patch extends
libbpf with poisioning and log fixup logic for kfunc calls.

Note, this poisoning is done only for kfunc *calls*, not kfunc address
resolution (ldimm64 instructions). The former cannot be ever valid, if
reached, so it's safe to poison them. The latter is a valid mechanism to
check if __weak kfunc ksym was resolved, and do necessary guarding and
work arounds based on this result, supported in most recent kernels. As
such, libbpf keeps such ldimm64 instructions as loading zero, never
poisoning them.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
6fd310547d libbpf: report vmlinux vs module name when dealing with ksyms
Currently libbpf always reports "kernel" as a source of ksym BTF type,
which is ambiguous given ksym's BTF can come from either vmlinux or
kernel module BTFs. Make this explicit and log module name, if used BTF
is from kernel module.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
0db753a9f8 libbpf: misc internal libbpf clean ups around log fixup
Normalize internal constants, field names, and comments related to log
fixup. Also add explicit `ext_idx` alias for relocation where relocation
is pointing to extern description for additional information.

No functional changes, just a clean up before subsequent additions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-19 16:15:07 -07:00
Dave Marchevsky
44f59ec077 bpf: Introduce opaque bpf_refcount struct and add btf_record plumbing
A 'struct bpf_refcount' is added to the set of opaque uapi/bpf.h types
meant for use in BPF programs. Similarly to other opaque types like
bpf_spin_lock and bpf_rbtree_node, the verifier needs to know where in
user-defined struct types a bpf_refcount can be located, so necessary
btf_record plumbing is added to enable this. bpf_refcount is sized to
hold a refcount_t.

Similarly to bpf_spin_lock, the offset of a bpf_refcount is cached in
btf_record as refcount_off in addition to being in the field array.
Caching refcount_off makes sense for this field because further patches
in the series will modify functions that take local kptrs (e.g.
bpf_obj_drop) to change their behavior if the type they're operating on
is refcounted. So enabling fast "is this type refcounted?" checks is
desirable.

No such verifier behavior changes are introduced in this patch, just
logic to recognize 'struct bpf_refcount' in btf_record.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230415201811.343116-3-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
2f01564c50 libbpf: Wire through log_true_size for bpf_btf_load() API
Similar to what we did for bpf_prog_load() in previous patch, wire
returning of log_true_size value from kernel back to the user through
OPTS out field.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230406234205.323208-17-andrii@kernel.org
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
c2fe7adb33 libbpf: Wire through log_true_size returned from kernel for BPF_PROG_LOAD
Add output-only log_true_size field to bpf_prog_load_opts to return
bpf_attr->log_true_size value back from bpf() syscall.

Note, that we have to drop const modifier from opts in bpf_prog_load().
This could potentially cause compilation error for some users. But
the usual practice is to define bpf_prog_load_ops
as a local variable next to bpf_prog_load() call and pass pointer to it,
so const vs non-const makes no difference and won't even come up in most
(if not all) cases.

There are no runtime and ABI backwards/forward compatibility issues at all.
If user provides old struct bpf_prog_load_opts, libbpf won't set new
fields. If old libbpf is provided new bpf_prog_load_opts, nothing will
happen either as old libbpf doesn't yet know about this new field.

Adding a new variant of bpf_prog_load() just for this seems like a big
and unnecessary overkill. As a corroborating evidence is the fact that
entire selftests/bpf code base required not adjustment whatsoever.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230406234205.323208-16-andrii@kernel.org
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
88004dd87a bpf: Add log_true_size output field to return necessary log buffer size
Add output-only log_true_size and btf_log_true_size field to
BPF_PROG_LOAD and BPF_BTF_LOAD commands, respectively. It will return
the size of log buffer necessary to fit in all the log contents at
specified log_level. This is very useful for BPF loader libraries like
libbpf to be able to size log buffer correctly, but could be used by
users directly, if necessary, as well.

This patch plumbs all this through the code, taking into account actual
bpf_attr size provided by user to determine if these new fields are
expected by users. And if they are, set them from kernel on return.

We refactory btf_parse() function to accommodate this, moving attr and
uattr handling inside it. The rest is very straightforward code, which
is split from the logging accounting changes in the previous patch to
make it simpler to review logic vs UAPI changes.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-13-andrii@kernel.org
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
a22abb9c85 libbpf: Don't enforce unnecessary verifier log restrictions on libbpf side
This basically prevents any forward compatibility. And we either way
just return -EINVAL, which would otherwise be returned from bpf()
syscall anyways.

Similarly, drop enforcement of non-NULL log_buf when log_level > 0. This
won't be true anymore soon.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-5-andrii@kernel.org
2023-04-19 16:15:07 -07:00
Herbert Xu
2c0c927a38 macvlan: Add netlink attribute for broadcast cutoff
Make the broadcast cutoff configurable through netlink.  Note
that macvlan is weird because there is no central device for
us to configure (the lowerdev could be anything).  So all the
options are duplicated over what could be thousands of child
devices.

IFLA_MACVLAN_BC_QUEUE_LEN took the approach of taking the maximum
of all child device settings.  This is unnecessary as we could
simply store the option in the port device and take the last
child device that gets updated as the value to use.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
d9d17f6d71 git: add .gitattributes file ignoring assets/ during archiving
We don't need to archive assets/ subdir when packaging libbpf sources in
retsnoop and veristat repos. Mark assets/ as export-ignore to skip it.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-04-01 15:36:54 -07:00
Daniel Müller
3783577161 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   226bc6ae6405c46a6e9865835c36a1d45fc0b3bf
Checkpoint bpf-next commit: 4ca13d1002f37c10038ff4ed3cfdc70dbe049d60
Baseline bpf commit:        915efd8a446b74442039d31689d5d863caf82517
Checkpoint bpf commit:      a6f6a95f25803500079513780d11a911ce551d76

Andrii Nakryiko (1):
  libbpf: disassociate section handler on explicit
    bpf_program__set_type() call

Arnaldo Carvalho de Melo (1):
  tools include UAPI: Synchronize linux/fcntl.h with the kernel sources

Eduard Zingerman (1):
  libbpf: Fix double-free when linker processes empty sections

JP Kobryn (1):
  libbpf: Ensure print callback usage is thread-safe

Jakub Kicinski (1):
  ynl: broaden the license even more

Lorenzo Bianconi (1):
  xdp: add xdp_set_features_flag utility routine

 include/uapi/linux/fcntl.h  |  1 +
 include/uapi/linux/netdev.h |  4 +++-
 src/libbpf.c                | 10 +++++++---
 src/libbpf.h                |  2 ++
 src/linker.c                | 14 +++++++++++++-
 5 files changed, 26 insertions(+), 5 deletions(-)

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-30 16:24:24 -07:00
Jakub Kicinski
75c14163b9 ynl: broaden the license even more
I relicensed Netlink spec code to GPL-2.0 OR BSD-3-Clause but
we still put a slightly different license on the uAPI header
than the rest of the code. Use the Linux-syscall-note on all
the specs and all generated code. It's moot for kernel code,
but should not hurt. This way the licenses match everywhere.

Cc: Chuck Lever <chuck.lever@oracle.com>
Fixes: 37d9df224d1e ("ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause")
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-30 16:24:24 -07:00
Lorenzo Bianconi
056e9bcc19 xdp: add xdp_set_features_flag utility routine
Introduce xdp_set_features_flag utility routine in order to update
dynamically xdp_features according to the dynamic hw configuration via
ethtool (e.g. changing number of hw rx/tx queues).
Add xdp_clear_features_flag() in order to clear all xdp_feature flag.

Reviewed-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-30 16:24:24 -07:00
Arnaldo Carvalho de Melo
14ae9422db tools include UAPI: Synchronize linux/fcntl.h with the kernel sources
To pick up the changes in:

  6fd7353829cafc40 ("mm/memfd: add F_SEAL_EXEC")

That doesn't add or change any perf tools functionality, only addresses
these build warnings:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/fcntl.h' differs from latest version at 'include/uapi/linux/fcntl.h'
  diff -u tools/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-30 16:24:24 -07:00
Andrii Nakryiko
3fd6eebb2d libbpf: disassociate section handler on explicit bpf_program__set_type() call
If user explicitly overrides programs's type with
bpf_program__set_type() API call, we need to disassociate whatever
SEC_DEF handler libbpf determined initially based on program's SEC()
definition, as it's not goind to be valid anymore and could lead to
crashes and/or confusing failures.

Also, fix up bpf_prog_test_load() helper in selftests/bpf, which is
force-setting program type (even if that's completely unnecessary; this
is quite a legacy piece of code), and thus should expect auto-attach to
not work, yet one of the tests explicitly relies on auto-attach for
testing.

Instead, force-set program type only if it differs from the desired one.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230327185202.1929145-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-30 16:24:24 -07:00
Eduard Zingerman
4218389b1e libbpf: Fix double-free when linker processes empty sections
Double-free error in bpf_linker__free() was reported by James Hilliard.
The error is caused by miss-use of realloc() in extend_sec().
The error occurs when two files with empty sections of the same name
are linked:
- when first file is processed:
  - extend_sec() calls realloc(dst->raw_data, dst_align_sz)
    with dst->raw_data == NULL and dst_align_sz == 0;
  - dst->raw_data is set to a special pointer to a memory block of
    size zero;
- when second file is processed:
  - extend_sec() calls realloc(dst->raw_data, dst_align_sz)
    with dst->raw_data == <special pointer> and dst_align_sz == 0;
  - realloc() "frees" dst->raw_data special pointer and returns NULL;
  - extend_sec() exits with -ENOMEM, and the old dst->raw_data value
    is preserved (it is now invalid);
  - eventually, bpf_linker__free() attempts to free dst->raw_data again.

This patch fixes the bug by avoiding -ENOMEM exit for dst_align_sz == 0.
The fix was suggested by Andrii Nakryiko <andrii.nakryiko@gmail.com>.

Reported-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: James Hilliard <james.hilliard1@gmail.com>
Link: https://lore.kernel.org/bpf/CADvTj4o7ZWUikKwNTwFq0O_AaX+46t_+Ca9gvWMYdWdRtTGeHQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20230328004738.381898-3-eddyz87@gmail.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-30 16:24:24 -07:00
JP Kobryn
ae32d7169d libbpf: Ensure print callback usage is thread-safe
This patch prevents races on the print function pointer, allowing the
libbpf_set_print() function to become thread-safe.

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230325010845.46000-1-inwardvessel@gmail.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-30 16:24:24 -07:00
Andrii Nakryiko
b362bb6e10 ci: update libbpf/ci references to use "main"
Seems like deafult branch was renamed s/master/main/, adopt libbpf CI to
not fail.

Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-03-27 10:45:47 -07:00
Andrii Nakryiko
f8cd00f613 ci: fallback to llvm-16 and clang-16 again
Seems like upstream LLVM/Clang packaging still has issues with
llvm/clang 17. Fallback to 16 again, for now.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-03-23 13:10:17 -07:00
Andrii Nakryiko
dc4e7076ad sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   b8a2e3f93d412114a1539ea97b59b3e6ed6e1f9a
Checkpoint bpf-next commit: 226bc6ae6405c46a6e9865835c36a1d45fc0b3bf
Baseline bpf commit:        a33a6eaa19d3af261e8708bfc8ba62020703117f
Checkpoint bpf commit:      915efd8a446b74442039d31689d5d863caf82517

Alexei Starovoitov (5):
  libbpf: Fix relocation of kfunc ksym in ld_imm64 insn.
  libbpf: Introduce bpf_ksym_exists() macro.
  libbpf: Fix ld_imm64 copy logic for ksym in light skeleton.
  libbpf: Rename RELO_EXTERN_VAR/FUNC.
  libbpf: Support kfunc detection in light skeleton.

Daniel Müller (1):
  libbpf: Ignore warnings about "inefficient alignment"

Kui-Feng Lee (5):
  bpf: Create links for BPF struct_ops maps.
  libbpf: Create a bpf_link in bpf_map__attach_struct_ops().
  bpf: Update the struct_ops of a bpf_link.
  libbpf: Update a bpf_link with another struct_ops.
  libbpf: Use .struct_ops.link section to indicate a struct_ops with a
    link.

Liu Pan (1):
  libbpf: Explicitly call write to append content to file

Sreevani Sreejith (1):
  bpf, docs: Libbpf overview documentation

 docs/index.rst           |  25 +++--
 docs/libbpf_overview.rst | 228 +++++++++++++++++++++++++++++++++++++
 include/uapi/linux/bpf.h |  33 +++++-
 src/bpf.c                |   8 +-
 src/bpf.h                |   3 +-
 src/bpf_gen_internal.h   |   4 +-
 src/bpf_helpers.h        |   5 +
 src/gen_loader.c         |  48 ++++----
 src/libbpf.c             | 235 +++++++++++++++++++++++++++++----------
 src/libbpf.h             |   1 +
 src/libbpf.map           |   1 +
 src/zip.c                |   6 +
 12 files changed, 501 insertions(+), 96 deletions(-)
 create mode 100644 docs/libbpf_overview.rst

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-03-23 13:10:17 -07:00
Kui-Feng Lee
465a73051d libbpf: Use .struct_ops.link section to indicate a struct_ops with a link.
Flags a struct_ops is to back a bpf_link by putting it to the
".struct_ops.link" section.  Once it is flagged, the created
struct_ops can be used to create a bpf_link or update a bpf_link that
has been backed by another struct_ops.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230323032405.3735486-8-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-23 13:10:17 -07:00
Kui-Feng Lee
e51cdaaca0 libbpf: Update a bpf_link with another struct_ops.
Introduce bpf_link__update_map(), which allows to atomically update
underlying struct_ops implementation for given struct_ops BPF link.

Also add old_map_fd to struct bpf_link_update_opts to handle
BPF_F_REPLACE feature.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230323032405.3735486-7-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-23 13:10:17 -07:00
Kui-Feng Lee
055cbdcc9f bpf: Update the struct_ops of a bpf_link.
By improving the BPF_LINK_UPDATE command of bpf(), it should allow you
to conveniently switch between different struct_ops on a single
bpf_link. This would enable smoother transitions from one struct_ops
to another.

The struct_ops maps passing along with BPF_LINK_UPDATE should have the
BPF_F_LINK flag.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230323032405.3735486-6-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-23 13:10:17 -07:00
Kui-Feng Lee
c6893dccd9 libbpf: Create a bpf_link in bpf_map__attach_struct_ops().
bpf_map__attach_struct_ops() was creating a dummy bpf_link as a
placeholder, but now it is constructing an authentic one by calling
bpf_link_create() if the map has the BPF_F_LINK flag.

You can flag a struct_ops map with BPF_F_LINK by calling
bpf_map__set_map_flags().

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230323032405.3735486-5-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-23 13:10:17 -07:00
Kui-Feng Lee
077bf73900 bpf: Create links for BPF struct_ops maps.
Make bpf_link support struct_ops.  Previously, struct_ops were always
used alone without any associated links. Upon updating its value, a
struct_ops would be activated automatically. Yet other BPF program
types required to make a bpf_link with their instances before they
could become active. Now, however, you can create an inactive
struct_ops, and create a link to activate it later.

With bpf_links, struct_ops has a behavior similar to other BPF program
types. You can pin/unpin them from their links and the struct_ops will
be deactivated when its link is removed while previously need someone
to delete the value for it to be deactivated.

bpf_links are responsible for registering their associated
struct_ops. You can only use a struct_ops that has the BPF_F_LINK flag
set to create a bpf_link, while a structs without this flag behaves in
the same manner as before and is registered upon updating its value.

The BPF_LINK_TYPE_STRUCT_OPS serves a dual purpose. Not only is it
used to craft the links for BPF struct_ops programs, but also to
create links for BPF struct_ops them-self.  Since the links of BPF
struct_ops programs are only used to create trampolines internally,
they are never seen in other contexts. Thus, they can be reused for
struct_ops themself.

To maintain a reference to the map supporting this link, we add
bpf_struct_ops_link as an additional type. The pointer of the map is
RCU and won't be necessary until later in the patchset.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230323032405.3735486-4-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-23 13:10:17 -07:00
Alexei Starovoitov
68cd7cd386 libbpf: Support kfunc detection in light skeleton.
Teach gen_loader to find {btf_id, btf_obj_fd} of kernel variables and kfuncs
and populate corresponding ld_imm64 and bpf_call insns.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230321203854.3035-4-alexei.starovoitov@gmail.com
2023-03-23 13:10:17 -07:00
Alexei Starovoitov
a5464a5b0e libbpf: Rename RELO_EXTERN_VAR/FUNC.
RELO_EXTERN_VAR/FUNC names are not correct anymore. RELO_EXTERN_VAR represent
ksym symbol in ld_imm64 insn. It can point to kernel variable or kfunc.
Rename RELO_EXTERN_VAR->RELO_EXTERN_LD64 and RELO_EXTERN_FUNC->RELO_EXTERN_CALL
to match what they actually represent.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230321203854.3035-2-alexei.starovoitov@gmail.com
2023-03-23 13:10:17 -07:00
Liu Pan
753e4d07d1 libbpf: Explicitly call write to append content to file
Write data to fd by calling "vdprintf", in most implementations
of the standard library, the data is finally written by the writev syscall.
But "uprobe_events/kprobe_events" does not allow segmented writes,
so switch the "append_to_file" function to explicit write() call.

Signed-off-by: Liu Pan <patteliu@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230320030720.650-1-patteliu@gmail.com
2023-03-23 13:10:17 -07:00
Alexei Starovoitov
5b45c90c49 libbpf: Fix ld_imm64 copy logic for ksym in light skeleton.
Unlike normal libbpf the light skeleton 'loader' program is doing
btf_find_by_name_kind() call at run-time to find ksym in the kernel and
populate its {btf_id, btf_obj_fd} pair in ld_imm64 insn. To avoid doing the
search multiple times for the same ksym it remembers the first patched ld_imm64
insn and copies {btf_id, btf_obj_fd} from it into subsequent ld_imm64 insn.
Fix a bug in copying logic, since it may incorrectly clear BPF_PSEUDO_BTF_ID flag.

Also replace always true if (btf_obj_fd >= 0) check with unconditional JMP_JA
to clarify the code.

Fixes: d995816b77eb ("libbpf: Avoid reload of imm for weak, unresolved, repeating ksym")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230319203014.55866-1-alexei.starovoitov@gmail.com
2023-03-23 13:10:17 -07:00
Sreevani Sreejith
2db620d982 bpf, docs: Libbpf overview documentation
This patch documents overview of libbpf, including its features for
developing BPF programs.

Signed-off-by: Sreevani Sreejith <ssreevani@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230315195405.2051559-1-ssreevani@meta.com
2023-03-23 13:10:17 -07:00
Alexei Starovoitov
c401b96718 libbpf: Introduce bpf_ksym_exists() macro.
Introduce bpf_ksym_exists() macro that can be used by BPF programs
to detect at load time whether particular ksym (either variable or kfunc)
is present in the kernel.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230317201920.62030-4-alexei.starovoitov@gmail.com
2023-03-23 13:10:17 -07:00
Alexei Starovoitov
fd28ca4b5b libbpf: Fix relocation of kfunc ksym in ld_imm64 insn.
void *p = kfunc; -> generates ld_imm64 insn.
kfunc() -> generates bpf_call insn.

libbpf patches bpf_call insn correctly while only btf_id part of ld_imm64 is
set in the former case. Which means that pointers to kfuncs in modules are not
patched correctly and the verifier rejects load of such programs due to btf_id
being out of range. Fix libbpf to patch ld_imm64 for kfunc.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230317201920.62030-3-alexei.starovoitov@gmail.com
2023-03-23 13:10:17 -07:00
Daniel Müller
c722f76593 libbpf: Ignore warnings about "inefficient alignment"
Some consumers of libbpf compile the code base with different warnings
enabled. In a report for perf, for example, -Wpacked was set which
caused warnings about "inefficient alignment" to be emitted on a subset
of supported architectures.

With this change we silence specifically those warnings, as we intentionally
worked with packed structs.

This is a similar resolution as in b2f10cd4e805 ("perf cpumap: Fix alignment
for masks in event encoding").

Fixes: 1eebcb60633f ("libbpf: Implement basic zip archive parsing support")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/bpf/CA+G9fYtBnwxAWXi2+GyNByApxnf_DtP1-6+_zOKAdJKnJBexjg@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20230315171550.1551603-1-deso@posteo.net
2023-03-23 13:10:17 -07:00
David Vernet
b5e9722ec2 ci: Regenerate latest vmlinux.h for old kernel CI tests.
CI will fail without it.

Signed-off-by: David Vernet <void@manifault.com>
2023-03-15 13:18:34 -07:00
David Vernet
7fdf16de6d sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   db55174d05ee6bed9d0583ba08e99c891ef0ed05
Checkpoint bpf-next commit: b8a2e3f93d412114a1539ea97b59b3e6ed6e1f9a
Baseline bpf commit:        d900f3d20cc3169ce42ec72acc850e662a4d4db2
Checkpoint bpf commit:      a33a6eaa19d3af261e8708bfc8ba62020703117f

Andrii Nakryiko (1):
  bpf: implement numbers iterator

Daniel Müller (1):
  libbpf: Fix theoretical u32 underflow in find_cd() function

Jakub Kicinski (1):
  ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause

Jesus Sanchez-Palencia (1):
  libbpf: Revert poisoning of strlcpy

Menglong Dong (1):
  libbpf: Add support to set kprobe/uprobe attach mode

Michael Weiß (1):
  bpf: Fix a typo for BPF_F_ANY_ALIGNMENT in bpf.h

Puranjay Mohan (2):
  libbpf: Refactor parse_usdt_arg() to re-use code
  libbpf: USDT arm arg parsing support

Ross Zwisler (1):
  bpf: use canonical ftrace path

 include/uapi/linux/bpf.h    |  18 +++-
 include/uapi/linux/netdev.h |   2 +-
 src/libbpf.c                |  48 ++++++++-
 src/libbpf.h                |  50 ++++++---
 src/libbpf_internal.h       |   4 +-
 src/usdt.c                  | 196 ++++++++++++++++++++++--------------
 src/zip.c                   |   3 +-
 7 files changed, 219 insertions(+), 102 deletions(-)

Signed-off-by: David Vernet <void@manifault.com>
2023-03-15 13:18:34 -07:00
David Vernet
faae78aac4 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: David Vernet <void@manifault.com>
2023-03-15 13:18:34 -07:00
Jesus Sanchez-Palencia
950cffc036 libbpf: Revert poisoning of strlcpy
This reverts commit 6d0c4b11e743("libbpf: Poison strlcpy()").

It added the pragma poison directive to libbpf_internal.h to protect
against accidental usage of strlcpy but ended up breaking the build for
toolchains based on libcs which provide the strlcpy() declaration from
string.h (e.g. uClibc-ng). The include order which causes the issue is:

    string.h,
    from Iibbpf_common.h:12,
    from libbpf.h:20,
    from libbpf_internal.h:26,
    from strset.c:9:

Fixes: 6d0c4b11e743 ("libbpf: Poison strlcpy()")
Signed-off-by: Jesus Sanchez-Palencia <jesussanp@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230309004836.2808610-1-jesussanp@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-15 13:18:34 -07:00
Jakub Kicinski
bdc7c5e217 ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause
I was intending to make all the Netlink Spec code BSD-3-Clause
to ease the adoption but it appears that:
 - I fumbled the uAPI and used "GPL WITH uAPI note" there
 - it gives people pause as they expect GPL in the kernel
As suggested by Chuck re-license under dual. This gives us benefit
of full BSD freedom while fulfilling the broad "kernel is under GPL"
expectations.

Link: https://lore.kernel.org/all/20230304120108.05dd44c5@kernel.org/
Link: https://lore.kernel.org/r/20230306200457.3903854-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15 13:18:34 -07:00
Ross Zwisler
e8107c3959 bpf: use canonical ftrace path
The canonical location for the tracefs filesystem is at /sys/kernel/tracing.

But, from Documentation/trace/ftrace.rst:

  Before 4.1, all ftrace tracing control files were within the debugfs
  file system, which is typically located at /sys/kernel/debug/tracing.
  For backward compatibility, when mounting the debugfs file system,
  the tracefs file system will be automatically mounted at:

  /sys/kernel/debug/tracing

Many comments and samples in the bpf code still refer to this older
debugfs path, so let's update them to avoid confusion.  There are a few
spots where the bpf code explicitly checks both tracefs and debugfs
(tools/bpf/bpftool/tracelog.c and tools/lib/api/fs/fs.c) and I've left
those alone so that the tools can continue to work with both paths.

Signed-off-by: Ross Zwisler <zwisler@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230313205628.1058720-2-zwisler@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-15 13:18:34 -07:00
Michael Weiß
c5be1b0770 bpf: Fix a typo for BPF_F_ANY_ALIGNMENT in bpf.h
Fix s/BPF_PROF_LOAD/BPF_PROG_LOAD/ typo in the documentation comment
for BPF_F_ANY_ALIGNMENT in bpf.h.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230309133823.944097-1-michael.weiss@aisec.fraunhofer.de
2023-03-15 13:18:34 -07:00
Andrii Nakryiko
32d34a9415 bpf: implement numbers iterator
Implement the first open-coded iterator type over a range of integers.

It's public API consists of:
  - bpf_iter_num_new() constructor, which accepts [start, end) range
    (that is, start is inclusive, end is exclusive).
  - bpf_iter_num_next() which will keep returning read-only pointer to int
    until the range is exhausted, at which point NULL will be returned.
    If bpf_iter_num_next() is kept calling after this, NULL will be
    persistently returned.
  - bpf_iter_num_destroy() destructor, which needs to be called at some
    point to clean up iterator state. BPF verifier enforces that iterator
    destructor is called at some point before BPF program exits.

Note that `start = end = X` is a valid combination to setup an empty
iterator. bpf_iter_num_new() will return 0 (success) for any such
combination.

If bpf_iter_num_new() detects invalid combination of input arguments, it
returns error, resets iterator state to, effectively, empty iterator, so
any subsequent call to bpf_iter_num_next() will keep returning NULL.

BPF verifier has no knowledge that returned integers are in the
[start, end) value range, as both `start` and `end` are not statically
known and enforced: they are runtime values.

While the implementation is pretty trivial, some care needs to be taken
to avoid overflows and underflows. Subsequent selftests will validate
correctness of [start, end) semantics, especially around extremes
(INT_MIN and INT_MAX).

Similarly to bpf_loop(), we enforce that no more than BPF_MAX_LOOPS can
be specified.

bpf_iter_num_{new,next,destroy}() is a logical evolution from bounded
BPF loops and bpf_loop() helper and is the basis for implementing
ergonomic BPF loops with no statically known or verified bounds.
Subsequent patches implement bpf_for() macro, demonstrating how this can
be wrapped into something that works and feels like a normal for() loop
in C language.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230308184121.1165081-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-15 13:18:34 -07:00
Puranjay Mohan
aab5f194e1 libbpf: USDT arm arg parsing support
Parsing of USDT arguments is architecture-specific; on arm it is
relatively easy since registers used are r[0-10], fp, ip, sp, lr,
pc. Format is slightly different compared to aarch64; forms are

- "size @ [ reg, #offset ]" for dereferences, for example
  "-8 @ [ sp, #76 ]" ; " -4 @ [ sp ]"
- "size @ reg" for register values; for example
  "-4@r0"
- "size @ #value" for raw values; for example
  "-8@#1"

Add support for parsing USDT arguments for ARM architecture.

To test the above changes QEMU's virt[1] board with cortex-a15
CPU was used. libbpf-bootstrap's usdt example[2] was modified to attach
to a test program with DTRACE_PROBE1/2/3/4... probes to test different
combinations.

[1] https://www.qemu.org/docs/master/system/arm/virt.html
[2] https://github.com/libbpf/libbpf-bootstrap/blob/master/examples/c/usdt.bpf.c

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307120440.25941-3-puranjay12@gmail.com
2023-03-15 13:18:34 -07:00
Puranjay Mohan
c5fe344018 libbpf: Refactor parse_usdt_arg() to re-use code
The parse_usdt_arg() function is defined differently for each
architecture but the last part of the function is repeated
verbatim for each architecture.

Refactor parse_usdt_arg() to fill the arg_sz and then do the repeated
post-processing in parse_usdt_spec().

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307120440.25941-2-puranjay12@gmail.com
2023-03-15 13:18:34 -07:00
Daniel Müller
232f42135a libbpf: Fix theoretical u32 underflow in find_cd() function
Coverity reported a potential underflow of the offset variable used in
the find_cd() function. Switch to using a signed 64 bit integer for the
representation of offset to make sure we can never underflow.

Fixes: 1eebcb60633f ("libbpf: Implement basic zip archive parsing support")
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307215504.837321-1-deso@posteo.net
2023-03-15 13:18:34 -07:00
Menglong Dong
cc7177624f libbpf: Add support to set kprobe/uprobe attach mode
By default, libbpf will attach the kprobe/uprobe BPF program in the
latest mode that supported by kernel. In this patch, we add the support
to let users manually attach kprobe/uprobe in legacy or perf mode.

There are 3 mode that supported by the kernel to attach kprobe/uprobe:

  LEGACY: create perf event in legacy way and don't use bpf_link
  PERF: create perf event with perf_event_open() and don't use bpf_link

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Biao Jiang <benbjiang@tencent.com>
Link: create perf event with perf_event_open() and use bpf_link
Link: https://lore.kernel.org/bpf/20230113093427.1666466-1-imagedong@tencent.com/
Link: https://lore.kernel.org/bpf/20230306064833.7932-2-imagedong@tencent.com

Users now can manually choose the mode with
bpf_program__attach_uprobe_opts()/bpf_program__attach_kprobe_opts().
2023-03-15 13:18:34 -07:00
Daniel Müller
cf46d44f0a sync: Add section about need for Makefile adjustments
When performing a sync with the kernel repository using the
sync-kernel.sh script, it may be necessary to manually adjust the
library's Makefile if:
- new source files were added upstream
- new public headers were added upstream

This change adds a new section to `SYNC.md` to spell out this need.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 13:06:39 -08:00
Daniel Müller
a41e6ef325 Stop running l4lb_all test on 5.5.0
The l4lb_all/l4lb_noinline_dynptr test no does not run on kernel 5.5.0,
because functionality is missing there. Do not allow running it.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Daniel Müller
c2495832ce libbpf: Properly build zip.o
The sync script does not seem to be automatically adding newly added
files added to the kernel repo build to the local Makefile. Do that now.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Daniel Müller
bfb1e97426 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   c8ee37bde4021a275d2e4f33bd48d54912bb00c4
Checkpoint bpf-next commit: db55174d05ee6bed9d0583ba08e99c891ef0ed05
Baseline bpf commit:        2d311f480b52eeb2e1fd432d64b78d82952c3808
Checkpoint bpf commit:      d900f3d20cc3169ce42ec72acc850e662a4d4db2

Alexei Starovoitov (1):
  bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted.

Daniel Müller (3):
  libbpf: Implement basic zip archive parsing support
  libbpf: Introduce elf_find_func_offset_from_file() function
  libbpf: Add support for attaching uprobes to shared objects in APKs

Joanne Koong (3):
  bpf: Add skb dynptrs
  bpf: Add xdp dynptrs
  bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr

Tero Kristo (1):
  bpf: Add support for absolute value BPF timers

Viktor Malik (3):
  libbpf: Remove unnecessary ternary operator
  libbpf: Remove several dead assignments
  libbpf: Cleanup linker_append_elf_relos

 include/uapi/linux/bpf.h |  33 +++-
 src/bpf_helpers.h        |   2 +-
 src/btf.c                |   2 -
 src/libbpf.c             | 149 ++++++++++++++----
 src/linker.c             |  11 +-
 src/relo_core.c          |   3 -
 src/zip.c                | 328 +++++++++++++++++++++++++++++++++++++++
 src/zip.h                |  47 ++++++
 8 files changed, 529 insertions(+), 46 deletions(-)
 create mode 100644 src/zip.c
 create mode 100644 src/zip.h

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Daniel Müller
a468b16788 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Alexei Starovoitov
6c673bb00b bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted.
__kptr meant to store PTR_UNTRUSTED kernel pointers inside bpf maps.
The concept felt useful, but didn't get much traction,
since bpf_rdonly_cast() was added soon after and bpf programs received
a simpler way to access PTR_UNTRUSTED kernel pointers
without going through restrictive __kptr usage.

Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted to indicate
its intended usage.
The main goal of __kptr_untrusted was to read/write such pointers
directly while bpf_kptr_xchg was a mechanism to access refcnted
kernel pointers. The next patch will allow RCU protected __kptr access
with direct read. At that point __kptr_untrusted will be deprecated.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-2-alexei.starovoitov@gmail.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Tero Kristo
b6c58f7619 bpf: Add support for absolute value BPF timers
Add a new flag BPF_F_TIMER_ABS that can be passed to bpf_timer_start()
to start an absolute value timer instead of the default relative value.
This makes the timer expire at an exact point in time, instead of a time
with latencies induced by both the BPF and timer subsystems.

Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Link: https://lore.kernel.org/r/20230302114614.2985072-2-tero.kristo@linux.intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Daniel Müller
db26142ffb libbpf: Add support for attaching uprobes to shared objects in APKs
This change adds support for attaching uprobes to shared objects located
in APKs, which is relevant for Android systems where various libraries
may reside in APKs. To make that happen, we extend the syntax for the
"binary path" argument to attach to with that supported by various
Android tools:
  <archive>!/<binary-in-archive>

For example:
  /system/app/test-app/test-app.apk!/lib/arm64-v8a/libc++_shared.so

APKs need to be specified via full path, i.e., we do not attempt to
resolve mere file names by searching system directories.

We cannot currently test this functionality end-to-end in an automated
fashion, because it relies on an Android system being present, but there
is no support for that in CI. I have tested the functionality manually,
by creating a libbpf program containing a uretprobe, attaching it to a
function inside a shared object inside an APK, and verifying the sanity
of the returned values.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230301212308.1839139-4-deso@posteo.net
2023-03-06 09:47:37 -08:00
Daniel Müller
47eb62005a libbpf: Introduce elf_find_func_offset_from_file() function
This change splits the elf_find_func_offset() function in two:
elf_find_func_offset(), which now accepts an already opened Elf object
instead of a path to a file that is to be opened, as well as
elf_find_func_offset_from_file(), which opens a binary based on a
path and then invokes elf_find_func_offset() on the Elf object. Having
this split in responsibilities will allow us to call
elf_find_func_offset() from other code paths on Elf objects that did not
necessarily come from a file on disk.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230301212308.1839139-3-deso@posteo.net
2023-03-06 09:47:37 -08:00
Daniel Müller
9ca6f946cd libbpf: Implement basic zip archive parsing support
This change implements support for reading zip archives, including
opening an archive, finding an entry based on its path and name in it,
and closing it.
The code was copied from https://github.com/iovisor/bcc/pull/4440, which
implements similar functionality for bcc. The author confirmed that he
is fine with this usage and the corresponding relicensing. I adjusted it
to adhere to libbpf coding standards.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Michał Gregorczyk <michalgr@meta.com>
Link: https://lore.kernel.org/bpf/20230301212308.1839139-2-deso@posteo.net
2023-03-06 09:47:37 -08:00
Viktor Malik
87695e9723 libbpf: Cleanup linker_append_elf_relos
Clang Static Analyser (scan-build) reports some unused symbols and dead
assignments in the linker_append_elf_relos function. Clean these up.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/c5c8fe9f411b69afada8399d23bb048ef2a70535.1677658777.git.vmalik@redhat.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Viktor Malik
3706449b1b libbpf: Remove several dead assignments
Clang Static Analyzer (scan-build) reports several dead assignments in
libbpf where the assigned value is unconditionally overridden by another
value before it is read. Remove these assignments.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/5503d18966583e55158471ebbb2f67374b11bf5e.1677658777.git.vmalik@redhat.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Viktor Malik
4c75268933 libbpf: Remove unnecessary ternary operator
Coverity reports that the first check of 'err' in bpf_object__init_maps
is always false as 'err' is initialized to 0 at that point. Remove the
unnecessary ternary operator.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/78a3702f2ea9f32a84faaae9b674c56269d330a7.1677658777.git.vmalik@redhat.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Joanne Koong
3fe3cccb06 bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr
Two new kfuncs are added, bpf_dynptr_slice and bpf_dynptr_slice_rdwr.
The user must pass in a buffer to store the contents of the data slice
if a direct pointer to the data cannot be obtained.

For skb and xdp type dynptrs, these two APIs are the only way to obtain
a data slice. However, for other types of dynptrs, there is no
difference between bpf_dynptr_slice(_rdwr) and bpf_dynptr_data.

For skb type dynptrs, the data is copied into the user provided buffer
if any of the data is not in the linear portion of the skb. For xdp type
dynptrs, the data is copied into the user provided buffer if the data is
between xdp frags.

If the skb is cloned and a call to bpf_dynptr_data_rdwr is made, then
the skb will be uncloned (see bpf_unclone_prologue()).

Please note that any bpf_dynptr_write() automatically invalidates any prior
data slices of the skb dynptr. This is because the skb may be cloned or
may need to pull its paged buffer into the head. As such, any
bpf_dynptr_write() will automatically have its prior data slices
invalidated, even if the write is to data in the skb head of an uncloned
skb. Please note as well that any other helper calls that change the
underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data
slices of the skb dynptr as well, for the same reasons.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-10-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Joanne Koong
0c5b5b5d91 bpf: Add xdp dynptrs
Add xdp dynptrs, which are dynptrs whose underlying pointer points
to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main
benefits. One is that they allow operations on sizes that are not
statically known at compile-time (eg variable-sized accesses).
Another is that parsing the packet data through dynptrs (instead of
through direct access of xdp->data and xdp->data_end) can be more
ergonomic and less brittle (eg does not need manual if checking for
being within bounds of data_end).

For reads and writes on the dynptr, this includes reading/writing
from/to and across fragments. Data slices through the bpf_dynptr_data
API are not supported; instead bpf_dynptr_slice() and
bpf_dynptr_slice_rdwr() should be used.

For examples of how xdp dynptrs can be used, please see the attached
selftests.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-9-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Joanne Koong
d16fc1f0f5 bpf: Add skb dynptrs
Add skb dynptrs, which are dynptrs whose underlying pointer points
to a skb. The dynptr acts on skb data. skb dynptrs have two main
benefits. One is that they allow operations on sizes that are not
statically known at compile-time (eg variable-sized accesses).
Another is that parsing the packet data through dynptrs (instead of
through direct access of skb->data and skb->data_end) can be more
ergonomic and less brittle (eg does not need manual if checking for
being within bounds of data_end).

For bpf prog types that don't support writes on skb data, the dynptr is
read-only (bpf_dynptr_write() will return an error)

For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write()
interfaces, reading and writing from/to data in the head as well as from/to
non-linear paged buffers is supported. Data slices through the
bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and
bpf_dynptr_slice_rdwr() (added in subsequent commit) should be used.

For examples of how skb dynptrs can be used, please see the attached
selftests.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-8-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Andrii Nakryiko
37922c6fb2 sync: add sync process documentation at SYNC.md
Explain sync setup expectations, necessary steps, common gotchas and
necessary manual adjustments.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-02-28 09:22:25 -08:00
Yonghong Song
19cd9a1d4b sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   951bce29c8988209cc359e1fa35a4aaa35542fd5
Checkpoint bpf-next commit: c8ee37bde4021a275d2e4f33bd48d54912bb00c4
Baseline bpf commit:        3a70e0d4c9d74cb00f7c0ec022f5599f9f7ba07d
Checkpoint bpf commit:      2d311f480b52eeb2e1fd432d64b78d82952c3808

Ilya Leoshkevich (1):
  libbpf: Document bpf_{btf,link,map,prog}_get_info_by_fd()

Puranjay Mohan (1):
  libbpf: Fix arm syscall regs spec in bpf_tracing.h

Rob Herring (1):
  perf: Add perf_event_attr::config3

Tariq Toukan (1):
  netdev-genl: fix repeated typo oflloading -> offloading

Tiezhu Yang (1):
  libbpf: Use struct user_pt_regs to define __PT_REGS_CAST() for
    LoongArch

Yonghong Song (1):
  libbpf: Fix bpf_xdp_query() in old kernels

 include/uapi/linux/netdev.h     |  2 +-
 include/uapi/linux/perf_event.h |  3 ++
 src/bpf.h                       | 69 ++++++++++++++++++++++++++++++---
 src/bpf_tracing.h               |  3 ++
 src/netlink.c                   |  8 +++-
 5 files changed, 78 insertions(+), 7 deletions(-)

Signed-off-by: Yonghong Song <yhs@fb.com>
2023-02-28 09:17:25 -08:00
Tariq Toukan
a6c64dbfa2 netdev-genl: fix repeated typo oflloading -> offloading
Fix a repeated copy/paste typo.

Fixes: d3d854fd6a1d ("netdev-genl: create a simple family for netdev stuff")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-28 09:17:25 -08:00
Yonghong Song
0d7ac28818 libbpf: Fix bpf_xdp_query() in old kernels
Commit 04d58f1b26a4("libbpf: add API to get XDP/XSK supported features")
added feature_flags to struct bpf_xdp_query_opts. If a user uses
bpf_xdp_query_opts with feature_flags member, the bpf_xdp_query()
will check whether 'netdev' family exists or not in the kernel.
If it does not exist, the bpf_xdp_query() will return -ENOENT.

But 'netdev' family does not exist in old kernels as it is
introduced in the same patch set as Commit 04d58f1b26a4.
So old kernel with newer libbpf won't work properly with
bpf_xdp_query() api call.

To fix this issue, if the return value of
libbpf_netlink_resolve_genl_family_id() is -ENOENT, bpf_xdp_query()
will just return 0, skipping the rest of xdp feature query.
This preserves backward compatibility.

Fixes: 04d58f1b26a4 ("libbpf: add API to get XDP/XSK supported features")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230227224943.1153459-1-yhs@fb.com
2023-02-28 09:17:25 -08:00
Ilya Leoshkevich
3fdc11b883 libbpf: Document bpf_{btf,link,map,prog}_get_info_by_fd()
Replace the short informal description with the proper doc comments.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230220234958.764997-1-iii@linux.ibm.com
2023-02-28 09:17:25 -08:00
Puranjay Mohan
e198fdc928 libbpf: Fix arm syscall regs spec in bpf_tracing.h
The syscall register definitions for ARM in bpf_tracing.h doesn't define
the fifth parameter for the syscalls. Because of this some KPROBES based
selftests fail to compile for ARM architecture.

Define the fifth parameter that is passed in the R5 register (uregs[4]).

Fixes: 3a95c42d65d5 ("libbpf: Define arm syscall regs spec in bpf_tracing.h")
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230223095346.10129-1-puranjay12@gmail.com
2023-02-28 09:17:25 -08:00
Tiezhu Yang
e114bd2657 libbpf: Use struct user_pt_regs to define __PT_REGS_CAST() for LoongArch
LoongArch provides struct user_pt_regs instead of struct pt_regs
to userspace, use struct user_pt_regs to define __PT_REGS_CAST()
to fix the following build error:

     CLNG-BPF [test_maps] loop1.bpf.o
  progs/loop1.c:22:9: error: incomplete definition of type 'struct pt_regs'
                                  m = PT_REGS_RC(ctx);
                                      ^~~~~~~~~~~~~~~
  tools/testing/selftests/bpf/tools/include/bpf/bpf_tracing.h:493:41: note: expanded from macro 'PT_REGS_RC'
  #define PT_REGS_RC(x) (__PT_REGS_CAST(x)->__PT_RC_REG)
                         ~~~~~~~~~~~~~~~~~^
  tools/testing/selftests/bpf/tools/include/bpf/bpf_helper_defs.h:20:8: note: forward declaration of 'struct pt_regs'
  struct pt_regs;
         ^
  1 error generated.
  make: *** [Makefile:572: tools/testing/selftests/bpf/loop1.bpf.o] Error 1
  make: Leaving directory 'tools/testing/selftests/bpf'

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1677235015-21717-2-git-send-email-yangtiezhu@loongson.cn
2023-02-28 09:17:25 -08:00
Rob Herring
bb0f8b32a5 perf: Add perf_event_attr::config3
Arm SPEv1.2 adds another 64-bits of event filtering control. As the
existing perf_event_attr::configN fields are all used up for SPE PMU, an
additional field is needed. Add a new 'config3' field.

Tested-by: James Clark <james.clark@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220825-arm-spe-v8-7-v4-7-327f860daf28@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2023-02-28 09:17:25 -08:00
Andrii Nakryiko
f9106f6bac ci: start using llvm-17 now
LLVM 17 problems were fixed upstream, so switch to using latest v17 in CI.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-02-24 14:11:17 -08:00
Yonghong Song
7ef34fa945 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   6c20822fada1b8adb77fa450d03a0d449686a4a9
Checkpoint bpf-next commit: 951bce29c8988209cc359e1fa35a4aaa35542fd5
Baseline bpf commit:        6c20822fada1b8adb77fa450d03a0d449686a4a9
Checkpoint bpf commit:      3a70e0d4c9d74cb00f7c0ec022f5599f9f7ba07d

Ilya Leoshkevich (2):
  libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
  libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()

Martin KaFai Lau (1):
  bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup

 include/uapi/linux/bpf.h |  6 ++++++
 src/bpf.c                | 20 ++++++++++++++++++++
 src/bpf.h                |  9 +++++++++
 src/btf.c                |  8 ++++----
 src/libbpf.c             | 14 +++++++-------
 src/libbpf.map           |  5 +++++
 src/netlink.c            |  2 +-
 src/ringbuf.c            |  4 ++--
 8 files changed, 54 insertions(+), 14 deletions(-)

Signed-off-by: Yonghong Song <yhs@fb.com>
2023-02-21 22:27:55 -08:00
Yonghong Song
7cfc12cb41 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Yonghong Song <yhs@fb.com>
2023-02-21 22:27:55 -08:00
Martin KaFai Lau
c16cae9381 bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
The bpf_fib_lookup() also looks up the neigh table.
This was done before bpf_redirect_neigh() was added.

In the use case that does not manage the neigh table
and requires bpf_fib_lookup() to lookup a fib to
decide if it needs to redirect or not, the bpf prog can
depend only on using bpf_redirect_neigh() to lookup the
neigh. It also keeps the neigh entries fresh and connected.

This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid
the double neigh lookup when the bpf prog always call
bpf_redirect_neigh() to do the neigh lookup. The params->smac
output is skipped together when SKIP_NEIGH is set because
bpf_redirect_neigh() will figure out the smac also.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev
2023-02-21 22:27:55 -08:00
Ilya Leoshkevich
768164af0e libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
Use the new type-safe wrappers around bpf_obj_get_info_by_fd().

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230214231221.249277-3-iii@linux.ibm.com
2023-02-21 22:27:55 -08:00
Ilya Leoshkevich
30f6bc3c0a libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
These are type-safe wrappers around bpf_obj_get_info_by_fd(). They
found one problem in selftests, and are also useful for adding
Memory Sanitizer annotations.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230214231221.249277-2-iii@linux.ibm.com
2023-02-21 22:27:55 -08:00
Song Liu
ea28429902 ci: Remove xdp_info from ALLOWLIST-5.5.0
xdp_info now depends on newer functionalities. Let's skip it for 5.5.0
kernel.

Signed-off-by: Song Liu <song@kernel.org>
2023-02-17 17:17:27 -08:00
Song Liu
34212c94a6 ci: regenerate vmlinux.h
Regenerate latest vmlinux.h for old kernel CI tests.

Signed-off-by: Song Liu <song@kernel.org>
2023-02-17 17:17:27 -08:00
Song Liu
6f1c8eddb2 sync: Add netdev.h from kernel tree
Add netdev.h to include/uapi/linux to make build success.

Signed-off-by: Song Liu <song@kernel.org>
2023-02-17 17:17:27 -08:00
Song Liu
4b492df97e sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   a5f6b9d577eba18601c14bba2dbff4a9b76af962
Checkpoint bpf-next commit: 6c20822fada1b8adb77fa450d03a0d449686a4a9
Baseline bpf commit:        e8c8fd9b8393d7064152c8806f5ac446d760a23e
Checkpoint bpf commit:      6c20822fada1b8adb77fa450d03a0d449686a4a9

Dave Marchevsky (1):
  bpf: Add basic bpf_rb_{root,node} support

Florian Lehner (1):
  bpf: fix typo in header for bpf_perf_prog_read_value

Grant Seltzer (2):
  libbpf: Fix malformed documentation formatting
  libbpf: Add documentation to map pinning API functions

Hao Xiang (1):
  libbpf: Correctly set the kernel code version in Debian kernel.

Ilya Leoshkevich (4):
  libbpf: Simplify barrier_var()
  libbpf: Fix unbounded memory access in bpf_usdt_arg()
  libbpf: Fix BPF_PROBE_READ{_STR}_INTO() on s390x
  libbpf: Fix alen calculation in libbpf_nla_dump_errormsg()

Jon Doron (1):
  libbpf: Add sample_period to creation options

Lorenzo Bianconi (3):
  libbpf: add the capability to specify netlink proto in
    libbpf_netlink_send_recv
  libbpf: add API to get XDP/XSK supported features
  libbpf: Always use libbpf_err to return an error in bpf_xdp_query()

Randy Dunlap (1):
  Documentation: bpf: correct spelling

Tiezhu Yang (1):
  tools/bpf: Use tab instead of white spaces to sync bpf.h

 docs/libbpf_naming_convention.rst |   6 +-
 include/uapi/linux/bpf.h          |  17 ++++-
 src/bpf_core_read.h               |   4 +-
 src/bpf_helpers.h                 |   2 +-
 src/libbpf.c                      |  46 ++----------
 src/libbpf.h                      |  97 +++++++++++++++++++++---
 src/libbpf_probes.c               |  83 +++++++++++++++++++++
 src/netlink.c                     | 118 +++++++++++++++++++++++++++---
 src/nlattr.c                      |   2 +-
 src/nlattr.h                      |  12 +++
 src/usdt.bpf.h                    |   5 +-
 11 files changed, 321 insertions(+), 71 deletions(-)

Signed-off-by: Song Liu <song@kernel.org>
2023-02-17 17:17:27 -08:00
Song Liu
24476fe699 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Song Liu <song@kernel.org>
2023-02-17 17:17:27 -08:00
Dave Marchevsky
d74065659a bpf: Add basic bpf_rb_{root,node} support
This patch adds special BPF_RB_{ROOT,NODE} btf_field_types similar to
BPF_LIST_{HEAD,NODE}, adds the necessary plumbing to detect the new
types, and adds bpf_rb_root_free function for freeing bpf_rb_root in
map_values.

structs bpf_rb_root and bpf_rb_node are opaque types meant to
obscure structs rb_root_cached rb_node, respectively.

btf_struct_access will prevent BPF programs from touching these special
fields automatically now that they're recognized.

btf_check_and_fixup_fields now groups list_head and rb_root together as
"graph root" fields and {list,rb}_node as "graph node", and does same
ownership cycle checking as before. Note that this function does _not_
prevent ownership type mixups (e.g. rb_root owning list_node) - that's
handled by btf_parse_graph_root.

After this patch, a bpf program can have a struct bpf_rb_root in a
map_value, but not add anything to nor do anything useful with it.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230214004017.2534011-2-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-17 17:17:27 -08:00
Ilya Leoshkevich
418962b686 libbpf: Fix alen calculation in libbpf_nla_dump_errormsg()
The code assumes that everything that comes after nlmsgerr are nlattrs.
When calculating their size, it does not account for the initial
nlmsghdr. This may lead to accessing uninitialized memory.

Fixes: bbf48c18ee0c ("libbpf: add error reporting in XDP")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230210001210.395194-8-iii@linux.ibm.com
2023-02-17 17:17:27 -08:00
Jon Doron
8c8243a409 libbpf: Add sample_period to creation options
Add option to set when the perf buffer should wake up, by default the
perf buffer becomes signaled for every event that is being pushed to it.

In case of a high throughput of events it will be more efficient to wake
up only once you have X events ready to be read.

So your application can wakeup once and drain the entire perf buffer.

Signed-off-by: Jon Doron <jond@wiz.io>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230207081916.3398417-1-arilou@gmail.com
2023-02-17 17:17:27 -08:00
Lorenzo Bianconi
6333ea6a3a libbpf: Always use libbpf_err to return an error in bpf_xdp_query()
In order to properly set errno, rely on libbpf_err utility routine in
bpf_xdp_query() to return an error to the caller.

Fixes: 04d58f1b26a4 ("libbpf: add API to get XDP/XSK supported features")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/827d40181f9f90fb37702f44328e1614df7c0503.1675768112.git.lorenzo@kernel.org
2023-02-17 17:17:27 -08:00
Hao Xiang
855bf91055 libbpf: Correctly set the kernel code version in Debian kernel.
In a previous commit, Ubuntu kernel code version is correctly set
by retrieving the information from /proc/version_signature.

commit<5b3d72987701d51bf31823b39db49d10970f5c2d>
(libbpf: Improve LINUX_VERSION_CODE detection)

The /proc/version_signature file doesn't present in at least the
older versions of Debian distributions (eg, Debian 9, 10). The Debian
kernel has a similar issue where the release information from uname()
syscall doesn't give the kernel code version that matches what the
kernel actually expects. Below is an example content from Debian 10.

release: 4.19.0-23-amd64
version: #1 SMP Debian 4.19.269-1 (2022-12-20) x86_64

Debian reports incorrect kernel version in utsname::release returned
by uname() syscall, which in older kernels (Debian 9, 10) leads to
kprobe BPF programs failing to load due to the version check mismatch.

Fortunately, the correct kernel code version presents in the
utsname::version returned by uname() syscall in Debian kernels. This
change adds another get kernel version function to handle Debian in
addition to the previously added get kernel version function to handle
Ubuntu. Some minor refactoring work is also done to make the code more
readable.

Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230203234842.2933903-1-hao.xiang@bytedance.com
2023-02-17 17:17:27 -08:00
Florian Lehner
5e0270f66e bpf: fix typo in header for bpf_perf_prog_read_value
Fix a simple typo in the documentation for bpf_perf_prog_read_value.

Signed-off-by: Florian Lehner <dev@der-flo.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20230203121439.25884-1-dev@der-flo.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-17 17:17:27 -08:00
Lorenzo Bianconi
547881e04e libbpf: add API to get XDP/XSK supported features
Extend bpf_xdp_query routine in order to get XDP/XSK supported features
of netdev over route netlink interface.
Extend libbpf netlink implementation in order to support netlink_generic
protocol.

Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Co-developed-by: Marek Majtyka <alardam@gmail.com>
Signed-off-by: Marek Majtyka <alardam@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/a72609ef4f0de7fee5376c40dbf54ad7f13bfb8d.1675245258.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-17 17:17:27 -08:00
Lorenzo Bianconi
41b96a8c08 libbpf: add the capability to specify netlink proto in libbpf_netlink_send_recv
This is a preliminary patch in order to introduce netlink_generic
protocol support to libbpf.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/7878a54667e74afeec3ee519999c044bd514b44c.1675245258.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-17 17:17:27 -08:00
Tiezhu Yang
700d755151 tools/bpf: Use tab instead of white spaces to sync bpf.h
Just silence the following build warning:

Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h'

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/1675319486-27744-2-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-17 17:17:27 -08:00
Ilya Leoshkevich
981da2b380 libbpf: Fix BPF_PROBE_READ{_STR}_INTO() on s390x
BPF_PROBE_READ_INTO() and BPF_PROBE_READ_STR_INTO() should map to
bpf_probe_read() and bpf_probe_read_str() respectively in order to work
correctly on architectures with !ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230128000650.1516334-24-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-17 17:17:27 -08:00
Ilya Leoshkevich
23898cf858 libbpf: Fix unbounded memory access in bpf_usdt_arg()
Loading programs that use bpf_usdt_arg() on s390x fails with:

    ; if (arg_num >= BPF_USDT_MAX_ARG_CNT || arg_num >= spec->arg_cnt)
    128: (79) r1 = *(u64 *)(r10 -24)      ; frame1: R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
    129: (25) if r1 > 0xb goto pc+83      ; frame1: R1_w=scalar(umax=11,var_off=(0x0; 0xf))
    ...
    ; arg_spec = &spec->args[arg_num];
    135: (79) r1 = *(u64 *)(r10 -24)      ; frame1: R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
    ...
    ; switch (arg_spec->arg_type) {
    139: (61) r1 = *(u32 *)(r2 +8)
    R2 unbounded memory access, make sure to bounds check any such access

The reason is that, even though the C code enforces that
arg_num < BPF_USDT_MAX_ARG_CNT, the verifier cannot propagate this
constraint to the arg_spec assignment yet. Help it by forcing r1 back
to stack after comparison.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230128000650.1516334-23-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-17 17:17:27 -08:00
Ilya Leoshkevich
7285d529cf libbpf: Simplify barrier_var()
Use a single "+r" constraint instead of the separate "=r" and "0".

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230128000650.1516334-22-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-17 17:17:27 -08:00
Randy Dunlap
dd460a52bc Documentation: bpf: correct spelling
Correct spelling problems for Documentation/bpf/ as reported
by codespell.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20230128195046.13327-1-rdunlap@infradead.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-17 17:17:27 -08:00
Grant Seltzer
44c1d381ff libbpf: Add documentation to map pinning API functions
This adds documentation for the following API functions:

- bpf_map__set_pin_path()
- bpf_map__pin_path()
- bpf_map__is_pinned()
- bpf_map__pin()
- bpf_map__unpin()
- bpf_object__pin_maps()
- bpf_object__unpin_maps()

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230126024225.520685-1-grantseltzer@gmail.com
2023-02-17 17:17:27 -08:00
Grant Seltzer
522fe6f721 libbpf: Fix malformed documentation formatting
This fixes the doxygen format documentation above the
user_ring_buffer__* APIs. There has to be a newline
before the @brief, otherwise doxygen won't render them
for libbpf.readthedocs.org.

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230126024749.522278-1-grantseltzer@gmail.com
2023-02-17 17:17:27 -08:00
Andrii Nakryiko
04aafdf9c9 ci: replicate BPF CI changes for clang installation
Add ability to install specified version of Clang. This replicates what
was done in https://github.com/libbpf/ci/pull/86.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-02-09 16:49:06 -08:00
Andrii Nakryiko
416620416f sync: sync include/uapi/linux/openat2.h
As reported in [0], we are missing openat2.h in libbpf-local UAPI
headers. Sync it and adjust sync script to keep syncing it going
forward.

  [0] https://github.com/libbpf/libbpf/issues/649

Closes: https://github.com/libbpf/libbpf/issues/649
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-01-31 16:46:18 -08:00
Joanne Koong
6b4a3f3131 ci: Update default llvm version to 17
Currently, CI is unable to locate llvm-16 on
aarch64/gcc, aarch64/llvm-16, and s390x/gcc [0].

This change upgrades the default llvm version to 17.

[0] https://github.com/kernel-patches/bpf/actions/runs/4040302668

Signed-off-by: Joanne Koong <joannekoong@gmail.com>
2023-01-30 17:20:12 -08:00
Dave Marchevsky
d73ecc91e1 Add patch fixing s390 issues
Signed-off-by: Dave Marchevsky <davemarchevsky@gmail.com>
2023-01-26 11:23:52 -08:00
Andrii Nakryiko
c2e797c8de ci: temporarily denylist decap_sanity test
It is mysteriously fails in CI, for now don't run it.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
f99818dd1a libbpf: regenerate vmlinux.h
Regenerate latest vmlinux.h for old kernel CI tests.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
b2e29a1026 libbpf: dump version to v1.2 in Makefile
Bump LIBBPF_MINOR_VERSION to 2 for v1.2 dev cycle.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
e398e7eaf4 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   7b43df6c6ec38c9097420902a1c8165c4b25bf70
Checkpoint bpf-next commit: a5f6b9d577eba18601c14bba2dbff4a9b76af962
Baseline bpf commit:        54c3f1a81421f85e60ae2eaae7be3727a09916ee
Checkpoint bpf commit:      e8c8fd9b8393d7064152c8806f5ac446d760a23e

Alexei Starovoitov (1):
  libbpf: Restore errno after pr_warn.

Andrii Nakryiko (24):
  libbpf: start v1.2 development cycle
  libbpf: Add support for fetching up to 8 arguments in kprobes
  libbpf: Add 6th argument support for x86-64 in bpf_tracing.h
  libbpf: Fix arm and arm64 specs in bpf_tracing.h
  libbpf: Complete mips spec in bpf_tracing.h
  libbpf: Complete powerpc spec in bpf_tracing.h
  libbpf: Complete sparc spec in bpf_tracing.h
  libbpf: Complete riscv arch spec in bpf_tracing.h
  libbpf: Fix and complete ARC spec in bpf_tracing.h
  libbpf: Complete LoongArch (loongarch) spec in bpf_tracing.h
  libbpf: Add BPF_UPROBE and BPF_URETPROBE macro aliases
  libbpf: Improve syscall tracing support in bpf_tracing.h
  libbpf: Define x86-64 syscall regs spec in bpf_tracing.h
  libbpf: Define i386 syscall regs spec in bpf_tracing.h
  libbpf: Define s390x syscall regs spec in bpf_tracing.h
  libbpf: Define arm syscall regs spec in bpf_tracing.h
  libbpf: Define arm64 syscall regs spec in bpf_tracing.h
  libbpf: Define mips syscall regs spec in bpf_tracing.h
  libbpf: Define powerpc syscall regs spec in bpf_tracing.h
  libbpf: Define sparc syscall regs spec in bpf_tracing.h
  libbpf: Define riscv syscall regs spec in bpf_tracing.h
  libbpf: Define arc syscall regs spec in bpf_tracing.h
  libbpf: Define loongarch syscall regs spec in bpf_tracing.h
  libbpf: Clean up now not needed __PT_PARM{1-6}_SYSCALL_REG defaults

Changbin Du (1):
  libbpf: Return -ENODATA for missing btf section

Daniel T. Lee (1):
  libbpf: Fix invalid return address register in s390

David Vernet (1):
  libbpf: Support sleepable struct_ops.s section

Hengqi Chen (1):
  libbpf: Add LoongArch support to bpf_tracing.h

Ludovic L'Hours (1):
  libbpf: Fix map creation flags sanitization

Menglong Dong (1):
  libbpf: Replace '.' with '_' in legacy kprobe event name

Rong Tao (1):
  libbpf: Poison strlcpy()

Stanislav Fomichev (1):
  bpf: Introduce device-bound XDP programs

Xin Liu (2):
  libbpf: fix errno is overwritten after being closed.
  libbpf: Added the description of some API functions

Ziyang Xuan (1):
  bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()

 include/uapi/linux/bpf.h |  12 ++
 src/bpf_tracing.h        | 320 ++++++++++++++++++++++++++++++++++-----
 src/btf.c                |   2 +-
 src/libbpf.c             |  10 +-
 src/libbpf.h             |  29 +++-
 src/libbpf.map           |   3 +
 src/libbpf_internal.h    |   5 +-
 src/libbpf_version.h     |   2 +-
 8 files changed, 341 insertions(+), 42 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
c93ba3907f sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-01-25 20:44:09 -08:00
David Vernet
112479afb7 libbpf: Support sleepable struct_ops.s section
In a prior change, the verifier was updated to support sleepable
BPF_PROG_TYPE_STRUCT_OPS programs. A caller could set the program as
sleepable with bpf_program__set_flags(), but it would be more ergonomic
and more in-line with other sleepable program types if we supported
suffixing a struct_ops section name with .s to indicate that it's
sleepable.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125164735.785732-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
004ed7120b libbpf: Clean up now not needed __PT_PARM{1-6}_SYSCALL_REG defaults
Each architecture supports at least 6 syscall argument registers, so now
that specs for each architecture is defined in bpf_tracing.h, remove
unnecessary macro overrides, which previously were required to keep
existing BPF_KSYSCALL() uses compiling and working.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-26-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
97740e5103 libbpf: Define loongarch syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-24-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
ef191974b3 libbpf: Define arc syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-23-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
ed66fb297d libbpf: Define riscv syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Pu Lehui <pulehui@huawei.com> # RISC-V
Link: https://lore.kernel.org/bpf/20230120200914.3008030-22-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
2c58ba33fb libbpf: Define sparc syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-21-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
9a6f8da473 libbpf: Define powerpc syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.
Note that 7th arg is supported on 32-bit powerpc architecture, by not on
powerpc64.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-20-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
a005bb2ff8 libbpf: Define mips syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-19-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
7f627a6202 libbpf: Define arm64 syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.
We need PT_REGS_PARM1_[CORE_]SYSCALL macros overrides, similarly to
s390x, due to orig_x0 not being present in UAPI's pt_regs, so we need to
utilize BPF CO-RE and custom pt_regs___arm64 definition.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Link: https://lore.kernel.org/bpf/20230120200914.3008030-18-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
a095b4f04d libbpf: Define arm syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-17-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
bd6e1ec311 libbpf: Define s390x syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.
Note that we need custom overrides for PT_REGS_PARM1_[CORE_]SYSCALL
macros due to the need to use BPF CO-RE and custom local pt_regs
definitions to fetch orig_gpr2, storing 1st argument.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # s390x
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-16-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
b2d8a8d269 libbpf: Define i386 syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-15-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
df16188dc2 libbpf: Define x86-64 syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.
Remove now unnecessary overrides of PT_REGS_PARM5_[CORE_]SYSCALL macros.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-14-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
672401ae09 libbpf: Improve syscall tracing support in bpf_tracing.h
Set up generic support in bpf_tracing.h for up to 7 syscall arguments
tracing with BPF_KSYSCALL, which seems to be the limit according to
syscall(2) manpage. Also change the way that syscall convention is
specified to be more explicit. Subsequent patches will adjust and define
proper per-architecture syscall conventions.

__PT_PARM1_SYSCALL_REG through __PT_PARM6_SYSCALL_REG is added
temporarily to keep everything working before each architecture has
syscall reg tables defined. They will be removed afterwards.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Link: https://lore.kernel.org/bpf/20230120200914.3008030-13-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
ed8b4c90ea libbpf: Add BPF_UPROBE and BPF_URETPROBE macro aliases
Add BPF_UPROBE and BPF_URETPROBE macros, aliased to BPF_KPROBE and
BPF_KRETPROBE, respectively. This makes uprobe-based BPF program code
much less confusing, especially to people new to tracing, at no cost in
terms of maintainability. We'll use this macro in selftests in
subsequent patch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-11-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
2094e1b37e libbpf: Complete LoongArch (loongarch) spec in bpf_tracing.h
Add PARM6 through PARM8 definitions. Add kernel docs link describing ABI
for LoongArch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-10-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
c978366c38 libbpf: Fix and complete ARC spec in bpf_tracing.h
Add PARM6 through PARM8 definitions. Also fix frame pointer (FP)
register definition. Also leave a link to where to find ABI spec.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-9-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
9db84de5f0 libbpf: Complete riscv arch spec in bpf_tracing.h
Add PARM6 through PARM8 definitions for RISC V (riscv) arch. Leave the
link for ABI doc for future reference.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Pu Lehui <pulehui@huawei.com> # RISC-V
Link: https://lore.kernel.org/bpf/20230120200914.3008030-8-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
ffbc84cf6f libbpf: Complete sparc spec in bpf_tracing.h
Add PARM6 definition for sparc architecture. Leave a link to calling
convention documentation.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-7-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
7b86294a90 libbpf: Complete powerpc spec in bpf_tracing.h
Add definitions of PARM6 through PARM8 for powerpc architecture. Add
also a link to a functiona call sequence documentation for future reference.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-6-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
31e29d9346 libbpf: Complete mips spec in bpf_tracing.h
Add registers for PARM6 through PARM8. Add a link to an ABI. We don't
distinguish between O32, N32, and N64, so document that we assume N64
right now.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-5-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
1b48e879a4 libbpf: Fix arm and arm64 specs in bpf_tracing.h
Remove invalid support for PARM5 on 32-bit arm, as per ABI. Add three
more argument registers for arm64. Also leave links to ABI specs for
future reference.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Link: https://lore.kernel.org/bpf/20230120200914.3008030-4-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
4759a83309 libbpf: Add 6th argument support for x86-64 in bpf_tracing.h
Add r9 as register containing 6th argument on x86-64 architecture, as
per its ABI. Add also a link to a page describing ABI for easier future
reference.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-3-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
c94a3fd806 libbpf: Add support for fetching up to 8 arguments in kprobes
Add BPF_KPROBE() and PT_REGS_PARMx() support for up to 8 arguments, if
target architecture supports this. Currently all architectures are
limited to only 5 register-placed arguments, which is limiting even on
x86-64.

This patch adds generic macro machinery to support up to 8 arguments
both when explicitly fetching it from pt_regs through PT_REGS_PARMx()
macros, as well as more ergonomic access in BPF_KPROBE().

Also, for i386 architecture we now don't have to define fake PARM4 and
PARM5 definitions, they will be generically substituted, just like for
PARM6 through PARM8.

Subsequent patches will fill out architecture-specific definitions,
where appropriate.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # s390x
Link: https://lore.kernel.org/bpf/20230120200914.3008030-2-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Stanislav Fomichev
7d68fca99c bpf: Introduce device-bound XDP programs
New flag BPF_F_XDP_DEV_BOUND_ONLY plus all the infra to have a way
to associate a netdev with a BPF program at load time.

netdevsim checks are dropped in favor of generic check in dev_xdp_attach.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-6-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-25 20:44:09 -08:00
Ziyang Xuan
ed09f7e65b bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()
Add ipip6 and ip6ip decap support for bpf_skb_adjust_room().
Main use case is for using cls_bpf on ingress hook to decapsulate
IPv4 over IPv6 and IPv6 over IPv4 tunnel packets.

Add two new flags BPF_F_ADJ_ROOM_DECAP_L3_IPV{4,6} to indicate the
new IP header version after decapsulating the outer IP header.

Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/b268ec7f0ff9431f4f43b1b40ab856ebb28cb4e1.1673574419.git.william.xuanziyang@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-25 20:44:09 -08:00
Menglong Dong
49e950dcfa libbpf: Replace '.' with '_' in legacy kprobe event name
'.' is not allowed in the event name of kprobe. Therefore, we will get a
EINVAL if the kernel function name has a '.' in legacy kprobe attach
case, such as 'icmp_reply.constprop.0'.

In order to adapt this case, we need to replace the '.' with other char
in gen_kprobe_legacy_event_name(). And I use '_' for this propose.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20230113093427.1666466-1-imagedong@tencent.com
2023-01-25 20:44:09 -08:00
Ludovic L'Hours
ce8d078ac7 libbpf: Fix map creation flags sanitization
As BPF_F_MMAPABLE flag is now conditionnaly set (by map_is_mmapable),
it should not be toggled but disabled if not supported by kernel.

Fixes: 4fcac46c7e10 ("libbpf: only add BPF_F_MMAPABLE flag for data maps with global vars")
Signed-off-by: Ludovic L'Hours <ludovic.lhours@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230108182018.24433-1-ludovic.lhours@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-25 20:44:09 -08:00
Rong Tao
42d77b062c libbpf: Poison strlcpy()
Since commit 9fc205b413b3("libbpf: Add sane strncpy alternative and use
it internally") introduce libbpf_strlcpy(), thus add strlcpy() to a poison
list to prevent accidental use of it.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/tencent_5695A257C4D16B4413036BA1DAACDECB0B07@qq.com
2023-01-25 20:44:09 -08:00
Changbin Du
d572b6359e libbpf: Return -ENODATA for missing btf section
As discussed before, return -ENODATA (No data available) would be more
meaningful than ENOENT (No such file or directory).

Suggested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221231151436.6541-1-changbin.du@gmail.com
2023-01-25 20:44:09 -08:00
Hengqi Chen
f758104b07 libbpf: Add LoongArch support to bpf_tracing.h
Add PT_REGS macros for LoongArch ([0]).

  [0]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://lore.kernel.org/bpf/20221231100757.3177034-1-hengqi.chen@gmail.com
2023-01-25 20:44:09 -08:00
Alexei Starovoitov
b92963bbe2 libbpf: Restore errno after pr_warn.
pr_warn calls into user-provided callback, which can clobber errno, so
`errno = saved_errno` should happen after pr_warn.

Fixes: 07453245620c ("libbpf: fix errno is overwritten after being closed.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 20:44:09 -08:00
Xin Liu
34fadd0fbe libbpf: Added the description of some API functions
Currently, many API functions are not described in the document.
Add add API description of the following four API functions:
  - libbpf_set_print;
  - bpf_object__open;
  - bpf_object__load;
  - bpf_object__close.

Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221224112058.12038-1-liuxin350@huawei.com
2023-01-25 20:44:09 -08:00
Daniel T. Lee
09f1324bd7 libbpf: Fix invalid return address register in s390
There is currently an invalid register mapping in the s390 return
address register. As the manual[1] states, the return address can be
found at r14. In bpf_tracing.h, the s390 registers were named
gprs(general purpose registers). This commit fixes the problem by
correcting the mistyped mapping.

[1]: https://uclibc.org/docs/psABI-s390x.pdf#page=14

Fixes: 3cc31d794097 ("libbpf: Normalize PT_REGS_xxx() macro definitions")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221224071527.2292-7-danieltimlee@gmail.com
2023-01-25 20:44:09 -08:00
Xin Liu
8cd371816b libbpf: fix errno is overwritten after being closed.
In the ensure_good_fd function, if the fcntl function succeeds but
the close function fails, ensure_good_fd returns a normal fd and
sets errno, which may cause users to misunderstand. The close
failure is not a serious problem, and the correct FD has been
handed over to the upper-layer application. Let's restore errno here.

Signed-off-by: Xin Liu <liuxin350@huawei.com>
Link: https://lore.kernel.org/r/20221223133618.10323-1-liuxin350@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
7d075a739e libbpf: start v1.2 development cycle
Bump current version for new development cycle to v1.2.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20221221180049.853365-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-25 20:44:09 -08:00
Quentin Monnet
3423d5e7cd sync: Remove "git format-patch" signature (version) from cover letter
When syncing with the kernel, the script generates a cover letter for
the latest changes using "git format-patch". Unless specified otherwise,
it uses a signature (as in, email footer signature) which defaults to
the Git version in use, and ends up in the commit logs. This doesn't
bring any useful information in there: let's get rid of this version
number.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
2023-01-04 09:23:49 -08:00
Daniel Müller
e3a40329bb ci: Add patch setting CONFIG_FUNCTION_ERROR_INJECTION in CI
Similar to what we did for vmtest [0], libbpf needs the patch setting
CONFIG_FUNCTION_ERROR_INJECTION in CI. Add it.

[0] https://github.com/kernel-patches/vmtest/pull/181

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-12-21 12:11:32 -08:00
thiagoftsm
a16e904d6c Merge branch 'libbpf:master' into master 2022-12-21 16:15:11 +00:00
Andrii Nakryiko
6597330c45 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   0e43662e61f2569500ab83b8188c065603530785
Checkpoint bpf-next commit: 7b43df6c6ec38c9097420902a1c8165c4b25bf70
Baseline bpf commit:        f506439ec3dee11e0e77b0a1f3fb3eec22c97873
Checkpoint bpf commit:      54c3f1a81421f85e60ae2eaae7be3727a09916ee

Changbin Du (1):
  libbpf: Show error info about missing ".BTF" section

Christian Ehrig (1):
  bpf: Add flag BPF_F_NO_TUNNEL_KEY to bpf_skb_set_tunnel_key()

Khem Raj (1):
  libbpf: Fix build warning on ref_ctr_off for 32-bit architectures

 include/uapi/linux/bpf.h | 4 ++++
 src/btf.c                | 1 +
 src/libbpf.c             | 2 +-
 3 files changed, 6 insertions(+), 1 deletion(-)

--
2.30.2

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-20 22:23:18 -08:00
Andrii Nakryiko
2e287cd201 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-20 22:23:18 -08:00
Changbin Du
49bd40e869 libbpf: Show error info about missing ".BTF" section
Show the real problem instead of just saying "No such file or directory".

Now will print below info:
libbpf: failed to find '.BTF' ELF section in /home/changbin/work/linux/vmlinux
Error: failed to load BTF from /home/changbin/work/linux/vmlinux: No such file or directory

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221217223509.88254-2-changbin.du@gmail.com
2022-12-20 22:23:18 -08:00
Khem Raj
f7dba2c313 libbpf: Fix build warning on ref_ctr_off for 32-bit architectures
Clang warns on 32-bit ARM on this comparision:

libbpf.c:10497:18: error: result of comparison of constant 4294967296 with expression of type 'size_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
        if (ref_ctr_off >= (1ULL << PERF_UPROBE_REF_CTR_OFFSET_BITS))
            ~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Typecast ref_ctr_off to __u64 in the check conditional, it is false on
32bit anyways.

Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221219191526.296264-1-raj.khem@gmail.com
2022-12-20 22:23:18 -08:00
Christian Ehrig
41ac436073 bpf: Add flag BPF_F_NO_TUNNEL_KEY to bpf_skb_set_tunnel_key()
This patch allows to remove TUNNEL_KEY from the tunnel flags bitmap
when using bpf_skb_set_tunnel_key by providing a BPF_F_NO_TUNNEL_KEY
flag. On egress, the resulting tunnel header will not contain a tunnel
key if the protocol and implementation supports it.

At the moment bpf_tunnel_key wants a user to specify a numeric tunnel
key. This will wrap the inner packet into a tunnel header with the key
bit and value set accordingly. This is problematic when using a tunnel
protocol that supports optional tunnel keys and a receiving tunnel
device that is not expecting packets with the key bit set. The receiver
won't decapsulate and drop the packet.

RFC 2890 and RFC 2784 GRE tunnels are examples where this flag is
useful. It allows for generating packets, that can be decapsulated by
a GRE tunnel device not operating in collect metadata mode or not
expecting the key bit set.

Signed-off-by: Christian Ehrig <cehrig@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221218051734.31411-1-cehrig@cloudflare.com
2022-12-20 22:23:18 -08:00
Andrii Nakryiko
75987cc295 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   b148c8b9b926e257a59c8eb2cd6fa3adfd443254
Checkpoint bpf-next commit: 0e43662e61f2569500ab83b8188c065603530785
Baseline bpf commit:        4121d4481b72501aa4d22680be4ea1096d69d133
Checkpoint bpf commit:      f506439ec3dee11e0e77b0a1f3fb3eec22c97873

Andrii Nakryiko (1):
  libbpf: Fix btf_dump's packed struct determination

 src/btf_dump.c | 33 ++++++---------------------------
 1 file changed, 6 insertions(+), 27 deletions(-)

--
2.30.2

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-15 14:34:52 -08:00
Andrii Nakryiko
b9f1a06c70 libbpf: Fix btf_dump's packed struct determination
Fix bug in btf_dump's logic of determining if a given struct type is
packed or not. The notion of "natural alignment" is not needed and is
even harmful in this case, so drop it altogether. The biggest difference
in btf_is_struct_packed() compared to its original implementation is
that we don't really use btf__align_of() to determine overall alignment
of a struct type (because it could be 1 for both packed and non-packed
struct, depending on specifci field definitions), and just use field's
actual alignment to calculate whether any field is requiring packing or
struct's size overall necessitates packing.

Add two simple test cases that demonstrate the difference this change
would make.

Fixes: ea2ce1ba99aa ("libbpf: Fix BTF-to-C converter's padding logic")
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20221215183605.4149488-1-andrii@kernel.org
2022-12-15 14:34:52 -08:00
Andrii Nakryiko
30554b08fe sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   706819495921ddad6b3780140b9d9e9293b6dedc
Checkpoint bpf-next commit: b148c8b9b926e257a59c8eb2cd6fa3adfd443254
Baseline bpf commit:        e931a173a685fe213127ae5aa6b7f2196c1d875d
Checkpoint bpf commit:      4121d4481b72501aa4d22680be4ea1096d69d133

Andrii Nakryiko (4):
  libbpf: Fix single-line struct definition output in btf_dump
  libbpf: Handle non-standardly sized enums better in BTF-to-C dumper
  libbpf: Fix btf__align_of() by taking into account field offsets
  libbpf: Fix BTF-to-C converter's padding logic

Eyal Birger (1):
  tools: add IFLA_XFRM_COLLECT_METADATA to uapi/linux/if_link.h

Kumar Kartikeya Dwivedi (1):
  bpf: Rework process_dynptr_func

Timo Hunziker (1):
  libbpf: Parse usdt args without offset on x86 (e.g. 8@(%rsp))

Xin Liu (1):
  libbpf: Optimized return value in libbpf_strerror when errno is libbpf
    errno

 include/uapi/linux/bpf.h     |   8 +-
 include/uapi/linux/if_link.h |   1 +
 src/btf.c                    |  13 +++
 src/btf_dump.c               | 214 +++++++++++++++++++++++++++--------
 src/libbpf_errno.c           |  16 ++-
 src/usdt.c                   |   8 ++
 6 files changed, 204 insertions(+), 56 deletions(-)

--
2.30.2

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-14 22:09:00 -08:00
Andrii Nakryiko
b0ff8e90f7 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-14 22:09:00 -08:00
Andrii Nakryiko
0b80970cb6 libbpf: Fix BTF-to-C converter's padding logic
Turns out that btf_dump API doesn't handle a bunch of tricky corner
cases, as reported by Per, and further discovered using his testing
Python script ([0]).

This patch revamps btf_dump's padding logic significantly, making it
more correct and also avoiding unnecessary explicit padding, where
compiler would pad naturally. This overall topic turned out to be very
tricky and subtle, there are lots of subtle corner cases. The comments
in the code tries to give some clues, but comments themselves are
supposed to be paired with good understanding of C alignment and padding
rules. Plus some experimentation to figure out subtle things like
whether `long :0;` means that struct is now forced to be long-aligned
(no, it's not, turns out).

Anyways, Per's script, while not completely correct in some known
situations, doesn't show any obvious cases where this logic breaks, so
this is a nice improvement over the previous state of this logic.

Some selftests had to be adjusted to accommodate better use of natural
alignment rules, eliminating some unnecessary padding, or changing it to
`type: 0;` alignment markers.

Note also that for when we are in between bitfields, we emit explicit
bit size, while otherwise we use `: 0`, this feels much more natural in
practice.

Next patch will add few more test cases, found through randomized Per's
script.

  [0] https://lore.kernel.org/bpf/85f83c333f5355c8ac026f835b18d15060725fcb.camel@ericsson.com/

Reported-by: Per Sundström XP <per.xp.sundstrom@ericsson.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-6-andrii@kernel.org
2022-12-14 22:09:00 -08:00
Andrii Nakryiko
58b164237a libbpf: Fix btf__align_of() by taking into account field offsets
btf__align_of() is supposed to be return alignment requirement of
a requested BTF type. For STRUCT/UNION it doesn't always return correct
value, because it calculates alignment only based on field types. But
for packed structs this is not enough, we need to also check field
offsets and struct size. If field offset isn't aligned according to
field type's natural alignment, then struct must be packed. Similarly,
if struct size is not a multiple of struct's natural alignment, then
struct must be packed as well.

This patch fixes this issue precisely by additionally checking these
conditions.

Fixes: 3d208f4ca111 ("libbpf: Expose btf__align_of() API")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-5-andrii@kernel.org
2022-12-14 22:09:00 -08:00
Andrii Nakryiko
e6e0e3fd85 libbpf: Handle non-standardly sized enums better in BTF-to-C dumper
Turns out C allows to force enum to be 1-byte or 8-byte explicitly using
mode(byte) or mode(word), respecticely. Linux sources are using this in
some cases. This is imporant to handle correctly, as enum size
determines corresponding fields in a struct that use that enum type. And
if enum size is incorrect, this will lead to invalid struct layout. So
add mode(byte) and mode(word) attribute support to btf_dump APIs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-3-andrii@kernel.org
2022-12-14 22:09:00 -08:00
Andrii Nakryiko
db11704944 libbpf: Fix single-line struct definition output in btf_dump
btf_dump APIs emit unnecessary tabs when emitting struct/union
definition that fits on the single line. Before this patch we'd get:

struct blah {<tab>};

This patch fixes this and makes sure that we get more natural:

struct blah {};

Fixes: 44a726c3f23c ("bpftool: Print newline before '}' for struct with padding only fields")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-2-andrii@kernel.org
2022-12-14 22:09:00 -08:00
Xin Liu
8d719b0c08 libbpf: Optimized return value in libbpf_strerror when errno is libbpf errno
This is a small improvement in libbpf_strerror. When libbpf_strerror
is used to obtain the system error description, if the length of the
buf is insufficient, libbpf_sterror returns ERANGE and sets errno to
ERANGE.

However, this processing is not performed when the error code
customized by libbpf is obtained. Make some minor improvements here,
return -ERANGE and set errno to ERANGE when buf is not enough for
custom description.

Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221210082045.233697-1-liuxin350@huawei.com
2022-12-14 22:09:00 -08:00
Kumar Kartikeya Dwivedi
6b90604fa7 bpf: Rework process_dynptr_func
Recently, user ringbuf support introduced a PTR_TO_DYNPTR register type
for use in callback state, because in case of user ringbuf helpers,
there is no dynptr on the stack that is passed into the callback. To
reflect such a state, a special register type was created.

However, some checks have been bypassed incorrectly during the addition
of this feature. First, for arg_type with MEM_UNINIT flag which
initialize a dynptr, they must be rejected for such register type.
Secondly, in the future, there are plans to add dynptr helpers that
operate on the dynptr itself and may change its offset and other
properties.

In all of these cases, PTR_TO_DYNPTR shouldn't be allowed to be passed
to such helpers, however the current code simply returns 0.

The rejection for helpers that release the dynptr is already handled.

For fixing this, we take a step back and rework existing code in a way
that will allow fitting in all classes of helpers and have a coherent
model for dealing with the variety of use cases in which dynptr is used.

First, for ARG_PTR_TO_DYNPTR, it can either be set alone or together
with a DYNPTR_TYPE_* constant that denotes the only type it accepts.

Next, helpers which initialize a dynptr use MEM_UNINIT to indicate this
fact. To make the distinction clear, use MEM_RDONLY flag to indicate
that the helper only operates on the memory pointed to by the dynptr,
not the dynptr itself. In C parlance, it would be equivalent to taking
the dynptr as a point to const argument.

When either of these flags are not present, the helper is allowed to
mutate both the dynptr itself and also the memory it points to.
Currently, the read only status of the memory is not tracked in the
dynptr, but it would be trivial to add this support inside dynptr state
of the register.

With these changes and renaming PTR_TO_DYNPTR to CONST_PTR_TO_DYNPTR to
better reflect its usage, it can no longer be passed to helpers that
initialize a dynptr, i.e. bpf_dynptr_from_mem, bpf_ringbuf_reserve_dynptr.

A note to reviewers is that in code that does mark_stack_slots_dynptr,
and unmark_stack_slots_dynptr, we implicitly rely on the fact that
PTR_TO_STACK reg is the only case that can reach that code path, as one
cannot pass CONST_PTR_TO_DYNPTR to helpers that don't set MEM_RDONLY. In
both cases such helpers won't be setting that flag.

The next patch will add a couple of selftest cases to make sure this
doesn't break.

Fixes: 205715673844 ("bpf: Add bpf_user_ringbuf_drain() helper")
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207204141.308952-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-14 22:09:00 -08:00
Timo Hunziker
74244c5bd7 libbpf: Parse usdt args without offset on x86 (e.g. 8@(%rsp))
Parse USDT arguments like "8@(%rsp)" on x86. These are emmited by
SystemTap. The argument syntax is similar to the existing "memory
dereference case" but the offset left out as it's zero (i.e. read
the value from the address in the register). We treat it the same
as the the "memory dereference case", but set the offset to 0.

I've tested that this fixes the "unrecognized arg #N spec: 8@(%rsp).."
error I've run into when attaching to a probe with such an argument.
Attaching and reading the correct argument values works.

Something similar might be needed for the other supported
architectures.

  [0] Closes: https://github.com/libbpf/libbpf/issues/559

Signed-off-by: Timo Hunziker <timo.hunziker@gmx.ch>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221203123746.2160-1-timo.hunziker@eclipso.ch
2022-12-14 22:09:00 -08:00
Eyal Birger
da08611c65 tools: add IFLA_XFRM_COLLECT_METADATA to uapi/linux/if_link.h
Needed for XFRM metadata tests.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Link: https://lore.kernel.org/r/20221203084659.1837829-4-eyal.birger@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-14 22:09:00 -08:00
Andrii Nakryiko
1e479aec4f ci: don't run test_maps in libbpf CI
It crashes often, it doesn't really test libbpf much.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-07 09:28:07 -08:00
Andrii Nakryiko
8846dc7a20 ci: fix Ubuntu version for kernel tests and pahole workflows
Having too new build environment in workflows that build selftests on
the host, but run them in a separate QEMU image can lead to problems
with runtime linker complaining about missing new enough version of
glibc and other dependencies.

Until we update images, fix used Ubuntu version to ubuntu-20.04 to
mitigate.

Suggested-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-05 11:52:11 -08:00
Andrii Nakryiko
eb9b5c567d sync: regenerate vmlinux.h
Update checked in vmlinux.h for 5.5 and 4.9 kernels.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-02 22:12:29 -08:00
Andrii Nakryiko
be8f15bb93 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   5b1d640800de7fe02d68bf592d9d101de24c87f2
Checkpoint bpf-next commit: 706819495921ddad6b3780140b9d9e9293b6dedc
Baseline bpf commit:        47df8a2f78bc34ff170d147d05b121f84e252b85
Checkpoint bpf commit:      e931a173a685fe213127ae5aa6b7f2196c1d875d

Alexei Starovoitov (1):
  selftests/bpf: Workaround for llvm nop-4 bug

Andrii Nakryiko (2):
  libbpf: Ignore hashmap__find() result explicitly in btf_dump
  libbpf: Avoid enum forward-declarations in public API in C++ mode

Donald Hunter (1):
  docs/bpf: Add table of BPF program types to libbpf docs

Hou Tao (4):
  libbpf: Use page size as max_entries when probing ring buffer map
  libbpf: Handle size overflow for ringbuf mmap
  libbpf: Handle size overflow for user ringbuf mmap
  libbpf: Check the validity of size in user_ring_buffer__reserve()

Ji Rongfeng (1):
  bpf: Update bpf_{g,s}etsockopt() documentation

 docs/index.rst           |   3 +
 docs/program_types.rst   | 203 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/bpf.h |  23 +++--
 src/bpf.h                |   7 ++
 src/btf_dump.c           |   2 +-
 src/libbpf.c             |   3 +-
 src/libbpf_probes.c      |   2 +-
 src/ringbuf.c            |  26 +++--
 8 files changed, 250 insertions(+), 19 deletions(-)
 create mode 100644 docs/program_types.rst

--
2.30.2

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-02 22:12:29 -08:00
Andrii Nakryiko
2bf5ed3a48 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-02 22:12:29 -08:00
Andrii Nakryiko
0fbf777e0b libbpf: Avoid enum forward-declarations in public API in C++ mode
C++ enum forward declarations are fundamentally not compatible with pure
C enum definitions, and so libbpf's use of `enum bpf_stats_type;`
forward declaration in libbpf/bpf.h public API header is causing C++
compilation issues.

More details can be found in [0], but it comes down to C++ supporting
enum forward declaration only with explicitly specified backing type:

  enum bpf_stats_type: int;

In C (and I believe it's a GCC extension also), such forward declaration
is simply:

  enum bpf_stats_type;

Further, in Linux UAPI this enum is defined in pure C way:

enum bpf_stats_type { BPF_STATS_RUN_TIME = 0; }

And even though in both cases backing type is int, which can be
confirmed by looking at DWARF information, for C++ compiler actual enum
definition and forward declaration are incompatible.

To eliminate this problem, for C++ mode define input argument as int,
which makes enum unnecessary in libbpf public header. This solves the
issue and as demonstrated by next patch doesn't cause any unwanted
compiler warnings, at least with default warnings setting.

  [0] https://stackoverflow.com/questions/42766839/c11-enum-forward-causes-underlying-type-mismatch
  [1] Closes: https://github.com/libbpf/libbpf/issues/249

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221130200013.2997831-1-andrii@kernel.org
2022-12-02 22:12:29 -08:00
Hou Tao
4d21c979ce libbpf: Check the validity of size in user_ring_buffer__reserve()
The top two bits of size are used as busy and discard flags, so reject
the reservation that has any of these special bits in the size. With the
addition of validity check, these is also no need to check whether or
not total_size is overflowed.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221116072351.1168938-5-houtao@huaweicloud.com
2022-12-02 22:12:29 -08:00
Hou Tao
11ad834557 libbpf: Handle size overflow for user ringbuf mmap
Similar with the overflow problem on ringbuf mmap, in user_ringbuf_map()
2 * max_entries may overflow u32 when mapping writeable region.

Fixing it by casting the size of writable mmap region into a __u64 and
checking whether or not there will be overflow during mmap.

Fixes: b66ccae01f1d ("bpf: Add libbpf logic for user-space ring buffer")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221116072351.1168938-4-houtao@huaweicloud.com
2022-12-02 22:12:29 -08:00
Hou Tao
f056d1bd54 libbpf: Handle size overflow for ringbuf mmap
The maximum size of ringbuf is 2GB on x86-64 host, so 2 * max_entries
will overflow u32 when mapping producer page and data pages. Only
casting max_entries to size_t is not enough, because for 32-bits
application on 64-bits kernel the size of read-only mmap region
also could overflow size_t.

So fixing it by casting the size of read-only mmap region into a __u64
and checking whether or not there will be overflow during mmap.

Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221116072351.1168938-3-houtao@huaweicloud.com
2022-12-02 22:12:29 -08:00
Hou Tao
b822a139e3 libbpf: Use page size as max_entries when probing ring buffer map
Using page size as max_entries when probing ring buffer map, else the
probe may fail on host with 64KB page size (e.g., an ARM64 host).

After the fix, the output of "bpftool feature" on above host will be
correct.

Before :
    eBPF map_type ringbuf is NOT available
    eBPF map_type user_ringbuf is NOT available

After :
    eBPF map_type ringbuf is available
    eBPF map_type user_ringbuf is available

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221116072351.1168938-2-houtao@huaweicloud.com
2022-12-02 22:12:29 -08:00
Ji Rongfeng
a5b4a53781 bpf: Update bpf_{g,s}etsockopt() documentation
* append missing optnames to the end
* simplify bpf_getsockopt()'s doc

Signed-off-by: Ji Rongfeng <SikoJobs@outlook.com>
Link: https://lore.kernel.org/r/DU0P192MB15479B86200B1216EC90E162D6099@DU0P192MB1547.EURP192.PROD.OUTLOOK.COM
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-02 22:12:29 -08:00
Donald Hunter
e84419ff5a docs/bpf: Add table of BPF program types to libbpf docs
Extend the libbpf documentation with a table of program types,
attach points and ELF section names.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20221121121734.98329-1-donald.hunter@gmail.com
2022-12-02 22:12:29 -08:00
Alexei Starovoitov
ca515c0dda selftests/bpf: Workaround for llvm nop-4 bug
Currently LLVM fails to recognize .data.* as data section and defaults to .text
section. Later BPF backend tries to emit 4-byte NOP instruction which doesn't
exist in BPF ISA and aborts.
The fix for LLVM is pending:
https://reviews.llvm.org/D138477

While waiting for the fix lets workaround the linked_list test case
by using .bss.* prefix which is properly recognized by LLVM as BSS section.

Fix libbpf to support .bss. prefix and adjust tests.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-02 22:12:29 -08:00
Andrii Nakryiko
95959419a7 libbpf: Ignore hashmap__find() result explicitly in btf_dump
Coverity is reporting that btf_dump_name_dups() doesn't check return
result of hashmap__find() call. This is intentional, so make it explicit
with (void) cast.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221117192824.4093553-1-andrii@kernel.org
2022-12-02 22:12:29 -08:00
Andrii Nakryiko
3c659715ec sync: fix sync scripts commit_signature function
After recent lint changes, commit_signature() function now gets optional
array of paths as multiple arguments, instead of entire array as second
argument. So adjust commit_signature() to handle this correctly.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-02 21:04:03 -08:00
Andrii Nakryiko
f46b17ef0e sync: add Signed-off-by for auto-generated sync commits
Now that we enforce Signed-off-by on every commit, make sure that
auto-generatd sync commits also get corrected Signed-off-by tags.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-02 20:51:21 -08:00
Evgeny Vereshchagin
1596a09b5d oss-fuzz: bump elfutils
to make it less likely for the libbpf fuzz target to run into
elfutils bugs that have been fixed upstream since two new fuzz
targets were added there back in April.

Signed-off-by: Evgeny Vereshchagin <evvers@ya.ru>
2022-11-18 13:54:40 -08:00
Kui-Feng Lee
5322b8e76c sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   b548b17a93fd18357a5a6f535c10c1e68719ad32
Checkpoint bpf-next commit: 5b1d640800de7fe02d68bf592d9d101de24c87f2
Baseline bpf commit:        9cbd48d5fa14e4c65f8580de16686077f7cea02b
Checkpoint bpf commit:      47df8a2f78bc34ff170d147d05b121f84e252b85

David Michael (1):
  libbpf: Fix uninitialized warning in btf_dump_dump_type_data

Jiri Olsa (1):
  libbpf: Use correct return pointer in attach_raw_tp

Kang Minchul (3):
  libbpf: checkpatch: Fixed code alignments in btf.c
  libbpf: Fixed various checkpatch issues in libbpf.c
  libbpf: checkpatch: Fixed code alignments in ringbuf.c

Kumar Kartikeya Dwivedi (1):
  bpf: Support bpf_list_head in map values

 include/uapi/linux/bpf.h | 10 +++++++++
 src/btf.c                |  5 +++--
 src/btf_dump.c           |  2 +-
 src/libbpf.c             | 47 +++++++++++++++++++++++++---------------
 src/ringbuf.c            |  4 ++--
 5 files changed, 45 insertions(+), 23 deletions(-)

--
2.30.2
2022-11-18 13:53:39 -08:00
Kui-Feng Lee
15bbaabed8 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-11-18 13:53:39 -08:00
Jiri Olsa
eb77c7210b libbpf: Use correct return pointer in attach_raw_tp
We need to pass '*link' to final libbpf_get_error,
because that one holds the return value, not 'link'.

Fixes: 4fa5bcfe07f7 ("libbpf: Allow BPF program auto-attach handlers to bail out")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221114145257.882322-1-jolsa@kernel.org
2022-11-18 13:53:39 -08:00
Kumar Kartikeya Dwivedi
2557efc8e1 bpf: Support bpf_list_head in map values
Add the support on the map side to parse, recognize, verify, and build
metadata table for a new special field of the type struct bpf_list_head.
To parameterize the bpf_list_head for a certain value type and the
list_node member it will accept in that value type, we use BTF
declaration tags.

The definition of bpf_list_head in a map value will be done as follows:

struct foo {
	struct bpf_list_node node;
	int data;
};

struct map_value {
	struct bpf_list_head head __contains(foo, node);
};

Then, the bpf_list_head only allows adding to the list 'head' using the
bpf_list_node 'node' for the type struct foo.

The 'contains' annotation is a BTF declaration tag composed of four
parts, "contains:name:node" where the name is then used to look up the
type in the map BTF, with its kind hardcoded to BTF_KIND_STRUCT during
the lookup. The node defines name of the member in this type that has
the type struct bpf_list_node, which is actually used for linking into
the linked list. For now, 'kind' part is hardcoded as struct.

This allows building intrusive linked lists in BPF, using container_of
to obtain pointer to entry, while being completely type safe from the
perspective of the verifier. The verifier knows exactly the type of the
nodes, and knows that list helpers return that type at some fixed offset
where the bpf_list_node member used for this list exists. The verifier
also uses this information to disallow adding types that are not
accepted by a certain list.

For now, no elements can be added to such lists. Support for that is
coming in future patches, hence draining and freeing items is done with
a TODO that will be resolved in a future patch.

Note that the bpf_list_head_free function moves the list out to a local
variable under the lock and releases it, doing the actual draining of
the list items outside the lock. While this helps with not holding the
lock for too long pessimizing other concurrent list operations, it is
also necessary for deadlock prevention: unless every function called in
the critical section would be notrace, a fentry/fexit program could
attach and call bpf_map_update_elem again on the map, leading to the
same lock being acquired if the key matches and lead to a deadlock.
While this requires some special effort on part of the BPF programmer to
trigger and is highly unlikely to occur in practice, it is always better
if we can avoid such a condition.

While notrace would prevent this, doing the draining outside the lock
has advantages of its own, hence it is used to also fix the deadlock
related problem.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221114191547.1694267-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-18 13:53:39 -08:00
Kang Minchul
9781b9eced libbpf: checkpatch: Fixed code alignments in ringbuf.c
Fixed some checkpatch issues in ringbuf.c

Signed-off-by: Kang Minchul <tegongkang@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221113190648.38556-4-tegongkang@gmail.com
2022-11-18 13:53:39 -08:00
Kang Minchul
4c3b53d09c libbpf: Fixed various checkpatch issues in libbpf.c
Fixed following checkpatch issues:

WARNING: Block comments use a trailing */ on a separate line
+        * other BPF program's BTF object */

WARNING: Possible repeated word: 'be'
+        * name. This is important to be be able to find corresponding BTF

ERROR: switch and case should be at the same indent
+       switch (ext->kcfg.sz) {
+               case 1: *(__u8 *)ext_val = value; break;
+               case 2: *(__u16 *)ext_val = value; break;
+               case 4: *(__u32 *)ext_val = value; break;
+               case 8: *(__u64 *)ext_val = value; break;
+               default:

ERROR: trailing statements should be on next line
+               case 1: *(__u8 *)ext_val = value; break;

ERROR: trailing statements should be on next line
+               case 2: *(__u16 *)ext_val = value; break;

ERROR: trailing statements should be on next line
+               case 4: *(__u32 *)ext_val = value; break;

ERROR: trailing statements should be on next line
+               case 8: *(__u64 *)ext_val = value; break;

ERROR: code indent should use tabs where possible
+                }$

WARNING: please, no spaces at the start of a line
+                }$

WARNING: Block comments use a trailing */ on a separate line
+        * for faster search */

ERROR: code indent should use tabs where possible
+^I^I^I^I^I^I        &ext->kcfg.is_signed);$

WARNING: braces {} are not necessary for single statement blocks
+       if (err) {
+               return err;
+       }

ERROR: code indent should use tabs where possible
+^I^I^I^I        sizeof(*obj->btf_modules), obj->btf_module_cnt + 1);$

Signed-off-by: Kang Minchul <tegongkang@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221113190648.38556-3-tegongkang@gmail.com
2022-11-18 13:53:39 -08:00
Kang Minchul
7b18ff1212 libbpf: checkpatch: Fixed code alignments in btf.c
Fixed some checkpatch issues in btf.c

Signed-off-by: Kang Minchul <tegongkang@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221113190648.38556-2-tegongkang@gmail.com
2022-11-18 13:53:39 -08:00
David Michael
c975797ebe libbpf: Fix uninitialized warning in btf_dump_dump_type_data
GCC 11.3.0 fails to compile btf_dump.c due to the following error,
which seems to originate in btf_dump_struct_data where the returned
value would be uninitialized if btf_vlen returns zero.

btf_dump.c: In function ‘btf_dump_dump_type_data’:
btf_dump.c:2363:12: error: ‘err’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
 2363 |         if (err < 0)
      |            ^

Fixes: 920d16af9b42 ("libbpf: BTF dumper support for typed data")
Signed-off-by: David Michael <fedora.dm0@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/87zgcu60hq.fsf@gmail.com
2022-11-18 13:53:39 -08:00
Manu Bretelle
9167308b4a ci: remove s390x-self-hosted-builder from libbpf/libbpf
Those were moved to libbpf/ci: https://github.com/libbpf/ci/tree/master/rootfs/s390x-self-hosted-builder

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2022-11-16 13:58:37 -08:00
Manu Bretelle
7049d3a2ea ci: Use s390x label to schedule workflows on s390x
The runners are having their labels uniformized across architecture.
z15 is being removed in favor of s390x.

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2022-11-16 13:55:31 -08:00
Andrii Nakryiko
ea931ec6c5 ci: drop LGTM integration
LGTM is deprecated, remove it. We have CodeQL now.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-11-16 12:17:40 -08:00
Andrii Nakryiko
3a73d6f865 readme: replace LGTM badge with CodeQL badge
LGTM is going to be removed, CodeQL is supposed to be a replacement.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-11-16 12:17:40 -08:00
Andrii Nakryiko
7b0891ac6b ci: build libbpf with more versions of clang and gcc
Add few more versions of clang and gcc used to compile-test libbpf.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-11-16 12:16:17 -08:00
Andrii Nakryiko
c80f12f7f6 ci: fix Debian builds due to pkg-config dependency change
Seems like we need pkgconfig dependency instead of pkg-config.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-11-16 11:25:17 -08:00
Andrii Nakryiko
3b6093fd43 sync: start syncing include/uapi/linux/fcntl.h UAPI header
Libbpf relies on F_DUPFD_CLOEXEC constant coming from fcntl.h UAPI
header, so we need to sync it along other UAPI headers. Also update sync
script to keep doing this automatically going forward.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-11-16 10:56:59 -08:00
Andrii Nakryiko
8d358ab948 sync: make LIBBPF_PATHS and LIBBPF_VIEW_PATHS into real array variables
Use correct Bash syntax to define these two variables as arrays.
Drop shellcheck opt-out for unquoted use of array.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-11-14 21:42:37 -08:00
Andrii Nakryiko
971ad8f8d0 sync: fix sync script's use of bash array variables
Don't wrap LIBBPF_PATHS[@] and LIBBPF_VIEW_PATHS[@] in quotes when
passing it to git commands. Not clear how it worked before, but
something recently broke. Either git commands became stricter or
something.

But either way, we do want to pass each element of LIBBPF_PATHS or
LIBBPF_VIEW_PATHS as separate command line arguments, so putting them in
quotes doesn't make sense, as that makes them look like a single
argument to git.

So drop all the quotes around these arrays. The only place where it's
still needed is in commit_signature call, as we do want to pass array as
single arg ($2) and then internally we unfold it into multiple command
line arguments.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-11-12 18:24:12 -08:00
Andrii Nakryiko
2ed27f9e63 ci: update vmlinux.h
Update vmlinux.h to get latest enums for some of selftests.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-11-12 18:24:12 -08:00
Andrii Nakryiko
4bdbb7ea28 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   62c69e89e81bfbdb9a87ae3e0599dcc6aacf786b
Checkpoint bpf-next commit: b548b17a93fd18357a5a6f535c10c1e68719ad32
Baseline bpf commit:        e7b09357453a99e6f9e74c39e9ca1363c22c0b96
Checkpoint bpf commit:      9cbd48d5fa14e4c65f8580de16686077f7cea02b

Alan Maguire (1):
  libbpf: Btf dedup identical struct test needs check for nested
    structs/arrays

Andrii Nakryiko (2):
  libbpf: clean up and refactor BTF fixup step
  libbpf: only add BPF_F_MMAPABLE flag for data maps with global vars

Anshuman Khandual (4):
  perf: Add system error and not in transaction branch types
  perf: Extend branch type classification
  perf: Capture branch privilege information
  perf: Add PERF_BR_NEW_ARCH_[N] map for BRBE on arm64 platform

Eduard Zingerman (4):
  libbpf: Resolve enum fwd as full enum64 and vice versa
  libbpf: Hashmap interface update to allow both long and void*
    keys/values
  libbpf: Resolve unambigous forward declarations
  libbpf: Hashmap.h update to fix build issues using LLVM14

Martin KaFai Lau (1):
  bpf: Add hwtstamp field for the sockops prog

Namhyung Kim (1):
  perf: Kill __PERF_SAMPLE_CALLCHAIN_EARLY

Ravi Bangoria (3):
  perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file
  perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL

Sandipan Das (1):
  perf/core: Add speculation info to branch entries

Xu Kuohai (1):
  libbpf: Avoid allocating reg_name with sscanf in parse_usdt_arg()

Yonghong Song (2):
  bpf: Implement cgroup storage available to non-cgroup-attached bpf
    progs
  libbpf: Support new cgroup local storage

 include/uapi/linux/bpf.h        |  51 +++++-
 include/uapi/linux/perf_event.h |  57 ++++++-
 src/btf.c                       | 267 ++++++++++++++++++++++----------
 src/btf_dump.c                  |  15 +-
 src/hashmap.c                   |  18 +--
 src/hashmap.h                   |  91 +++++++----
 src/libbpf.c                    | 196 ++++++++++++++---------
 src/libbpf_probes.c             |   1 +
 src/strset.c                    |  18 +--
 src/usdt.c                      |  44 +++---
 10 files changed, 511 insertions(+), 247 deletions(-)

--
2.30.2
2022-11-12 18:24:12 -08:00
Andrii Nakryiko
4978cf9cd8 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-11-12 18:24:12 -08:00
Martin KaFai Lau
00fc9f407c bpf: Add hwtstamp field for the sockops prog
The bpf-tc prog has already been able to access the
skb_hwtstamps(skb)->hwtstamp.  This patch extends the same hwtstamp
access to the sockops prog.

In sockops, the skb is also available to the bpf prog during
the BPF_SOCK_OPS_PARSE_HDR_OPT_CB event.  There is a use case
that the hwtstamp will be useful to the sockops prog to better
measure the one-way-delay when the sender has put the tx
timestamp in the tcp header option.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221107230420.4192307-2-martin.lau@linux.dev
2022-11-12 18:24:12 -08:00
Eduard Zingerman
e1b34c589d libbpf: Hashmap.h update to fix build issues using LLVM14
A fix for the LLVM compilation error while building bpftool.
Replaces the expression:

  _Static_assert((p) == NULL || ...)

by expression:

  _Static_assert((__builtin_constant_p((p)) ? (p) == NULL : 0) || ...)

When "p" is not a constant the former is not considered to be a
constant expression by LLVM 14.

The error was introduced in the following patch-set: [1].
The error was reported here: [2].

  [1] https://lore.kernel.org/bpf/20221109142611.879983-1-eddyz87@gmail.com/
  [2] https://lore.kernel.org/all/202211110355.BcGcbZxP-lkp@intel.com/

Reported-by: kernel test robot <lkp@intel.com>
Fixes: c302378bc157 ("libbpf: Hashmap interface update to allow both long and void* keys/values")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221110223240.1350810-1-eddyz87@gmail.com
2022-11-12 18:24:12 -08:00
Eduard Zingerman
7583310911 libbpf: Resolve unambigous forward declarations
Resolve forward declarations that don't take part in type graphs
comparisons if declaration name is unambiguous. Example:

CU #1:

struct foo;              // standalone forward declaration
struct foo *some_global;

CU #2:

struct foo { int x; };
struct foo *another_global;

The `struct foo` from CU #1 is not a part of any definition that is
compared against another definition while `btf_dedup_struct_types`
processes structural types. The the BTF after `btf_dedup_struct_types`
the BTF looks as follows:

[1] STRUCT 'foo' size=4 vlen=1 ...
[2] INT 'int' size=4 ...
[3] PTR '(anon)' type_id=1
[4] FWD 'foo' fwd_kind=struct
[5] PTR '(anon)' type_id=4

This commit adds a new pass `btf_dedup_resolve_fwds`, that maps such
forward declarations to structs or unions with identical name in case
if the name is not ambiguous.

The pass is positioned before `btf_dedup_ref_types` so that types
[3] and [5] could be merged as a same type after [1] and [4] are merged.
The final result for the example above looks as follows:

[1] STRUCT 'foo' size=4 vlen=1
	'x' type_id=2 bits_offset=0
[2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[3] PTR '(anon)' type_id=1

For defconfig kernel with BTF enabled this removes 63 forward
declarations. Examples of removed declarations: `pt_regs`, `in6_addr`.
The running time of `btf__dedup` function is increased by about 3%.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221109142611.879983-3-eddyz87@gmail.com
2022-11-12 18:24:12 -08:00
Eduard Zingerman
4a65c5d888 libbpf: Hashmap interface update to allow both long and void* keys/values
An update for libbpf's hashmap interface from void* -> void* to a
polymorphic one, allowing both long and void* keys and values.

This simplifies many use cases in libbpf as hashmaps there are mostly
integer to integer.

Perf copies hashmap implementation from libbpf and has to be
updated as well.

Changes to libbpf, selftests/bpf and perf are packed as a single
commit to avoid compilation issues with any future bisect.

Polymorphic interface is acheived by hiding hashmap interface
functions behind auxiliary macros that take care of necessary
type casts, for example:

    #define hashmap_cast_ptr(p)						\
	({								\
		_Static_assert((p) == NULL || sizeof(*(p)) == sizeof(long),\
			       #p " pointee should be a long-sized integer or a pointer"); \
		(long *)(p);						\
	})

    bool hashmap_find(const struct hashmap *map, long key, long *value);

    #define hashmap__find(map, key, value) \
		hashmap_find((map), (long)(key), hashmap_cast_ptr(value))

- hashmap__find macro casts key and value parameters to long
  and long* respectively
- hashmap_cast_ptr ensures that value pointer points to a memory
  of appropriate size.

This hack was suggested by Andrii Nakryiko in [1].
This is a follow up for [2].

[1] https://lore.kernel.org/bpf/CAEf4BzZ8KFneEJxFAaNCCFPGqp20hSpS2aCj76uRk3-qZUH5xg@mail.gmail.com/
[2] https://lore.kernel.org/bpf/af1facf9-7bc8-8a3d-0db4-7b3f333589a2@meta.com/T/#m65b28f1d6d969fcd318b556db6a3ad499a42607d

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221109142611.879983-2-eddyz87@gmail.com
2022-11-12 18:24:12 -08:00
Eduard Zingerman
3a387f5a8f libbpf: Resolve enum fwd as full enum64 and vice versa
Changes de-duplication logic for enums in the following way:
- update btf_hash_enum to ignore size and kind fields to get
  ENUM and ENUM64 types in a same hash bucket;
- update btf_compat_enum to consider enum fwd to be compatible with
  full enum64 (and vice versa);

This allows BTF de-duplication in the following case:

    // CU #1
    enum foo;

    struct s {
      enum foo *a;
    } *x;

    // CU #2
    enum foo {
      x = 0xfffffffff // big enough to force enum64
    };

    struct s {
      enum foo *a;
    } *y;

De-duplicated BTF prior to this commit:

    [1] ENUM64 'foo' encoding=UNSIGNED size=8 vlen=1
    	'x' val=68719476735ULL
    [2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64
        encoding=(none)
    [3] STRUCT 's' size=8 vlen=1
    	'a' type_id=4 bits_offset=0
    [4] PTR '(anon)' type_id=1
    [5] PTR '(anon)' type_id=3
    [6] STRUCT 's' size=8 vlen=1
    	'a' type_id=8 bits_offset=0
    [7] ENUM 'foo' encoding=UNSIGNED size=4 vlen=0
    [8] PTR '(anon)' type_id=7
    [9] PTR '(anon)' type_id=6

De-duplicated BTF after this commit:

    [1] ENUM64 'foo' encoding=UNSIGNED size=8 vlen=1
    	'x' val=68719476735ULL
    [2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64
        encoding=(none)
    [3] STRUCT 's' size=8 vlen=1
    	'a' type_id=4 bits_offset=0
    [4] PTR '(anon)' type_id=1
    [5] PTR '(anon)' type_id=3

Enum forward declarations in C do not provide information about
enumeration values range. Thus the `btf_type->size` field is
meaningless for forward enum declarations. In fact, GCC does not
encode size in DWARF for forward enum declarations
(but dwarves sets enumeration size to a default value of `sizeof(int) * 8`
when size is not specified see dwarf_loader.c:die__create_new_enumeration).

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221101235413.1824260-1-eddyz87@gmail.com
2022-11-12 18:24:12 -08:00
Ravi Bangoria
a2eba90326 perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL
PERF_MEM_LVLNUM_EXTN_MEM was introduced to cover CXL devices but it's
bit ambiguous name and also not generic enough to cover cxl.cache and
cxl.io devices. Rename it to PERF_MEM_LVLNUM_CXL to be more specific.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/f6268268-b4e9-9ed6-0453-65792644d953@amd.com
2022-11-12 18:24:12 -08:00
Yonghong Song
7106ebe768 libbpf: Support new cgroup local storage
Add support for new cgroup local storage.

Acked-by: David Vernet <void@manifault.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042856.673989-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-12 18:24:12 -08:00
Yonghong Song
3c6d127e50 bpf: Implement cgroup storage available to non-cgroup-attached bpf progs
Similar to sk/inode/task storage, implement similar cgroup local storage.

There already exists a local storage implementation for cgroup-attached
bpf programs.  See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper
bpf_get_local_storage(). But there are use cases such that non-cgroup
attached bpf progs wants to access cgroup local storage data. For example,
tc egress prog has access to sk and cgroup. It is possible to use
sk local storage to emulate cgroup local storage by storing data in socket.
But this is a waste as it could be lots of sockets belonging to a particular
cgroup. Alternatively, a separate map can be created with cgroup id as the key.
But this will introduce additional overhead to manipulate the new map.
A cgroup local storage, similar to existing sk/inode/task storage,
should help for this use case.

The life-cycle of storage is managed with the life-cycle of the
cgroup struct.  i.e. the storage is destroyed along with the owning cgroup
with a call to bpf_cgrp_storage_free() when cgroup itself
is deleted.

The userspace map operations can be done by using a cgroup fd as a key
passed to the lookup, update and delete operations.

Typically, the following code is used to get the current cgroup:
    struct task_struct *task = bpf_get_current_task_btf();
    ... task->cgroups->dfl_cgrp ...
and in structure task_struct definition:
    struct task_struct {
        ....
        struct css_set __rcu            *cgroups;
        ....
    }
With sleepable program, accessing task->cgroups is not protected by rcu_read_lock.
So the current implementation only supports non-sleepable program and supporting
sleepable program will be the next step together with adding rcu_read_lock
protection for rcu tagged structures.

Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old cgroup local
storage support, the new map name BPF_MAP_TYPE_CGRP_STORAGE is used
for cgroup storage available to non-cgroup-attached bpf programs. The old
cgroup storage supports bpf_get_local_storage() helper to get the cgroup data.
The new cgroup storage helper bpf_cgrp_storage_get() can provide similar
functionality. While old cgroup storage pre-allocates storage memory, the new
mechanism can also pre-allocate with a user space bpf_map_update_elem() call
to avoid potential run-time memory allocation failure.
Therefore, the new cgroup storage can provide all functionality w.r.t.
the old one. So in uapi bpf.h, the old BPF_MAP_TYPE_CGROUP_STORAGE is alias to
BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED to indicate the old cgroup storage can
be deprecated since the new one can provide the same functionality.

Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042850.673791-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-12 18:24:12 -08:00
Alan Maguire
6ebbbacb5c libbpf: Btf dedup identical struct test needs check for nested structs/arrays
When examining module BTF, it is common to see core kernel structures
such as sk_buff, net_device duplicated in the module.  After adding
debug messaging to BTF it turned out that much of the problem
was down to the identical struct test failing during deduplication;
sometimes the compiler adds identical structs.  However
it turns out sometimes that type ids of identical struct members
can also differ, even when the containing structs are still identical.

To take an example, for struct sk_buff, debug messaging revealed
that the identical struct matching was failing for the anon
struct "headers"; specifically for the first field:

__u8       __pkt_type_offset[0]; /*   128     0 */

Looking at the code in BTF deduplication, we have code that guards
against the possibility of identical struct definitions, down to
type ids, and identical array definitions.  However in this case
we have a struct which is being defined twice but does not have
identical type ids since each duplicate struct has separate type
ids for the above array member.   A similar problem (though not
observed) could occur for struct-in-struct.

The solution is to make the "identical struct" test check members
not just for matching ids, but to also check if they in turn are
identical structs or arrays.

The results of doing this are quite dramatic (for some modules
at least); I see the number of type ids drop from around 10000
to just over 1000 in one module for example.

For testing use latest pahole or apply [1], otherwise dedups
can fail for the reasons described there.

Also fix return type of btf_dedup_identical_arrays() as
suggested by Andrii to match boolean return type used
elsewhere.

Fixes: efdd3eb8015e ("libbpf: Accommodate DWARF/compiler bug with duplicated structs")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1666622309-22289-1-git-send-email-alan.maguire@oracle.com

[1] https://lore.kernel.org/bpf/1666364523-9648-1-git-send-email-alan.maguire
2022-11-12 18:24:12 -08:00
Xu Kuohai
1bb7a8349a libbpf: Avoid allocating reg_name with sscanf in parse_usdt_arg()
The reg_name in parse_usdt_arg() is used to hold register name, which
is short enough to be held in a 16-byte array, so we could define
reg_name as char reg_name[16] to avoid dynamically allocating reg_name
with sscanf.

Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221018145538.2046842-1-xukuohai@huaweicloud.com
2022-11-12 18:24:12 -08:00
Andrii Nakryiko
3cd45b660c libbpf: only add BPF_F_MMAPABLE flag for data maps with global vars
Teach libbpf to not add BPF_F_MMAPABLE flag unnecessarily for ARRAY maps
that are backing data sections, if such data sections don't expose any
variables to user-space. Exposed variables are those that have
STB_GLOBAL or STB_WEAK ELF binding and correspond to BTF VAR's
BTF_VAR_GLOBAL_ALLOCATED linkage.

The overall idea is that if some data section doesn't have any variable that
is exposed through BPF skeleton, then there is no reason to make such
BPF array mmapable. Making BPF array mmapable is not a free no-op
action, because BPF verifier doesn't allow users to put special objects
(such as BPF spin locks, RB tree nodes, linked list nodes, kptrs, etc;
anything that has a sensitive internal state that should not be modified
arbitrarily from user space) into mmapable arrays, as there is no way to
prevent user space from corrupting such sensitive state through direct
memory access through memory-mapped region.

By making sure that libbpf doesn't add BPF_F_MMAPABLE flag to BPF array
maps corresponding to data sections that only have static variables
(which are not supposed to be visible to user space according to libbpf
and BPF skeleton rules), users now can have spinlocks, kptrs, etc in
either default .bss/.data sections or custom .data.* sections (assuming
there are no global variables in such sections).

The only possible hiccup with this approach is the need to use global
variables during BPF static linking, even if it's not intended to be
shared with user space through BPF skeleton. To allow such scenarios,
extend libbpf's STV_HIDDEN ELF visibility attribute handling to
variables. Libbpf is already treating global hidden BPF subprograms as
static subprograms and adjusts BTF accordingly to make BPF verifier
verify such subprograms as static subprograms with preserving entire BPF
verifier state between subprog calls. This patch teaches libbpf to treat
global hidden variables as static ones and adjust BTF information
accordingly as well. This allows to share variables between multiple
object files during static linking, but still keep them internal to BPF
program and not get them exposed through BPF skeleton.

Note, that if the user has some advanced scenario where they absolutely
need BPF_F_MMAPABLE flag on .data/.bss/.rodata BPF array map despite
only having static variables, they still can achieve this by forcing it
through explicit bpf_map__set_map_flags() API.

Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20221019002816.359650-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-12 18:24:12 -08:00
Andrii Nakryiko
0e195e4597 libbpf: clean up and refactor BTF fixup step
Refactor libbpf's BTF fixup step during BPF object open phase. The only
functional change is that we now ignore BTF_VAR_GLOBAL_EXTERN variables
during fix up, not just BTF_VAR_STATIC ones, which shouldn't cause any
change in behavior as there shouldn't be any extern variable in data
sections for valid BPF object anyways.

Otherwise it's just collapsing two functions that have no reason to be
separate, and switching find_elf_var_offset() helper to return entire
symbol pointer, not just its offset. This will be used by next patch to
get ELF symbol visibility.

While refactoring, also "normalize" debug messages inside
btf_fixup_datasec() to follow general libbpf style and print out data
section name consistently, where it's available.

Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221019002816.359650-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-12 18:24:12 -08:00
Ravi Bangoria
08830e9d2f perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file
PERF_MEM_SNOOPX_PEER is defined only in tools uapi header. Although
it's used only by perf tool, not defining it in kernel header can
create problems in future.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220928095805.596-8-ravi.bangoria@amd.com
2022-11-12 18:24:12 -08:00
Ravi Bangoria
1022f26d04 perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
PERF_MEM_LVLNUM_EXTN_MEM which can be used to indicate accesses to
extension memory like CXL etc. PERF_MEM_LVL_IO can be used for IO
accesses but it can not distinguish between local and remote IO.
Introduce new field PERF_MEM_LVLNUM_IO which can be clubbed with
PERF_MEM_REMOTE_REMOTE to indicate Remote IO accesses.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220928095805.596-2-ravi.bangoria@amd.com
2022-11-12 18:24:12 -08:00
Namhyung Kim
b4ca1f6407 perf: Kill __PERF_SAMPLE_CALLCHAIN_EARLY
There's no in-tree user anymore.  Let's get rid of it.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220908214104.3851807-3-namhyung@kernel.org
2022-11-12 18:24:12 -08:00
Anshuman Khandual
fd71ca941b perf: Add PERF_BR_NEW_ARCH_[N] map for BRBE on arm64 platform
BRBE captured branch types will overflow perf_branch_entry.type and generic
branch types in perf_branch_entry.new_type. So override each available arch
specific branch type in the following manner to comprehensively process all
reported branch types in BRBE.

  PERF_BR_ARM64_FIQ            PERF_BR_NEW_ARCH_1
  PERF_BR_ARM64_DEBUG_HALT     PERF_BR_NEW_ARCH_2
  PERF_BR_ARM64_DEBUG_EXIT     PERF_BR_NEW_ARCH_3
  PERF_BR_ARM64_DEBUG_INST     PERF_BR_NEW_ARCH_4
  PERF_BR_ARM64_DEBUG_DATA     PERF_BR_NEW_ARCH_5

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lkml.kernel.org/r/20220824044822.70230-5-anshuman.khandual@arm.com
2022-11-12 18:24:12 -08:00
Anshuman Khandual
a14b39bd31 perf: Capture branch privilege information
Platforms like arm64 could capture privilege level information for all the
branch records. Hence this adds a new element in the struct branch_entry to
record the privilege level information, which could be requested through a
new event.attr.branch_sample_type based flag PERF_SAMPLE_BRANCH_PRIV_SAVE.
This flag helps user choose whether privilege information is captured.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lkml.kernel.org/r/20220824044822.70230-4-anshuman.khandual@arm.com
2022-11-12 18:24:12 -08:00
Anshuman Khandual
ade228b8f0 perf: Extend branch type classification
branch_entry.type now has ran out of space to accommodate more branch types
classification. This will prevent perf branch stack implementation on arm64
(via BRBE) to capture all available branch types. Extending this bit field
i.e branch_entry.type [4 bits] is not an option as it will break user space
ABI both for little and big endian perf tools.

Extend branch classification with a new field branch_entry.new_type via a
new branch type PERF_BR_EXTEND_ABI in branch_entry.type. Perf tools which
could decode PERF_BR_EXTEND_ABI, will then parse branch_entry.new_type as
well.

branch_entry.new_type is a 4 bit field which can hold upto 16 branch types.
The first three branch types will hold various generic page faults followed
by five architecture specific branch types, which can be overridden by the
platform for specific use cases. These architecture specific branch types
gets overridden on arm64 platform for BRBE implementation.

New generic branch types

 - PERF_BR_NEW_FAULT_ALGN
 - PERF_BR_NEW_FAULT_DATA
 - PERF_BR_NEW_FAULT_INST

New arch specific branch types

 - PERF_BR_NEW_ARCH_1
 - PERF_BR_NEW_ARCH_2
 - PERF_BR_NEW_ARCH_3
 - PERF_BR_NEW_ARCH_4
 - PERF_BR_NEW_ARCH_5

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lkml.kernel.org/r/20220824044822.70230-3-anshuman.khandual@arm.com
2022-11-12 18:24:12 -08:00
Anshuman Khandual
41ab246bdf perf: Add system error and not in transaction branch types
This expands generic branch type classification by adding two more entries
there in i.e system error and not in transaction. This also updates the x86
implementation to process X86_BR_NO_TX records as appropriate. This changes
branch types reported to user space on x86 platform but it should not be a
problem. The possible scenarios and impacts are enumerated here.

 --------------------------------------------------------------------------
 | kernel | perf tool |                     Impact                        |
 --------------------------------------------------------------------------
 |   old  |    old    |  Works as before                                  |
 --------------------------------------------------------------------------
 |   old  |    new    |  PERF_BR_UNKNOWN is processed                     |
 --------------------------------------------------------------------------
 |   new  |    old    |  PERF_BR_NO_TX is blocked via old PERF_BR_MAX     |
 --------------------------------------------------------------------------
 |   new  |    new    |  PERF_BR_NO_TX is recognized                      |
 --------------------------------------------------------------------------

When PERF_BR_NO_TX is blocked via old PERF_BR_MAX (new kernel with old perf
tool) the user space might throw up an warning complaining about an
unrecognized branch types being reported, but it's expected. PERF_BR_SERROR
& PERF_BR_NO_TX branch types will be used for BRBE implementation on arm64
platform.

PERF_BR_NO_TX complements 'abort' and 'in_tx' elements in perf_branch_entry
which represent other transaction states for a given branch record. Because
this completes the transaction state classification.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lkml.kernel.org/r/20220824044822.70230-2-anshuman.khandual@arm.com
2022-11-12 18:24:12 -08:00
Sandipan Das
d918025bc8 perf/core: Add speculation info to branch entries
Add a new "spec" bitfield to branch entries for providing speculation
information. This will be populated using hints provided by branch sampling
features on supported hardware. The following cases are covered:

  * No branch speculation information is available
  * Branch is speculative but taken on the wrong path
  * Branch is non-speculative but taken on the correct path
  * Branch is speculative and taken on the correct path

Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/834088c302faf21c7b665031dd111f424e509a64.1660211399.git.sandipan.das@amd.com
2022-11-12 18:24:12 -08:00
Daniel Müller
918d7712c0 ci: Make sure to keep ci/diffs/ directory around
Commit 837664758d ("ci: Allow usage of .patch patches") removed the
ci/diffs/.do_not_use_dot_patch_here marker file. Given that we currently
have no CI patches present and that git does not track (empty)
directories, ci/diffs/ got removed. That's fine functionality-wise, but
it makes for a bit of a discoverability hurdle. Add back a marker file
to keep the directory around.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-11-08 08:33:47 -08:00
Daniel Müller
4a84a7619f ci: Provide KBUILD_OUTPUT to actions asking for it
As of https://github.com/libbpf/ci/pull/67 a bunch of actions honor
KBUILD_OUTPUT. Doing so will make it possible to separate source code
from build artifacts, which in turn may allow us to support incremental
kernel compilation in CI down the line.
Irrespective of these future changes, actions pertaining the kernel
build now ask for an additional input defining where to store or expect
build artifacts. Provide it.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-11-07 11:02:01 -08:00
Daniel Müller
837664758d ci: Allow usage of .patch patches
With https://github.com/libbpf/ci/pull/68 merged we can now keep the
.patch extension for patches and don't have to worry about forgetting
the rename to .diff.
Remove the marker file reminding us of that need.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-11-07 11:00:56 -08:00
Daniel Müller
11bf829873 ci: Remove no longer needed patches
Patch "selftests/bpf: Fix OOB write in test_verifier" has made it to the
bpf branch (after originally landing on bpf-next). Remove it from CI, as
it is no longer necessary.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-11-07 11:00:56 -08:00
Matteo Croce
c97b16d96c ci: enable shellcheck linter
Run shellckeck linter in a github action,
as in https://github.com/libbpf/ci/pull/61

Signed-off-by: Matteo Croce <teknoraver@meta.com>
2022-10-27 16:46:38 -07:00
Matteo Croce
1c17672353 shellcheck: fix errors
Signed-off-by: Matteo Croce <teknoraver@meta.com>
2022-10-27 16:46:38 -07:00
Tobias Waldekranz
68e6f83f22 Makefile: Fix cross-compilation for 32-bit targets
Determining the correct library installation path (lib vs. lib64)
using uname(1) breaks in cross compilation scenarios where word widths
differ between the host and target system.

Instead, source the information from the compilers '-dumpmachine'
option (supported by both GCC and Clang).

We call this the "host" architecture, using the same nomenclature as
Autotools (--host configure option).

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
2022-10-18 17:33:04 -07:00
grantseltzer
383ffb79a6 Add documentation badge to README
This adds a documentation badge that links to libbpf.readthedocs.org
When rendered on github it will display the status of the docs build

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
2022-10-17 13:55:59 -07:00
David Vernet
50315fd763 README: Fix Arch packaging link
libbpf is now packaged as part of the core repository, not the extra
repository. Fix the current link which gets a 404.

Signed-off-by: David Vernet <void@manifault.com>
2022-10-17 13:17:40 -07:00
Andrii Nakryiko
534a2c6f53 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   87dbdc230d162bf9ee1ac77c8ade178b6b1e199e
Checkpoint bpf-next commit: 62c69e89e81bfbdb9a87ae3e0599dcc6aacf786b
Baseline bpf commit:        60240bc26114543fcbfcd8a28466e67e77b20388
Checkpoint bpf commit:      e7b09357453a99e6f9e74c39e9ca1363c22c0b96

Andrii Nakryiko (1):
  bpf: explicitly define BPF_FUNC_xxx integer values

Eduard Zingerman (1):
  bpftool: Print newline before '}' for struct with padding only fields

Kui-Feng Lee (2):
  bpf: Parameterize task iterators.
  bpf: Handle bpf_link_info for the parameterized task BPF iterators.

Roberto Sassu (5):
  libbpf: Fix LIBBPF_1.0.0 declaration in libbpf.map
  libbpf: Introduce bpf_get_fd_by_id_opts and
    bpf_map_get_fd_by_id_opts()
  libbpf: Introduce bpf_prog_get_fd_by_id_opts()
  libbpf: Introduce bpf_btf_get_fd_by_id_opts()
  libbpf: Introduce bpf_link_get_fd_by_id_opts()

Shung-Hsi Yu (3):
  libbpf: Use elf_getshdrnum() instead of e_shnum
  libbpf: Deal with section with no data gracefully
  libbpf: Fix null-pointer dereference in find_prog_by_sec_insn()

Xin Liu (1):
  libbpf: Fix overrun in netlink attribute iteration

Xu Kuohai (2):
  libbpf: Fix use-after-free in btf_dump_name_dups
  libbpf: Fix memory leak in parse_usdt_arg()

 include/uapi/linux/bpf.h | 442 ++++++++++++++++++++-------------------
 src/bpf.c                |  48 ++++-
 src/bpf.h                |  16 ++
 src/btf_dump.c           |  35 +++-
 src/libbpf.c             |  22 +-
 src/libbpf.map           |   6 +-
 src/nlattr.c             |   2 +-
 src/usdt.c               |  11 +-
 8 files changed, 347 insertions(+), 235 deletions(-)

--
2.30.2
2022-10-17 13:13:02 -07:00
Shung-Hsi Yu
3a3ef0c1d0 libbpf: Fix null-pointer dereference in find_prog_by_sec_insn()
When there are no program sections, obj->programs is left unallocated,
and find_prog_by_sec_insn()'s search lands on &obj->programs[0] == NULL,
and will cause null-pointer dereference in the following access to
prog->sec_idx.

Guard the search with obj->nr_programs similar to what's being done in
__bpf_program__iter() to prevent null-pointer access from happening.

Fixes: db2b8b06423c ("libbpf: Support CO-RE relocations for multi-prog sections")
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221012022353.7350-4-shung-hsi.yu@suse.com
2022-10-17 13:13:02 -07:00
Shung-Hsi Yu
3ee4823fcb libbpf: Deal with section with no data gracefully
ELF section data pointer returned by libelf may be NULL (if section has
SHT_NOBITS), so null check section data pointer before attempting to
copy license and kversion section.

Fixes: cb1e5e961991 ("bpf tools: Collect version and license from ELF sections")
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221012022353.7350-3-shung-hsi.yu@suse.com
2022-10-17 13:13:02 -07:00
Shung-Hsi Yu
7412775110 libbpf: Use elf_getshdrnum() instead of e_shnum
This commit replace e_shnum with the elf_getshdrnum() helper to fix two
oss-fuzz-reported heap-buffer overflow in __bpf_object__open. Both
reports are incorrectly marked as fixed and while still being
reproducible in the latest libbpf.

  # clusterfuzz-testcase-minimized-bpf-object-fuzzer-5747922482888704
  libbpf: loading object 'fuzz-object' from buffer
  libbpf: sec_cnt is 0
  libbpf: elf: section(1) .data, size 0, link 538976288, flags 2020202020202020, type=2
  libbpf: elf: section(2) .data, size 32, link 538976288, flags 202020202020ff20, type=1
  =================================================================
  ==13==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000c0 at pc 0x0000005a7b46 bp 0x7ffd12214af0 sp 0x7ffd12214ae8
  WRITE of size 4 at 0x6020000000c0 thread T0
  SCARINESS: 46 (4-byte-write-heap-buffer-overflow-far-from-bounds)
      #0 0x5a7b45 in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3414:24
      #1 0x5733c0 in bpf_object_open /src/libbpf/src/libbpf.c:7223:16
      #2 0x5739fd in bpf_object__open_mem /src/libbpf/src/libbpf.c:7263:20
      ...

The issue lie in libbpf's direct use of e_shnum field in ELF header as
the section header count. Where as libelf implemented an extra logic
that, when e_shnum == 0 && e_shoff != 0, will use sh_size member of the
initial section header as the real section header count (part of ELF
spec to accommodate situation where section header counter is larger
than SHN_LORESERVE).

The above inconsistency lead to libbpf writing into a zero-entry calloc
area. So intead of using e_shnum directly, use the elf_getshdrnum()
helper provided by libelf to retrieve the section header counter into
sec_cnt.

Fixes: 0d6988e16a12 ("libbpf: Fix section counting logic")
Fixes: 25bbbd7a444b ("libbpf: Remove assumptions about uniqueness of .rodata/.data/.bss maps")
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=40868
Link: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=40957
Link: https://lore.kernel.org/bpf/20221012022353.7350-2-shung-hsi.yu@suse.com
2022-10-17 13:13:02 -07:00
Xu Kuohai
881a10980b libbpf: Fix memory leak in parse_usdt_arg()
In the arm64 version of parse_usdt_arg(), when sscanf returns 2, reg_name
is allocated but not freed. Fix it.

Fixes: 0f8619929c57 ("libbpf: Usdt aarch64 arg parsing support")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20221011120108.782373-3-xukuohai@huaweicloud.com
2022-10-17 13:13:02 -07:00
Xu Kuohai
54caf920db libbpf: Fix use-after-free in btf_dump_name_dups
ASAN reports an use-after-free in btf_dump_name_dups:

ERROR: AddressSanitizer: heap-use-after-free on address 0xffff927006db at pc 0xaaaab5dfb618 bp 0xffffdd89b890 sp 0xffffdd89b928
READ of size 2 at 0xffff927006db thread T0
    #0 0xaaaab5dfb614 in __interceptor_strcmp.part.0 (test_progs+0x21b614)
    #1 0xaaaab635f144 in str_equal_fn tools/lib/bpf/btf_dump.c:127
    #2 0xaaaab635e3e0 in hashmap_find_entry tools/lib/bpf/hashmap.c:143
    #3 0xaaaab635e72c in hashmap__find tools/lib/bpf/hashmap.c:212
    #4 0xaaaab6362258 in btf_dump_name_dups tools/lib/bpf/btf_dump.c:1525
    #5 0xaaaab636240c in btf_dump_resolve_name tools/lib/bpf/btf_dump.c:1552
    #6 0xaaaab6362598 in btf_dump_type_name tools/lib/bpf/btf_dump.c:1567
    #7 0xaaaab6360b48 in btf_dump_emit_struct_def tools/lib/bpf/btf_dump.c:912
    #8 0xaaaab6360630 in btf_dump_emit_type tools/lib/bpf/btf_dump.c:798
    #9 0xaaaab635f720 in btf_dump__dump_type tools/lib/bpf/btf_dump.c:282
    #10 0xaaaab608523c in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:236
    #11 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
    #12 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
    #13 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
    #14 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
    #15 0xaaaab5d65990  (test_progs+0x185990)

0xffff927006db is located 11 bytes inside of 16-byte region [0xffff927006d0,0xffff927006e0)
freed by thread T0 here:
    #0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4)
    #1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191
    #2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163
    #3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106
    #4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157
    #5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519
    #6 0xaaaab6353e10 in btf__add_field tools/lib/bpf/btf.c:2032
    #7 0xaaaab6084fcc in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:232
    #8 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
    #9 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
    #10 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
    #11 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
    #12 0xaaaab5d65990  (test_progs+0x185990)

previously allocated by thread T0 here:
    #0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4)
    #1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191
    #2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163
    #3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106
    #4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157
    #5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519
    #6 0xaaaab6353ff0 in btf_add_enum_common tools/lib/bpf/btf.c:2070
    #7 0xaaaab6354080 in btf__add_enum tools/lib/bpf/btf.c:2102
    #8 0xaaaab6082f50 in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:162
    #9 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
    #10 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
    #11 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
    #12 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
    #13 0xaaaab5d65990  (test_progs+0x185990)

The reason is that the key stored in hash table name_map is a string
address, and the string memory is allocated by realloc() function, when
the memory is resized by realloc() later, the old memory may be freed,
so the address stored in name_map references to a freed memory, causing
use-after-free.

Fix it by storing duplicated string address in name_map.

Fixes: 919d2b1dbb07 ("libbpf: Allow modification of BTF and add btf__add_str API")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20221011120108.782373-2-xukuohai@huaweicloud.com
2022-10-17 13:13:02 -07:00
Roberto Sassu
0d6c47523c libbpf: Introduce bpf_link_get_fd_by_id_opts()
Introduce bpf_link_get_fd_by_id_opts(), for symmetry with
bpf_map_get_fd_by_id_opts(), to let the caller pass the newly introduced
data structure bpf_get_fd_by_id_opts. Keep the existing
bpf_link_get_fd_by_id(), and call bpf_link_get_fd_by_id_opts() with NULL as
opts argument, to prevent setting open_flags.

Currently, the kernel does not support non-zero open_flags for
bpf_link_get_fd_by_id_opts(), and a call with them will result in an error
returned by the bpf() system call. The caller should always pass zero
open_flags.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221006110736.84253-6-roberto.sassu@huaweicloud.com
2022-10-17 13:13:02 -07:00
Roberto Sassu
998282f179 libbpf: Introduce bpf_btf_get_fd_by_id_opts()
Introduce bpf_btf_get_fd_by_id_opts(), for symmetry with
bpf_map_get_fd_by_id_opts(), to let the caller pass the newly introduced
data structure bpf_get_fd_by_id_opts. Keep the existing
bpf_btf_get_fd_by_id(), and call bpf_btf_get_fd_by_id_opts() with NULL as
opts argument, to prevent setting open_flags.

Currently, the kernel does not support non-zero open_flags for
bpf_btf_get_fd_by_id_opts(), and a call with them will result in an error
returned by the bpf() system call. The caller should always pass zero
open_flags.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221006110736.84253-5-roberto.sassu@huaweicloud.com
2022-10-17 13:13:02 -07:00
Roberto Sassu
d6d1ec5b25 libbpf: Introduce bpf_prog_get_fd_by_id_opts()
Introduce bpf_prog_get_fd_by_id_opts(), for symmetry with
bpf_map_get_fd_by_id_opts(), to let the caller pass the newly introduced
data structure bpf_get_fd_by_id_opts. Keep the existing
bpf_prog_get_fd_by_id(), and call bpf_prog_get_fd_by_id_opts() with NULL as
opts argument, to prevent setting open_flags.

Currently, the kernel does not support non-zero open_flags for
bpf_prog_get_fd_by_id_opts(), and a call with them will result in an error
returned by the bpf() system call. The caller should always pass zero
open_flags.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221006110736.84253-4-roberto.sassu@huaweicloud.com
2022-10-17 13:13:02 -07:00
Roberto Sassu
a719cae6aa libbpf: Introduce bpf_get_fd_by_id_opts and bpf_map_get_fd_by_id_opts()
Define a new data structure called bpf_get_fd_by_id_opts, with the member
open_flags, to be used by callers of the _opts variants of
bpf_*_get_fd_by_id() to specify the permissions needed for the file
descriptor to be obtained.

Also, introduce bpf_map_get_fd_by_id_opts(), to let the caller pass a
bpf_get_fd_by_id_opts structure.

Finally, keep the existing bpf_map_get_fd_by_id(), and call
bpf_map_get_fd_by_id_opts() with NULL as opts argument, to request
read-write permissions (current behavior).

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221006110736.84253-3-roberto.sassu@huaweicloud.com
2022-10-17 13:13:02 -07:00
Roberto Sassu
07024c87de libbpf: Fix LIBBPF_1.0.0 declaration in libbpf.map
Add the missing LIBBPF_0.8.0 at the end of the LIBBPF_1.0.0 declaration,
similarly to other version declarations.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221006110736.84253-2-roberto.sassu@huaweicloud.com
2022-10-17 13:13:02 -07:00
Andrii Nakryiko
19ef40cee6 bpf: explicitly define BPF_FUNC_xxx integer values
Historically enum bpf_func_id's BPF_FUNC_xxx enumerators relied on
implicit sequential values being assigned by compiler. This is
convenient, as new BPF helpers are always added at the very end, but it
also has its downsides, some of them being:

  - with over 200 helpers now it's very hard to know what's each helper's ID,
    which is often important to know when working with BPF assembly (e.g.,
    by dumping raw bpf assembly instructions with llvm-objdump -d
    command). it's possible to work around this by looking into vmlinux.h,
    dumping /sys/btf/kernel/vmlinux, looking at libbpf-provided
    bpf_helper_defs.h, etc. But it always feels like an unnecessary step
    and one should be able to quickly figure this out from UAPI header.

  - when backporting and cherry-picking only some BPF helpers onto older
    kernels it's important to be able to skip some enum values for helpers
    that weren't backported, but preserve absolute integer IDs to keep BPF
    helper IDs stable so that BPF programs stay portable across upstream
    and backported kernels.

While neither problem is insurmountable, they come up frequently enough
and are annoying enough to warrant improving the situation. And for the
backporting the problem can easily go unnoticed for a while, especially
if backport is done with people not very familiar with BPF subsystem overall.

Anyways, it's easy to fix this by making sure that __BPF_FUNC_MAPPER
macro provides explicit helper IDs. Unfortunately that would potentially
break existing users that use UAPI-exposed __BPF_FUNC_MAPPER and are
expected to pass macro that accepts only symbolic helper identifier
(e.g., map_lookup_elem for bpf_map_lookup_elem() helper).

As such, we need to introduce a new macro (___BPF_FUNC_MAPPER) which
would specify both identifier and integer ID, but in such a way as to
allow existing __BPF_FUNC_MAPPER be expressed in terms of new
___BPF_FUNC_MAPPER macro. And that's what this patch is doing. To avoid
duplication and allow __BPF_FUNC_MAPPER stay *exactly* the same,
___BPF_FUNC_MAPPER accepts arbitrary "context" arguments, which can be
used to pass any extra macros, arguments, and whatnot. In our case we
use this to pass original user-provided macro that expects single
argument and __BPF_FUNC_MAPPER is using it's own three-argument
__BPF_FUNC_MAPPER_APPLY intermediate macro to impedance-match new and
old "callback" macros.

Once we resolve this, we use new ___BPF_FUNC_MAPPER to define enum
bpf_func_id with explicit values. The other users of __BPF_FUNC_MAPPER
in kernel (namely in kernel/bpf/disasm.c) are kept exactly the same both
as demonstration that backwards compat works, but also to avoid
unnecessary code churn.

Note that new ___BPF_FUNC_MAPPER() doesn't forcefully insert comma
between values, as that might not be appropriate in all possible cases
where ___BPF_FUNC_MAPPER might be used by users. This doesn't reduce
usability, as it's trivial to insert that comma inside "callback" macro.

To validate all the manually specified IDs are exactly right, we used
BTF to compare before and after values:

  $ bpftool btf dump file ~/linux-build/default/vmlinux | rg bpf_func_id -A 211 > after.txt
  $ git stash # stach UAPI changes
  $ make -j90
  ... re-building kernel without UAPI changes ...
  $ bpftool btf dump file ~/linux-build/default/vmlinux | rg bpf_func_id -A 211 > before.txt
  $ diff -u before.txt after.txt
  --- before.txt  2022-10-05 10:48:18.119195916 -0700
  +++ after.txt   2022-10-05 10:46:49.446615025 -0700
  @@ -1,4 +1,4 @@
  -[14576] ENUM 'bpf_func_id' encoding=UNSIGNED size=4 vlen=211
  +[9560] ENUM 'bpf_func_id' encoding=UNSIGNED size=4 vlen=211
          'BPF_FUNC_unspec' val=0
          'BPF_FUNC_map_lookup_elem' val=1
          'BPF_FUNC_map_update_elem' val=2

As can be seen from diff above, the only thing that changed was resulting BTF
type ID of ENUM bpf_func_id, not any of the enumerators, their names or integer
values.

The only other place that needed fixing was scripts/bpf_doc.py used to generate
man pages and bpf_helper_defs.h header for libbpf and selftests. That script is
tightly-coupled to exact shape of ___BPF_FUNC_MAPPER macro definition, so had
to be trivially adapted.

Cc: Quentin Monnet <quentin@isovalent.com>
Reported-by: Andrea Terzolo <andrea.terzolo@polito.it>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221006042452.2089843-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-17 13:13:02 -07:00
Eduard Zingerman
3d3ff49213 bpftool: Print newline before '}' for struct with padding only fields
btf_dump_emit_struct_def attempts to print empty structures at a
single line, e.g. `struct empty {}`. However, it has to account for a
case when there are no regular but some padding fields in the struct.
In such case `vlen` would be zero, but size would be non-zero.

E.g. here is struct bpf_timer from vmlinux.h before this patch:

 struct bpf_timer {
 	long: 64;
	long: 64;};

And after this patch:

 struct bpf_dynptr {
 	long: 64;
	long: 64;
 };

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221001104425.415768-1-eddyz87@gmail.com
2022-10-17 13:13:02 -07:00
Xin Liu
3745a20b28 libbpf: Fix overrun in netlink attribute iteration
I accidentally found that a change in commit 1045b03e07d8 ("netlink: fix
overrun in attribute iteration") was not synchronized to the function
`nla_ok` in tools/lib/bpf/nlattr.c, I think it is necessary to modify,
this patch will do it.

Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220930090708.62394-1-liuxin350@huawei.com
2022-10-17 13:13:02 -07:00
Kui-Feng Lee
b9e909dd41 bpf: Handle bpf_link_info for the parameterized task BPF iterators.
Add new fields to bpf_link_info that users can query it through
bpf_obj_get_info_by_fd().

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20220926184957.208194-3-kuifeng@fb.com
2022-10-17 13:13:02 -07:00
Kui-Feng Lee
73c0c44b67 bpf: Parameterize task iterators.
Allow creating an iterator that loops through resources of one
thread/process.

People could only create iterators to loop through all resources of
files, vma, and tasks in the system, even though they were interested
in only the resources of a specific task or process.  Passing the
additional parameters, people can now create an iterator to go
through all resources or only the resources of a task.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20220926184957.208194-2-kuifeng@fb.com
2022-10-17 13:13:02 -07:00
Daniel Müller
abde7fb314 Remove lru_bug from DENYLIST-latest.s390x
The comment associated with the entry is a bit confusing. It stemmed
from the test being denylisted on bpf, but not bpf-next in the past.
Regardless, by now said change has propagated to both trees, so we no
longer need to carry around this deny list entry here.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-10-12 09:29:08 -07:00
Manu Bretelle
63389d32f6 ci: remove mkrootfs from libbpf/libbpf
This is being moved to libbpf/ci instead https://github.com/libbpf/ci/pull/44

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2022-10-11 09:14:31 -07:00
Frantisek Sumsal
59080bd06c ci: use CodeQL instead of LGTM
As LGTM is going to be shut down by EOY[0], let's move the code scanning to
CodeQL as recommended. Thanks to GH integration the results from such
scans will be shown both in the respective PR and in the Security ->
Code Scanning tab[1].

[0] https://github.blog/2022-08-15-the-next-step-for-lgtm-com-github-code-scanning/
[1] https://github.com/libbpf/libbpf/security/code-scanning
2022-10-10 16:31:14 -07:00
Daniel Müller
8b0b41f812 Remove travis-ci symlink
With https://github.com/libbpf/ci/pull/41 merged we no longer require
the travis-ci symlink in this repository. Remove it. Also, it turns out
we still have a few locations referencing travis-ci/ instead of ci/.
Convert those.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-10-07 11:54:39 -07:00
chantra
6bd5b40bcd ci: install wget package on s390x runners
`wget` is installed by default in GH runners.
It is used in
[`get-linux-source`](79c799d6fb/get-linux-source/checkout_latest_kernel.sh (L32))
to download source faster than through a git fetch.

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2022-10-06 14:24:50 -07:00
chantra
6cd8907a4a ci: update actions-runner to 2.298.2 on s390x
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2022-10-06 14:24:50 -07:00
chantra
fa2875be8a ci: install zstd on s390x runners
zstd is installed by [default in GH
runners](https://github.com/actions/runner-images/blob/main/images/linux/Ubuntu2004-Readme.md).

Having it by default, we can start leveraging it when uploading
artifacts. It has a better compression ratio and is multithreaded.

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2022-10-06 14:24:50 -07:00
chantra
27a93eae7c [s390x][ci] Force replacing workers when a worker already exist with
same name.

This is essentially aligning whith what is done in
0f2883e196/entrypoint.sh (L90-L91)

The issue at hand did manifest on s390x host when restarting a runner
and GH having an existing runner with the same name.
The logic was to default to not replace it and the runner would be
started with somne defaults, which mean the name would change, and the
labels would be lost, making the runner unusable (while still running): https://gist.github.com/chantra/ef0bd3e0c9e35bb82619636acf2f7c98

By replacing the existing runner, we will not get into that state.
2022-10-04 10:52:07 -07:00
thiagoftsm
dac1c4b6a8 Merge branch 'libbpf:master' into master 2022-09-29 21:37:39 +00:00
Andrii Nakryiko
1714037104 vmtest: regenerate latest vmlinux.h
Update checked in vmlinux.h for 5.5 kernel tests.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-09-27 15:23:45 -07:00
Andrii Nakryiko
d598cb20c7 libbpf: bump version to 1.1.0
Bump LIBBPF_MINOR_VERSION to 1 for v1.1.0.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-09-27 15:23:45 -07:00
Andrii Nakryiko
ce321d6fd4 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   e34cfee65ec891a319ce79797dda18083af33a76
Checkpoint bpf-next commit: 87dbdc230d162bf9ee1ac77c8ade178b6b1e199e
Baseline bpf commit:        14b20b784f59bdd95f6f1cfb112c9818bcec4d84
Checkpoint bpf commit:      60240bc26114543fcbfcd8a28466e67e77b20388

Andrii Nakryiko (3):
  libbpf: Fix crash if SEC("freplace") programs don't have
    attach_prog_fd set
  libbpf: restore memory layout of bpf_object_open_opts
  libbpf: Don't require full struct enum64 in UAPI headers

Benjamin Tissoires (1):
  libbpf: add map_get_fd_by_id and map_delete_elem in light skeleton

Daniel Borkmann (1):
  libbpf: Remove gcc support for bpf_tail_call_static for now

David Vernet (3):
  bpf: Define new BPF_MAP_TYPE_USER_RINGBUF map type
  bpf: Add bpf_user_ringbuf_drain() helper
  bpf: Add libbpf logic for user-space ring buffer

Hao Luo (2):
  bpf: Introduce cgroup iter
  bpf: Add CGROUP prefix to cgroup_iter_order

James Hilliard (1):
  libbpf: Add GCC support for bpf_tail_call_static

Jiri Olsa (1):
  bpf: Return value in kprobe get_func_ip only for entry address

Jon Doron (1):
  libbpf: Fix the case of running as non-root with capabilities

Pu Lehui (1):
  bpf, cgroup: Reject prog_attach_flags array when effective query

Quentin Monnet (1):
  bpf: Fix a few typos in BPF helpers documentation

Shmulik Ladkani (2):
  bpf, flow_dissector: Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode for
    bpf progs
  bpf: Support getting tunnel flags

Stanislav Fomichev (1):
  bpf: update bpf_{g,s}et_retval documentation

Tao Chen (1):
  libbpf: Support raw BTF placed in the default search path

Wang Yufen (1):
  libbpf: Add pathname_concat() helper

Xin Liu (2):
  libbpf: Clean up legacy bpf maps declaration in bpf_helpers
  libbpf: Fix NULL pointer exception in API btf_dump__dump_type_data

Yonghong Song (3):
  bpf: Update descriptions for helpers bpf_get_func_arg[_cnt]()
  libbpf: Add new BPF_PROG2 macro
  libbpf: Improve BPF_PROG2 macro code quality and description

 include/uapi/linux/bpf.h | 139 +++++++++++++++++---
 src/bpf_helpers.h        |  12 --
 src/bpf_tracing.h        | 107 ++++++++++++++++
 src/btf.c                |  32 ++---
 src/btf.h                |  25 +++-
 src/btf_dump.c           |   2 +-
 src/libbpf.c             | 106 ++++++++-------
 src/libbpf.h             | 111 +++++++++++++++-
 src/libbpf.map           |  10 ++
 src/libbpf_probes.c      |   1 +
 src/libbpf_version.h     |   2 +-
 src/ringbuf.c            | 271 +++++++++++++++++++++++++++++++++++++++
 src/skel_internal.h      |  23 ++++
 src/usdt.c               |   2 +-
 14 files changed, 731 insertions(+), 112 deletions(-)

--
2.30.2
2022-09-27 15:23:45 -07:00
Andrii Nakryiko
0f5b3a10ae sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-09-27 15:23:45 -07:00
Pu Lehui
5859c59e50 bpf, cgroup: Reject prog_attach_flags array when effective query
Attach flags is only valid for attached progs of this layer cgroup,
but not for effective progs. For querying with EFFECTIVE flags,
exporting attach flags does not make sense. So when effective query,
we reject prog_attach_flags array and don't need to populate it.
Also we limit attach_flags to output 0 during effective query.

Fixes: b79c9fc9551b ("bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20220921104604.2340580-2-pulehui@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-09-27 15:23:45 -07:00
Andrii Nakryiko
85f8b7c4dc libbpf: Don't require full struct enum64 in UAPI headers
Drop the requirement for system-wide kernel UAPI headers to provide full
struct btf_enum64 definition. This is an unexpected requirement that
slipped in libbpf 1.0 and put unnecessary pressure ([0]) on users to have
a bleeding-edge kernel UAPI header from unreleased Linux 6.0.

To achieve this, we forward declare struct btf_enum64. But that's not
enough as there is btf_enum64_value() helper that expects to know the
layout of struct btf_enum64. So we get a bit creative with
reinterpreting memory layout as array of __u32 and accesing lo32/hi32
fields as array elements. Alternative way would be to have a local
pointer variable for anonymous struct with exactly the same layout as
struct btf_enum64, but that gets us into C++ compiler errors complaining
about invalid type casts. So play it safe, if ugly.

  [0] Closes: https://github.com/libbpf/libbpf/issues/562

Fixes: d90ec262b35b ("libbpf: Add enum64 support for btf_dump")
Reported-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://lore.kernel.org/bpf/20220927042940.147185-1-andrii@kernel.org
2022-09-27 15:23:45 -07:00
Jon Doron
9da0dcb621 libbpf: Fix the case of running as non-root with capabilities
When running rootless with special capabilities like:
FOWNER / DAC_OVERRIDE / DAC_READ_SEARCH

The "access" API will not make the proper check if there is really
access to a file or not.

>From the access man page:
"
The check is done using the calling process's real UID and GID, rather
than the effective IDs as is done when actually attempting an operation
(e.g., open(2)) on the file.  Similarly, for the root user, the check
uses the set of permitted capabilities  rather than the set of effective
capabilities; ***and for non-root users, the check uses an empty set of
capabilities.***
"

What that means is that for non-root user the access API will not do the
proper validation if the process really has permission to a file or not.

To resolve this this patch replaces all the access API calls with
faccessat with AT_EACCESS flag.

Signed-off-by: Jon Doron <jond@wiz.io>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220925070431.1313680-1-arilou@gmail.com
2022-09-27 15:23:45 -07:00
Jiri Olsa
82c4054376 bpf: Return value in kprobe get_func_ip only for entry address
Changing return value of kprobe's version of bpf_get_func_ip
to return zero if the attach address is not on the function's
entry point.

For kprobes attached in the middle of the function we can't easily
get to the function address especially now with the CONFIG_X86_KERNEL_IBT
support.

If user cares about current IP for kprobes attached within the
function body, they can get it with PT_REGS_IP(ctx).

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220926153340.1621984-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-27 15:23:45 -07:00
Andrii Nakryiko
b3a117773d libbpf: restore memory layout of bpf_object_open_opts
When attach_prog_fd field was removed in libbpf 1.0 and replaced with
`long: 0` placeholder, it actually shifted all the subsequent fields by
8 byte. This is due to `long: 0` promising to adjust next field's offset
to long-aligned offset. But in this case we were already long-aligned
as pin_root_path is a pointer. So `long: 0` had no effect, and thus
didn't feel the gap created by removed attach_prog_fd.

Non-zero bitfield should have been used instead. I validated using
pahole. Originally kconfig field was at offset 40. With `long: 0` it's
at offset 32, which is wrong. With this change it's back at offset 40.

While technically libbpf 1.0 is allowed to break backwards
compatibility and applications should have been recompiled against
libbpf 1.0 headers, but given how trivial it is to preserve memory
layout, let's fix this.

Reported-by: Grant Seltzer Richman <grantseltzer@gmail.com>
Fixes: 146bf811f5ac ("libbpf: remove most other deprecated high-level APIs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220923230559.666608-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-09-27 15:23:45 -07:00
Wang Yufen
fc2577c54c libbpf: Add pathname_concat() helper
Move snprintf and len check to common helper pathname_concat() to make the
code simpler.

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1663828124-10437-1-git-send-email-wangyufen@huawei.com
2022-09-27 15:23:45 -07:00
Tao Chen
0420f75dbc libbpf: Support raw BTF placed in the default search path
Currently, the default vmlinux files at '/boot/vmlinux-*',
'/lib/modules/*/vmlinux-*' etc. are parsed with 'btf__parse_elf()' to
extract BTF. It is possible that these files are actually raw BTF files
similar to /sys/kernel/btf/vmlinux. So parse these files with
'btf__parse' which tries both raw format and ELF format.

This might be useful in some scenarios where users put their custom BTF
into known locations and don't want to specify btf_custom_path option.

Signed-off-by: Tao Chen <chentao.kernel@linux.alibaba.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/3f59fb5a345d2e4f10e16fe9e35fbc4c03ecaa3e.1662999860.git.chentao.kernel@linux.alibaba.com
2022-09-27 15:23:45 -07:00
Yonghong Song
aa25f218b4 libbpf: Improve BPF_PROG2 macro code quality and description
Commit 34586d29f8df ("libbpf: Add new BPF_PROG2 macro") added BPF_PROG2
macro for trampoline based programs with struct arguments. Andrii
made a few suggestions to improve code quality and description.
This patch implemented these suggestions including better internal
macro name, consistent usage pattern for __builtin_choose_expr(),
simpler macro definition for always-inline func arguments and
better macro description.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20220910025214.1536510-1-yhs@fb.com
2022-09-27 15:23:45 -07:00
David Vernet
9e9bf46c92 bpf: Add libbpf logic for user-space ring buffer
Now that all of the logic is in place in the kernel to support user-space
produced ring buffers, we can add the user-space logic to libbpf. This
patch therefore adds the following public symbols to libbpf:

struct user_ring_buffer *
user_ring_buffer__new(int map_fd,
		      const struct user_ring_buffer_opts *opts);
void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
                                         __u32 size, int timeout_ms);
void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample);
void user_ring_buffer__discard(struct user_ring_buffer *rb,
void user_ring_buffer__free(struct user_ring_buffer *rb);

A user-space producer must first create a struct user_ring_buffer * object
with user_ring_buffer__new(), and can then reserve samples in the
ring buffer using one of the following two symbols:

void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
                                         __u32 size, int timeout_ms);

With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring
buffer will be returned if sufficient space is available in the buffer.
user_ring_buffer__reserve_blocking() provides similar semantics, but will
block for up to 'timeout_ms' in epoll_wait if there is insufficient space
in the buffer. This function has the guarantee from the kernel that it will
receive at least one event-notification per invocation to
bpf_ringbuf_drain(), provided that at least one sample is drained, and the
BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain().

Once a sample is reserved, it must either be committed to the ring buffer
with user_ring_buffer__submit(), or discarded with
user_ring_buffer__discard().

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com
2022-09-27 15:23:45 -07:00
David Vernet
28903eb40e bpf: Add bpf_user_ringbuf_drain() helper
In a prior change, we added a new BPF_MAP_TYPE_USER_RINGBUF map type which
will allow user-space applications to publish messages to a ring buffer
that is consumed by a BPF program in kernel-space. In order for this
map-type to be useful, it will require a BPF helper function that BPF
programs can invoke to drain samples from the ring buffer, and invoke
callbacks on those samples. This change adds that capability via a new BPF
helper function:

bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx,
                       u64 flags)

BPF programs may invoke this function to run callback_fn() on a series of
samples in the ring buffer. callback_fn() has the following signature:

long callback_fn(struct bpf_dynptr *dynptr, void *context);

Samples are provided to the callback in the form of struct bpf_dynptr *'s,
which the program can read using BPF helper functions for querying
struct bpf_dynptr's.

In order to support bpf_ringbuf_drain(), a new PTR_TO_DYNPTR register
type is added to the verifier to reflect a dynptr that was allocated by
a helper function and passed to a BPF program. Unlike PTR_TO_STACK
dynptrs which are allocated on the stack by a BPF program, PTR_TO_DYNPTR
dynptrs need not use reference tracking, as the BPF helper is trusted to
properly free the dynptr before returning. The verifier currently only
supports PTR_TO_DYNPTR registers that are also DYNPTR_TYPE_LOCAL.

Note that while the corresponding user-space libbpf logic will be added
in a subsequent patch, this patch does contain an implementation of the
.map_poll() callback for BPF_MAP_TYPE_USER_RINGBUF maps. This
.map_poll() callback guarantees that an epoll-waiting user-space
producer will receive at least one event notification whenever at least
one sample is drained in an invocation of bpf_user_ringbuf_drain(),
provided that the function is not invoked with the BPF_RB_NO_WAKEUP
flag. If the BPF_RB_FORCE_WAKEUP flag is provided, a wakeup
notification is sent even if no sample was drained.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920000100.477320-3-void@manifault.com
2022-09-27 15:23:45 -07:00
David Vernet
8138aa78bd bpf: Define new BPF_MAP_TYPE_USER_RINGBUF map type
We want to support a ringbuf map type where samples are published from
user-space, to be consumed by BPF programs. BPF currently supports a
kernel -> user-space circular ring buffer via the BPF_MAP_TYPE_RINGBUF
map type.  We'll need to define a new map type for user-space -> kernel,
as none of the helpers exported for BPF_MAP_TYPE_RINGBUF will apply
to a user-space producer ring buffer, and we'll want to add one or
more helper functions that would not apply for a kernel-producer
ring buffer.

This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type
definition. The map type is useless in its current form, as there is no
way to access or use it for anything until we one or more BPF helpers. A
follow-on patch will therefore add a new helper function that allows BPF
programs to run callbacks on samples that are published to the ring
buffer.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920000100.477320-2-void@manifault.com
2022-09-27 15:23:45 -07:00
Xin Liu
8ac9773f52 libbpf: Fix NULL pointer exception in API btf_dump__dump_type_data
We found that function btf_dump__dump_type_data can be called by the
user as an API, but in this function, the `opts` parameter may be used
as a null pointer.This causes `opts->indent_str` to trigger a NULL
pointer exception.

Fixes: 2ce8450ef5a3 ("libbpf: add bpf_object__open_{file, mem} w/ extensible opts")
Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Weibin Kong <kongweibin2@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220917084809.30770-1-liuxin350@huawei.com
2022-09-27 15:23:45 -07:00
Xin Liu
b63791cbde libbpf: Clean up legacy bpf maps declaration in bpf_helpers
Legacy BPF map declarations are no longer supported in libbpf v1.0 [0].
Only BTF-defined maps are supported starting from v1.0, so it is time to
remove the definition of bpf_map_def in bpf_helpers.h.

  [0] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0

Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20220913073643.19960-1-liuxin350@huawei.com
2022-09-27 15:23:45 -07:00
Andrii Nakryiko
0ff6d28aec libbpf: Fix crash if SEC("freplace") programs don't have attach_prog_fd set
Fix SIGSEGV caused by libbpf trying to find attach type in vmlinux BTF
for freplace programs. It's wrong to search in vmlinux BTF and libbpf
doesn't even mark vmlinux BTF as required for freplace programs. So
trying to search anything in obj->vmlinux_btf might cause NULL
dereference if nothing else in BPF object requires vmlinux BTF.

Instead, error out if freplace (EXT) program doesn't specify
attach_prog_fd during at the load time.

Fixes: 91abb4a6d79d ("libbpf: Support attachment of BPF tracing programs to kernel modules")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220909193053.577111-3-andrii@kernel.org
2022-09-27 15:23:45 -07:00
Daniel Borkmann
861364fa45 libbpf: Remove gcc support for bpf_tail_call_static for now
This reverts commit 14e5ce79943a ("libbpf: Add GCC support for
bpf_tail_call_static"). Reason is that gcc invented their own BPF asm
which is not conform with LLVM one, and going forward this would be
more painful to maintain here and in other areas of the library. Thus
remove it; ask to gcc folks is to align with LLVM one to use exact
same syntax.

Fixes: 14e5ce79943a ("libbpf: Add GCC support for bpf_tail_call_static")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: James Hilliard <james.hilliard1@gmail.com>
Cc: Jose E. Marchesi <jose.marchesi@oracle.com>
2022-09-27 15:23:45 -07:00
Yonghong Song
21ec5ca723 libbpf: Add new BPF_PROG2 macro
To support struct arguments in trampoline based programs,
existing BPF_PROG doesn't work any more since
the type size is needed to find whether a parameter
takes one or two registers. So this patch added a new
BPF_PROG2 macro to support such trampoline programs.

The idea is suggested by Andrii. For example, if the
to-be-traced function has signature like
  typedef struct {
       void *x;
       int t;
  } sockptr;
  int blah(sockptr x, char y);

In the new BPF_PROG2 macro, the argument can be
represented as
  __bpf_prog_call(
     ({ union {
          struct { __u64 x, y; } ___z;
          sockptr x;
        } ___tmp = { .___z = { ctx[0], ctx[1] }};
        ___tmp.x;
     }),
     ({ union {
          struct { __u8 x; } ___z;
          char y;
        } ___tmp = { .___z = { ctx[2] }};
        ___tmp.y;
     }));
In the above, the values stored on the stack are properly
assigned to the actual argument type value by using 'union'
magic. Note that the macro also works even if no arguments
are with struct types.

Note that new BPF_PROG2 works for both llvm16 and pre-llvm16
compilers where llvm16 supports bpf target passing value
with struct up to 16 byte size and pre-llvm16 will pass
by reference by storing values on the stack. With static functions
with struct argument as always inline, the compiler is able
to optimize and remove additional stack saving of struct values.

Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220831152707.2079473-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-27 15:23:45 -07:00
Yonghong Song
255690da57 bpf: Update descriptions for helpers bpf_get_func_arg[_cnt]()
Now instead of the number of arguments, the number of registers
holding argument values are stored in trampoline. Update
the description of bpf_get_func_arg[_cnt]() helpers. Previous
programs without struct arguments should continue to work
as usual.

Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220831152657.2078805-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-27 15:23:45 -07:00
Shmulik Ladkani
b1753eaf3b bpf: Support getting tunnel flags
Existing 'bpf_skb_get_tunnel_key' extracts various tunnel parameters
(id, ttl, tos, local and remote) but does not expose ip_tunnel_info's
tun_flags to the BPF program.

It makes sense to expose tun_flags to the BPF program.

Assume for example multiple GRE tunnels maintained on a single GRE
interface in collect_md mode. The program expects origins to initiate
over GRE, however different origins use different GRE characteristics
(e.g. some prefer to use GRE checksum, some do not; some pass a GRE key,
some do not, etc..).

A BPF program getting tun_flags can therefore remember the relevant
flags (e.g. TUNNEL_CSUM, TUNNEL_SEQ...) for each initiating remote. In
the reply path, the program can use 'bpf_skb_set_tunnel_key' in order
to correctly reply to the remote, using similar characteristics, based
on the stored tunnel flags.

Introduce BPF_F_TUNINFO_FLAGS flag for bpf_skb_get_tunnel_key. If
specified, 'bpf_tunnel_key->tunnel_flags' is set with the tun_flags.

Decided to use the existing unused 'tunnel_ext' as the storage for the
'tunnel_flags' in order to avoid changing bpf_tunnel_key's layout.

Also, the following has been considered during the design:

  1. Convert the "interesting" internal TUNNEL_xxx flags back to BPF_F_yyy
     and place into the new 'tunnel_flags' field. This has 2 drawbacks:

     - The BPF_F_yyy flags are from *set_tunnel_key* enumeration space,
       e.g. BPF_F_ZERO_CSUM_TX. It is awkward that it is "returned" into
       tunnel_flags from a *get_tunnel_key* call.
     - Not all "interesting" TUNNEL_xxx flags can be mapped to existing
       BPF_F_yyy flags, and it doesn't make sense to create new BPF_F_yyy
       flags just for purposes of the returned tunnel_flags.

  2. Place key.tun_flags into 'tunnel_flags' but mask them, keeping only
     "interesting" flags. That's ok, but the drawback is that what's
     "interesting" for my usecase might be limiting for other usecases.

Therefore I decided to expose what's in key.tun_flags *as is*, which seems
most flexible. The BPF user can just choose to ignore bits he's not
interested in. The TUNNEL_xxx are also UAPI, so no harm exposing them
back in the get_tunnel_key call.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220831144010.174110-1-shmulik.ladkani@gmail.com
2022-09-27 15:23:45 -07:00
James Hilliard
eeb2bc4061 libbpf: Add GCC support for bpf_tail_call_static
The bpf_tail_call_static function is currently not defined unless
using clang >= 8.

To support bpf_tail_call_static on GCC we can check if __clang__ is
not defined to enable bpf_tail_call_static.

We need to use GCC assembly syntax when the compiler does not define
__clang__ as LLVM inline assembly is not fully compatible with GCC.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220829210546.755377-1-james.hilliard1@gmail.com
2022-09-27 15:23:45 -07:00
Quentin Monnet
a11587cc01 bpf: Fix a few typos in BPF helpers documentation
Address a few typos in the documentation for the BPF helper functions.
They were reported by Jakub [0], who ran spell checkers on the generated
man page [1].

[0] https://lore.kernel.org/linux-man/d22dcd47-023c-8f52-d369-7b5308e6c842@gmail.com/T/#mb02e7d4b7fb61d98fa914c77b581184e9a9537af
[1] https://lore.kernel.org/linux-man/eb6a1e41-c48e-ac45-5154-ac57a2c76108@gmail.com/T/#m4a8d1b003616928013ffcd1450437309ab652f9f

v3: Do not copy unrelated (and breaking) elements to tools/ header
v2: Turn a ',' into a ';'

Reported-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220825220806.107143-1-quentin@isovalent.com
2022-09-27 15:23:45 -07:00
Benjamin Tissoires
7fb6138fae libbpf: add map_get_fd_by_id and map_delete_elem in light skeleton
This allows to have a better control over maps from the kernel when
preloading eBPF programs.

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220824134055.1328882-8-benjamin.tissoires@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-27 15:23:45 -07:00
Hao Luo
c918b3e724 bpf: Add CGROUP prefix to cgroup_iter_order
bpf_cgroup_iter_order is globally visible but the entries do not have
CGROUP prefix. As requested by Andrii, put a CGROUP in the names
in bpf_cgroup_iter_order.

This patch fixes two previous commits: one introduced the API and
the other uses the API in bpf selftest (that is, the selftest
cgroup_hierarchical_stats).

I tested this patch via the following command:

  test_progs -t cgroup,iter,btf_dump

Fixes: d4ccaf58a847 ("bpf: Introduce cgroup iter")
Fixes: 88886309d2e8 ("selftests/bpf: add a selftest for cgroup hierarchical stats collection")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220825223936.1865810-1-haoluo@google.com
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
2022-09-27 15:23:45 -07:00
Hao Luo
981001bf46 bpf: Introduce cgroup iter
Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes:

 - walking a cgroup's descendants in pre-order.
 - walking a cgroup's descendants in post-order.
 - walking a cgroup's ancestors.
 - process only the given cgroup.

When attaching cgroup_iter, one can set a cgroup to the iter_link
created from attaching. This cgroup is passed as a file descriptor
or cgroup id and serves as the starting point of the walk. If no
cgroup is specified, the starting point will be the root cgroup v2.

For walking descendants, one can specify the order: either pre-order or
post-order. For walking ancestors, the walk starts at the specified
cgroup and ends at the root.

One can also terminate the walk early by returning 1 from the iter
program.

Note that because walking cgroup hierarchy holds cgroup_mutex, the iter
program is called with cgroup_mutex held.

Currently only one session is supported, which means, depending on the
volume of data bpf program intends to send to user space, the number
of cgroups that can be walked is limited. For example, given the current
buffer size is 8 * PAGE_SIZE, if the program sends 64B data for each
cgroup, assuming PAGE_SIZE is 4kb, the total number of cgroups that can
be walked is 512. This is a limitation of cgroup_iter. If the output
data is larger than the kernel buffer size, after all data in the
kernel buffer is consumed by user space, the subsequent read() syscall
will signal EOPNOTSUPP. In order to work around, the user may have to
update their program to reduce the volume of data sent to output. For
example, skip some uninteresting cgroups. In future, we may extend
bpf_iter flags to allow customizing buffer size.

Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220824233117.1312810-2-haoluo@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-27 15:23:45 -07:00
Stanislav Fomichev
ee7d295f83 bpf: update bpf_{g,s}et_retval documentation
* replace 'syscall' with 'upper layers', still mention that it's being
  exported via syscall errno
* describe what happens in set_retval(-EPERM) + return 1
* describe what happens with bind's 'return 3'

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220823222555.523590-5-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-27 15:23:45 -07:00
Shmulik Ladkani
94d69cc07f bpf, flow_dissector: Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode for bpf progs
Currently, attaching BPF_PROG_TYPE_FLOW_DISSECTOR programs completely
replaces the flow-dissector logic with custom dissection logic. This
forces implementors to write programs that handle dissection for any
flows expected in the namespace.

It makes sense for flow-dissector BPF programs to just augment the
dissector with custom logic (e.g. dissecting certain flows or custom
protocols), while enjoying the broad capabilities of the standard
dissector for any other traffic.

Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode. Flow-dissector BPF
programs may return this to indicate no dissection was made, and
fallback to the standard dissector is requested.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220821113519.116765-3-shmulik.ladkani@gmail.com
2022-09-27 15:23:45 -07:00
Mikhail Tuzikov
12a41a80c5 Adding network diag utils into actions-runner-libbpf container 2022-09-27 11:06:30 -07:00
Daniel Müller
10a32130e7 Clean up local allow/deny lists
Now that we are including the upstream allow/deny lists we can remove
any duplicates from our local lists. While at it, we also add some usdt
tests to the denylist, which are currently failing. This is the same
step we took in the vmtest repository [0].

[0] https://github.com/kernel-patches/vmtest/pull/133

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-09-06 15:01:05 -07:00
Daniel Müller
fad270918d Use deny/allow lists from upstream
So far we have relied on allow/deny lists maintained in this repository
to decide which tests to explicitly include/exclude from running in CI.
With recent changes [0] this information is now available in upstream
Linux.
As such, this change switches us over to using the upstream allow/deny
lists in addition to the local ones. We unconditionally honor the
upstream lists for all kernel versions.

[0] https://lore.kernel.org/bpf/165893461358.29339.11641967418379627671.git-patchwork-notify@kernel.org/T/#m2a97b0ea9ef0ddee7a53bbf7919e3f324b233937

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-09-06 15:01:05 -07:00
Daniel Müller
c091b07808 Fix comment: WHITELIST -> ALLOWLIST
Commit 693de729d0 ("Rename blacklists and whitelists") renamed the
black and white lists but missed the adjustment of a comment,
referencing a file name. Update it accordingly.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-09-06 14:07:51 -07:00
Daniel Müller
efd33720cd Set KERNEL and REPO_ROOT environment variable for run-qemu action
With an upcoming change we would like to invoke bpftool checks from the
run-qemu action (https://github.com/libbpf/ci/pull/37). This action
requires two environment variables, KERNEL and REPO_ROOT, set in order
to function.
Make sure to set them now. Long term we should probably make them
explicit input arguments instead of implicit global state, but there are
many more such instances that we need to clean up.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-09-01 11:00:13 -07:00
Daniel Müller
9aedff8d03 Provide kernel-root argument to run-qemu action
With https://github.com/libbpf/ci/pull/36 merged the run-qemu action now
accepts an additional argument, `kernel-root`.
Provide it to the action with the value appropriate for this repository.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-09-01 10:36:35 -07:00
Daniel Müller
51e63f7229 Explicitly provide kernel-root argument to prepare-rootfs action
Let's make the "kernel-root" explicit when using the prepare-rootfs
action, instead of relying on the default, .kernel.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-29 11:14:39 -07:00
chantra
c53af98d1a [s390x][runner] update action runner to 2.296.0 (latest) 2022-08-27 17:14:28 -07:00
chantra
2c44349e09 [s390x][runners] Use consistent runner name across restarts
Currently, the runner name is taken from the docker container's
hostname.
This changes across restarts, causing the runner name to change across
restarts too.

This uses the host name to keep a consistent name.
2022-08-27 17:14:28 -07:00
Daniel Müller
58361243ec Fix sourcing of helpers.sh in coverity workflow
The path to the helpers.sh script to source was put one level too deep
by cfbd763ef8 ("Use foldable helpers where applicable") and the
GITHUB_ACTION_PATH variable is not actually defined in a workflow.

Fix up both issues.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-26 11:30:12 -07:00
Andrii Nakryiko
c32e1cf948 README: add dark background logo image
Add auto-selectable libbpf logo for light and dark themes.

Suggested-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-08-24 22:09:09 -07:00
Andrii Nakryiko
c4f44c7c11 assets: add libbpf logo images
Add three layouts of libbpf logos (sparse, compact, sideways) with three
color variants (light bg, dark bg, monochrome).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-08-24 21:51:42 -07:00
Daniel Müller
a7a525d47a Rename test_progs_noalu function to test_progs_no_alu32
As a follow up to 66b788c1a4 ("Factor out test_progs_noalu function")
and taking into account feedback [0], this change renames the
test_progs_noalu function to test_progs_no_alu32, to stay closer to the
name of the binary being invoked.

[0] https://github.com/kernel-patches/vmtest/pull/124#discussion_r953175641

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-24 08:08:21 -07:00
Daniel Müller
cfbd763ef8 Use foldable helpers where applicable
As discussed at some earlier point in time, some of the actions/workflow
logic does not use our foldable helpers despite being able to. Switch
them over.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-23 12:04:38 -07:00
109 changed files with 96964 additions and 83022 deletions

1
.gitattributes vendored Normal file
View File

@@ -0,0 +1 @@
assets/** export-ignore

3
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,3 @@
Thank you for considering a contribution!
Please note that the `libbpf` authoritative source code is developed as part of bpf-next Linux source tree under tools/lib/bpf subdirectory and is periodically synced to Github. As such, all the libbpf changes should be sent to BPF mailing list, please don't open PRs here unless you are changing Github-specific parts of libbpf (e.g., Github-specific Makefile).

View File

@@ -18,9 +18,10 @@ runs:
steps:
- shell: bash
run: |
echo "::group::Setup Env"
source $GITHUB_ACTION_PATH/../../../ci/vmtest/helpers.sh
foldable start "Setup Env"
sudo apt-get install -y qemu-kvm zstd binutils-dev elfutils libcap-dev libelf-dev libdw-dev python3-docutils
echo "::endgroup::"
foldable end
- shell: bash
run: |
export KERNEL=${{ inputs.kernel }}

View File

@@ -8,9 +8,26 @@ source ${THISDIR}/helpers.sh
foldable start prepare_selftests "Building selftests"
LLVM_VER=16
LIBBPF_PATH="${REPO_ROOT}"
llvm_default_version() {
echo "16"
}
llvm_latest_version() {
echo "17"
}
LLVM_VERSION=$(llvm_default_version)
if [[ "${LLVM_VERSION}" == $(llvm_latest_version) ]]; then
REPO_DISTRO_SUFFIX=""
else
REPO_DISTRO_SUFFIX="-${LLVM_VERSION}"
fi
echo "deb https://apt.llvm.org/focal/ llvm-toolchain-focal${REPO_DISTRO_SUFFIX} main" \
| sudo tee /etc/apt/sources.list.d/llvm.list
PREPARE_SELFTESTS_SCRIPT=${THISDIR}/prepare_selftests-${KERNEL}.sh
if [ -f "${PREPARE_SELFTESTS_SCRIPT}" ]; then
(cd "${REPO_ROOT}/${REPO_PATH}/tools/testing/selftests/bpf" && ${PREPARE_SELFTESTS_SCRIPT})
@@ -23,10 +40,11 @@ else
fi
cd ${REPO_ROOT}/${REPO_PATH}
make headers
make \
CLANG=clang-${LLVM_VER} \
LLC=llc-${LLVM_VER} \
LLVM_STRIP=llvm-strip-${LLVM_VER} \
CLANG=clang-${LLVM_VERSION} \
LLC=llc-${LLVM_VERSION} \
LLVM_STRIP=llvm-strip-${LLVM_VERSION} \
VMLINUX_BTF="${VMLINUX_BTF}" \
VMLINUX_H=${VMLINUX_H} \
-C "${REPO_ROOT}/${REPO_PATH}/tools/testing/selftests/bpf" \

View File

@@ -1,3 +1,5 @@
# shellcheck shell=bash
# $1 - start or end
# $2 - fold identifier, no spaces
# $3 - fold section description

View File

@@ -1,3 +1,5 @@
#!/bin/bash
printf "all:\n\ttouch bpf_testmod.ko\n\nclean:\n" > bpf_testmod/Makefile
printf "all:\n\ttouch bpf_test_no_cfi.ko\n\nclean:\n" > bpf_test_no_cfi/Makefile

View File

@@ -1,3 +1,5 @@
#!/bin/bash
printf "all:\n\ttouch bpf_testmod.ko\n\nclean:\n" > bpf_testmod/Makefile
printf "all:\n\ttouch bpf_test_no_cfi.ko\n\nclean:\n" > bpf_test_no_cfi/Makefile

File diff suppressed because it is too large Load Diff

View File

@@ -6,7 +6,7 @@ runs:
- id: variables
run: |
export REPO_ROOT=$GITHUB_WORKSPACE
export CI_ROOT=$REPO_ROOT/travis-ci
export CI_ROOT=$REPO_ROOT/ci
# this is somewhat ugly, but that is the easiest way to share this code with
# arch specific docker
echo 'echo ::group::Env setup' > /tmp/ci_setup
@@ -16,7 +16,7 @@ runs:
echo export PROJECT_NAME='libbpf' >> /tmp/ci_setup
echo export AUTHOR_EMAIL="$(git log -1 --pretty=\"%aE\")" >> /tmp/ci_setup
echo export REPO_ROOT=$GITHUB_WORKSPACE >> /tmp/ci_setup
echo export CI_ROOT=$REPO_ROOT/travis-ci >> /tmp/ci_setup
echo export CI_ROOT=$REPO_ROOT/ci >> /tmp/ci_setup
echo export VMTEST_ROOT=$CI_ROOT/vmtest >> /tmp/ci_setup
echo 'echo ::endgroup::' >> /tmp/ci_setup
shell: bash

View File

@@ -16,11 +16,28 @@ inputs:
runs:
using: "composite"
steps:
# Allow CI user to access /dev/kvm (via qemu) w/o group change/relogin
# by changing permissions set by udev.
- name: Set /dev/kvm permissions
shell: bash
run: |
if [ -e /dev/kvm ]; then
echo "/dev/kvm exists"
if [ $(id -u) != 0 ]; then
echo 'KERNEL=="kvm", GROUP="kvm", MODE="0666", OPTIONS+="static_node=kvm"' \
| sudo tee /etc/udev/rules.d/99-kvm4all.rules > /dev/null
sudo udevadm control --reload-rules
sudo udevadm trigger --name-match=kvm
fi
else
echo "/dev/kvm does not exist"
fi
# setup environment
- name: Setup environment
uses: libbpf/ci/setup-build-env@master
uses: libbpf/ci/setup-build-env@main
with:
pahole: ${{ inputs.pahole }}
arch: ${{ inputs.arch }}
# 1. download CHECKPOINT kernel source
- name: Get checkpoint commit
shell: bash
@@ -28,41 +45,45 @@ runs:
cat CHECKPOINT-COMMIT
echo "CHECKPOINT=$(cat CHECKPOINT-COMMIT)" >> $GITHUB_ENV
- name: Get kernel source at checkpoint
uses: libbpf/ci/get-linux-source@master
uses: libbpf/ci/get-linux-source@main
with:
repo: 'https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git'
rev: ${{ env.CHECKPOINT }}
dest: '${{ github.workspace }}/.kernel'
- name: Patch kernel source
uses: libbpf/ci/patch-kernel@master
uses: libbpf/ci/patch-kernel@main
with:
patches-root: '${{ github.workspace }}/travis-ci/diffs'
patches-root: '${{ github.workspace }}/ci/diffs'
repo-root: '.kernel'
- name: Prepare to build BPF selftests
shell: bash
run: |
echo "::group::Prepare building selftest"
source $GITHUB_ACTION_PATH/../../../ci/vmtest/helpers.sh
foldable start "Prepare building selftest"
cd .kernel
cat tools/testing/selftests/bpf/config \
tools/testing/selftests/bpf/config.${{ inputs.arch }} > .config
# this file might or mihgt not exist depending on kernel version
cat tools/testing/selftests/bpf/config.vm >> .config || :
make olddefconfig && make prepare
cd -
echo "::endgroup::"
foldable end
# 2. if kernel == LATEST, build kernel image from tree
- name: Build kernel image
if: ${{ inputs.kernel == 'LATEST' }}
shell: bash
run: |
echo "::group::Build Kernel Image"
source $GITHUB_ACTION_PATH/../../../ci/vmtest/helpers.sh
foldable start "Build Kernel Image"
cd .kernel
make -j $((4*$(nproc))) all > /dev/null
cp vmlinux ${{ github.workspace }}
cd -
echo "::endgroup::"
foldable end
# else, just download prebuilt kernel image
- name: Download prebuilt kernel
if: ${{ inputs.kernel != 'LATEST' }}
uses: libbpf/ci/download-vmlinux@master
uses: libbpf/ci/download-vmlinux@main
with:
kernel: ${{ inputs.kernel }}
arch: ${{ inputs.arch }}
@@ -74,16 +95,24 @@ runs:
kernel: ${{ inputs.kernel }}
# 4. prepare rootfs
- name: prepare rootfs
uses: libbpf/ci/prepare-rootfs@master
uses: libbpf/ci/prepare-rootfs@main
env:
KBUILD_OUTPUT: '.kernel'
with:
kernel: ${{ inputs.kernel }}
project-name: 'libbpf'
arch: ${{ inputs.arch }}
kernel: ${{ inputs.kernel }}
kernel-root: '.kernel'
kbuild-output: ${{ env.KBUILD_OUTPUT }}
image-output: '/tmp/root.img'
# 5. run selftest in QEMU
- name: Run selftests
uses: libbpf/ci/run-qemu@master
env:
KERNEL: ${{ inputs.kernel }}
REPO_ROOT: ${{ github.workspace }}
uses: libbpf/ci/run-qemu@main
with:
arch: ${{ inputs.arch }}
img: '/tmp/root.img'
vmlinuz: 'vmlinuz'
arch: ${{ inputs.arch }}
kernel-root: '.kernel'

View File

@@ -23,16 +23,26 @@ jobs:
target: RUN
- name: ASan+UBSan
target: RUN_ASAN
- name: clang
target: RUN_CLANG
- name: clang ASan+UBSan
target: RUN_CLANG_ASAN
- name: gcc-10
target: RUN_GCC10
- name: gcc-10 ASan+UBSan
target: RUN_GCC10_ASAN
- name: clang
target: RUN_CLANG
- name: clang-14
target: RUN_CLANG14
- name: clang-15
target: RUN_CLANG15
- name: clang-16
target: RUN_CLANG16
- name: gcc-10
target: RUN_GCC10
- name: gcc-11
target: RUN_GCC11
- name: gcc-12
target: RUN_GCC12
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
name: Checkout
- uses: ./.github/actions/setup
name: Setup
@@ -53,14 +63,14 @@ jobs:
- arch: s390x
- arch: x86
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
name: Checkout
- uses: ./.github/actions/setup
name: Pre-Setup
- run: source /tmp/ci_setup && sudo -E $CI_ROOT/managers/ubuntu.sh
if: matrix.arch == 'x86'
name: Setup
- uses: uraimo/run-on-arch-action@v2.0.5
- uses: uraimo/run-on-arch-action@v2.7.1
name: Build in docker
if: matrix.arch != 'x86'
with:

52
.github/workflows/codeql.yml vendored Normal file
View File

@@ -0,0 +1,52 @@
---
# vi: ts=2 sw=2 et:
name: "CodeQL"
on:
push:
branches:
- master
pull_request:
branches:
- master
permissions:
contents: read
jobs:
analyze:
name: Analyze
runs-on: ubuntu-22.04
concurrency:
group: ${{ github.workflow }}-${{ matrix.language }}-${{ github.ref }}
cancel-in-progress: true
permissions:
actions: read
security-events: write
strategy:
fail-fast: false
matrix:
language: ['cpp', 'python']
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
queries: +security-extended,security-and-quality
- name: Setup
uses: ./.github/actions/setup
- name: Build
run: |
source /tmp/ci_setup
make -C ./src
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2

View File

@@ -11,16 +11,17 @@ jobs:
if: github.repository == 'libbpf/libbpf'
name: Coverity
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- uses: ./.github/actions/setup
- name: Run coverity
run: |
echo ::group::Setup CI env
source "${GITHUB_WORKSPACE}"/ci/vmtest/helpers.sh
foldable start "Setup CI env"
source /tmp/ci_setup
export COVERITY_SCAN_NOTIFICATION_EMAIL="${AUTHOR_EMAIL}"
export COVERITY_SCAN_BRANCH_PATTERN=${GITHUB_REF##refs/*/}
export TRAVIS_BRANCH=${COVERITY_SCAN_BRANCH_PATTERN}
echo ::endgroup::
foldable end
scripts/coverity.sh
env:
COVERITY_SCAN_TOKEN: ${{ secrets.COVERITY_SCAN_TOKEN }}

19
.github/workflows/lint.yml vendored Normal file
View File

@@ -0,0 +1,19 @@
name: "lint"
on:
pull_request:
push:
branches:
- master
jobs:
shellcheck:
name: ShellCheck
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Run ShellCheck
uses: ludeeus/action-shellcheck@master
env:
SHELLCHECK_OPTS: --severity=error

View File

@@ -25,7 +25,7 @@ jobs:
runs-on: ubuntu-latest
name: vmtest with customized pahole/Kernel
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- uses: ./.github/actions/setup
- uses: ./.github/actions/vmtest
with:

View File

@@ -7,12 +7,12 @@ on:
jobs:
vmtest:
runs-on: ubuntu-latest
runs-on: ubuntu-20.04
name: Kernel LATEST + staging pahole
env:
STAGING: tmp.master
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- uses: ./.github/actions/setup
- uses: ./.github/actions/vmtest
with:

View File

@@ -19,19 +19,19 @@ jobs:
matrix:
include:
- kernel: 'LATEST'
runs_on: ubuntu-latest
runs_on: ubuntu-20.04
arch: 'x86_64'
- kernel: '5.5.0'
runs_on: ubuntu-latest
runs_on: ubuntu-20.04
arch: 'x86_64'
- kernel: '4.9.0'
runs_on: ubuntu-latest
runs_on: ubuntu-20.04
arch: 'x86_64'
- kernel: 'LATEST'
runs_on: z15
runs_on: s390x
arch: 's390x'
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
name: Checkout
- uses: ./.github/actions/setup
name: Setup

View File

@@ -1,14 +0,0 @@
# vi: set ts=2 sw=2:
extraction:
cpp:
prepare:
packages:
- libelf-dev
- pkg-config
after_prepare:
# As the buildsystem detection by LGTM is performed _only_ during the
# 'configure' phase, we need to trick LGTM we use a supported build
# system (configure, meson, cmake, etc.). This way LGTM correctly detects
# that our sources are in the src/ subfolder.
- touch src/configure
- chmod +x src/configure

View File

@@ -5,6 +5,11 @@
# Required
version: 2
build:
os: "ubuntu-22.04"
tools:
python: "3.11"
# Build documentation in the docs/ directory with Sphinx
sphinx:
builder: html
@@ -17,6 +22,5 @@ formats:
# Optionally set the version of Python and requirements required to build your docs
python:
version: 3.7
install:
- requirements: docs/sphinx/requirements.txt
- requirements: docs/sphinx/requirements.txt

View File

@@ -1 +1 @@
14b20b784f59bdd95f6f1cfb112c9818bcec4d84
443574b033876c85a35de4c65c14f7fe092222b2

View File

@@ -1 +1 @@
e34cfee65ec891a319ce79797dda18083af33a76
14bb1e8c8d4ad5d9d2febb7d19c70a3cf536e1e5

View File

@@ -1,10 +1,14 @@
<img src="https://user-images.githubusercontent.com/508075/185997470-2f427d3d-f040-4eef-afc5-ae4f766615b2.png" width="40%" >
<picture>
<source media="(prefers-color-scheme: dark)" srcset="assets/libbpf-logo-sideways-darkbg.png" width="40%">
<img src="assets/libbpf-logo-sideways.png" width="40%">
</picture>
libbpf
[![Github Actions Builds & Tests](https://github.com/libbpf/libbpf/actions/workflows/test.yml/badge.svg)](https://github.com/libbpf/libbpf/actions/workflows/test.yml)
[![Total alerts](https://img.shields.io/lgtm/alerts/g/libbpf/libbpf.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/libbpf/libbpf/alerts/)
[![Coverity](https://img.shields.io/coverity/scan/18195.svg)](https://scan.coverity.com/projects/libbpf)
[![CodeQL](https://github.com/libbpf/libbpf/workflows/CodeQL/badge.svg?branch=master)](https://github.com/libbpf/libbpf/actions?query=workflow%3ACodeQL+branch%3Amaster)
[![OSS-Fuzz Status](https://oss-fuzz-build-logs.storage.googleapis.com/badges/libbpf.svg)](https://oss-fuzz-build-logs.storage.googleapis.com/index.html#libbpf)
[![Read the Docs](https://readthedocs.org/projects/libbpf/badge/?version=latest)](https://libbpf.readthedocs.io/en/latest/)
======
**This is the official home of the libbpf library.**
@@ -141,8 +145,8 @@ Distributions packaging libbpf from this mirror:
- [Fedora](https://src.fedoraproject.org/rpms/libbpf)
- [Gentoo](https://packages.gentoo.org/packages/dev-libs/libbpf)
- [Debian](https://packages.debian.org/source/sid/libbpf)
- [Arch](https://www.archlinux.org/packages/extra/x86_64/libbpf/)
- [Ubuntu](https://packages.ubuntu.com/source/impish/libbpf)
- [Arch](https://archlinux.org/packages/core/x86_64/libbpf/)
- [Ubuntu](https://packages.ubuntu.com/source/jammy/libbpf)
- [Alpine](https://pkgs.alpinelinux.org/packages?name=libbpf)
Benefits of packaging from the mirror over packaging from kernel sources:
@@ -169,7 +173,7 @@ bpf-next to Github sync
=======================
All the gory details of syncing can be found in `scripts/sync-kernel.sh`
script.
script. See [SYNC.md](SYNC.md) for instruction.
Some header files in this repo (`include/linux/*.h`) are reduced versions of
their counterpart files at

281
SYNC.md Normal file
View File

@@ -0,0 +1,281 @@
<picture>
<source media="(prefers-color-scheme: dark)" srcset="assets/libbpf-logo-sideways-darkbg.png" width="40%">
<img src="assets/libbpf-logo-sideways.png" width="40%">
</picture>
Libbpf sync
===========
Libbpf *authoritative source code* is developed as part of [bpf-next Linux source
tree](https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next) under
`tools/lib/bpf` subdirectory and is periodically synced to Github.
Most of the mundane mechanical things like bpf and bpf-next tree merge, Git
history transformation, cherry-picking relevant commits, re-generating
auto-generated headers, etc. are taken care by
[sync-kernel.sh script](https://github.com/libbpf/libbpf/blob/master/scripts/sync-kernel.sh).
But occasionally human needs to do few extra things to make everything work
nicely.
This document goes over the process of syncing libbpf sources from Linux repo
to this Github repository. Feel free to contribute fixes and additions if you
run into new problems not outlined here.
Setup expectations
------------------
Sync script has particular expectation of upstream Linux repo setup. It
expects that current HEAD of that repo points to bpf-next's master branch and
that there is a separate local branch pointing to bpf tree's master branch.
This is important, as the script will automatically merge their histories for
the purpose of libbpf sync.
Below, we assume that Linux repo is located at `~/linux`, it's current head is
at latest `bpf-next/master`, and libbpf's Github repo is located at
`~/libbpf`, checked out to latest commit on `master` branch. It doesn't matter
from where to run `sync-kernel.sh` script, but we'll be running it from inside
`~/libbpf`.
```
$ cd ~/linux && git remote -v | grep -E '^(bpf|bpf-next)'
bpf https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git (fetch)
bpf ssh://git@gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
(push)
bpf-next
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git (fetch)
bpf-next
ssh://git@gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git (push)
$ git branch -vv | grep -E '^? (master|bpf-master)'
* bpf-master 2d311f480b52 [bpf/master] riscv, bpf: Fix patch_text implicit declaration
master c8ee37bde402 [bpf-next/master] libbpf: Fix bpf_xdp_query() in old kernels
$ git checkout bpf-master && git pull && git checkout master && git pull
...
$ git log --oneline -n1
c8ee37bde402 (HEAD -> master, bpf-next/master) libbpf: Fix bpf_xdp_query() in old kernels
$ cd ~/libbpf && git checkout master && git pull
Your branch is up to date with 'libbpf/master'.
Already up to date.
```
Running setup script
--------------------
First step is to always run `sync-kernel.sh` script. It expects three arguments:
```
$ scripts/sync-kernel.sh <libbpf-repo> <kernel-repo> <bpf-branch>
```
Note, that we'll store script's entire output in `/tmp/libbpf-sync.txt` and
put it into PR summary later on. **Please store scripts output and include it
in PR summary for others to check for anything unexpected and suspicious.**
```
$ scripts/sync-kernel.sh ~/libbpf ~/linux bpf-master | tee /tmp/libbpf-sync.txt
Dumping existing libbpf commit signatures...
WORKDIR: /home/andriin/libbpf
LINUX REPO: /home/andriin/linux
LIBBPF REPO: /home/andriin/libbpf
...
```
Most of the time this will go very uneventful. One expected case when sync
script might require user intervention is if `bpf` tree has some libbpf fixes,
which is nowadays not a very frequent occurence. But if that happens, script
will show you a diff between expected state as of latest bpf-next and synced
Github repo state. And will ask if these changes look good. Please use your
best judgement to verify that differences are indeed from expected `bpf` tree
fixes. E.g., it might look like below:
```
Comparing list of files...
Comparing file contents...
--- /home/andriin/linux/include/uapi/linux/netdev.h 2023-02-27 16:54:42.270583372 -0800
+++ /home/andriin/libbpf/include/uapi/linux/netdev.h 2023-02-27 16:54:34.615530796 -0800
@@ -19,7 +19,7 @@
* @NETDEV_XDP_ACT_XSK_ZEROCOPY: This feature informs if netdev supports AF_XDP
* in zero copy mode.
* @NETDEV_XDP_ACT_HW_OFFLOAD: This feature informs if netdev supports XDP hw
- * oflloading.
+ * offloading.
* @NETDEV_XDP_ACT_RX_SG: This feature informs if netdev implements non-linear
* XDP buffer support in the driver napi callback.
* @NETDEV_XDP_ACT_NDO_XMIT_SG: This feature informs if netdev implements
/home/andriin/linux/include/uapi/linux/netdev.h and /home/andriin/libbpf/include/uapi/linux/netdev.h are different!
Unfortunately, there are some inconsistencies, please double check.
Does everything look good? [y/N]:
```
If it looks sensible and expected, type `y` and script will proceed.
If sync is successful, your `~/linux` repo will be left in original state on
the original HEAD commit. `~/libbpf` repo will now be on a new branch, named
`libbpf-sync-<timestamp>` (e.g., `libbpf-sync-2023-02-28T00-53-40.072Z`).
Push this branch into your fork of `libbpf/libbpf` Github repo and create a PR:
```
$ git push --set-upstream origin libbpf-sync-2023-02-28T00-53-40.072Z
Enumerating objects: 130, done.
Counting objects: 100% (115/115), done.
Delta compression using up to 80 threads
Compressing objects: 100% (28/28), done.
Writing objects: 100% (32/32), 5.57 KiB | 1.86 MiB/s, done.
Total 32 (delta 21), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (21/21), completed with 9 local objects.
remote:
remote: Create a pull request for 'libbpf-sync-2023-02-28T00-53-40.072Z' on GitHub by visiting:
remote: https://github.com/anakryiko/libbpf/pull/new/libbpf-sync-2023-02-28T00-53-40.072Z
remote:
To github.com:anakryiko/libbpf.git
* [new branch] libbpf-sync-2023-02-28T00-53-40.072Z -> libbpf-sync-2023-02-28T00-53-40.072Z
Branch 'libbpf-sync-2023-02-28T00-53-40.072Z' set up to track remote branch 'libbpf-sync-2023-02-28T00-53-40.072Z' from 'origin'.
```
**Please, adjust PR name to have a properly looking timestamp. Libbpf
maintainers will be very thankful for that!**
By default Github will turn above branch name into PR with subject "Libbpf sync
2023 02 28 t00 53 40.072 z". Please fix this into a proper timestamp, e.g.:
"Libbpf sync 2023-02-28T00:53:40.072Z". Thank you!
**Please don't forget to paste contents of /tmp/libbpf-sync.txt into PR
summary!**
Once PR is created, libbpf CI will run a bunch of tests to check that
everything is good. In simple cases that would be all you'd need to do. In more
complicated cases some extra adjustments might be necessary.
**Please, keep naming and style consistent.** Prefix CI-related fixes with `ci: `
prefix. If you had to modify sync script, prefix it with `sync: `. Also make
sure that each such commit has `Signed-off-by: Your Full Name <your@email.com>`,
just like you'd do that for Linux upstream patch. Libbpf closely follows kernel
conventions and styling, so please help maintaining that.
Including new sources
---------------------
If entirely new source files (typically `*.c`) were added to the library in the
kernel repository, it may be necessary to add these to the build system
manually (you may notice linker errors otherwise), because the script cannot
handle such changes automatically. To that end, edit `src/Makefile` as
necessary. Commit
[c2495832ced4](https://github.com/libbpf/libbpf/commit/c2495832ced4239bcd376b9954db38a6addd89ca)
is an example of how to go about doing that.
Similarly, if new public API header files were added, the `Makefile` will need
to be adjusted as well.
Updating allow/deny lists
-------------------------
Libbpf CI intentionally runs a subset of latest BPF selftests on old kernel
(4.9 and 5.5, currently). It happens from time to time that some tests that
previously were successfully running on old kernels now don't, typically due to
reliance on some freshly added kernel feature. It might look something like this in [CI logs](https://github.com/libbpf/libbpf/actions/runs/4206303272/jobs/7299609578#step:4:2733):
```
All error logs:
serial_test_xdp_info:FAIL:get_xdp_none errno=2
#283 xdp_info:FAIL
Summary: 49/166 PASSED, 5 SKIPPED, 1 FAILED
```
In such case we can either work with upstream to fix test to be compatible with
old kernels, or we'll have to add a test into a denylist (or remove it from
allowlist, like was [done](https://github.com/libbpf/libbpf/commit/ea284299025bf85b85b4923191de6463cd43ccd6)
for the case above).
```
$ find . -name '*LIST*'
./ci/vmtest/configs/ALLOWLIST-4.9.0
./ci/vmtest/configs/DENYLIST-5.5.0
./ci/vmtest/configs/DENYLIST-latest.s390x
./ci/vmtest/configs/DENYLIST-latest
./ci/vmtest/configs/ALLOWLIST-5.5.0
```
Please determine which tests need to be added/removed from which list. And then
add that as a separate commit. **Please keep using the same branch name, so
that the same PR can be updated.** There is no need to open new PRs for each
such fix.
Regenerating vmlinux.h header
-----------------------------
To compile latest BPF selftests against old kernels, we check in pre-generated
[vmlinux.h](https://github.com/libbpf/libbpf/blob/master/.github/actions/build-selftests/vmlinux.h)
header file, located at `.github/actions/build-selftests/vmlinux.h`, which
contains type definitions from latest upstream kernel. When after libbpf sync
upstream BPF selftests require new kernel types, we'd need to regenerate
`vmlinux.h` and check it in as well.
This will looks something like this in [CI logs](https://github.com/libbpf/libbpf/actions/runs/4198939244/jobs/7283214243#step:4:1903):
```
In file included from progs/test_spin_lock_fail.c:5:
/home/runner/work/libbpf/libbpf/.kernel/tools/testing/selftests/bpf/bpf_experimental.h:73:53: error: declaration of 'struct bpf_rb_root' will not be visible outside of this function [-Werror,-Wvisibility]
extern struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root,
^
/home/runner/work/libbpf/libbpf/.kernel/tools/testing/selftests/bpf/bpf_experimental.h:81:35: error: declaration of 'struct bpf_rb_root' will not be visible outside of this function [-Werror,-Wvisibility]
extern void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
^
/home/runner/work/libbpf/libbpf/.kernel/tools/testing/selftests/bpf/bpf_experimental.h:90:52: error: declaration of 'struct bpf_rb_root' will not be visible outside of this function [-Werror,-Wvisibility]
extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym;
^
3 errors generated.
make: *** [Makefile:572: /home/runner/work/libbpf/libbpf/.kernel/tools/testing/selftests/bpf/test_spin_lock_fail.bpf.o] Error 1
make: *** Waiting for unfinished jobs....
Error: Process completed with exit code 2.
```
You'll need to build latest upstream kernel from `bpf-next` tree, using BPF
selftest configs. Concat arch-agnostic and arch-specific configs, build kernel,
then use bpftool to dump `vmlinux.h`:
```
$ cd ~/linux
$ cat tools/testing/selftests/bpf/config \
tools/testing/selftests/bpf/config.x86_64 > .config
$ make -j$(nproc) olddefconfig all
...
$ bpftool btf dump file ~/linux/vmlinux format c > ~/libbpf/.github/actions/build-selftests/vmlinux.h
$ cd ~/libbpf && git add . && git commit -s
```
Check in generated `vmlinux.h`, don't forget to use `ci: ` commit prefix, add
it on top of sync commits. Push to Github and let libbpf CI do the checking for
you. See [this commit](https://github.com/libbpf/libbpf/commit/34212c94a64df8eeb1dd5d064630a65e1dfd4c20)
for reference.
Troubleshooting
---------------
If something goes wrong and sync script exits early or is terminated early by
user, you might end up with `~/linux` repo on temporary sync-related branch.
Don't worry, though, sync script never destroys repo state, it follows
"copy-on-write" philosophy and creates new branches where necessary. So it's
very easy to restore previous state. So if anything goes wrong, it's easy to
start fresh:
```
$ git branch | grep -E 'libbpf-.*Z'
libbpf-baseline-2023-02-28T00-43-35.146Z
libbpf-bpf-baseline-2023-02-28T00-43-35.146Z
libbpf-bpf-tip-2023-02-28T00-43-35.146Z
libbpf-squash-base-2023-02-28T00-43-35.146Z
* libbpf-squash-tip-2023-02-28T00-43-35.146Z
$ git cherry-pick --abort
$ git checkout master && git branch | grep -E 'libbpf-.*Z' | xargs git br -D
Switched to branch 'master'
Your branch is up to date with 'bpf-next/master'.
Deleted branch libbpf-baseline-2023-02-28T00-43-35.146Z (was 951bce29c898).
Deleted branch libbpf-bpf-baseline-2023-02-28T00-43-35.146Z (was 3a70e0d4c9d7).
Deleted branch libbpf-bpf-tip-2023-02-28T00-43-35.146Z (was 2d311f480b52).
Deleted branch libbpf-squash-base-2023-02-28T00-43-35.146Z (was 957f109ef883).
Deleted branch libbpf-squash-tip-2023-02-28T00-43-35.146Z (was be66130d2339).
Deleted branch libbpf-tip-2023-02-28T00-43-35.146Z (was 2d311f480b52).
```
You might need to do the same for your `~/libbpf` repo sometimes, depending at
which stage sync script was terminated.

Binary file not shown.

After

Width:  |  Height:  |  Size: 262 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 128 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 116 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 284 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 142 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 140 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 352 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 206 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 236 KiB

View File

@@ -0,0 +1,56 @@
From f267f262815033452195f46c43b572159262f533 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Tue, 5 Mar 2024 10:08:28 +0100
Subject: [PATCH 2/2] xdp, bonding: Fix feature flags when there are no slave
devs anymore
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY")
changed the driver from reporting everything as supported before a device
was bonded into having the driver report that no XDP feature is supported
until a real device is bonded as it seems to be more truthful given
eventually real underlying devices decide what XDP features are supported.
The change however did not take into account when all slave devices get
removed from the bond device. In this case after 9b0ed890ac2a, the driver
keeps reporting a feature mask of 0x77, that is, NETDEV_XDP_ACT_MASK &
~NETDEV_XDP_ACT_XSK_ZEROCOPY whereas it should have reported a feature
mask of 0.
Fix it by resetting XDP feature flags in the same way as if no XDP program
is attached to the bond device. This was uncovered by the XDP bond selftest
which let BPF CI fail. After adjusting the starting masks on the latter
to 0 instead of NETDEV_XDP_ACT_MASK the test passes again together with
this fix.
Fixes: 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Prashant Batra <prbatra.mail@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Message-ID: <20240305090829.17131-1-daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
drivers/net/bonding/bond_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a11748b8d69b..cd0683bcca03 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1811,7 +1811,7 @@ void bond_xdp_set_features(struct net_device *bond_dev)
ASSERT_RTNL();
- if (!bond_xdp_check(bond)) {
+ if (!bond_xdp_check(bond) || !bond_has_slaves(bond)) {
xdp_clear_features_flag(bond_dev);
return;
}
--
2.43.0

View File

@@ -1,35 +0,0 @@
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>
Subject: [PATCH bpf-next] selftests/bpf: Fix OOB write in test_verifier
Date: Tue, 14 Dec 2021 07:18:00 +0530 [thread overview]
Message-ID: <20211214014800.78762-1-memxor@gmail.com> (raw)
The commit referenced below added fixup_map_timer support (to create a
BPF map containing timers), but failed to increase the size of the
map_fds array, leading to out of bounds write. Fix this by changing
MAX_NR_MAPS to 22.
Fixes: e60e6962c503 ("selftests/bpf: Add tests for restricted helpers")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
tools/testing/selftests/bpf/test_verifier.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index ad5d30bafd93..33e2ecb3bef9 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -54,7 +54,7 @@
#define MAX_INSNS BPF_MAXINSNS
#define MAX_TEST_INSNS 1000000
#define MAX_FIXUPS 8
-#define MAX_NR_MAPS 21
+#define MAX_NR_MAPS 22
#define MAX_TEST_RUNS 8
#define POINTER_VALUE 0xcafe4all
#define TEST_DATA_LEN 64
--
2.34.1

View File

@@ -6,7 +6,7 @@ CONT_NAME="${CONT_NAME:-libbpf-debian-$DEBIAN_RELEASE}"
ENV_VARS="${ENV_VARS:-}"
DOCKER_RUN="${DOCKER_RUN:-docker run}"
REPO_ROOT="${REPO_ROOT:-$PWD}"
ADDITIONAL_DEPS=(clang pkg-config gcc-10)
ADDITIONAL_DEPS=(pkgconf)
EXTRA_CFLAGS=""
EXTRA_LDFLAGS=""
@@ -43,30 +43,35 @@ for phase in "${PHASES[@]}"; do
docker_exec bash -c "echo deb-src http://deb.debian.org/debian $DEBIAN_RELEASE main >>/etc/apt/sources.list"
docker_exec apt-get -y update
docker_exec apt-get -y install aptitude
docker_exec aptitude -y build-dep libelf-dev
docker_exec aptitude -y install libelf-dev
docker_exec aptitude -y install make libz-dev libelf-dev
docker_exec aptitude -y install "${ADDITIONAL_DEPS[@]}"
echo -e "::endgroup::"
;;
RUN|RUN_CLANG|RUN_GCC10|RUN_ASAN|RUN_CLANG_ASAN|RUN_GCC10_ASAN)
RUN|RUN_CLANG|RUN_CLANG14|RUN_CLANG15|RUN_CLANG16|RUN_GCC10|RUN_GCC11|RUN_GCC12|RUN_ASAN|RUN_CLANG_ASAN|RUN_GCC10_ASAN)
CC="cc"
if [[ "$phase" = *"CLANG"* ]]; then
if [[ "$phase" =~ "RUN_CLANG(\d+)(_ASAN)?" ]]; then
ENV_VARS="-e CC=clang-${BASH_REMATCH[1]} -e CXX=clang++-${BASH_REMATCH[1]}"
CC="clang-${BASH_REMATCH[1]}"
elif [[ "$phase" = *"CLANG"* ]]; then
ENV_VARS="-e CC=clang -e CXX=clang++"
CC="clang"
elif [[ "$phase" = *"GCC10"* ]]; then
ENV_VARS="-e CC=gcc-10 -e CXX=g++-10"
CC="gcc-10"
else
EXTRA_CFLAGS="${EXTRA_CFLAGS} -Wno-stringop-truncation"
elif [[ "$phase" =~ "RUN_GCC(\d+)(_ASAN)?" ]]; then
ENV_VARS="-e CC=gcc-${BASH_REMATCH[1]} -e CXX=g++-${BASH_REMATCH[1]}"
CC="gcc-${BASH_REMATCH[1]}"
fi
if [[ "$phase" = *"ASAN"* ]]; then
EXTRA_CFLAGS="${EXTRA_CFLAGS} -fsanitize=address,undefined"
EXTRA_LDFLAGS="${EXTRA_LDFLAGS} -fsanitize=address,undefined"
fi
if [[ "$CC" != "cc" ]]; then
docker_exec aptitude -y install "$CC"
else
docker_exec aptitude -y install gcc
fi
docker_exec mkdir build install
docker_exec ${CC} --version
info "build"
docker_exec make -j$((4*$(nproc))) EXTRA_CFLAGS="${EXTRA_CFLAGS}" EXTRA_LDFLAGS="${EXTRA_LDFLAGS}" -C ./src -B OBJDIR=../build
docker_exec make -j$((4*$(nproc))) EXTRA_CFLAGS="${EXTRA_CFLAGS}" EXTRA_LDFLAGS="${EXTRA_LDFLAGS}" -C ./src -B OBJDIR=../build
info "ldd build/libbpf.so:"
docker_exec ldd build/libbpf.so
if ! docker_exec ldd build/libbpf.so | grep -q libelf; then

View File

@@ -1,107 +0,0 @@
#!/bin/bash
# This script is based on drgn script for generating Arch Linux bootstrap
# images.
# https://github.com/osandov/drgn/blob/master/scripts/vmtest/mkrootfs.sh
set -euo pipefail
usage () {
USAGE_STRING="usage: $0 [NAME]
$0 -h
Build an Arch Linux root filesystem image for testing libbpf in a virtual
machine.
The image is generated as a zstd-compressed tarball.
This must be run as root, as most of the installation is done in a chroot.
Arguments:
NAME name of generated image file (default:
libbpf-vmtest-rootfs-\$DATE.tar.zst)
Options:
-h display this help message and exit"
case "$1" in
out)
echo "$USAGE_STRING"
exit 0
;;
err)
echo "$USAGE_STRING" >&2
exit 1
;;
esac
}
while getopts "h" OPT; do
case "$OPT" in
h)
usage out
;;
*)
usage err
;;
esac
done
if [[ $OPTIND -eq $# ]]; then
NAME="${!OPTIND}"
elif [[ $OPTIND -gt $# ]]; then
NAME="libbpf-vmtest-rootfs-$(date +%Y.%m.%d).tar.zst"
else
usage err
fi
pacman_conf=
root=
trap 'rm -rf "$pacman_conf" "$root"' EXIT
pacman_conf="$(mktemp -p "$PWD")"
cat > "$pacman_conf" << "EOF"
[options]
Architecture = x86_64
CheckSpace
SigLevel = Required DatabaseOptional
[core]
Include = /etc/pacman.d/mirrorlist
[extra]
Include = /etc/pacman.d/mirrorlist
[community]
Include = /etc/pacman.d/mirrorlist
EOF
root="$(mktemp -d -p "$PWD")"
packages=(
busybox
# libbpf dependencies.
libelf
zlib
# selftests test_progs dependencies.
binutils
elfutils
ethtool
glibc
iproute2
# selftests test_verifier dependencies.
libcap
)
pacstrap -C "$pacman_conf" -cGM "$root" "${packages[@]}"
# Remove unnecessary files from the chroot.
# We don't need the pacman databases anymore.
rm -rf "$root/var/lib/pacman/sync/"
# We don't need D, Fortran, or Go.
rm -f "$root/usr/lib/libgdruntime."* \
"$root/usr/lib/libgphobos."* \
"$root/usr/lib/libgfortran."* \
"$root/usr/lib/libgo."*
# We don't need any documentation.
rm -rf "$root/usr/share/{doc,help,man,texinfo}"
"$(dirname "$0")"/mkrootfs_tweak.sh "$root"
tar -C "$root" -c . | zstd -T0 -19 -o "$NAME"
chmod 644 "$NAME"

View File

@@ -1,52 +0,0 @@
#!/bin/bash
# This script builds a Debian root filesystem image for testing libbpf in a
# virtual machine. Requires debootstrap >= 1.0.95 and zstd.
# Use e.g. ./mkrootfs_debian.sh --arch=s390x to generate a rootfs for a
# foreign architecture. Requires configured binfmt_misc, e.g. using
# Debian/Ubuntu's qemu-user-binfmt package or
# https://github.com/multiarch/qemu-user-static.
set -e -u -x -o pipefail
# Check whether we are root now in order to avoid confusing errors later.
if [ "$(id -u)" != 0 ]; then
echo "$0 must run as root" >&2
exit 1
fi
# Create a working directory and schedule its deletion.
root=$(mktemp -d -p "$PWD")
trap 'rm -r "$root"' EXIT
# Install packages.
packages=(
binutils
busybox
elfutils
ethtool
iproute2
iptables
libcap2
libelf1
strace
zlib1g
)
packages=$(IFS=, && echo "${packages[*]}")
debootstrap --include="$packages" --variant=minbase "$@" bookworm "$root"
# Remove the init scripts (tests use their own). Also remove various
# unnecessary files in order to save space.
rm -rf \
"$root"/etc/rcS.d \
"$root"/usr/share/{doc,info,locale,man,zoneinfo} \
"$root"/var/cache/apt/archives/* \
"$root"/var/lib/apt/lists/*
# Apply common tweaks.
"$(dirname "$0")"/mkrootfs_tweak.sh "$root"
# Save the result.
name="libbpf-vmtest-rootfs-$(date +%Y.%m.%d).tar.zst"
rm -f "$name"
tar -C "$root" -c . | zstd -T0 -19 -o "$name"

View File

@@ -1,61 +0,0 @@
#!/bin/bash
# This script prepares a mounted root filesystem for testing libbpf in a virtual
# machine.
set -e -u -x -o pipefail
root=$1
shift
chroot "${root}" /bin/busybox --install
cat > "$root/etc/inittab" << "EOF"
::sysinit:/etc/init.d/rcS
::ctrlaltdel:/sbin/reboot
::shutdown:/sbin/swapoff -a
::shutdown:/bin/umount -a -r
::restart:/sbin/init
EOF
chmod 644 "$root/etc/inittab"
mkdir -m 755 -p "$root/etc/init.d" "$root/etc/rcS.d"
cat > "$root/etc/rcS.d/S10-mount" << "EOF"
#!/bin/sh
set -eux
/bin/mount proc /proc -t proc
# Mount devtmpfs if not mounted
if [[ -z $(/bin/mount -t devtmpfs) ]]; then
/bin/mount devtmpfs /dev -t devtmpfs
fi
/bin/mount sysfs /sys -t sysfs
/bin/mount bpffs /sys/fs/bpf -t bpf
/bin/mount debugfs /sys/kernel/debug -t debugfs
echo 'Listing currently mounted file systems'
/bin/mount
EOF
chmod 755 "$root/etc/rcS.d/S10-mount"
cat > "$root/etc/rcS.d/S40-network" << "EOF"
#!/bin/sh
set -eux
ip link set lo up
EOF
chmod 755 "$root/etc/rcS.d/S40-network"
cat > "$root/etc/init.d/rcS" << "EOF"
#!/bin/sh
set -eux
for path in /etc/rcS.d/S*; do
[ -x "$path" ] && "$path"
done
EOF
chmod 755 "$root/etc/init.d/rcS"
chmod 755 "$root"

View File

@@ -1,107 +0,0 @@
# IBM Z self-hosted builder
libbpf CI uses an IBM-provided z15 self-hosted builder. There are no IBM Z
builds of GitHub (GH) Actions runner, and stable qemu-user has problems with .NET
apps, so the builder runs the x86_64 runner version with qemu-user built from
the master branch.
We are currently supporting runners for the following repositories:
* libbpf/libbpf
* kernel-patches/bpf
* kernel-patches/vmtest
Below instructions are directly applicable to libbpf, and require minor
modifications for kernel-patches repos. Currently, qemu-user-static Docker
image is shared between all GitHub runners, but separate actions-runner-\*
service / Docker image is created for each runner type.
## Configuring the builder.
### Install prerequisites.
```
$ sudo apt install -y docker.io # Ubuntu
```
### Add services.
```
$ sudo cp *.service /etc/systemd/system/
$ sudo systemctl daemon-reload
```
### Create a config file.
```
$ sudo tee /etc/actions-runner-libbpf
repo=<owner>/<name>
access_token=<ghp_***>
```
Access token should have the repo scope, consult
https://docs.github.com/en/rest/reference/actions#create-a-registration-token-for-a-repository
for details.
### Autostart the x86_64 emulation support.
This step is important, you would not be able to build docker container
without having this service running. If container build fails, make sure
service is running properly.
```
$ sudo systemctl enable --now qemu-user-static
```
### Autostart the runner.
```
$ sudo systemctl enable --now actions-runner-libbpf
```
## Rebuilding the image
In order to update the `iiilinuxibmcom/actions-runner-libbpf` image, e.g. to
get the latest OS security fixes, use the following commands:
```
$ sudo docker build \
--pull \
-f actions-runner-libbpf.Dockerfile \
-t iiilinuxibmcom/actions-runner-libbpf \
.
$ sudo systemctl restart actions-runner-libbpf
```
## Removing persistent data
The `actions-runner-libbpf` service stores various temporary data, such as
runner registration information, work directories and logs, in the
`actions-runner-libbpf` volume. In order to remove it and start from scratch,
e.g. when upgrading the runner or switching it to a different repository, use
the following commands:
```
$ sudo systemctl stop actions-runner-libbpf
$ sudo docker rm -f actions-runner-libbpf
$ sudo docker volume rm actions-runner-libbpf
```
## Troubleshooting
In order to check if service is running, use the following command:
```
$ sudo systemctl status <service name>
```
In order to get logs for service:
```
$ journalctl -u <service name>
```
In order to check which containers are currently active:
```
$ sudo docker ps
```

View File

@@ -1,50 +0,0 @@
# Self-Hosted IBM Z Github Actions Runner.
# Temporary image: amd64 dependencies.
FROM amd64/ubuntu:20.04 as ld-prefix
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get -y install ca-certificates libicu66 libssl1.1
# Main image.
FROM s390x/ubuntu:20.04
# Packages for libbpf testing that are not installed by .github/actions/setup.
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get -y install \
bc \
bison \
cmake \
cpu-checker \
curl \
flex \
git \
jq \
linux-image-generic \
qemu-system-s390x \
rsync \
software-properties-common \
sudo \
tree
# amd64 dependencies.
COPY --from=ld-prefix / /usr/x86_64-linux-gnu/
RUN ln -fs ../lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 /usr/x86_64-linux-gnu/lib64/
RUN ln -fs /etc/resolv.conf /usr/x86_64-linux-gnu/etc/
ENV QEMU_LD_PREFIX=/usr/x86_64-linux-gnu
# amd64 Github Actions Runner.
ARG version=2.285.0
RUN useradd -m actions-runner
RUN echo "actions-runner ALL=(ALL) NOPASSWD: ALL" >>/etc/sudoers
RUN echo "Defaults env_keep += \"DEBIAN_FRONTEND\"" >>/etc/sudoers
RUN usermod -a -G kvm actions-runner
USER actions-runner
ENV USER=actions-runner
WORKDIR /home/actions-runner
RUN curl -L https://github.com/actions/runner/releases/download/v${version}/actions-runner-linux-x64-${version}.tar.gz | tar -xz
VOLUME /home/actions-runner
# Scripts.
COPY fs/ /
ENTRYPOINT ["/usr/bin/entrypoint"]
CMD ["/usr/bin/actions-runner"]

View File

@@ -1,24 +0,0 @@
[Unit]
Description=Self-Hosted IBM Z Github Actions Runner
Wants=qemu-user-static
After=qemu-user-static
StartLimitIntervalSec=0
[Service]
Type=simple
Restart=always
ExecStart=/usr/bin/docker run \
--device=/dev/kvm \
--env-file=/etc/actions-runner-libbpf \
--init \
--interactive \
--name=actions-runner-libbpf \
--rm \
--volume=actions-runner-libbpf:/home/actions-runner \
iiilinuxibmcom/actions-runner-libbpf
ExecStop=/bin/sh -c "docker exec actions-runner-libbpf kill -INT -- -1"
ExecStop=/bin/sh -c "docker wait actions-runner-libbpf"
ExecStop=/bin/sh -c "docker rm actions-runner-libbpf"
[Install]
WantedBy=multi-user.target

View File

@@ -1,40 +0,0 @@
#!/bin/bash
#
# Ephemeral runner startup script.
#
# Expects the following environment variables:
#
# - repo=<owner>/<name>
# - access_token=<ghp_***>
#
set -e -u
# Check the cached registration token.
token_file=registration-token.json
set +e
expires_at=$(jq --raw-output .expires_at "$token_file" 2>/dev/null)
status=$?
set -e
if [[ $status -ne 0 || $(date +%s) -ge $(date -d "$expires_at" +%s) ]]; then
# Refresh the cached registration token.
curl \
-X POST \
-H "Accept: application/vnd.github.v3+json" \
-H "Authorization: token $access_token" \
"https://api.github.com/repos/$repo/actions/runners/registration-token" \
-o "$token_file"
fi
# (Re-)register the runner.
registration_token=$(jq --raw-output .token "$token_file")
./config.sh remove --token "$registration_token" || true
./config.sh \
--url "https://github.com/$repo" \
--token "$registration_token" \
--labels z15 \
--ephemeral
# Run one job.
./run.sh

View File

@@ -1,35 +0,0 @@
#!/bin/bash
#
# Container entrypoint that waits for all spawned processes.
#
set -e -u
# /dev/kvm has host permissions, fix it.
if [ -e /dev/kvm ]; then
sudo chown root:kvm /dev/kvm
fi
# Create a FIFO and start reading from its read end.
tempdir=$(mktemp -d "/tmp/done.XXXXXXXXXX")
trap 'rm -r "$tempdir"' EXIT
done="$tempdir/pipe"
mkfifo "$done"
cat "$done" & waiter=$!
# Start the workload. Its descendants will inherit the FIFO's write end.
status=0
if [ "$#" -eq 0 ]; then
bash 9>"$done" || status=$?
else
"$@" 9>"$done" || status=$?
fi
# When the workload and all of its descendants exit, the FIFO's write end will
# be closed and `cat "$done"` will exit. Wait until it happens. This is needed
# in order to handle SelfUpdater, which the workload may start in background
# before exiting.
wait "$waiter"
exit "$status"

View File

@@ -1,11 +0,0 @@
[Unit]
Description=Support for transparent execution of non-native binaries with QEMU user emulation
[Service]
Type=oneshot
# The source code for iiilinuxibmcom/qemu-user-static is at https://github.com/iii-i/qemu-user-static/tree/v6.1.0-1
# TODO: replace it with multiarch/qemu-user-static once version >6.1 is available
ExecStart=/usr/bin/docker run --rm --interactive --privileged iiilinuxibmcom/qemu-user-static --reset -p yes
[Install]
WantedBy=multi-user.target

View File

@@ -16,7 +16,6 @@ global_data
global_data_init
global_func_args
hashmap
l4lb_all
legacy_printk
linked_funcs
linked_maps
@@ -33,11 +32,7 @@ raw_tp_writable_test_run
rdonly_maps
section_names
signal_pending
skeleton
sockmap_ktls
sockopt
sockopt_inherit
sockopt_multi
spinlock
stacktrace_map
stacktrace_map_raw_tp
@@ -47,9 +42,9 @@ task_fd_query_tp
tc_bpf
tcp_estats
tcp_rtt
test_global_funcs/arg_tag_ctx*
tp_attach_query
usdt/urand_pid_attach
xdp
xdp_info
xdp_noinline
xdp_perf

View File

@@ -0,0 +1,14 @@
# TEMPORARY
btf_dump/btf_dump: syntax
kprobe_multi_bench_attach
core_reloc/enum64val
core_reloc/size___diff_sz
core_reloc/type_based___diff_sz
test_ima # All of CI is broken on it following 6.3-rc1 merge
lwt_reroute # crashes kernel after netnext merge from 2ab1efad60ad "net/sched: cls_api: complement tcf_tfilter_dump_policy"
tc_links_ingress # started failing after net-next merge from 2ab1efad60ad "net/sched: cls_api: complement tcf_tfilter_dump_policy"
xdp_bonding/xdp_bonding_features # started failing after net merge from 359e54a93ab4 "l2tp: pass correct message length to ip6_append_data"
tc_redirect/tc_redirect_dtime # uapi breakage after net-next commit 885c36e59f46 ("net: Re-use and set mono_delivery_time bit for userspace tstamp packets")
migrate_reuseport/IPv4 TCP_NEW_SYN_RECV reqsk_timer_handler # flaky, under investigation
migrate_reuseport/IPv6 TCP_NEW_SYN_RECV reqsk_timer_handler # flaky, under investigation

View File

@@ -1,118 +1,5 @@
# This file is not used and is there for historic purposes only.
# See WHITELIST-5.5.0 instead.
# This complements ALLOWLIST-5.5.0 but excludes subtest that can't work on 5.5
# PERMANENTLY DISABLED
align # verifier output format changed
atomics # new atomic operations (v5.12+)
atomic_bounds # new atomic operations (v5.12+)
bind_perm # changed semantics of return values (v5.12+)
bpf_cookie # 5.15+
bpf_iter # bpf_iter support is missing
bpf_obj_id # bpf_link support missing for GET_OBJ_INFO, GET_FD_BY_ID, etc
bpf_tcp_ca # STRUCT_OPS is missing
btf_map_in_map # inner map leak fixed in 5.8
btf_skc_cls_ingress # v5.10+ functionality
cg_storage_multi # v5.9+ functionality
cgroup_attach_multi # BPF_F_REPLACE_PROG missing
cgroup_link # LINK_CREATE is missing
cgroup_skb_sk_lookup # bpf_sk_lookup_tcp() helper is missing
check_mtu # missing BPF helper (v5.12+)
cls_redirect # bpf_csum_level() helper is missing
connect_force_port # cgroup/get{peer,sock}name{4,6} support is missing
d_path # v5.10+ feature
enable_stats # BPF_ENABLE_STATS support is missing
fentry_fexit # bpf_prog_test_tracing missing
fentry_test # bpf_prog_test_tracing missing
fexit_bpf2bpf # freplace is missing
fexit_sleep # relies on bpf_trampoline fix in 5.12+
fexit_test # bpf_prog_test_tracing missing
flow_dissector # bpf_link-based flow dissector is in 5.8+
flow_dissector_reattach
for_each # v5.12+
get_func_ip_test # v5.15+
get_stack_raw_tp # exercising BPF verifier bug causing infinite loop
hash_large_key # v5.11+
ima # v5.11+
kfree_skb # 32-bit pointer arith in test_pkt_access
ksyms # __start_BTF has different name
kfunc_call # v5.13+
link_pinning # bpf_link is missing
linked_vars # v5.13+
load_bytes_relative # new functionality in 5.8
lookup_and_delete # v5.14+
map_init # per-CPU LRU missing
map_ptr # test uses BPF_MAP_TYPE_RINGBUF, added in 5.8
metadata # v5.10+
migrate_reuseport # v5.14+
mmap # 5.5 kernel is too permissive with re-mmaping
modify_return # fmod_ret support is missing
module_attach # module BTF support missing (v5.11+)
netcnt
netns_cookie # v5.15+
ns_current_pid_tgid # bpf_get_ns_current_pid_tgid() helper is missing
pe_preserve_elems # v5.10+
perf_branches # bpf_read_branch_records() helper is missing
perf_link # v5.15+
pkt_access # 32-bit pointer arith in test_pkt_access
probe_read_user_str # kernel bug with garbage bytes at the end
prog_run_xattr # 32-bit pointer arith in test_pkt_access
raw_tp_test_run # v5.10+
recursion # v5.12+
ringbuf # BPF_MAP_TYPE_RINGBUF is supported in 5.8+
# bug in verifier w/ tracking references
#reference_tracking/classifier/sk_lookup_success
reference_tracking
select_reuseport # UDP support is missing
send_signal # bpf_send_signal_thread() helper is missing
sk_assign # bpf_sk_assign helper missing
sk_lookup # v5.9+
sk_storage_tracing # missing bpf_sk_storage_get() helper
skb_ctx # ctx_{size, }_{in, out} in BPF_PROG_TEST_RUN is missing
skb_helpers # helpers added in 5.8+
skeleton # creates too big ARRAY map
snprintf # v5.13+
snprintf_btf # v5.10+
sock_fields # v5.10+
socket_cookie # v5.12+
sockmap_basic # uses new socket fields, 5.8+
sockmap_listen # no listen socket supportin SOCKMAP
sockopt_sk
sockopt_qos_to_cc # v5.15+
stacktrace_build_id # v5.9+
stack_var_off # v5.12+
syscall # v5.14+
task_local_storage # v5.12+
task_pt_regs # v5.15+
tcp_hdr_options # v5.10+, new TCP header options feature in BPF
tcpbpf_user # LINK_CREATE is missing
tc_redirect # v5.14+
test_bpffs # v5.10+, new CONFIG_BPF_PRELOAD=y and CONFIG_BPF_PRELOAD_UMG=y|m
test_bprm_opts # v5.11+
test_global_funcs # kernel doesn't support BTF linkage=global on FUNCs
test_local_storage # v5.10+ feature
test_lsm # no BPF_LSM support
test_overhead # no fmod_ret support
test_profiler # needs verifier logic improvements from v5.10+
test_skb_pkt_end # v5.11+
timer # v5.15+
timer_mim # v5.15+
trace_ext # v5.10+
trace_printk # v5.14+
trampoline_count # v5.12+ have lower allowed limits
udp_limit # no cgroup/sock_release BPF program type (5.9+)
varlen # verifier bug fixed in later kernels
vmlinux # hrtimer_nanosleep() signature changed incompatibly
xdp_adjust_tail # new XDP functionality added in 5.8
xdp_attach # IFLA_XDP_EXPECTED_FD support is missing
xdp_bonding # v5.15+
xdp_bpf2bpf # freplace is missing
xdp_context_test_run # v5.15+
xdp_cpumap_attach # v5.9+
xdp_devmap_attach # new feature in 5.8
xdp_link # v5.9+
# SUBTESTS FAILING (block entire test until blocking subtests works properly)
btf # "size check test", "func (Non zero vlen)"
tailcalls # tailcall_bpf2bpf_1, tailcall_bpf2bpf_2, tailcall_bpf2bpf_3
tc_bpf/tc_bpf_non_root

View File

@@ -1,6 +1,13 @@
# TEMPORARY
get_stack_raw_tp # spams with kernel warnings until next bpf -> bpf-next merge
stacktrace_build_id_nmi
stacktrace_build_id
task_fd_query_rawtp
varlen
decap_sanity # weird failure with decap_sanity_ns netns already existing, TBD
empty_skb # waiting the fix in bpf tree to make it to bpf-next
bpf_nf/tc-bpf-ct # test consistently failing on x86: https://github.com/libbpf/libbpf/pull/698#issuecomment-1590341200
bpf_nf/xdp-ct # test consistently failing on x86: https://github.com/libbpf/libbpf/pull/698#issuecomment-1590341200
kprobe_multi_bench_attach # suspected to cause crashes in CI
find_vma # test consistently fails on latest kernel, see https://github.com/libbpf/libbpf/issues/754 for details
bpf_cookie/perf_event
send_signal/send_signal_nmi
send_signal/send_signal_nmi_thread
lwt_reroute # crashes kernel, fix pending upstream
tc_links_ingress # fails, same fix is pending upstream
tc_redirect # enough is enough, banned for life for flakiness

View File

@@ -1,68 +1,17 @@
# TEMPORARY
atomics # attach(add): actual -524 <= expected 0 (trampoline)
bpf_iter_setsockopt # JIT does not support calling kernel function (kfunc)
bloom_filter_map # failed to find kernel BTF type ID of '__x64_sys_getpgid': -3 (?)
bpf_tcp_ca # JIT does not support calling kernel function (kfunc)
bpf_loop # attaches to __x64_sys_nanosleep
bpf_mod_race # BPF trampoline
bpf_nf # JIT does not support calling kernel function
core_read_macros # unknown func bpf_probe_read#4 (overlapping)
d_path # failed to auto-attach program 'prog_stat': -524 (trampoline)
dummy_st_ops # test_run unexpected error: -524 (errno 524) (trampoline)
fentry_fexit # fentry attach failed: -524 (trampoline)
fentry_test # fentry_first_attach unexpected error: -524 (trampoline)
fexit_bpf2bpf # freplace_attach_trace unexpected error: -524 (trampoline)
fexit_sleep # fexit_skel_load fexit skeleton failed (trampoline)
fexit_stress # fexit attach failed prog 0 failed: -524 (trampoline)
fexit_test # fexit_first_attach unexpected error: -524 (trampoline)
get_func_args_test # trampoline
get_func_ip_test # get_func_ip_test__attach unexpected error: -524 (trampoline)
get_stack_raw_tp # user_stack corrupted user stack (no backchain userspace)
kfree_skb # attach fentry unexpected error: -524 (trampoline)
kfunc_call # 'bpf_prog_active': not found in kernel BTF (?)
ksyms_module # test_ksyms_module__open_and_load unexpected error: -9 (?)
ksyms_module_libbpf # JIT does not support calling kernel function (kfunc)
ksyms_module_lskel # test_ksyms_module_lskel__open_and_load unexpected error: -9 (?)
modify_return # modify_return attach failed: -524 (trampoline)
module_attach # skel_attach skeleton attach failed: -524 (trampoline)
mptcp
kprobe_multi_test # relies on fentry
netcnt # failed to load BPF skeleton 'netcnt_prog': -7 (?)
probe_user # check_kprobe_res wrong kprobe res from probe read (?)
recursion # skel_attach unexpected error: -524 (trampoline)
ringbuf # skel_load skeleton load failed (?)
sk_assign # Can't read on server: Invalid argument (?)
sk_lookup # endianness problem
sk_storage_tracing # test_sk_storage_tracing__attach unexpected error: -524 (trampoline)
skc_to_unix_sock # could not attach BPF object unexpected error: -524 (trampoline)
socket_cookie # prog_attach unexpected error: -524 (trampoline)
stacktrace_build_id # compare_map_keys stackid_hmap vs. stackmap err -2 errno 2 (?)
tailcalls # tail_calls are not allowed in non-JITed programs with bpf-to-bpf calls (?)
task_local_storage # failed to auto-attach program 'trace_exit_creds': -524 (trampoline)
test_bpffs # bpffs test failed 255 (iterator)
test_bprm_opts # failed to auto-attach program 'secure_exec': -524 (trampoline)
test_ima # failed to auto-attach program 'ima': -524 (trampoline)
test_local_storage # failed to auto-attach program 'unlink_hook': -524 (trampoline)
test_lsm # failed to find kernel BTF type ID of '__x64_sys_setdomainname': -3 (?)
test_overhead # attach_fentry unexpected error: -524 (trampoline)
test_profiler # unknown func bpf_probe_read_str#45 (overlapping)
timer # failed to auto-attach program 'test1': -524 (trampoline)
timer_crash # trampoline
timer_mim # failed to auto-attach program 'test1': -524 (trampoline)
trace_ext # failed to auto-attach program 'test_pkt_md_access_new': -524 (trampoline)
trace_printk # trace_printk__load unexpected error: -2 (errno 2) (?)
trace_vprintk # trace_vprintk__open_and_load unexpected error: -9 (?)
trampoline_count # prog 'prog1': failed to attach: ERROR: strerror_r(-524)=22 (trampoline)
verif_stats # trace_vprintk__open_and_load unexpected error: -9 (?)
vmlinux # failed to auto-attach program 'handle__fentry': -524 (trampoline)
xdp_adjust_tail # case-128 err 0 errno 28 retval 1 size 128 expect-size 3520 (?)
xdp_bonding # failed to auto-attach program 'trace_on_entry': -524 (trampoline)
xdp_bpf2bpf # failed to auto-attach program 'trace_on_entry': -524 (trampoline)
map_kptr # failed to open_and_load program: -524 (trampoline)
bpf_cookie # failed to open_and_load program: -524 (trampoline)
xdp_do_redirect # prog_run_max_size unexpected error: -22 (errno 22)
send_signal # intermittently fails to receive signal
select_reuseport # intermittently fails on new s390x setup
xdp_synproxy # JIT does not support calling kernel function (kfunc)
unpriv_bpf_disabled # fentry
lru_bug
sockmap_listen/sockhash VSOCK test_vsock_redir
usdt/basic # failing verifier due to bounds check after LLVM update
usdt/multispec # same as above
deny_namespace # not yet in bpf denylist
tc_redirect/tc_redirect_dtime # very flaky
lru_bug # not yet in bpf-next denylist
# Disabled temporarily for a crash.
# https://lore.kernel.org/bpf/c9923c1d-971d-4022-8dc8-1364e929d34c@gmail.com/
dummy_st_ops/dummy_init_ptr_arg
fexit_bpf2bpf
tailcalls
trace_ext
xdp_bpf2bpf
xdp_metadata

View File

@@ -1,3 +1,5 @@
# shellcheck shell=bash
# $1 - start or end
# $2 - fold identifier, no spaces
# $3 - fold section description

View File

@@ -13,7 +13,7 @@ read_lists() {
if [[ -s "$path" ]]; then
cat "$path"
fi;
done) | cut -d'#' -f1 | tr -s ' \t\n' ','
done) | cut -d'#' -f1 | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//' | tr -s '\n' ','
}
test_progs() {
@@ -22,15 +22,15 @@ test_progs() {
# "&& true" does not change the return code (it is not executed
# if the Python script fails), but it prevents exiting on a
# failure due to the "set -e".
./test_progs ${DENYLIST:+-d$DENYLIST} ${ALLOWLIST:+-a$ALLOWLIST} && true
./test_progs ${DENYLIST:+-d"$DENYLIST"} ${ALLOWLIST:+-a"$ALLOWLIST"} && true
echo "test_progs:$?" >> "${STATUS_FILE}"
foldable end test_progs
fi
}
test_progs_noalu() {
test_progs_no_alu32() {
foldable start test_progs-no_alu32 "Testing test_progs-no_alu32"
./test_progs-no_alu32 ${DENYLIST:+-d$DENYLIST} ${ALLOWLIST:+-a$ALLOWLIST} && true
./test_progs-no_alu32 ${DENYLIST:+-d"$DENYLIST"} ${ALLOWLIST:+-a"$ALLOWLIST"} && true
echo "test_progs-no_alu32:$?" >> "${STATUS_FILE}"
foldable end test_progs-no_alu32
}
@@ -55,9 +55,27 @@ test_verifier() {
foldable end vm_init
configs_path=${PROJECT_NAME}/vmtest/configs
DENYLIST=$(read_lists "$configs_path/DENYLIST-${KERNEL}" "$configs_path/DENYLIST-${KERNEL}.${ARCH}")
ALLOWLIST=$(read_lists "$configs_path/ALLOWLIST-${KERNEL}" "$configs_path/ALLOWLIST-${KERNEL}.${ARCH}")
foldable start kernel_config "Kconfig"
zcat /proc/config.gz
foldable end kernel_config
configs_path=/${PROJECT_NAME}/selftests/bpf
local_configs_path=${PROJECT_NAME}/vmtest/configs
DENYLIST=$(read_lists \
"$configs_path/DENYLIST" \
"$configs_path/DENYLIST.${ARCH}" \
"$local_configs_path/DENYLIST-${KERNEL}" \
"$local_configs_path/DENYLIST-${KERNEL}.${ARCH}" \
)
ALLOWLIST=$(read_lists \
"$configs_path/ALLOWLIST" \
"$configs_path/ALLOWLIST.${ARCH}" \
"$local_configs_path/ALLOWLIST-${KERNEL}" \
"$local_configs_path/ALLOWLIST-${KERNEL}.${ARCH}" \
)
echo "DENYLIST: ${DENYLIST}"
echo "ALLOWLIST: ${ALLOWLIST}"
@@ -66,8 +84,8 @@ cd ${PROJECT_NAME}/selftests/bpf
if [ $# -eq 0 ]; then
test_progs
test_progs_noalu
test_maps
test_progs_no_alu32
# test_maps
test_verifier
else
for test_name in "$@"; do

View File

@@ -18,6 +18,7 @@ extensions = [
'sphinx.ext.viewcode',
'sphinx.ext.imgmath',
'sphinx.ext.todo',
'sphinx_rtd_theme',
'breathe',
]

View File

@@ -1,21 +1,33 @@
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
.. _libbpf:
======
libbpf
======
If you are looking to develop BPF applications using the libbpf library, this
directory contains important documentation that you should read.
To get started, it is recommended to begin with the :doc:`libbpf Overview
<libbpf_overview>` document, which provides a high-level understanding of the
libbpf APIs and their usage. This will give you a solid foundation to start
exploring and utilizing the various features of libbpf to develop your BPF
applications.
.. toctree::
:maxdepth: 1
libbpf_overview
API Documentation <https://libbpf.readthedocs.io/en/latest/api.html>
program_types
libbpf_naming_convention
libbpf_build
This is documentation for libbpf, a userspace library for loading and
interacting with bpf programs.
All general BPF questions, including kernel functionality, libbpf APIs and
their application, should be sent to bpf@vger.kernel.org mailing list.
You can `subscribe <http://vger.kernel.org/vger-lists.html#bpf>`_ to the
mailing list search its `archive <https://lore.kernel.org/bpf/>`_.
Please search the archive before asking new questions. It very well might
be that this was already addressed or answered before.
All general BPF questions, including kernel functionality, libbpf APIs and their
application, should be sent to bpf@vger.kernel.org mailing list. You can
`subscribe <http://vger.kernel.org/vger-lists.html#bpf>`_ to the mailing list
search its `archive <https://lore.kernel.org/bpf/>`_. Please search the archive
before asking new questions. It may be that this was already addressed or
answered before.

View File

@@ -83,8 +83,8 @@ This prevents from accidentally exporting a symbol, that is not supposed
to be a part of ABI what, in turn, improves both libbpf developer- and
user-experiences.
ABI versionning
---------------
ABI versioning
--------------
To make future ABI extensions possible libbpf ABI is versioned.
Versioning is implemented by ``libbpf.map`` version script that is
@@ -148,7 +148,7 @@ API documentation convention
The libbpf API is documented via comments above definitions in
header files. These comments can be rendered by doxygen and sphinx
for well organized html output. This section describes the
convention in which these comments should be formated.
convention in which these comments should be formatted.
Here is an example from btf.h:

228
docs/libbpf_overview.rst Normal file
View File

@@ -0,0 +1,228 @@
.. SPDX-License-Identifier: GPL-2.0
===============
libbpf Overview
===============
libbpf is a C-based library containing a BPF loader that takes compiled BPF
object files and prepares and loads them into the Linux kernel. libbpf takes the
heavy lifting of loading, verifying, and attaching BPF programs to various
kernel hooks, allowing BPF application developers to focus only on BPF program
correctness and performance.
The following are the high-level features supported by libbpf:
* Provides high-level and low-level APIs for user space programs to interact
with BPF programs. The low-level APIs wrap all the bpf system call
functionality, which is useful when users need more fine-grained control
over the interactions between user space and BPF programs.
* Provides overall support for the BPF object skeleton generated by bpftool.
The skeleton file simplifies the process for the user space programs to access
global variables and work with BPF programs.
* Provides BPF-side APIS, including BPF helper definitions, BPF maps support,
and tracing helpers, allowing developers to simplify BPF code writing.
* Supports BPF CO-RE mechanism, enabling BPF developers to write portable
BPF programs that can be compiled once and run across different kernel
versions.
This document will delve into the above concepts in detail, providing a deeper
understanding of the capabilities and advantages of libbpf and how it can help
you develop BPF applications efficiently.
BPF App Lifecycle and libbpf APIs
==================================
A BPF application consists of one or more BPF programs (either cooperating or
completely independent), BPF maps, and global variables. The global
variables are shared between all BPF programs, which allows them to cooperate on
a common set of data. libbpf provides APIs that user space programs can use to
manipulate the BPF programs by triggering different phases of a BPF application
lifecycle.
The following section provides a brief overview of each phase in the BPF life
cycle:
* **Open phase**: In this phase, libbpf parses the BPF
object file and discovers BPF maps, BPF programs, and global variables. After
a BPF app is opened, user space apps can make additional adjustments
(setting BPF program types, if necessary; pre-setting initial values for
global variables, etc.) before all the entities are created and loaded.
* **Load phase**: In the load phase, libbpf creates BPF
maps, resolves various relocations, and verifies and loads BPF programs into
the kernel. At this point, libbpf validates all the parts of a BPF application
and loads the BPF program into the kernel, but no BPF program has yet been
executed. After the load phase, its possible to set up the initial BPF map
state without racing with the BPF program code execution.
* **Attachment phase**: In this phase, libbpf
attaches BPF programs to various BPF hook points (e.g., tracepoints, kprobes,
cgroup hooks, network packet processing pipeline, etc.). During this
phase, BPF programs perform useful work such as processing
packets, or updating BPF maps and global variables that can be read from user
space.
* **Tear down phase**: In the tear down phase,
libbpf detaches BPF programs and unloads them from the kernel. BPF maps are
destroyed, and all the resources used by the BPF app are freed.
BPF Object Skeleton File
========================
BPF skeleton is an alternative interface to libbpf APIs for working with BPF
objects. Skeleton code abstract away generic libbpf APIs to significantly
simplify code for manipulating BPF programs from user space. Skeleton code
includes a bytecode representation of the BPF object file, simplifying the
process of distributing your BPF code. With BPF bytecode embedded, there are no
extra files to deploy along with your application binary.
You can generate the skeleton header file ``(.skel.h)`` for a specific object
file by passing the BPF object to the bpftool. The generated BPF skeleton
provides the following custom functions that correspond to the BPF lifecycle,
each of them prefixed with the specific object name:
* ``<name>__open()`` creates and opens BPF application (``<name>`` stands for
the specific bpf object name)
* ``<name>__load()`` instantiates, loads,and verifies BPF application parts
* ``<name>__attach()`` attaches all auto-attachable BPF programs (its
optional, you can have more control by using libbpf APIs directly)
* ``<name>__destroy()`` detaches all BPF programs and
frees up all used resources
Using the skeleton code is the recommended way to work with bpf programs. Keep
in mind, BPF skeleton provides access to the underlying BPF object, so whatever
was possible to do with generic libbpf APIs is still possible even when the BPF
skeleton is used. It's an additive convenience feature, with no syscalls, and no
cumbersome code.
Other Advantages of Using Skeleton File
---------------------------------------
* BPF skeleton provides an interface for user space programs to work with BPF
global variables. The skeleton code memory maps global variables as a struct
into user space. The struct interface allows user space programs to initialize
BPF programs before the BPF load phase and fetch and update data from user
space afterward.
* The ``skel.h`` file reflects the object file structure by listing out the
available maps, programs, etc. BPF skeleton provides direct access to all the
BPF maps and BPF programs as struct fields. This eliminates the need for
string-based lookups with ``bpf_object_find_map_by_name()`` and
``bpf_object_find_program_by_name()`` APIs, reducing errors due to BPF source
code and user-space code getting out of sync.
* The embedded bytecode representation of the object file ensures that the
skeleton and the BPF object file are always in sync.
BPF Helpers
===========
libbpf provides BPF-side APIs that BPF programs can use to interact with the
system. The BPF helpers definition allows developers to use them in BPF code as
any other plain C function. For example, there are helper functions to print
debugging messages, get the time since the system was booted, interact with BPF
maps, manipulate network packets, etc.
For a complete description of what the helpers do, the arguments they take, and
the return value, see the `bpf-helpers
<https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man page.
BPF CO-RE (Compile Once Run Everywhere)
=========================================
BPF programs work in the kernel space and have access to kernel memory and data
structures. One limitation that BPF applications come across is the lack of
portability across different kernel versions and configurations. `BCC
<https://github.com/iovisor/bcc/>`_ is one of the solutions for BPF
portability. However, it comes with runtime overhead and a large binary size
from embedding the compiler with the application.
libbpf steps up the BPF program portability by supporting the BPF CO-RE concept.
BPF CO-RE brings together BTF type information, libbpf, and the compiler to
produce a single executable binary that you can run on multiple kernel versions
and configurations.
To make BPF programs portable libbpf relies on the BTF type information of the
running kernel. Kernel also exposes this self-describing authoritative BTF
information through ``sysfs`` at ``/sys/kernel/btf/vmlinux``.
You can generate the BTF information for the running kernel with the following
command:
::
$ bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
The command generates a ``vmlinux.h`` header file with all kernel types
(:doc:`BTF types <../btf>`) that the running kernel uses. Including
``vmlinux.h`` in your BPF program eliminates dependency on system-wide kernel
headers.
libbpf enables portability of BPF programs by looking at the BPF programs
recorded BTF type and relocation information and matching them to BTF
information (vmlinux) provided by the running kernel. libbpf then resolves and
matches all the types and fields, and updates necessary offsets and other
relocatable data to ensure that BPF programs logic functions correctly for a
specific kernel on the host. BPF CO-RE concept thus eliminates overhead
associated with BPF development and allows developers to write portable BPF
applications without modifications and runtime source code compilation on the
target machine.
The following code snippet shows how to read the parent field of a kernel
``task_struct`` using BPF CO-RE and libbf. The basic helper to read a field in a
CO-RE relocatable manner is ``bpf_core_read(dst, sz, src)``, which will read
``sz`` bytes from the field referenced by ``src`` into the memory pointed to by
``dst``.
.. code-block:: C
:emphasize-lines: 6
//...
struct task_struct *task = (void *)bpf_get_current_task();
struct task_struct *parent_task;
int err;
err = bpf_core_read(&parent_task, sizeof(void *), &task->parent);
if (err) {
/* handle error */
}
/* parent_task contains the value of task->parent pointer */
In the code snippet, we first get a pointer to the current ``task_struct`` using
``bpf_get_current_task()``. We then use ``bpf_core_read()`` to read the parent
field of task struct into the ``parent_task`` variable. ``bpf_core_read()`` is
just like ``bpf_probe_read_kernel()`` BPF helper, except it records information
about the field that should be relocated on the target kernel. i.e, if the
``parent`` field gets shifted to a different offset within
``struct task_struct`` due to some new field added in front of it, libbpf will
automatically adjust the actual offset to the proper value.
Getting Started with libbpf
===========================
Check out the `libbpf-bootstrap <https://github.com/libbpf/libbpf-bootstrap>`_
repository with simple examples of using libbpf to build various BPF
applications.
See also `libbpf API documentation
<https://libbpf.readthedocs.io/en/latest/api.html>`_.
libbpf and Rust
===============
If you are building BPF applications in Rust, it is recommended to use the
`Libbpf-rs <https://github.com/libbpf/libbpf-rs>`_ library instead of bindgen
bindings directly to libbpf. Libbpf-rs wraps libbpf functionality in
Rust-idiomatic interfaces and provides libbpf-cargo plugin to handle BPF code
compilation and skeleton generation. Using Libbpf-rs will make building user
space part of the BPF application easier. Note that the BPF program themselves
must still be written in plain C.
Additional Documentation
========================
* `Program types and ELF Sections <https://libbpf.readthedocs.io/en/latest/program_types.html>`_
* `API naming convention <https://libbpf.readthedocs.io/en/latest/libbpf_naming_convention.html>`_
* `Building libbpf <https://libbpf.readthedocs.io/en/latest/libbpf_build.html>`_
* `API documentation Convention <https://libbpf.readthedocs.io/en/latest/libbpf_naming_convention.html#api-documentation-convention>`_

213
docs/program_types.rst Normal file
View File

@@ -0,0 +1,213 @@
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
.. _program_types_and_elf:
Program Types and ELF Sections
==============================
The table below lists the program types, their attach types where relevant and the ELF section
names supported by libbpf for them. The ELF section names follow these rules:
- ``type`` is an exact match, e.g. ``SEC("socket")``
- ``type+`` means it can be either exact ``SEC("type")`` or well-formed ``SEC("type/extras")``
with a '``/``' separator between ``type`` and ``extras``.
When ``extras`` are specified, they provide details of how to auto-attach the BPF program. The
format of ``extras`` depends on the program type, e.g. ``SEC("tracepoint/<category>/<name>")``
for tracepoints or ``SEC("usdt/<path>:<provider>:<name>")`` for USDT probes. The extras are
described in more detail in the footnotes.
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| Program Type | Attach Type | ELF Section Name | Sleepable |
+===========================================+========================================+==================================+===========+
| ``BPF_PROG_TYPE_CGROUP_DEVICE`` | ``BPF_CGROUP_DEVICE`` | ``cgroup/dev`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_CGROUP_SKB`` | | ``cgroup/skb`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET_EGRESS`` | ``cgroup_skb/egress`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET_INGRESS`` | ``cgroup_skb/ingress`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_CGROUP_SOCKOPT`` | ``BPF_CGROUP_GETSOCKOPT`` | ``cgroup/getsockopt`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_SETSOCKOPT`` | ``cgroup/setsockopt`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_CGROUP_SOCK_ADDR`` | ``BPF_CGROUP_INET4_BIND`` | ``cgroup/bind4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET4_CONNECT`` | ``cgroup/connect4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET4_GETPEERNAME`` | ``cgroup/getpeername4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET4_GETSOCKNAME`` | ``cgroup/getsockname4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET6_BIND`` | ``cgroup/bind6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET6_CONNECT`` | ``cgroup/connect6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET6_GETPEERNAME`` | ``cgroup/getpeername6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET6_GETSOCKNAME`` | ``cgroup/getsockname6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UDP4_RECVMSG`` | ``cgroup/recvmsg4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UDP4_SENDMSG`` | ``cgroup/sendmsg4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UDP6_RECVMSG`` | ``cgroup/recvmsg6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UDP6_SENDMSG`` | ``cgroup/sendmsg6`` | |
| +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UNIX_CONNECT`` | ``cgroup/connect_unix`` | |
| +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UNIX_SENDMSG`` | ``cgroup/sendmsg_unix`` | |
| +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UNIX_RECVMSG`` | ``cgroup/recvmsg_unix`` | |
| +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UNIX_GETPEERNAME`` | ``cgroup/getpeername_unix`` | |
| +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UNIX_GETSOCKNAME`` | ``cgroup/getsockname_unix`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_CGROUP_SOCK`` | ``BPF_CGROUP_INET4_POST_BIND`` | ``cgroup/post_bind4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET6_POST_BIND`` | ``cgroup/post_bind6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET_SOCK_CREATE`` | ``cgroup/sock_create`` | |
+ + +----------------------------------+-----------+
| | | ``cgroup/sock`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET_SOCK_RELEASE`` | ``cgroup/sock_release`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_CGROUP_SYSCTL`` | ``BPF_CGROUP_SYSCTL`` | ``cgroup/sysctl`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_EXT`` | | ``freplace+`` [#fentry]_ | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_FLOW_DISSECTOR`` | ``BPF_FLOW_DISSECTOR`` | ``flow_dissector`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_KPROBE`` | | ``kprobe+`` [#kprobe]_ | |
+ + +----------------------------------+-----------+
| | | ``kretprobe+`` [#kprobe]_ | |
+ + +----------------------------------+-----------+
| | | ``ksyscall+`` [#ksyscall]_ | |
+ + +----------------------------------+-----------+
| | | ``kretsyscall+`` [#ksyscall]_ | |
+ + +----------------------------------+-----------+
| | | ``uprobe+`` [#uprobe]_ | |
+ + +----------------------------------+-----------+
| | | ``uprobe.s+`` [#uprobe]_ | Yes |
+ + +----------------------------------+-----------+
| | | ``uretprobe+`` [#uprobe]_ | |
+ + +----------------------------------+-----------+
| | | ``uretprobe.s+`` [#uprobe]_ | Yes |
+ + +----------------------------------+-----------+
| | | ``usdt+`` [#usdt]_ | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_TRACE_KPROBE_MULTI`` | ``kprobe.multi+`` [#kpmulti]_ | |
+ + +----------------------------------+-----------+
| | | ``kretprobe.multi+`` [#kpmulti]_ | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LIRC_MODE2`` | ``BPF_LIRC_MODE2`` | ``lirc_mode2`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LSM`` | ``BPF_LSM_CGROUP`` | ``lsm_cgroup+`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_LSM_MAC`` | ``lsm+`` [#lsm]_ | |
+ + +----------------------------------+-----------+
| | | ``lsm.s+`` [#lsm]_ | Yes |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LWT_IN`` | | ``lwt_in`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LWT_OUT`` | | ``lwt_out`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` | | ``lwt_seg6local`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LWT_XMIT`` | | ``lwt_xmit`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_PERF_EVENT`` | | ``perf_event`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE`` | | ``raw_tp.w+`` [#rawtp]_ | |
+ + +----------------------------------+-----------+
| | | ``raw_tracepoint.w+`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_RAW_TRACEPOINT`` | | ``raw_tp+`` [#rawtp]_ | |
+ + +----------------------------------+-----------+
| | | ``raw_tracepoint+`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SCHED_ACT`` | | ``action`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SCHED_CLS`` | | ``classifier`` | |
+ + +----------------------------------+-----------+
| | | ``tc`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SK_LOOKUP`` | ``BPF_SK_LOOKUP`` | ``sk_lookup`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SK_MSG`` | ``BPF_SK_MSG_VERDICT`` | ``sk_msg`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SK_REUSEPORT`` | ``BPF_SK_REUSEPORT_SELECT_OR_MIGRATE`` | ``sk_reuseport/migrate`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_SK_REUSEPORT_SELECT`` | ``sk_reuseport`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SK_SKB`` | | ``sk_skb`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_SK_SKB_STREAM_PARSER`` | ``sk_skb/stream_parser`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_SK_SKB_STREAM_VERDICT`` | ``sk_skb/stream_verdict`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SOCKET_FILTER`` | | ``socket`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SOCK_OPS`` | ``BPF_CGROUP_SOCK_OPS`` | ``sockops`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_STRUCT_OPS`` | | ``struct_ops+`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SYSCALL`` | | ``syscall`` | Yes |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_TRACEPOINT`` | | ``tp+`` [#tp]_ | |
+ + +----------------------------------+-----------+
| | | ``tracepoint+`` [#tp]_ | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_TRACING`` | ``BPF_MODIFY_RETURN`` | ``fmod_ret+`` [#fentry]_ | |
+ + +----------------------------------+-----------+
| | | ``fmod_ret.s+`` [#fentry]_ | Yes |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_TRACE_FENTRY`` | ``fentry+`` [#fentry]_ | |
+ + +----------------------------------+-----------+
| | | ``fentry.s+`` [#fentry]_ | Yes |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_TRACE_FEXIT`` | ``fexit+`` [#fentry]_ | |
+ + +----------------------------------+-----------+
| | | ``fexit.s+`` [#fentry]_ | Yes |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_TRACE_ITER`` | ``iter+`` [#iter]_ | |
+ + +----------------------------------+-----------+
| | | ``iter.s+`` [#iter]_ | Yes |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_TRACE_RAW_TP`` | ``tp_btf+`` [#fentry]_ | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_XDP`` | ``BPF_XDP_CPUMAP`` | ``xdp.frags/cpumap`` | |
+ + +----------------------------------+-----------+
| | | ``xdp/cpumap`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_XDP_DEVMAP`` | ``xdp.frags/devmap`` | |
+ + +----------------------------------+-----------+
| | | ``xdp/devmap`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_XDP`` | ``xdp.frags`` | |
+ + +----------------------------------+-----------+
| | | ``xdp`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
.. rubric:: Footnotes
.. [#fentry] The ``fentry`` attach format is ``fentry[.s]/<function>``.
.. [#kprobe] The ``kprobe`` attach format is ``kprobe/<function>[+<offset>]``. Valid
characters for ``function`` are ``a-zA-Z0-9_.`` and ``offset`` must be a valid
non-negative integer.
.. [#ksyscall] The ``ksyscall`` attach format is ``ksyscall/<syscall>``.
.. [#uprobe] The ``uprobe`` attach format is ``uprobe[.s]/<path>:<function>[+<offset>]``.
.. [#usdt] The ``usdt`` attach format is ``usdt/<path>:<provider>:<name>``.
.. [#kpmulti] The ``kprobe.multi`` attach format is ``kprobe.multi/<pattern>`` where ``pattern``
supports ``*`` and ``?`` wildcards. Valid characters for pattern are
``a-zA-Z0-9_.*?``.
.. [#lsm] The ``lsm`` attachment format is ``lsm[.s]/<hook>``.
.. [#rawtp] The ``raw_tp`` attach format is ``raw_tracepoint[.w]/<tracepoint>``.
.. [#tp] The ``tracepoint`` attach format is ``tracepoint/<category>/<name>``.
.. [#iter] The ``iter`` attach format is ``iter[.s]/<struct-name>``.

View File

@@ -1 +1,2 @@
breathe
breathe
sphinx_rtd_theme

View File

@@ -37,6 +37,14 @@
.off = 0, \
.imm = IMM })
#define BPF_CALL_REL(DST) \
((struct bpf_insn) { \
.code = BPF_JMP | BPF_CALL, \
.dst_reg = 0, \
.src_reg = BPF_PSEUDO_CALL, \
.off = 0, \
.imm = DST })
#define BPF_EXIT_INSN() \
((struct bpf_insn) { \
.code = BPF_JMP | BPF_EXIT, \

View File

@@ -3,6 +3,8 @@
#ifndef __LINUX_KERNEL_H
#define __LINUX_KERNEL_H
#include <linux/compiler.h>
#ifndef offsetof
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
#endif

File diff suppressed because it is too large Load Diff

123
include/uapi/linux/fcntl.h Normal file
View File

@@ -0,0 +1,123 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef _UAPI_LINUX_FCNTL_H
#define _UAPI_LINUX_FCNTL_H
#include <asm/fcntl.h>
#include <linux/openat2.h>
#define F_SETLEASE (F_LINUX_SPECIFIC_BASE + 0)
#define F_GETLEASE (F_LINUX_SPECIFIC_BASE + 1)
/*
* Cancel a blocking posix lock; internal use only until we expose an
* asynchronous lock api to userspace:
*/
#define F_CANCELLK (F_LINUX_SPECIFIC_BASE + 5)
/* Create a file descriptor with FD_CLOEXEC set. */
#define F_DUPFD_CLOEXEC (F_LINUX_SPECIFIC_BASE + 6)
/*
* Request nofications on a directory.
* See below for events that may be notified.
*/
#define F_NOTIFY (F_LINUX_SPECIFIC_BASE+2)
/*
* Set and get of pipe page size array
*/
#define F_SETPIPE_SZ (F_LINUX_SPECIFIC_BASE + 7)
#define F_GETPIPE_SZ (F_LINUX_SPECIFIC_BASE + 8)
/*
* Set/Get seals
*/
#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
/*
* Types of seals
*/
#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
#define F_SEAL_GROW 0x0004 /* prevent file from growing */
#define F_SEAL_WRITE 0x0008 /* prevent writes */
#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */
#define F_SEAL_EXEC 0x0020 /* prevent chmod modifying exec bits */
/* (1U << 31) is reserved for signed error codes */
/*
* Set/Get write life time hints. {GET,SET}_RW_HINT operate on the
* underlying inode, while {GET,SET}_FILE_RW_HINT operate only on
* the specific file.
*/
#define F_GET_RW_HINT (F_LINUX_SPECIFIC_BASE + 11)
#define F_SET_RW_HINT (F_LINUX_SPECIFIC_BASE + 12)
#define F_GET_FILE_RW_HINT (F_LINUX_SPECIFIC_BASE + 13)
#define F_SET_FILE_RW_HINT (F_LINUX_SPECIFIC_BASE + 14)
/*
* Valid hint values for F_{GET,SET}_RW_HINT. 0 is "not set", or can be
* used to clear any hints previously set.
*/
#define RWH_WRITE_LIFE_NOT_SET 0
#define RWH_WRITE_LIFE_NONE 1
#define RWH_WRITE_LIFE_SHORT 2
#define RWH_WRITE_LIFE_MEDIUM 3
#define RWH_WRITE_LIFE_LONG 4
#define RWH_WRITE_LIFE_EXTREME 5
/*
* The originally introduced spelling is remained from the first
* versions of the patch set that introduced the feature, see commit
* v4.13-rc1~212^2~51.
*/
#define RWF_WRITE_LIFE_NOT_SET RWH_WRITE_LIFE_NOT_SET
/*
* Types of directory notifications that may be requested.
*/
#define DN_ACCESS 0x00000001 /* File accessed */
#define DN_MODIFY 0x00000002 /* File modified */
#define DN_CREATE 0x00000004 /* File created */
#define DN_DELETE 0x00000008 /* File removed */
#define DN_RENAME 0x00000010 /* File renamed */
#define DN_ATTRIB 0x00000020 /* File changed attibutes */
#define DN_MULTISHOT 0x80000000 /* Don't remove notifier */
/*
* The constants AT_REMOVEDIR and AT_EACCESS have the same value. AT_EACCESS is
* meaningful only to faccessat, while AT_REMOVEDIR is meaningful only to
* unlinkat. The two functions do completely different things and therefore,
* the flags can be allowed to overlap. For example, passing AT_REMOVEDIR to
* faccessat would be undefined behavior and thus treating it equivalent to
* AT_EACCESS is valid undefined behavior.
*/
#define AT_FDCWD -100 /* Special value used to indicate
openat should use the current
working directory. */
#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */
#define AT_EACCESS 0x200 /* Test access permitted for
effective IDs, not real IDs. */
#define AT_REMOVEDIR 0x200 /* Remove directory instead of
unlinking file. */
#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */
#define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */
#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */
#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */
#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */
#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */
#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
/* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */
#define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to
compare object identity and may not
be usable to open_by_handle_at(2) */
#if defined(__KERNEL__)
#define AT_GETATTR_NOSEC 0x80000000
#endif
#endif /* _UAPI_LINUX_FCNTL_H */

View File

@@ -211,6 +211,9 @@ struct rtnl_link_stats {
* @rx_nohandler: Number of packets received on the interface
* but dropped by the networking stack because the device is
* not designated to receive packets (e.g. backup link in a bond).
*
* @rx_otherhost_dropped: Number of packets dropped due to mismatch
* in destination MAC address.
*/
struct rtnl_link_stats64 {
__u64 rx_packets;
@@ -243,6 +246,23 @@ struct rtnl_link_stats64 {
__u64 rx_compressed;
__u64 tx_compressed;
__u64 rx_nohandler;
__u64 rx_otherhost_dropped;
};
/* Subset of link stats useful for in-HW collection. Meaning of the fields is as
* for struct rtnl_link_stats64.
*/
struct rtnl_hw_stats64 {
__u64 rx_packets;
__u64 tx_packets;
__u64 rx_bytes;
__u64 tx_bytes;
__u64 rx_errors;
__u64 tx_errors;
__u64 rx_dropped;
__u64 tx_dropped;
__u64 multicast;
};
/* The struct should be in sync with struct ifmap */
@@ -350,7 +370,13 @@ enum {
IFLA_GRO_MAX_SIZE,
IFLA_TSO_MAX_SIZE,
IFLA_TSO_MAX_SEGS,
IFLA_ALLMULTI, /* Allmulti count: > 0 means acts ALLMULTI */
IFLA_DEVLINK_PORT,
IFLA_GSO_IPV4_MAX_SIZE,
IFLA_GRO_IPV4_MAX_SIZE,
IFLA_DPLL_PIN,
__IFLA_MAX
};
@@ -539,6 +565,12 @@ enum {
IFLA_BRPORT_MRP_IN_OPEN,
IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT,
IFLA_BRPORT_MCAST_EHT_HOSTS_CNT,
IFLA_BRPORT_LOCKED,
IFLA_BRPORT_MAB,
IFLA_BRPORT_MCAST_N_GROUPS,
IFLA_BRPORT_MCAST_MAX_GROUPS,
IFLA_BRPORT_NEIGH_VLAN_SUPPRESS,
IFLA_BRPORT_BACKUP_NHID,
__IFLA_BRPORT_MAX
};
#define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
@@ -605,6 +637,7 @@ enum {
IFLA_MACVLAN_MACADDR_COUNT,
IFLA_MACVLAN_BC_QUEUE_LEN,
IFLA_MACVLAN_BC_QUEUE_LEN_USED,
IFLA_MACVLAN_BC_CUTOFF,
__IFLA_MACVLAN_MAX,
};
@@ -673,6 +706,7 @@ enum {
IFLA_XFRM_UNSPEC,
IFLA_XFRM_LINK,
IFLA_XFRM_IF_ID,
IFLA_XFRM_COLLECT_METADATA,
__IFLA_XFRM_MAX
};
@@ -714,7 +748,79 @@ enum ipvlan_mode {
#define IPVLAN_F_PRIVATE 0x01
#define IPVLAN_F_VEPA 0x02
/* Tunnel RTM header */
struct tunnel_msg {
__u8 family;
__u8 flags;
__u16 reserved2;
__u32 ifindex;
};
/* netkit section */
enum netkit_action {
NETKIT_NEXT = -1,
NETKIT_PASS = 0,
NETKIT_DROP = 2,
NETKIT_REDIRECT = 7,
};
enum netkit_mode {
NETKIT_L2,
NETKIT_L3,
};
enum {
IFLA_NETKIT_UNSPEC,
IFLA_NETKIT_PEER_INFO,
IFLA_NETKIT_PRIMARY,
IFLA_NETKIT_POLICY,
IFLA_NETKIT_PEER_POLICY,
IFLA_NETKIT_MODE,
__IFLA_NETKIT_MAX,
};
#define IFLA_NETKIT_MAX (__IFLA_NETKIT_MAX - 1)
/* VXLAN section */
/* include statistics in the dump */
#define TUNNEL_MSG_FLAG_STATS 0x01
#define TUNNEL_MSG_VALID_USER_FLAGS TUNNEL_MSG_FLAG_STATS
/* Embedded inside VXLAN_VNIFILTER_ENTRY_STATS */
enum {
VNIFILTER_ENTRY_STATS_UNSPEC,
VNIFILTER_ENTRY_STATS_RX_BYTES,
VNIFILTER_ENTRY_STATS_RX_PKTS,
VNIFILTER_ENTRY_STATS_RX_DROPS,
VNIFILTER_ENTRY_STATS_RX_ERRORS,
VNIFILTER_ENTRY_STATS_TX_BYTES,
VNIFILTER_ENTRY_STATS_TX_PKTS,
VNIFILTER_ENTRY_STATS_TX_DROPS,
VNIFILTER_ENTRY_STATS_TX_ERRORS,
VNIFILTER_ENTRY_STATS_PAD,
__VNIFILTER_ENTRY_STATS_MAX
};
#define VNIFILTER_ENTRY_STATS_MAX (__VNIFILTER_ENTRY_STATS_MAX - 1)
enum {
VXLAN_VNIFILTER_ENTRY_UNSPEC,
VXLAN_VNIFILTER_ENTRY_START,
VXLAN_VNIFILTER_ENTRY_END,
VXLAN_VNIFILTER_ENTRY_GROUP,
VXLAN_VNIFILTER_ENTRY_GROUP6,
VXLAN_VNIFILTER_ENTRY_STATS,
__VXLAN_VNIFILTER_ENTRY_MAX
};
#define VXLAN_VNIFILTER_ENTRY_MAX (__VXLAN_VNIFILTER_ENTRY_MAX - 1)
enum {
VXLAN_VNIFILTER_UNSPEC,
VXLAN_VNIFILTER_ENTRY,
__VXLAN_VNIFILTER_MAX
};
#define VXLAN_VNIFILTER_MAX (__VXLAN_VNIFILTER_MAX - 1)
enum {
IFLA_VXLAN_UNSPEC,
IFLA_VXLAN_ID,
@@ -746,6 +852,8 @@ enum {
IFLA_VXLAN_GPE,
IFLA_VXLAN_TTL_INHERIT,
IFLA_VXLAN_DF,
IFLA_VXLAN_VNIFILTER, /* only applicable with COLLECT_METADATA mode */
IFLA_VXLAN_LOCALBYPASS,
__IFLA_VXLAN_MAX
};
#define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
@@ -779,6 +887,7 @@ enum {
IFLA_GENEVE_LABEL,
IFLA_GENEVE_TTL_INHERIT,
IFLA_GENEVE_DF,
IFLA_GENEVE_INNER_PROTO_INHERIT,
__IFLA_GENEVE_MAX
};
#define IFLA_GENEVE_MAX (__IFLA_GENEVE_MAX - 1)
@@ -824,6 +933,8 @@ enum {
IFLA_GTP_FD1,
IFLA_GTP_PDP_HASHSIZE,
IFLA_GTP_ROLE,
IFLA_GTP_CREATE_SOCKETS,
IFLA_GTP_RESTART_COUNT,
__IFLA_GTP_MAX,
};
#define IFLA_GTP_MAX (__IFLA_GTP_MAX - 1)
@@ -863,6 +974,7 @@ enum {
IFLA_BOND_AD_LACP_ACTIVE,
IFLA_BOND_MISSED_MAX,
IFLA_BOND_NS_IP6_TARGET,
IFLA_BOND_COUPLED_CONTROL,
__IFLA_BOND_MAX,
};
@@ -1160,6 +1272,17 @@ enum {
#define IFLA_STATS_FILTER_BIT(ATTR) (1 << (ATTR - 1))
enum {
IFLA_STATS_GETSET_UNSPEC,
IFLA_STATS_GET_FILTERS, /* Nest of IFLA_STATS_LINK_xxx, each a u32 with
* a filter mask for the corresponding group.
*/
IFLA_STATS_SET_OFFLOAD_XSTATS_L3_STATS, /* 0 or 1 as u8 */
__IFLA_STATS_GETSET_MAX,
};
#define IFLA_STATS_GETSET_MAX (__IFLA_STATS_GETSET_MAX - 1)
/* These are embedded into IFLA_STATS_LINK_XSTATS:
* [IFLA_STATS_LINK_XSTATS]
* -> [LINK_XSTATS_TYPE_xxx]
@@ -1177,10 +1300,21 @@ enum {
enum {
IFLA_OFFLOAD_XSTATS_UNSPEC,
IFLA_OFFLOAD_XSTATS_CPU_HIT, /* struct rtnl_link_stats64 */
IFLA_OFFLOAD_XSTATS_HW_S_INFO, /* HW stats info. A nest */
IFLA_OFFLOAD_XSTATS_L3_STATS, /* struct rtnl_hw_stats64 */
__IFLA_OFFLOAD_XSTATS_MAX
};
#define IFLA_OFFLOAD_XSTATS_MAX (__IFLA_OFFLOAD_XSTATS_MAX - 1)
enum {
IFLA_OFFLOAD_XSTATS_HW_S_INFO_UNSPEC,
IFLA_OFFLOAD_XSTATS_HW_S_INFO_REQUEST, /* u8 */
IFLA_OFFLOAD_XSTATS_HW_S_INFO_USED, /* u8 */
__IFLA_OFFLOAD_XSTATS_HW_S_INFO_MAX,
};
#define IFLA_OFFLOAD_XSTATS_HW_S_INFO_MAX \
(__IFLA_OFFLOAD_XSTATS_HW_S_INFO_MAX - 1)
/* XDP section */
#define XDP_FLAGS_UPDATE_IF_NOEXIST (1U << 0)
@@ -1279,4 +1413,14 @@ enum {
#define IFLA_MCTP_MAX (__IFLA_MCTP_MAX - 1)
/* DSA section */
enum {
IFLA_DSA_UNSPEC,
IFLA_DSA_MASTER,
__IFLA_DSA_MAX,
};
#define IFLA_DSA_MAX (__IFLA_DSA_MAX - 1)
#endif /* _UAPI_LINUX_IF_LINK_H */

View File

@@ -25,9 +25,21 @@
* application.
*/
#define XDP_USE_NEED_WAKEUP (1 << 3)
/* By setting this option, userspace application indicates that it can
* handle multiple descriptors per packet thus enabling AF_XDP to split
* multi-buffer XDP frames into multiple Rx descriptors. Without this set
* such frames will be dropped.
*/
#define XDP_USE_SG (1 << 4)
/* Flags for xsk_umem_config flags */
#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
/* Force checksum calculation in software. Can be used for testing or
* working around potential HW issues. This option causes performance
* degradation and only works in XDP_COPY mode.
*/
#define XDP_UMEM_TX_SW_CSUM (1 << 1)
struct sockaddr_xdp {
__u16 sxdp_family;
@@ -70,6 +82,7 @@ struct xdp_umem_reg {
__u32 chunk_size;
__u32 headroom;
__u32 flags;
__u32 tx_metadata_len;
};
struct xdp_statistics {
@@ -99,6 +112,41 @@ struct xdp_options {
#define XSK_UNALIGNED_BUF_ADDR_MASK \
((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
/* Request transmit timestamp. Upon completion, put it into tx_timestamp
* field of union xsk_tx_metadata.
*/
#define XDP_TXMD_FLAGS_TIMESTAMP (1 << 0)
/* Request transmit checksum offload. Checksum start position and offset
* are communicated via csum_start and csum_offset fields of union
* xsk_tx_metadata.
*/
#define XDP_TXMD_FLAGS_CHECKSUM (1 << 1)
/* AF_XDP offloads request. 'request' union member is consumed by the driver
* when the packet is being transmitted. 'completion' union member is
* filled by the driver when the transmit completion arrives.
*/
struct xsk_tx_metadata {
__u64 flags;
union {
struct {
/* XDP_TXMD_FLAGS_CHECKSUM */
/* Offset from desc->addr where checksumming should start. */
__u16 csum_start;
/* Offset from csum_start where checksum should be stored. */
__u16 csum_offset;
} request;
struct {
/* XDP_TXMD_FLAGS_TIMESTAMP */
__u64 tx_timestamp;
} completion;
};
};
/* Rx/Tx descriptor */
struct xdp_desc {
__u64 addr;
@@ -108,4 +156,14 @@ struct xdp_desc {
/* UMEM descriptor is __u64 */
/* Flag indicating that the packet continues with the buffer pointed out by the
* next frame in the ring. The end of the packet is signalled by setting this
* bit to zero. For single buffer packets, every descriptor has 'options' set
* to 0 and this maintains backward compatibility.
*/
#define XDP_PKT_CONTD (1 << 0)
/* TX packet carries valid metadata. */
#define XDP_TX_METADATA (1 << 1)
#endif /* _LINUX_IF_XDP_H */

175
include/uapi/linux/netdev.h Normal file
View File

@@ -0,0 +1,175 @@
/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
/* Do not edit directly, auto-generated from: */
/* Documentation/netlink/specs/netdev.yaml */
/* YNL-GEN uapi header */
#ifndef _UAPI_LINUX_NETDEV_H
#define _UAPI_LINUX_NETDEV_H
#define NETDEV_FAMILY_NAME "netdev"
#define NETDEV_FAMILY_VERSION 1
/**
* enum netdev_xdp_act
* @NETDEV_XDP_ACT_BASIC: XDP features set supported by all drivers
* (XDP_ABORTED, XDP_DROP, XDP_PASS, XDP_TX)
* @NETDEV_XDP_ACT_REDIRECT: The netdev supports XDP_REDIRECT
* @NETDEV_XDP_ACT_NDO_XMIT: This feature informs if netdev implements
* ndo_xdp_xmit callback.
* @NETDEV_XDP_ACT_XSK_ZEROCOPY: This feature informs if netdev supports AF_XDP
* in zero copy mode.
* @NETDEV_XDP_ACT_HW_OFFLOAD: This feature informs if netdev supports XDP hw
* offloading.
* @NETDEV_XDP_ACT_RX_SG: This feature informs if netdev implements non-linear
* XDP buffer support in the driver napi callback.
* @NETDEV_XDP_ACT_NDO_XMIT_SG: This feature informs if netdev implements
* non-linear XDP buffer support in ndo_xdp_xmit callback.
*/
enum netdev_xdp_act {
NETDEV_XDP_ACT_BASIC = 1,
NETDEV_XDP_ACT_REDIRECT = 2,
NETDEV_XDP_ACT_NDO_XMIT = 4,
NETDEV_XDP_ACT_XSK_ZEROCOPY = 8,
NETDEV_XDP_ACT_HW_OFFLOAD = 16,
NETDEV_XDP_ACT_RX_SG = 32,
NETDEV_XDP_ACT_NDO_XMIT_SG = 64,
/* private: */
NETDEV_XDP_ACT_MASK = 127,
};
/**
* enum netdev_xdp_rx_metadata
* @NETDEV_XDP_RX_METADATA_TIMESTAMP: Device is capable of exposing receive HW
* timestamp via bpf_xdp_metadata_rx_timestamp().
* @NETDEV_XDP_RX_METADATA_HASH: Device is capable of exposing receive packet
* hash via bpf_xdp_metadata_rx_hash().
* @NETDEV_XDP_RX_METADATA_VLAN_TAG: Device is capable of exposing receive
* packet VLAN tag via bpf_xdp_metadata_rx_vlan_tag().
*/
enum netdev_xdp_rx_metadata {
NETDEV_XDP_RX_METADATA_TIMESTAMP = 1,
NETDEV_XDP_RX_METADATA_HASH = 2,
NETDEV_XDP_RX_METADATA_VLAN_TAG = 4,
};
/**
* enum netdev_xsk_flags
* @NETDEV_XSK_FLAGS_TX_TIMESTAMP: HW timestamping egress packets is supported
* by the driver.
* @NETDEV_XSK_FLAGS_TX_CHECKSUM: L3 checksum HW offload is supported by the
* driver.
*/
enum netdev_xsk_flags {
NETDEV_XSK_FLAGS_TX_TIMESTAMP = 1,
NETDEV_XSK_FLAGS_TX_CHECKSUM = 2,
};
enum netdev_queue_type {
NETDEV_QUEUE_TYPE_RX,
NETDEV_QUEUE_TYPE_TX,
};
enum netdev_qstats_scope {
NETDEV_QSTATS_SCOPE_QUEUE = 1,
};
enum {
NETDEV_A_DEV_IFINDEX = 1,
NETDEV_A_DEV_PAD,
NETDEV_A_DEV_XDP_FEATURES,
NETDEV_A_DEV_XDP_ZC_MAX_SEGS,
NETDEV_A_DEV_XDP_RX_METADATA_FEATURES,
NETDEV_A_DEV_XSK_FEATURES,
__NETDEV_A_DEV_MAX,
NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)
};
enum {
NETDEV_A_PAGE_POOL_ID = 1,
NETDEV_A_PAGE_POOL_IFINDEX,
NETDEV_A_PAGE_POOL_NAPI_ID,
NETDEV_A_PAGE_POOL_INFLIGHT,
NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
NETDEV_A_PAGE_POOL_DETACH_TIME,
__NETDEV_A_PAGE_POOL_MAX,
NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
};
enum {
NETDEV_A_PAGE_POOL_STATS_INFO = 1,
NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST = 8,
NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW,
NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER,
NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY,
NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL,
NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING_FULL,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_RELEASED_REFCNT,
__NETDEV_A_PAGE_POOL_STATS_MAX,
NETDEV_A_PAGE_POOL_STATS_MAX = (__NETDEV_A_PAGE_POOL_STATS_MAX - 1)
};
enum {
NETDEV_A_NAPI_IFINDEX = 1,
NETDEV_A_NAPI_ID,
NETDEV_A_NAPI_IRQ,
NETDEV_A_NAPI_PID,
__NETDEV_A_NAPI_MAX,
NETDEV_A_NAPI_MAX = (__NETDEV_A_NAPI_MAX - 1)
};
enum {
NETDEV_A_QUEUE_ID = 1,
NETDEV_A_QUEUE_IFINDEX,
NETDEV_A_QUEUE_TYPE,
NETDEV_A_QUEUE_NAPI_ID,
__NETDEV_A_QUEUE_MAX,
NETDEV_A_QUEUE_MAX = (__NETDEV_A_QUEUE_MAX - 1)
};
enum {
NETDEV_A_QSTATS_IFINDEX = 1,
NETDEV_A_QSTATS_QUEUE_TYPE,
NETDEV_A_QSTATS_QUEUE_ID,
NETDEV_A_QSTATS_SCOPE,
NETDEV_A_QSTATS_RX_PACKETS = 8,
NETDEV_A_QSTATS_RX_BYTES,
NETDEV_A_QSTATS_TX_PACKETS,
NETDEV_A_QSTATS_TX_BYTES,
NETDEV_A_QSTATS_RX_ALLOC_FAIL,
__NETDEV_A_QSTATS_MAX,
NETDEV_A_QSTATS_MAX = (__NETDEV_A_QSTATS_MAX - 1)
};
enum {
NETDEV_CMD_DEV_GET = 1,
NETDEV_CMD_DEV_ADD_NTF,
NETDEV_CMD_DEV_DEL_NTF,
NETDEV_CMD_DEV_CHANGE_NTF,
NETDEV_CMD_PAGE_POOL_GET,
NETDEV_CMD_PAGE_POOL_ADD_NTF,
NETDEV_CMD_PAGE_POOL_DEL_NTF,
NETDEV_CMD_PAGE_POOL_CHANGE_NTF,
NETDEV_CMD_PAGE_POOL_STATS_GET,
NETDEV_CMD_QUEUE_GET,
NETDEV_CMD_NAPI_GET,
NETDEV_CMD_QSTATS_GET,
__NETDEV_CMD_MAX,
NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
};
#define NETDEV_MCGRP_MGMT "mgmt"
#define NETDEV_MCGRP_PAGE_POOL "page-pool"
#endif /* _UAPI_LINUX_NETDEV_H */

View File

@@ -0,0 +1,43 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef _UAPI_LINUX_OPENAT2_H
#define _UAPI_LINUX_OPENAT2_H
#include <linux/types.h>
/*
* Arguments for how openat2(2) should open the target path. If only @flags and
* @mode are non-zero, then openat2(2) operates very similarly to openat(2).
*
* However, unlike openat(2), unknown or invalid bits in @flags result in
* -EINVAL rather than being silently ignored. @mode must be zero unless one of
* {O_CREAT, O_TMPFILE} are set.
*
* @flags: O_* flags.
* @mode: O_CREAT/O_TMPFILE file mode.
* @resolve: RESOLVE_* flags.
*/
struct open_how {
__u64 flags;
__u64 mode;
__u64 resolve;
};
/* how->resolve flags for openat2(2). */
#define RESOLVE_NO_XDEV 0x01 /* Block mount-point crossings
(includes bind-mounts). */
#define RESOLVE_NO_MAGICLINKS 0x02 /* Block traversal through procfs-style
"magic-links". */
#define RESOLVE_NO_SYMLINKS 0x04 /* Block traversal through all symlinks
(implies OEXT_NO_MAGICLINKS) */
#define RESOLVE_BENEATH 0x08 /* Block "lexical" trickery like
"..", symlinks, and absolute
paths which escape the dirfd. */
#define RESOLVE_IN_ROOT 0x10 /* Make all jumps to "/" and ".."
be scoped inside the dirfd
(similar to chroot(2)). */
#define RESOLVE_CACHED 0x20 /* Only complete if resolution can be
completed through cached lookup. May
return -EAGAIN if that's not
possible. */
#endif /* _UAPI_LINUX_OPENAT2_H */

View File

@@ -164,8 +164,6 @@ enum perf_event_sample_format {
PERF_SAMPLE_WEIGHT_STRUCT = 1U << 24,
PERF_SAMPLE_MAX = 1U << 25, /* non-ABI */
__PERF_SAMPLE_CALLCHAIN_EARLY = 1ULL << 63, /* non-ABI; internal use */
};
#define PERF_SAMPLE_WEIGHT_TYPE (PERF_SAMPLE_WEIGHT | PERF_SAMPLE_WEIGHT_STRUCT)
@@ -204,6 +202,10 @@ enum perf_branch_sample_type_shift {
PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT = 17, /* save low level index of raw branch records */
PERF_SAMPLE_BRANCH_PRIV_SAVE_SHIFT = 18, /* save privilege mode */
PERF_SAMPLE_BRANCH_COUNTERS_SHIFT = 19, /* save occurrences of events on a branch */
PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
};
@@ -233,6 +235,10 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_HW_INDEX = 1U << PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT,
PERF_SAMPLE_BRANCH_PRIV_SAVE = 1U << PERF_SAMPLE_BRANCH_PRIV_SAVE_SHIFT,
PERF_SAMPLE_BRANCH_COUNTERS = 1U << PERF_SAMPLE_BRANCH_COUNTERS_SHIFT,
PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
};
@@ -253,9 +259,48 @@ enum {
PERF_BR_COND_RET = 10, /* conditional function return */
PERF_BR_ERET = 11, /* exception return */
PERF_BR_IRQ = 12, /* irq */
PERF_BR_SERROR = 13, /* system error */
PERF_BR_NO_TX = 14, /* not in transaction */
PERF_BR_EXTEND_ABI = 15, /* extend ABI */
PERF_BR_MAX,
};
/*
* Common branch speculation outcome classification
*/
enum {
PERF_BR_SPEC_NA = 0, /* Not available */
PERF_BR_SPEC_WRONG_PATH = 1, /* Speculative but on wrong path */
PERF_BR_NON_SPEC_CORRECT_PATH = 2, /* Non-speculative but on correct path */
PERF_BR_SPEC_CORRECT_PATH = 3, /* Speculative and on correct path */
PERF_BR_SPEC_MAX,
};
enum {
PERF_BR_NEW_FAULT_ALGN = 0, /* Alignment fault */
PERF_BR_NEW_FAULT_DATA = 1, /* Data fault */
PERF_BR_NEW_FAULT_INST = 2, /* Inst fault */
PERF_BR_NEW_ARCH_1 = 3, /* Architecture specific */
PERF_BR_NEW_ARCH_2 = 4, /* Architecture specific */
PERF_BR_NEW_ARCH_3 = 5, /* Architecture specific */
PERF_BR_NEW_ARCH_4 = 6, /* Architecture specific */
PERF_BR_NEW_ARCH_5 = 7, /* Architecture specific */
PERF_BR_NEW_MAX,
};
enum {
PERF_BR_PRIV_UNKNOWN = 0,
PERF_BR_PRIV_USER = 1,
PERF_BR_PRIV_KERNEL = 2,
PERF_BR_PRIV_HV = 3,
};
#define PERF_BR_ARM64_FIQ PERF_BR_NEW_ARCH_1
#define PERF_BR_ARM64_DEBUG_HALT PERF_BR_NEW_ARCH_2
#define PERF_BR_ARM64_DEBUG_EXIT PERF_BR_NEW_ARCH_3
#define PERF_BR_ARM64_DEBUG_INST PERF_BR_NEW_ARCH_4
#define PERF_BR_ARM64_DEBUG_DATA PERF_BR_NEW_ARCH_5
#define PERF_SAMPLE_BRANCH_PLM_ALL \
(PERF_SAMPLE_BRANCH_USER|\
PERF_SAMPLE_BRANCH_KERNEL|\
@@ -333,6 +378,7 @@ enum perf_event_read_format {
#define PERF_ATTR_SIZE_VER5 112 /* add: aux_watermark */
#define PERF_ATTR_SIZE_VER6 120 /* add: aux_sample_size */
#define PERF_ATTR_SIZE_VER7 128 /* add: sig_data */
#define PERF_ATTR_SIZE_VER8 136 /* add: config3 */
/*
* Hardware event_id to monitor via a performance monitoring event:
@@ -474,6 +520,8 @@ struct perf_event_attr {
* truncated accordingly on 32 bit architectures.
*/
__u64 sig_data;
__u64 config3; /* extension of config2 */
};
/*
@@ -938,6 +986,12 @@ enum perf_event_type {
* { u64 nr;
* { u64 hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
* { u64 from, to, flags } lbr[nr];
* #
* # The format of the counters is decided by the
* # "branch_counter_nr" and "branch_counter_width",
* # which are defined in the ABI.
* #
* { u64 counters; } cntr[nr] && PERF_SAMPLE_BRANCH_COUNTERS
* } && PERF_SAMPLE_BRANCH_STACK
*
* { u64 abi; # enum perf_sample_regs_abi
@@ -1295,7 +1349,10 @@ union perf_mem_data_src {
#define PERF_MEM_LVLNUM_L2 0x02 /* L2 */
#define PERF_MEM_LVLNUM_L3 0x03 /* L3 */
#define PERF_MEM_LVLNUM_L4 0x04 /* L4 */
/* 5-0xa available */
/* 5-0x7 available */
#define PERF_MEM_LVLNUM_UNC 0x08 /* Uncached */
#define PERF_MEM_LVLNUM_CXL 0x09 /* CXL */
#define PERF_MEM_LVLNUM_IO 0x0a /* I/O */
#define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
#define PERF_MEM_LVLNUM_LFB 0x0c /* LFB */
#define PERF_MEM_LVLNUM_RAM 0x0d /* RAM */
@@ -1313,7 +1370,7 @@ union perf_mem_data_src {
#define PERF_MEM_SNOOP_SHIFT 19
#define PERF_MEM_SNOOPX_FWD 0x01 /* forward */
/* 1 free */
#define PERF_MEM_SNOOPX_PEER 0x02 /* xfer from peer */
#define PERF_MEM_SNOOPX_SHIFT 38
/* locked instruction */
@@ -1363,6 +1420,7 @@ union perf_mem_data_src {
* abort: aborting a hardware transaction
* cycles: cycles from last branch (or 0 if not supported)
* type: branch type
* spec: branch speculation info (or 0 if not supported)
*/
struct perf_branch_entry {
__u64 from;
@@ -1373,9 +1431,15 @@ struct perf_branch_entry {
abort:1, /* transaction abort */
cycles:16, /* cycle count to last branch */
type:4, /* branch type */
reserved:40;
spec:2, /* branch speculation info */
new_type:4, /* additional branch type */
priv:3, /* privilege level */
reserved:31;
};
/* Size of used info bits in struct perf_branch_entry */
#define PERF_BRANCH_ENTRY_INFO_BITS_MAX 33
union perf_sample_weight {
__u64 full;
#if defined(__LITTLE_ENDIAN_BITFIELD)

View File

@@ -204,37 +204,6 @@ struct tc_u32_pcnt {
#define TC_U32_MAXDEPTH 8
/* RSVP filter */
enum {
TCA_RSVP_UNSPEC,
TCA_RSVP_CLASSID,
TCA_RSVP_DST,
TCA_RSVP_SRC,
TCA_RSVP_PINFO,
TCA_RSVP_POLICE,
TCA_RSVP_ACT,
__TCA_RSVP_MAX
};
#define TCA_RSVP_MAX (__TCA_RSVP_MAX - 1 )
struct tc_rsvp_gpi {
__u32 key;
__u32 mask;
int offset;
};
struct tc_rsvp_pinfo {
struct tc_rsvp_gpi dpi;
struct tc_rsvp_gpi spi;
__u8 protocol;
__u8 tunnelid;
__u8 tunnelhdr;
__u8 pad;
};
/* ROUTE filter */
enum {
@@ -265,22 +234,6 @@ enum {
#define TCA_FW_MAX (__TCA_FW_MAX - 1)
/* TC index filter */
enum {
TCA_TCINDEX_UNSPEC,
TCA_TCINDEX_HASH,
TCA_TCINDEX_MASK,
TCA_TCINDEX_SHIFT,
TCA_TCINDEX_FALL_THROUGH,
TCA_TCINDEX_CLASSID,
TCA_TCINDEX_POLICE,
TCA_TCINDEX_ACT,
__TCA_TCINDEX_MAX
};
#define TCA_TCINDEX_MAX (__TCA_TCINDEX_MAX - 1)
/* Flow filter */
enum {

View File

@@ -457,115 +457,6 @@ enum {
#define TCA_HFSC_MAX (__TCA_HFSC_MAX - 1)
/* CBQ section */
#define TC_CBQ_MAXPRIO 8
#define TC_CBQ_MAXLEVEL 8
#define TC_CBQ_DEF_EWMA 5
struct tc_cbq_lssopt {
unsigned char change;
unsigned char flags;
#define TCF_CBQ_LSS_BOUNDED 1
#define TCF_CBQ_LSS_ISOLATED 2
unsigned char ewma_log;
unsigned char level;
#define TCF_CBQ_LSS_FLAGS 1
#define TCF_CBQ_LSS_EWMA 2
#define TCF_CBQ_LSS_MAXIDLE 4
#define TCF_CBQ_LSS_MINIDLE 8
#define TCF_CBQ_LSS_OFFTIME 0x10
#define TCF_CBQ_LSS_AVPKT 0x20
__u32 maxidle;
__u32 minidle;
__u32 offtime;
__u32 avpkt;
};
struct tc_cbq_wrropt {
unsigned char flags;
unsigned char priority;
unsigned char cpriority;
unsigned char __reserved;
__u32 allot;
__u32 weight;
};
struct tc_cbq_ovl {
unsigned char strategy;
#define TC_CBQ_OVL_CLASSIC 0
#define TC_CBQ_OVL_DELAY 1
#define TC_CBQ_OVL_LOWPRIO 2
#define TC_CBQ_OVL_DROP 3
#define TC_CBQ_OVL_RCLASSIC 4
unsigned char priority2;
__u16 pad;
__u32 penalty;
};
struct tc_cbq_police {
unsigned char police;
unsigned char __res1;
unsigned short __res2;
};
struct tc_cbq_fopt {
__u32 split;
__u32 defmap;
__u32 defchange;
};
struct tc_cbq_xstats {
__u32 borrows;
__u32 overactions;
__s32 avgidle;
__s32 undertime;
};
enum {
TCA_CBQ_UNSPEC,
TCA_CBQ_LSSOPT,
TCA_CBQ_WRROPT,
TCA_CBQ_FOPT,
TCA_CBQ_OVL_STRATEGY,
TCA_CBQ_RATE,
TCA_CBQ_RTAB,
TCA_CBQ_POLICE,
__TCA_CBQ_MAX,
};
#define TCA_CBQ_MAX (__TCA_CBQ_MAX - 1)
/* dsmark section */
enum {
TCA_DSMARK_UNSPEC,
TCA_DSMARK_INDICES,
TCA_DSMARK_DEFAULT_INDEX,
TCA_DSMARK_SET_TC_INDEX,
TCA_DSMARK_MASK,
TCA_DSMARK_VALUE,
__TCA_DSMARK_MAX,
};
#define TCA_DSMARK_MAX (__TCA_DSMARK_MAX - 1)
/* ATM section */
enum {
TCA_ATM_UNSPEC,
TCA_ATM_FD, /* file/socket descriptor */
TCA_ATM_PTR, /* pointer to descriptor - later */
TCA_ATM_HDR, /* LL header */
TCA_ATM_EXCESS, /* excess traffic class (0 for CLP) */
TCA_ATM_ADDR, /* PVC address (for output only) */
TCA_ATM_STATE, /* VC state (ATM_VS_*; for output only) */
__TCA_ATM_MAX,
};
#define TCA_ATM_MAX (__TCA_ATM_MAX - 1)
/* Network emulator */
enum {

View File

@@ -41,14 +41,14 @@ fi
# due to https://bugs.gentoo.org/794601) so let's just point the script to
# commits referring to versions of libelf that actually can be built
rm -rf elfutils
git clone git://sourceware.org/git/elfutils.git
git clone https://sourceware.org/git/elfutils.git
(
cd elfutils
git checkout 83251d4091241acddbdcf16f814e3bc6ef3df49a
git checkout 67a187d4c1790058fc7fd218317851cb68bb087c
git log --oneline -1
# ASan isn't compatible with -Wl,--no-undefined: https://github.com/google/sanitizers/issues/380
find -name Makefile.am | xargs sed -i 's/,--no-undefined//'
sed -i 's/^\(NO_UNDEFINED=\).*/\1/' configure.ac
# ASan isn't compatible with -Wl,-z,defs either:
# https://clang.llvm.org/docs/AddressSanitizer.html#usage
@@ -62,6 +62,7 @@ fi
autoreconf -i -f
if ! ./configure --enable-maintainer-mode --disable-debuginfod --disable-libdebuginfod \
--disable-demangler --without-bzlib --without-lzma --without-zstd \
CC="$CC" CFLAGS="-Wno-error $CFLAGS" CXX="$CXX" CXXFLAGS="-Wno-error $CXXFLAGS" LDFLAGS="$CFLAGS"; then
cat config.log
exit 1

View File

@@ -42,8 +42,11 @@ PATH_MAP=( \
[tools/include/uapi/linux/bpf_common.h]=include/uapi/linux/bpf_common.h \
[tools/include/uapi/linux/bpf.h]=include/uapi/linux/bpf.h \
[tools/include/uapi/linux/btf.h]=include/uapi/linux/btf.h \
[tools/include/uapi/linux/fcntl.h]=include/uapi/linux/fcntl.h \
[tools/include/uapi/linux/openat2.h]=include/uapi/linux/openat2.h \
[tools/include/uapi/linux/if_link.h]=include/uapi/linux/if_link.h \
[tools/include/uapi/linux/if_xdp.h]=include/uapi/linux/if_xdp.h \
[tools/include/uapi/linux/netdev.h]=include/uapi/linux/netdev.h \
[tools/include/uapi/linux/netlink.h]=include/uapi/linux/netlink.h \
[tools/include/uapi/linux/pkt_cls.h]=include/uapi/linux/pkt_cls.h \
[tools/include/uapi/linux/pkt_sched.h]=include/uapi/linux/pkt_sched.h \
@@ -51,8 +54,8 @@ PATH_MAP=( \
[Documentation/bpf/libbpf]=docs \
)
LIBBPF_PATHS="${!PATH_MAP[@]} :^tools/lib/bpf/Makefile :^tools/lib/bpf/Build :^tools/lib/bpf/.gitignore :^tools/include/tools/libc_compat.h"
LIBBPF_VIEW_PATHS="${PATH_MAP[@]}"
LIBBPF_PATHS=("${!PATH_MAP[@]}" ":^tools/lib/bpf/Makefile" ":^tools/lib/bpf/Build" ":^tools/lib/bpf/.gitignore" ":^tools/include/tools/libc_compat.h")
LIBBPF_VIEW_PATHS=("${PATH_MAP[@]}")
LIBBPF_VIEW_EXCLUDE_REGEX='^src/(Makefile|Build|test_libbpf\.c|bpf_helper_defs\.h|\.gitignore)$|^docs/(\.gitignore|api\.rst|conf\.py)$|^docs/sphinx/.*'
LINUX_VIEW_EXCLUDE_REGEX='^include/tools/libc_compat.h$'
@@ -85,7 +88,9 @@ commit_desc()
# $2 - paths filter
commit_signature()
{
git show --pretty='("%s")|%aI|%b' --shortstat $1 -- ${2-.} | tr '\n' '|'
local ref=$1
shift
git show --pretty='("%s")|%aI|%b' --shortstat $ref -- "${@-.}" | tr '\n' '|'
}
# Cherry-pick commits touching libbpf-related files
@@ -104,7 +109,7 @@ cherry_pick_commits()
local libbpf_conflict_cnt
local desc
new_commits=$(git rev-list --no-merges --topo-order --reverse ${baseline_tag}..${tip_tag} ${LIBBPF_PATHS[@]})
new_commits=$(git rev-list --no-merges --topo-order --reverse ${baseline_tag}..${tip_tag} -- "${LIBBPF_PATHS[@]}")
for new_commit in ${new_commits}; do
desc="$(commit_desc ${new_commit})"
signature="$(commit_signature ${new_commit} "${LIBBPF_PATHS[@]}")"
@@ -138,7 +143,7 @@ cherry_pick_commits()
echo "Picking '${desc}'..."
if ! git cherry-pick ${new_commit} &>/dev/null; then
echo "Warning! Cherry-picking '${desc} failed, checking if it's non-libbpf files causing problems..."
libbpf_conflict_cnt=$(git diff --name-only --diff-filter=U -- ${LIBBPF_PATHS[@]} | wc -l)
libbpf_conflict_cnt=$(git diff --name-only --diff-filter=U -- "${LIBBPF_PATHS[@]}" | wc -l)
conflict_cnt=$(git diff --name-only | wc -l)
prompt_resolution=1
@@ -257,7 +262,7 @@ if ((${COMMIT_CNT} <= 0)); then
fi
# Exclude baseline commit and generate nice cover letter with summary
git format-patch ${SQUASH_BASE_TAG}..${SQUASH_TIP_TAG} --cover-letter -o ${TMP_DIR}/patches
git format-patch --no-signature ${SQUASH_BASE_TAG}..${SQUASH_TIP_TAG} --cover-letter -o ${TMP_DIR}/patches
# Now is time to re-apply libbpf-related linux patches to libbpf repo
cd_to ${LIBBPF_REPO}
@@ -284,7 +289,7 @@ cd_to ${LIBBPF_REPO}
helpers_changes=$(git status --porcelain src/bpf_helper_defs.h | wc -l)
if ((${helpers_changes} == 1)); then
git add src/bpf_helper_defs.h
git commit -m "sync: auto-generate latest BPF helpers
git commit -s -m "sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
" -- src/bpf_helper_defs.h
@@ -306,7 +311,7 @@ Baseline bpf-next commit: ${BASELINE_COMMIT}\n\
Checkpoint bpf-next commit: ${TIP_COMMIT}\n\
Baseline bpf commit: ${BPF_BASELINE_COMMIT}\n\
Checkpoint bpf commit: ${BPF_TIP_COMMIT}/" | \
git commit --file=-
git commit -s --file=-
echo "SUCCESS! ${COMMIT_CNT} commits synced."
@@ -316,10 +321,10 @@ cd_to ${LINUX_REPO}
git checkout -b ${VIEW_TAG} ${TIP_COMMIT}
FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch -f --tree-filter "${LIBBPF_TREE_FILTER}" ${VIEW_TAG}^..${VIEW_TAG}
FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch -f --subdirectory-filter __libbpf ${VIEW_TAG}^..${VIEW_TAG}
git ls-files -- ${LIBBPF_VIEW_PATHS[@]} | grep -v -E "${LINUX_VIEW_EXCLUDE_REGEX}" > ${TMP_DIR}/linux-view.ls
git ls-files -- "${LIBBPF_VIEW_PATHS[@]}" | grep -v -E "${LINUX_VIEW_EXCLUDE_REGEX}" > ${TMP_DIR}/linux-view.ls
cd_to ${LIBBPF_REPO}
git ls-files -- ${LIBBPF_VIEW_PATHS[@]} | grep -v -E "${LIBBPF_VIEW_EXCLUDE_REGEX}" > ${TMP_DIR}/github-view.ls
git ls-files -- "${LIBBPF_VIEW_PATHS[@]}" | grep -v -E "${LIBBPF_VIEW_EXCLUDE_REGEX}" > ${TMP_DIR}/github-view.ls
echo "Comparing list of files..."
diff -u ${TMP_DIR}/linux-view.ls ${TMP_DIR}/github-view.ls

View File

@@ -9,7 +9,7 @@ else
endif
LIBBPF_MAJOR_VERSION := 1
LIBBPF_MINOR_VERSION := 0
LIBBPF_MINOR_VERSION := 4
LIBBPF_PATCH_VERSION := 0
LIBBPF_VERSION := $(LIBBPF_MAJOR_VERSION).$(LIBBPF_MINOR_VERSION).$(LIBBPF_PATCH_VERSION)
LIBBPF_MAJMIN_VERSION := $(LIBBPF_MAJOR_VERSION).$(LIBBPF_MINOR_VERSION).0
@@ -35,7 +35,10 @@ ALL_CFLAGS := $(INCLUDES)
SHARED_CFLAGS += -fPIC -fvisibility=hidden -DSHARED
CFLAGS ?= -g -O2 -Werror -Wall -std=gnu89
ALL_CFLAGS += $(CFLAGS) -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 $(EXTRA_CFLAGS)
ALL_CFLAGS += $(CFLAGS) \
-D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 \
-Wno-unknown-warning-option -Wno-format-overflow \
$(EXTRA_CFLAGS)
ALL_LDFLAGS += $(LDFLAGS) $(EXTRA_LDFLAGS)
ifdef NO_PKG_CONFIG
@@ -52,7 +55,7 @@ STATIC_OBJDIR := $(OBJDIR)/staticobjs
OBJS := bpf.o btf.o libbpf.o libbpf_errno.o netlink.o \
nlattr.o str_error.o libbpf_probes.o bpf_prog_linfo.o \
btf_dump.o hashmap.o ringbuf.o strset.o linker.o gen_loader.o \
relo_core.o usdt.o
relo_core.o usdt.o zip.o elf.o features.o
SHARED_OBJS := $(addprefix $(SHARED_OBJDIR)/,$(OBJS))
STATIC_OBJS := $(addprefix $(STATIC_OBJDIR)/,$(OBJS))
@@ -77,7 +80,8 @@ INSTALL = install
DESTDIR ?=
ifeq ($(filter-out %64 %64be %64eb %64le %64el s390x, $(shell uname -m)),)
HOSTARCH = $(firstword $(subst -, ,$(shell $(CC) -dumpmachine)))
ifeq ($(filter-out %64 %64be %64eb %64le %64el s390x, $(HOSTARCH)),)
LIBSUBDIR := lib64
else
LIBSUBDIR := lib

326
src/bpf.c
View File

@@ -103,7 +103,7 @@ int sys_bpf_prog_load(union bpf_attr *attr, unsigned int size, int attempts)
* [0] https://lore.kernel.org/bpf/20201201215900.3569844-1-guro@fb.com/
* [1] d05512618056 ("bpf: Add bpf_ktime_get_coarse_ns helper")
*/
int probe_memcg_account(void)
int probe_memcg_account(int token_fd)
{
const size_t attr_sz = offsetofend(union bpf_attr, attach_btf_obj_fd);
struct bpf_insn insns[] = {
@@ -120,6 +120,9 @@ int probe_memcg_account(void)
attr.insns = ptr_to_u64(insns);
attr.insn_cnt = insn_cnt;
attr.license = ptr_to_u64("GPL");
attr.prog_token_fd = token_fd;
if (token_fd)
attr.prog_flags |= BPF_F_TOKEN_FD;
prog_fd = sys_bpf_fd(BPF_PROG_LOAD, &attr, attr_sz);
if (prog_fd >= 0) {
@@ -146,7 +149,7 @@ int bump_rlimit_memlock(void)
struct rlimit rlim;
/* if kernel supports memcg-based accounting, skip bumping RLIMIT_MEMLOCK */
if (memlock_bumped || kernel_supports(NULL, FEAT_MEMCG_ACCOUNT))
if (memlock_bumped || feat_supported(NULL, FEAT_MEMCG_ACCOUNT))
return 0;
memlock_bumped = true;
@@ -169,7 +172,7 @@ int bpf_map_create(enum bpf_map_type map_type,
__u32 max_entries,
const struct bpf_map_create_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, map_extra);
const size_t attr_sz = offsetofend(union bpf_attr, map_token_fd);
union bpf_attr attr;
int fd;
@@ -181,7 +184,7 @@ int bpf_map_create(enum bpf_map_type map_type,
return libbpf_err(-EINVAL);
attr.map_type = map_type;
if (map_name && kernel_supports(NULL, FEAT_PROG_NAME))
if (map_name && feat_supported(NULL, FEAT_PROG_NAME))
libbpf_strlcpy(attr.map_name, map_name, sizeof(attr.map_name));
attr.key_size = key_size;
attr.value_size = value_size;
@@ -191,6 +194,7 @@ int bpf_map_create(enum bpf_map_type map_type,
attr.btf_key_type_id = OPTS_GET(opts, btf_key_type_id, 0);
attr.btf_value_type_id = OPTS_GET(opts, btf_value_type_id, 0);
attr.btf_vmlinux_value_type_id = OPTS_GET(opts, btf_vmlinux_value_type_id, 0);
attr.value_type_btf_obj_fd = OPTS_GET(opts, value_type_btf_obj_fd, 0);
attr.inner_map_fd = OPTS_GET(opts, inner_map_fd, 0);
attr.map_flags = OPTS_GET(opts, map_flags, 0);
@@ -198,6 +202,8 @@ int bpf_map_create(enum bpf_map_type map_type,
attr.numa_node = OPTS_GET(opts, numa_node, 0);
attr.map_ifindex = OPTS_GET(opts, map_ifindex, 0);
attr.map_token_fd = OPTS_GET(opts, token_fd, 0);
fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz);
return libbpf_err_errno(fd);
}
@@ -230,9 +236,9 @@ alloc_zero_tailing_info(const void *orecord, __u32 cnt,
int bpf_prog_load(enum bpf_prog_type prog_type,
const char *prog_name, const char *license,
const struct bpf_insn *insns, size_t insn_cnt,
const struct bpf_prog_load_opts *opts)
struct bpf_prog_load_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, fd_array);
const size_t attr_sz = offsetofend(union bpf_attr, prog_token_fd);
void *finfo = NULL, *linfo = NULL;
const char *func_info, *line_info;
__u32 log_size, log_level, attach_prog_fd, attach_btf_obj_fd;
@@ -261,8 +267,9 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
attr.prog_flags = OPTS_GET(opts, prog_flags, 0);
attr.prog_ifindex = OPTS_GET(opts, prog_ifindex, 0);
attr.kern_version = OPTS_GET(opts, kern_version, 0);
attr.prog_token_fd = OPTS_GET(opts, token_fd, 0);
if (prog_name && kernel_supports(NULL, FEAT_PROG_NAME))
if (prog_name && feat_supported(NULL, FEAT_PROG_NAME))
libbpf_strlcpy(attr.prog_name, prog_name, sizeof(attr.prog_name));
attr.license = ptr_to_u64(license);
@@ -290,10 +297,6 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
if (!!log_buf != !!log_size)
return libbpf_err(-EINVAL);
if (log_level > (4 | 2 | 1))
return libbpf_err(-EINVAL);
if (log_level && !log_buf)
return libbpf_err(-EINVAL);
func_info_rec_size = OPTS_GET(opts, func_info_rec_size, 0);
func_info = OPTS_GET(opts, func_info, NULL);
@@ -316,6 +319,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
}
fd = sys_bpf_prog_load(&attr, attr_sz, attempts);
OPTS_SET(opts, log_true_size, attr.log_true_size);
if (fd >= 0)
return fd;
@@ -356,6 +360,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
}
fd = sys_bpf_prog_load(&attr, attr_sz, attempts);
OPTS_SET(opts, log_true_size, attr.log_true_size);
if (fd >= 0)
goto done;
}
@@ -370,6 +375,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
attr.log_level = 1;
fd = sys_bpf_prog_load(&attr, attr_sz, attempts);
OPTS_SET(opts, log_true_size, attr.log_true_size);
}
done:
/* free() doesn't affect errno, so we don't need to restore it */
@@ -573,20 +579,30 @@ int bpf_map_update_batch(int fd, const void *keys, const void *values, __u32 *co
(void *)keys, (void *)values, count, opts);
}
int bpf_obj_pin(int fd, const char *pathname)
int bpf_obj_pin_opts(int fd, const char *pathname, const struct bpf_obj_pin_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, file_flags);
const size_t attr_sz = offsetofend(union bpf_attr, path_fd);
union bpf_attr attr;
int ret;
if (!OPTS_VALID(opts, bpf_obj_pin_opts))
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.path_fd = OPTS_GET(opts, path_fd, 0);
attr.pathname = ptr_to_u64((void *)pathname);
attr.file_flags = OPTS_GET(opts, file_flags, 0);
attr.bpf_fd = fd;
ret = sys_bpf(BPF_OBJ_PIN, &attr, attr_sz);
return libbpf_err_errno(ret);
}
int bpf_obj_pin(int fd, const char *pathname)
{
return bpf_obj_pin_opts(fd, pathname, NULL);
}
int bpf_obj_get(const char *pathname)
{
return bpf_obj_get_opts(pathname, NULL);
@@ -594,7 +610,7 @@ int bpf_obj_get(const char *pathname)
int bpf_obj_get_opts(const char *pathname, const struct bpf_obj_get_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, file_flags);
const size_t attr_sz = offsetofend(union bpf_attr, path_fd);
union bpf_attr attr;
int fd;
@@ -602,6 +618,7 @@ int bpf_obj_get_opts(const char *pathname, const struct bpf_obj_get_opts *opts)
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.path_fd = OPTS_GET(opts, path_fd, 0);
attr.pathname = ptr_to_u64((void *)pathname);
attr.file_flags = OPTS_GET(opts, file_flags, 0);
@@ -619,55 +636,89 @@ int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type,
return bpf_prog_attach_opts(prog_fd, target_fd, type, &opts);
}
int bpf_prog_attach_opts(int prog_fd, int target_fd,
enum bpf_attach_type type,
const struct bpf_prog_attach_opts *opts)
int bpf_prog_attach_opts(int prog_fd, int target, enum bpf_attach_type type,
const struct bpf_prog_attach_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd);
const size_t attr_sz = offsetofend(union bpf_attr, expected_revision);
__u32 relative_id, flags;
int ret, relative_fd;
union bpf_attr attr;
int ret;
if (!OPTS_VALID(opts, bpf_prog_attach_opts))
return libbpf_err(-EINVAL);
relative_id = OPTS_GET(opts, relative_id, 0);
relative_fd = OPTS_GET(opts, relative_fd, 0);
flags = OPTS_GET(opts, flags, 0);
/* validate we don't have unexpected combinations of non-zero fields */
if (relative_fd && relative_id)
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.target_fd = target_fd;
attr.attach_bpf_fd = prog_fd;
attr.attach_type = type;
attr.attach_flags = OPTS_GET(opts, flags, 0);
attr.replace_bpf_fd = OPTS_GET(opts, replace_prog_fd, 0);
attr.target_fd = target;
attr.attach_bpf_fd = prog_fd;
attr.attach_type = type;
attr.replace_bpf_fd = OPTS_GET(opts, replace_fd, 0);
attr.expected_revision = OPTS_GET(opts, expected_revision, 0);
if (relative_id) {
attr.attach_flags = flags | BPF_F_ID;
attr.relative_id = relative_id;
} else {
attr.attach_flags = flags;
attr.relative_fd = relative_fd;
}
ret = sys_bpf(BPF_PROG_ATTACH, &attr, attr_sz);
return libbpf_err_errno(ret);
}
int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
int bpf_prog_detach_opts(int prog_fd, int target, enum bpf_attach_type type,
const struct bpf_prog_detach_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd);
const size_t attr_sz = offsetofend(union bpf_attr, expected_revision);
__u32 relative_id, flags;
int ret, relative_fd;
union bpf_attr attr;
int ret;
if (!OPTS_VALID(opts, bpf_prog_detach_opts))
return libbpf_err(-EINVAL);
relative_id = OPTS_GET(opts, relative_id, 0);
relative_fd = OPTS_GET(opts, relative_fd, 0);
flags = OPTS_GET(opts, flags, 0);
/* validate we don't have unexpected combinations of non-zero fields */
if (relative_fd && relative_id)
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.target_fd = target_fd;
attr.attach_type = type;
attr.target_fd = target;
attr.attach_bpf_fd = prog_fd;
attr.attach_type = type;
attr.expected_revision = OPTS_GET(opts, expected_revision, 0);
if (relative_id) {
attr.attach_flags = flags | BPF_F_ID;
attr.relative_id = relative_id;
} else {
attr.attach_flags = flags;
attr.relative_fd = relative_fd;
}
ret = sys_bpf(BPF_PROG_DETACH, &attr, attr_sz);
return libbpf_err_errno(ret);
}
int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
{
return bpf_prog_detach_opts(0, target_fd, type, NULL);
}
int bpf_prog_detach2(int prog_fd, int target_fd, enum bpf_attach_type type)
{
const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd);
union bpf_attr attr;
int ret;
memset(&attr, 0, attr_sz);
attr.target_fd = target_fd;
attr.attach_bpf_fd = prog_fd;
attr.attach_type = type;
ret = sys_bpf(BPF_PROG_DETACH, &attr, attr_sz);
return libbpf_err_errno(ret);
return bpf_prog_detach_opts(prog_fd, target_fd, type, NULL);
}
int bpf_link_create(int prog_fd, int target_fd,
@@ -675,9 +726,9 @@ int bpf_link_create(int prog_fd, int target_fd,
const struct bpf_link_create_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, link_create);
__u32 target_btf_id, iter_info_len;
__u32 target_btf_id, iter_info_len, relative_id;
int fd, err, relative_fd;
union bpf_attr attr;
int fd, err;
if (!OPTS_VALID(opts, bpf_link_create_opts))
return libbpf_err(-EINVAL);
@@ -723,6 +774,18 @@ int bpf_link_create(int prog_fd, int target_fd,
if (!OPTS_ZEROED(opts, kprobe_multi))
return libbpf_err(-EINVAL);
break;
case BPF_TRACE_UPROBE_MULTI:
attr.link_create.uprobe_multi.flags = OPTS_GET(opts, uprobe_multi.flags, 0);
attr.link_create.uprobe_multi.cnt = OPTS_GET(opts, uprobe_multi.cnt, 0);
attr.link_create.uprobe_multi.path = ptr_to_u64(OPTS_GET(opts, uprobe_multi.path, 0));
attr.link_create.uprobe_multi.offsets = ptr_to_u64(OPTS_GET(opts, uprobe_multi.offsets, 0));
attr.link_create.uprobe_multi.ref_ctr_offsets = ptr_to_u64(OPTS_GET(opts, uprobe_multi.ref_ctr_offsets, 0));
attr.link_create.uprobe_multi.cookies = ptr_to_u64(OPTS_GET(opts, uprobe_multi.cookies, 0));
attr.link_create.uprobe_multi.pid = OPTS_GET(opts, uprobe_multi.pid, 0);
if (!OPTS_ZEROED(opts, uprobe_multi))
return libbpf_err(-EINVAL);
break;
case BPF_TRACE_RAW_TP:
case BPF_TRACE_FENTRY:
case BPF_TRACE_FEXIT:
case BPF_MODIFY_RETURN:
@@ -731,6 +794,46 @@ int bpf_link_create(int prog_fd, int target_fd,
if (!OPTS_ZEROED(opts, tracing))
return libbpf_err(-EINVAL);
break;
case BPF_NETFILTER:
attr.link_create.netfilter.pf = OPTS_GET(opts, netfilter.pf, 0);
attr.link_create.netfilter.hooknum = OPTS_GET(opts, netfilter.hooknum, 0);
attr.link_create.netfilter.priority = OPTS_GET(opts, netfilter.priority, 0);
attr.link_create.netfilter.flags = OPTS_GET(opts, netfilter.flags, 0);
if (!OPTS_ZEROED(opts, netfilter))
return libbpf_err(-EINVAL);
break;
case BPF_TCX_INGRESS:
case BPF_TCX_EGRESS:
relative_fd = OPTS_GET(opts, tcx.relative_fd, 0);
relative_id = OPTS_GET(opts, tcx.relative_id, 0);
if (relative_fd && relative_id)
return libbpf_err(-EINVAL);
if (relative_id) {
attr.link_create.tcx.relative_id = relative_id;
attr.link_create.flags |= BPF_F_ID;
} else {
attr.link_create.tcx.relative_fd = relative_fd;
}
attr.link_create.tcx.expected_revision = OPTS_GET(opts, tcx.expected_revision, 0);
if (!OPTS_ZEROED(opts, tcx))
return libbpf_err(-EINVAL);
break;
case BPF_NETKIT_PRIMARY:
case BPF_NETKIT_PEER:
relative_fd = OPTS_GET(opts, netkit.relative_fd, 0);
relative_id = OPTS_GET(opts, netkit.relative_id, 0);
if (relative_fd && relative_id)
return libbpf_err(-EINVAL);
if (relative_id) {
attr.link_create.netkit.relative_id = relative_id;
attr.link_create.flags |= BPF_F_ID;
} else {
attr.link_create.netkit.relative_fd = relative_fd;
}
attr.link_create.netkit.expected_revision = OPTS_GET(opts, netkit.expected_revision, 0);
if (!OPTS_ZEROED(opts, netkit))
return libbpf_err(-EINVAL);
break;
default:
if (!OPTS_ZEROED(opts, flags))
return libbpf_err(-EINVAL);
@@ -794,11 +897,17 @@ int bpf_link_update(int link_fd, int new_prog_fd,
if (!OPTS_VALID(opts, bpf_link_update_opts))
return libbpf_err(-EINVAL);
if (OPTS_GET(opts, old_prog_fd, 0) && OPTS_GET(opts, old_map_fd, 0))
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.link_update.link_fd = link_fd;
attr.link_update.new_prog_fd = new_prog_fd;
attr.link_update.flags = OPTS_GET(opts, flags, 0);
attr.link_update.old_prog_fd = OPTS_GET(opts, old_prog_fd, 0);
if (OPTS_GET(opts, old_prog_fd, 0))
attr.link_update.old_prog_fd = OPTS_GET(opts, old_prog_fd, 0);
else if (OPTS_GET(opts, old_map_fd, 0))
attr.link_update.old_map_fd = OPTS_GET(opts, old_map_fd, 0);
ret = sys_bpf(BPF_LINK_UPDATE, &attr, attr_sz);
return libbpf_err_errno(ret);
@@ -817,8 +926,7 @@ int bpf_iter_create(int link_fd)
return libbpf_err_errno(fd);
}
int bpf_prog_query_opts(int target_fd,
enum bpf_attach_type type,
int bpf_prog_query_opts(int target, enum bpf_attach_type type,
struct bpf_prog_query_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, query);
@@ -829,18 +937,20 @@ int bpf_prog_query_opts(int target_fd,
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.query.target_fd = target_fd;
attr.query.attach_type = type;
attr.query.query_flags = OPTS_GET(opts, query_flags, 0);
attr.query.prog_cnt = OPTS_GET(opts, prog_cnt, 0);
attr.query.prog_ids = ptr_to_u64(OPTS_GET(opts, prog_ids, NULL));
attr.query.prog_attach_flags = ptr_to_u64(OPTS_GET(opts, prog_attach_flags, NULL));
attr.query.target_fd = target;
attr.query.attach_type = type;
attr.query.query_flags = OPTS_GET(opts, query_flags, 0);
attr.query.count = OPTS_GET(opts, count, 0);
attr.query.prog_ids = ptr_to_u64(OPTS_GET(opts, prog_ids, NULL));
attr.query.link_ids = ptr_to_u64(OPTS_GET(opts, link_ids, NULL));
attr.query.prog_attach_flags = ptr_to_u64(OPTS_GET(opts, prog_attach_flags, NULL));
attr.query.link_attach_flags = ptr_to_u64(OPTS_GET(opts, link_attach_flags, NULL));
ret = sys_bpf(BPF_PROG_QUERY, &attr, attr_sz);
OPTS_SET(opts, attach_flags, attr.query.attach_flags);
OPTS_SET(opts, prog_cnt, attr.query.prog_cnt);
OPTS_SET(opts, revision, attr.query.revision);
OPTS_SET(opts, count, attr.query.count);
return libbpf_err_errno(ret);
}
@@ -935,58 +1045,98 @@ int bpf_link_get_next_id(__u32 start_id, __u32 *next_id)
return bpf_obj_get_next_id(start_id, next_id, BPF_LINK_GET_NEXT_ID);
}
int bpf_prog_get_fd_by_id(__u32 id)
int bpf_prog_get_fd_by_id_opts(__u32 id,
const struct bpf_get_fd_by_id_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, open_flags);
union bpf_attr attr;
int fd;
if (!OPTS_VALID(opts, bpf_get_fd_by_id_opts))
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.prog_id = id;
attr.open_flags = OPTS_GET(opts, open_flags, 0);
fd = sys_bpf_fd(BPF_PROG_GET_FD_BY_ID, &attr, attr_sz);
return libbpf_err_errno(fd);
}
int bpf_map_get_fd_by_id(__u32 id)
int bpf_prog_get_fd_by_id(__u32 id)
{
return bpf_prog_get_fd_by_id_opts(id, NULL);
}
int bpf_map_get_fd_by_id_opts(__u32 id,
const struct bpf_get_fd_by_id_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, open_flags);
union bpf_attr attr;
int fd;
if (!OPTS_VALID(opts, bpf_get_fd_by_id_opts))
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.map_id = id;
attr.open_flags = OPTS_GET(opts, open_flags, 0);
fd = sys_bpf_fd(BPF_MAP_GET_FD_BY_ID, &attr, attr_sz);
return libbpf_err_errno(fd);
}
int bpf_btf_get_fd_by_id(__u32 id)
int bpf_map_get_fd_by_id(__u32 id)
{
return bpf_map_get_fd_by_id_opts(id, NULL);
}
int bpf_btf_get_fd_by_id_opts(__u32 id,
const struct bpf_get_fd_by_id_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, open_flags);
union bpf_attr attr;
int fd;
if (!OPTS_VALID(opts, bpf_get_fd_by_id_opts))
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.btf_id = id;
attr.open_flags = OPTS_GET(opts, open_flags, 0);
fd = sys_bpf_fd(BPF_BTF_GET_FD_BY_ID, &attr, attr_sz);
return libbpf_err_errno(fd);
}
int bpf_link_get_fd_by_id(__u32 id)
int bpf_btf_get_fd_by_id(__u32 id)
{
return bpf_btf_get_fd_by_id_opts(id, NULL);
}
int bpf_link_get_fd_by_id_opts(__u32 id,
const struct bpf_get_fd_by_id_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, open_flags);
union bpf_attr attr;
int fd;
if (!OPTS_VALID(opts, bpf_get_fd_by_id_opts))
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.link_id = id;
attr.open_flags = OPTS_GET(opts, open_flags, 0);
fd = sys_bpf_fd(BPF_LINK_GET_FD_BY_ID, &attr, attr_sz);
return libbpf_err_errno(fd);
}
int bpf_link_get_fd_by_id(__u32 id)
{
return bpf_link_get_fd_by_id_opts(id, NULL);
}
int bpf_obj_get_info_by_fd(int bpf_fd, void *info, __u32 *info_len)
{
const size_t attr_sz = offsetofend(union bpf_attr, info);
@@ -1004,23 +1154,54 @@ int bpf_obj_get_info_by_fd(int bpf_fd, void *info, __u32 *info_len)
return libbpf_err_errno(err);
}
int bpf_raw_tracepoint_open(const char *name, int prog_fd)
int bpf_prog_get_info_by_fd(int prog_fd, struct bpf_prog_info *info, __u32 *info_len)
{
return bpf_obj_get_info_by_fd(prog_fd, info, info_len);
}
int bpf_map_get_info_by_fd(int map_fd, struct bpf_map_info *info, __u32 *info_len)
{
return bpf_obj_get_info_by_fd(map_fd, info, info_len);
}
int bpf_btf_get_info_by_fd(int btf_fd, struct bpf_btf_info *info, __u32 *info_len)
{
return bpf_obj_get_info_by_fd(btf_fd, info, info_len);
}
int bpf_link_get_info_by_fd(int link_fd, struct bpf_link_info *info, __u32 *info_len)
{
return bpf_obj_get_info_by_fd(link_fd, info, info_len);
}
int bpf_raw_tracepoint_open_opts(int prog_fd, struct bpf_raw_tp_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, raw_tracepoint);
union bpf_attr attr;
int fd;
if (!OPTS_VALID(opts, bpf_raw_tp_opts))
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.raw_tracepoint.name = ptr_to_u64(name);
attr.raw_tracepoint.prog_fd = prog_fd;
attr.raw_tracepoint.name = ptr_to_u64(OPTS_GET(opts, tp_name, NULL));
attr.raw_tracepoint.cookie = OPTS_GET(opts, cookie, 0);
fd = sys_bpf_fd(BPF_RAW_TRACEPOINT_OPEN, &attr, attr_sz);
return libbpf_err_errno(fd);
}
int bpf_btf_load(const void *btf_data, size_t btf_size, const struct bpf_btf_load_opts *opts)
int bpf_raw_tracepoint_open(const char *name, int prog_fd)
{
const size_t attr_sz = offsetofend(union bpf_attr, btf_log_level);
LIBBPF_OPTS(bpf_raw_tp_opts, opts, .tp_name = name);
return bpf_raw_tracepoint_open_opts(prog_fd, &opts);
}
int bpf_btf_load(const void *btf_data, size_t btf_size, struct bpf_btf_load_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, btf_token_fd);
union bpf_attr attr;
char *log_buf;
size_t log_size;
@@ -1045,6 +1226,10 @@ int bpf_btf_load(const void *btf_data, size_t btf_size, const struct bpf_btf_loa
attr.btf = ptr_to_u64(btf_data);
attr.btf_size = btf_size;
attr.btf_flags = OPTS_GET(opts, btf_flags, 0);
attr.btf_token_fd = OPTS_GET(opts, token_fd, 0);
/* log_level == 0 and log_buf != NULL means "try loading without
* log_buf, but retry with log_buf and log_level=1 on error", which is
* consistent across low-level and high-level BTF and program loading
@@ -1063,6 +1248,8 @@ int bpf_btf_load(const void *btf_data, size_t btf_size, const struct bpf_btf_loa
attr.btf_log_level = 1;
fd = sys_bpf_fd(BPF_BTF_LOAD, &attr, attr_sz);
}
OPTS_SET(opts, log_true_size, attr.btf_log_true_size);
return libbpf_err_errno(fd);
}
@@ -1123,3 +1310,20 @@ int bpf_prog_bind_map(int prog_fd, int map_fd,
ret = sys_bpf(BPF_PROG_BIND_MAP, &attr, attr_sz);
return libbpf_err_errno(ret);
}
int bpf_token_create(int bpffs_fd, struct bpf_token_create_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, token_create);
union bpf_attr attr;
int fd;
if (!OPTS_VALID(opts, bpf_token_create_opts))
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.token_create.bpffs_fd = bpffs_fd;
attr.token_create.flags = OPTS_GET(opts, flags, 0);
fd = sys_bpf_fd(BPF_TOKEN_CREATE, &attr, attr_sz);
return libbpf_err_errno(fd);
}

329
src/bpf.h
View File

@@ -1,7 +1,7 @@
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
/*
* common eBPF ELF operations.
* Common BPF ELF operations.
*
* Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
* Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
@@ -35,7 +35,7 @@
extern "C" {
#endif
int libbpf_set_memlock_rlim(size_t memlock_bytes);
LIBBPF_API int libbpf_set_memlock_rlim(size_t memlock_bytes);
struct bpf_map_create_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
@@ -51,8 +51,12 @@ struct bpf_map_create_opts {
__u32 numa_node;
__u32 map_ifindex;
__s32 value_type_btf_obj_fd;
__u32 token_fd;
size_t :0;
};
#define bpf_map_create_opts__last_field map_ifindex
#define bpf_map_create_opts__last_field token_fd
LIBBPF_API int bpf_map_create(enum bpf_map_type map_type,
const char *map_name,
@@ -96,13 +100,21 @@ struct bpf_prog_load_opts {
__u32 log_level;
__u32 log_size;
char *log_buf;
/* output: actual total log contents size (including termintaing zero).
* It could be both larger than original log_size (if log was
* truncated), or smaller (if log buffer wasn't filled completely).
* If kernel doesn't support this feature, log_size is left unchanged.
*/
__u32 log_true_size;
__u32 token_fd;
size_t :0;
};
#define bpf_prog_load_opts__last_field log_buf
#define bpf_prog_load_opts__last_field token_fd
LIBBPF_API int bpf_prog_load(enum bpf_prog_type prog_type,
const char *prog_name, const char *license,
const struct bpf_insn *insns, size_t insn_cnt,
const struct bpf_prog_load_opts *opts);
struct bpf_prog_load_opts *opts);
/* Flags to direct loading requirements */
#define MAPS_RELAX_COMPAT 0x01
@@ -117,11 +129,21 @@ struct bpf_btf_load_opts {
char *log_buf;
__u32 log_level;
__u32 log_size;
/* output: actual total log contents size (including termintaing zero).
* It could be both larger than original log_size (if log was
* truncated), or smaller (if log buffer wasn't filled completely).
* If kernel doesn't support this feature, log_size is left unchanged.
*/
__u32 log_true_size;
__u32 btf_flags;
__u32 token_fd;
size_t :0;
};
#define bpf_btf_load_opts__last_field log_size
#define bpf_btf_load_opts__last_field token_fd
LIBBPF_API int bpf_btf_load(const void *btf_data, size_t btf_size,
const struct bpf_btf_load_opts *opts);
struct bpf_btf_load_opts *opts);
LIBBPF_API int bpf_map_update_elem(int fd, const void *key, const void *value,
__u64 flags);
@@ -168,10 +190,14 @@ LIBBPF_API int bpf_map_delete_batch(int fd, const void *keys,
/**
* @brief **bpf_map_lookup_batch()** allows for batch lookup of BPF map elements.
*
* The parameter *in_batch* is the address of the first element in the batch to read.
* *out_batch* is an output parameter that should be passed as *in_batch* to subsequent
* calls to **bpf_map_lookup_batch()**. NULL can be passed for *in_batch* to indicate
* that the batched lookup starts from the beginning of the map.
* The parameter *in_batch* is the address of the first element in the batch to
* read. *out_batch* is an output parameter that should be passed as *in_batch*
* to subsequent calls to **bpf_map_lookup_batch()**. NULL can be passed for
* *in_batch* to indicate that the batched lookup starts from the beginning of
* the map. Both *in_batch* and *out_batch* must point to memory large enough to
* hold a single key, except for maps of type **BPF_MAP_TYPE_{HASH, PERCPU_HASH,
* LRU_HASH, LRU_PERCPU_HASH}**, for which the memory size must be at
* least 4 bytes wide regardless of key size.
*
* The *keys* and *values* are output parameters which must point to memory large enough to
* hold *count* items based on the key and value size of the map *map_fd*. The *keys*
@@ -204,7 +230,10 @@ LIBBPF_API int bpf_map_lookup_batch(int fd, void *in_batch, void *out_batch,
*
* @param fd BPF map file descriptor
* @param in_batch address of the first element in batch to read, can pass NULL to
* get address of the first element in *out_batch*
* get address of the first element in *out_batch*. If not NULL, must be large
* enough to hold a key. For **BPF_MAP_TYPE_{HASH, PERCPU_HASH, LRU_HASH,
* LRU_PERCPU_HASH}**, the memory size must be at least 4 bytes wide regardless
* of key size.
* @param out_batch output parameter that should be passed to next call as *in_batch*
* @param keys pointer to an array of *count* keys
* @param values pointer to an array large enough for *count* values
@@ -270,36 +299,96 @@ LIBBPF_API int bpf_map_update_batch(int fd, const void *keys, const void *values
__u32 *count,
const struct bpf_map_batch_opts *opts);
struct bpf_obj_pin_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 file_flags;
int path_fd;
size_t :0;
};
#define bpf_obj_pin_opts__last_field path_fd
LIBBPF_API int bpf_obj_pin(int fd, const char *pathname);
LIBBPF_API int bpf_obj_pin_opts(int fd, const char *pathname,
const struct bpf_obj_pin_opts *opts);
struct bpf_obj_get_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 file_flags;
int path_fd;
size_t :0;
};
#define bpf_obj_get_opts__last_field file_flags
#define bpf_obj_get_opts__last_field path_fd
LIBBPF_API int bpf_obj_pin(int fd, const char *pathname);
LIBBPF_API int bpf_obj_get(const char *pathname);
LIBBPF_API int bpf_obj_get_opts(const char *pathname,
const struct bpf_obj_get_opts *opts);
struct bpf_prog_attach_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
unsigned int flags;
int replace_prog_fd;
};
#define bpf_prog_attach_opts__last_field replace_prog_fd
LIBBPF_API int bpf_prog_attach(int prog_fd, int attachable_fd,
enum bpf_attach_type type, unsigned int flags);
LIBBPF_API int bpf_prog_attach_opts(int prog_fd, int attachable_fd,
enum bpf_attach_type type,
const struct bpf_prog_attach_opts *opts);
LIBBPF_API int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
LIBBPF_API int bpf_prog_detach2(int prog_fd, int attachable_fd,
enum bpf_attach_type type);
struct bpf_prog_attach_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 flags;
union {
int replace_prog_fd;
int replace_fd;
};
int relative_fd;
__u32 relative_id;
__u64 expected_revision;
size_t :0;
};
#define bpf_prog_attach_opts__last_field expected_revision
struct bpf_prog_detach_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 flags;
int relative_fd;
__u32 relative_id;
__u64 expected_revision;
size_t :0;
};
#define bpf_prog_detach_opts__last_field expected_revision
/**
* @brief **bpf_prog_attach_opts()** attaches the BPF program corresponding to
* *prog_fd* to a *target* which can represent a file descriptor or netdevice
* ifindex.
*
* @param prog_fd BPF program file descriptor
* @param target attach location file descriptor or ifindex
* @param type attach type for the BPF program
* @param opts options for configuring the attachment
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_prog_attach_opts(int prog_fd, int target,
enum bpf_attach_type type,
const struct bpf_prog_attach_opts *opts);
/**
* @brief **bpf_prog_detach_opts()** detaches the BPF program corresponding to
* *prog_fd* from a *target* which can represent a file descriptor or netdevice
* ifindex.
*
* @param prog_fd BPF program file descriptor
* @param target detach location file descriptor or ifindex
* @param type detach type for the BPF program
* @param opts options for configuring the detachment
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_prog_detach_opts(int prog_fd, int target,
enum bpf_attach_type type,
const struct bpf_prog_detach_opts *opts);
union bpf_iter_link_info; /* defined in up-to-date linux/bpf.h */
struct bpf_link_create_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
@@ -318,13 +407,38 @@ struct bpf_link_create_opts {
const unsigned long *addrs;
const __u64 *cookies;
} kprobe_multi;
struct {
__u32 flags;
__u32 cnt;
const char *path;
const unsigned long *offsets;
const unsigned long *ref_ctr_offsets;
const __u64 *cookies;
__u32 pid;
} uprobe_multi;
struct {
__u64 cookie;
} tracing;
struct {
__u32 pf;
__u32 hooknum;
__s32 priority;
__u32 flags;
} netfilter;
struct {
__u32 relative_fd;
__u32 relative_id;
__u64 expected_revision;
} tcx;
struct {
__u32 relative_fd;
__u32 relative_id;
__u64 expected_revision;
} netkit;
};
size_t :0;
};
#define bpf_link_create_opts__last_field kprobe_multi.cookies
#define bpf_link_create_opts__last_field uprobe_multi.pid
LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
enum bpf_attach_type attach_type,
@@ -336,8 +450,9 @@ struct bpf_link_update_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 flags; /* extra flags */
__u32 old_prog_fd; /* expected old program FD */
__u32 old_map_fd; /* expected old map FD */
};
#define bpf_link_update_opts__last_field old_prog_fd
#define bpf_link_update_opts__last_field old_map_fd
LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd,
const struct bpf_link_update_opts *opts);
@@ -365,36 +480,166 @@ LIBBPF_API int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id);
LIBBPF_API int bpf_map_get_next_id(__u32 start_id, __u32 *next_id);
LIBBPF_API int bpf_btf_get_next_id(__u32 start_id, __u32 *next_id);
LIBBPF_API int bpf_link_get_next_id(__u32 start_id, __u32 *next_id);
struct bpf_get_fd_by_id_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 open_flags; /* permissions requested for the operation on fd */
size_t :0;
};
#define bpf_get_fd_by_id_opts__last_field open_flags
LIBBPF_API int bpf_prog_get_fd_by_id(__u32 id);
LIBBPF_API int bpf_prog_get_fd_by_id_opts(__u32 id,
const struct bpf_get_fd_by_id_opts *opts);
LIBBPF_API int bpf_map_get_fd_by_id(__u32 id);
LIBBPF_API int bpf_map_get_fd_by_id_opts(__u32 id,
const struct bpf_get_fd_by_id_opts *opts);
LIBBPF_API int bpf_btf_get_fd_by_id(__u32 id);
LIBBPF_API int bpf_btf_get_fd_by_id_opts(__u32 id,
const struct bpf_get_fd_by_id_opts *opts);
LIBBPF_API int bpf_link_get_fd_by_id(__u32 id);
LIBBPF_API int bpf_link_get_fd_by_id_opts(__u32 id,
const struct bpf_get_fd_by_id_opts *opts);
LIBBPF_API int bpf_obj_get_info_by_fd(int bpf_fd, void *info, __u32 *info_len);
/**
* @brief **bpf_prog_get_info_by_fd()** obtains information about the BPF
* program corresponding to *prog_fd*.
*
* Populates up to *info_len* bytes of *info* and updates *info_len* with the
* actual number of bytes written to *info*. Note that *info* should be
* zero-initialized or initialized as expected by the requested *info*
* type. Failing to (zero-)initialize *info* under certain circumstances can
* result in this helper returning an error.
*
* @param prog_fd BPF program file descriptor
* @param info pointer to **struct bpf_prog_info** that will be populated with
* BPF program information
* @param info_len pointer to the size of *info*; on success updated with the
* number of bytes written to *info*
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_prog_get_info_by_fd(int prog_fd, struct bpf_prog_info *info, __u32 *info_len);
/**
* @brief **bpf_map_get_info_by_fd()** obtains information about the BPF
* map corresponding to *map_fd*.
*
* Populates up to *info_len* bytes of *info* and updates *info_len* with the
* actual number of bytes written to *info*. Note that *info* should be
* zero-initialized or initialized as expected by the requested *info*
* type. Failing to (zero-)initialize *info* under certain circumstances can
* result in this helper returning an error.
*
* @param map_fd BPF map file descriptor
* @param info pointer to **struct bpf_map_info** that will be populated with
* BPF map information
* @param info_len pointer to the size of *info*; on success updated with the
* number of bytes written to *info*
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_map_get_info_by_fd(int map_fd, struct bpf_map_info *info, __u32 *info_len);
/**
* @brief **bpf_btf_get_info_by_fd()** obtains information about the
* BTF object corresponding to *btf_fd*.
*
* Populates up to *info_len* bytes of *info* and updates *info_len* with the
* actual number of bytes written to *info*. Note that *info* should be
* zero-initialized or initialized as expected by the requested *info*
* type. Failing to (zero-)initialize *info* under certain circumstances can
* result in this helper returning an error.
*
* @param btf_fd BTF object file descriptor
* @param info pointer to **struct bpf_btf_info** that will be populated with
* BTF object information
* @param info_len pointer to the size of *info*; on success updated with the
* number of bytes written to *info*
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_btf_get_info_by_fd(int btf_fd, struct bpf_btf_info *info, __u32 *info_len);
/**
* @brief **bpf_btf_get_info_by_fd()** obtains information about the BPF
* link corresponding to *link_fd*.
*
* Populates up to *info_len* bytes of *info* and updates *info_len* with the
* actual number of bytes written to *info*. Note that *info* should be
* zero-initialized or initialized as expected by the requested *info*
* type. Failing to (zero-)initialize *info* under certain circumstances can
* result in this helper returning an error.
*
* @param link_fd BPF link file descriptor
* @param info pointer to **struct bpf_link_info** that will be populated with
* BPF link information
* @param info_len pointer to the size of *info*; on success updated with the
* number of bytes written to *info*
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_link_get_info_by_fd(int link_fd, struct bpf_link_info *info, __u32 *info_len);
struct bpf_prog_query_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 query_flags;
__u32 attach_flags; /* output argument */
__u32 *prog_ids;
__u32 prog_cnt; /* input+output argument */
union {
/* input+output argument */
__u32 prog_cnt;
__u32 count;
};
__u32 *prog_attach_flags;
__u32 *link_ids;
__u32 *link_attach_flags;
__u64 revision;
size_t :0;
};
#define bpf_prog_query_opts__last_field prog_attach_flags
#define bpf_prog_query_opts__last_field revision
LIBBPF_API int bpf_prog_query_opts(int target_fd,
enum bpf_attach_type type,
/**
* @brief **bpf_prog_query_opts()** queries the BPF programs and BPF links
* which are attached to *target* which can represent a file descriptor or
* netdevice ifindex.
*
* @param target query location file descriptor or ifindex
* @param type attach type for the BPF program
* @param opts options for configuring the query
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_prog_query_opts(int target, enum bpf_attach_type type,
struct bpf_prog_query_opts *opts);
LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type,
__u32 query_flags, __u32 *attach_flags,
__u32 *prog_ids, __u32 *prog_cnt);
struct bpf_raw_tp_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
const char *tp_name;
__u64 cookie;
size_t :0;
};
#define bpf_raw_tp_opts__last_field cookie
LIBBPF_API int bpf_raw_tracepoint_open_opts(int prog_fd, struct bpf_raw_tp_opts *opts);
LIBBPF_API int bpf_raw_tracepoint_open(const char *name, int prog_fd);
LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf,
__u32 *buf_len, __u32 *prog_id, __u32 *fd_type,
__u64 *probe_offset, __u64 *probe_addr);
#ifdef __cplusplus
/* forward-declaring enums in C++ isn't compatible with pure C enums, so
* instead define bpf_enable_stats() as accepting int as an input
*/
LIBBPF_API int bpf_enable_stats(int type);
#else
enum bpf_stats_type; /* defined in up-to-date linux/bpf.h */
LIBBPF_API int bpf_enable_stats(enum bpf_stats_type type);
#endif
struct bpf_prog_bind_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
@@ -431,6 +676,30 @@ struct bpf_test_run_opts {
LIBBPF_API int bpf_prog_test_run_opts(int prog_fd,
struct bpf_test_run_opts *opts);
struct bpf_token_create_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 flags;
size_t :0;
};
#define bpf_token_create_opts__last_field flags
/**
* @brief **bpf_token_create()** creates a new instance of BPF token derived
* from specified BPF FS mount point.
*
* BPF token created with this API can be passed to bpf() syscall for
* commands like BPF_PROG_LOAD, BPF_MAP_CREATE, etc.
*
* @param bpffs_fd FD for BPF FS instance from which to derive a BPF token
* instance.
* @param opts optional BPF token creation options, can be NULL
*
* @return BPF token FD > 0, on success; negative error code, otherwise (errno
* is also set to the error code)
*/
LIBBPF_API int bpf_token_create(int bpffs_fd,
struct bpf_token_create_opts *opts);
#ifdef __cplusplus
} /* extern "C" */
#endif

View File

@@ -2,6 +2,8 @@
#ifndef __BPF_CORE_READ_H__
#define __BPF_CORE_READ_H__
#include <bpf/bpf_helpers.h>
/*
* enum bpf_field_info_kind is passed as a second argument into
* __builtin_preserve_field_info() built-in to get a specific aspect of
@@ -44,7 +46,7 @@ enum bpf_enum_value_kind {
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define __CORE_BITFIELD_PROBE_READ(dst, src, fld) \
bpf_probe_read_kernel( \
(void *)dst, \
(void *)dst, \
__CORE_RELO(src, fld, BYTE_SIZE), \
(const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET))
#else
@@ -111,8 +113,61 @@ enum bpf_enum_value_kind {
val; \
})
/*
* Write to a bitfield, identified by s->field.
* This is the inverse of BPF_CORE_WRITE_BITFIELD().
*/
#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({ \
void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET); \
unsigned int byte_size = __CORE_RELO(s, field, BYTE_SIZE); \
unsigned int lshift = __CORE_RELO(s, field, LSHIFT_U64); \
unsigned int rshift = __CORE_RELO(s, field, RSHIFT_U64); \
unsigned long long mask, val, nval = new_val; \
unsigned int rpad = rshift - lshift; \
\
asm volatile("" : "+r"(p)); \
\
switch (byte_size) { \
case 1: val = *(unsigned char *)p; break; \
case 2: val = *(unsigned short *)p; break; \
case 4: val = *(unsigned int *)p; break; \
case 8: val = *(unsigned long long *)p; break; \
} \
\
mask = (~0ULL << rshift) >> lshift; \
val = (val & ~mask) | ((nval << rpad) & mask); \
\
switch (byte_size) { \
case 1: *(unsigned char *)p = val; break; \
case 2: *(unsigned short *)p = val; break; \
case 4: *(unsigned int *)p = val; break; \
case 8: *(unsigned long long *)p = val; break; \
} \
})
/* Differentiator between compilers builtin implementations. This is a
* requirement due to the compiler parsing differences where GCC optimizes
* early in parsing those constructs of type pointers to the builtin specific
* type, resulting in not being possible to collect the required type
* information in the builtin expansion.
*/
#ifdef __clang__
#define ___bpf_typeof(type) ((typeof(type) *) 0)
#else
#define ___bpf_typeof1(type, NR) ({ \
extern typeof(type) *___concat(bpf_type_tmp_, NR); \
___concat(bpf_type_tmp_, NR); \
})
#define ___bpf_typeof(type) ___bpf_typeof1(type, __COUNTER__)
#endif
#ifdef __clang__
#define ___bpf_field_ref1(field) (field)
#define ___bpf_field_ref2(type, field) (((typeof(type) *)0)->field)
#define ___bpf_field_ref2(type, field) (___bpf_typeof(type)->field)
#else
#define ___bpf_field_ref1(field) (&(field))
#define ___bpf_field_ref2(type, field) (&(___bpf_typeof(type)->field))
#endif
#define ___bpf_field_ref(args...) \
___bpf_apply(___bpf_field_ref, ___bpf_narg(args))(args)
@@ -162,7 +217,7 @@ enum bpf_enum_value_kind {
* BTF. Always succeeds.
*/
#define bpf_core_type_id_local(type) \
__builtin_btf_type_id(*(typeof(type) *)0, BPF_TYPE_ID_LOCAL)
__builtin_btf_type_id(*___bpf_typeof(type), BPF_TYPE_ID_LOCAL)
/*
* Convenience macro to get BTF type ID of a target kernel's type that matches
@@ -172,7 +227,7 @@ enum bpf_enum_value_kind {
* - 0, if no matching type was found in a target kernel BTF.
*/
#define bpf_core_type_id_kernel(type) \
__builtin_btf_type_id(*(typeof(type) *)0, BPF_TYPE_ID_TARGET)
__builtin_btf_type_id(*___bpf_typeof(type), BPF_TYPE_ID_TARGET)
/*
* Convenience macro to check that provided named type
@@ -182,7 +237,7 @@ enum bpf_enum_value_kind {
* 0, if no matching type is found.
*/
#define bpf_core_type_exists(type) \
__builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_EXISTS)
__builtin_preserve_type_info(*___bpf_typeof(type), BPF_TYPE_EXISTS)
/*
* Convenience macro to check that provided named type
@@ -192,7 +247,7 @@ enum bpf_enum_value_kind {
* 0, if the type does not match any in the target kernel
*/
#define bpf_core_type_matches(type) \
__builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_MATCHES)
__builtin_preserve_type_info(*___bpf_typeof(type), BPF_TYPE_MATCHES)
/*
* Convenience macro to get the byte size of a provided named type
@@ -202,7 +257,7 @@ enum bpf_enum_value_kind {
* 0, if no matching type is found.
*/
#define bpf_core_type_size(type) \
__builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_SIZE)
__builtin_preserve_type_info(*___bpf_typeof(type), BPF_TYPE_SIZE)
/*
* Convenience macro to check that provided enumerator value is defined in
@@ -212,8 +267,13 @@ enum bpf_enum_value_kind {
* kernel's BTF;
* 0, if no matching enum and/or enum value within that enum is found.
*/
#ifdef __clang__
#define bpf_core_enum_value_exists(enum_type, enum_value) \
__builtin_preserve_enum_value(*(typeof(enum_type) *)enum_value, BPF_ENUMVAL_EXISTS)
#else
#define bpf_core_enum_value_exists(enum_type, enum_value) \
__builtin_preserve_enum_value(___bpf_typeof(enum_type), enum_value, BPF_ENUMVAL_EXISTS)
#endif
/*
* Convenience macro to get the integer value of an enumerator value in
@@ -223,8 +283,13 @@ enum bpf_enum_value_kind {
* present in target kernel's BTF;
* 0, if no matching enum and/or enum value within that enum is found.
*/
#ifdef __clang__
#define bpf_core_enum_value(enum_type, enum_value) \
__builtin_preserve_enum_value(*(typeof(enum_type) *)enum_value, BPF_ENUMVAL_VALUE)
#else
#define bpf_core_enum_value(enum_type, enum_value) \
__builtin_preserve_enum_value(___bpf_typeof(enum_type), enum_value, BPF_ENUMVAL_VALUE)
#endif
/*
* bpf_core_read() abstracts away bpf_probe_read_kernel() call and captures
@@ -236,7 +301,7 @@ enum bpf_enum_value_kind {
* a relocation, which records BTF type ID describing root struct/union and an
* accessor string which describes exact embedded field that was used to take
* an address. See detailed description of this relocation format and
* semantics in comments to struct bpf_field_reloc in libbpf_internal.h.
* semantics in comments to struct bpf_core_relo in include/uapi/linux/bpf.h.
*
* This relocation allows libbpf to adjust BPF instruction to use correct
* actual field offset, based on target kernel BTF type that matches original
@@ -260,6 +325,17 @@ enum bpf_enum_value_kind {
#define bpf_core_read_user_str(dst, sz, src) \
bpf_probe_read_user_str(dst, sz, (const void *)__builtin_preserve_access_index(src))
extern void *bpf_rdonly_cast(const void *obj, __u32 btf_id) __ksym __weak;
/*
* Cast provided pointer *ptr* into a pointer to a specified *type* in such
* a way that BPF verifier will become aware of associated kernel-side BTF
* type. This allows to access members of kernel types directly without the
* need to use BPF_CORE_READ() macros.
*/
#define bpf_core_cast(ptr, type) \
((typeof(type) *)bpf_rdonly_cast((ptr), bpf_core_type_id_kernel(type)))
#define ___concat(a, b) a ## b
#define ___apply(fn, n) ___concat(fn, n)
#define ___nth(_1, _2, _3, _4, _5, _6, _7, _8, _9, _10, __11, N, ...) N
@@ -364,7 +440,7 @@ enum bpf_enum_value_kind {
/* Non-CO-RE variant of BPF_CORE_READ_INTO() */
#define BPF_PROBE_READ_INTO(dst, src, a, ...) ({ \
___core_read(bpf_probe_read, bpf_probe_read, \
___core_read(bpf_probe_read_kernel, bpf_probe_read_kernel, \
dst, (src), a, ##__VA_ARGS__) \
})
@@ -400,7 +476,7 @@ enum bpf_enum_value_kind {
/* Non-CO-RE variant of BPF_CORE_READ_STR_INTO() */
#define BPF_PROBE_READ_STR_INTO(dst, src, a, ...) ({ \
___core_read(bpf_probe_read_str, bpf_probe_read, \
___core_read(bpf_probe_read_kernel_str, bpf_probe_read_kernel, \
dst, (src), a, ##__VA_ARGS__) \
})

View File

@@ -11,6 +11,7 @@ struct ksym_relo_desc {
int insn_idx;
bool is_weak;
bool is_typeless;
bool is_ld64;
};
struct ksym_desc {
@@ -24,6 +25,7 @@ struct ksym_desc {
bool typeless;
};
int insn;
bool is_ld64;
};
struct bpf_gen {
@@ -65,7 +67,7 @@ void bpf_gen__map_update_elem(struct bpf_gen *gen, int map_idx, void *value, __u
void bpf_gen__map_freeze(struct bpf_gen *gen, int map_idx);
void bpf_gen__record_attach_target(struct bpf_gen *gen, const char *name, enum bpf_attach_type type);
void bpf_gen__record_extern(struct bpf_gen *gen, const char *name, bool is_weak,
bool is_typeless, int kind, int insn_idx);
bool is_typeless, bool is_ld64, int kind, int insn_idx);
void bpf_gen__record_relo_core(struct bpf_gen *gen, const struct bpf_core_relo *core_relo);
void bpf_gen__populate_outer_map(struct bpf_gen *gen, int outer_map_idx, int key, int inner_map_idx);

File diff suppressed because it is too large Load Diff

View File

@@ -13,6 +13,7 @@
#define __uint(name, val) int (*name)[val]
#define __type(name, val) typeof(val) *name
#define __array(name, val) typeof(val) *name[]
#define __ulong(name, val) enum { ___bpf_concat(__unique_value, __COUNTER__) = val } name
/*
* Helper macro to place programs, maps, license in
@@ -77,16 +78,21 @@
/*
* Helper macros to manipulate data structures
*/
#ifndef offsetof
#define offsetof(TYPE, MEMBER) ((unsigned long)&((TYPE *)0)->MEMBER)
#endif
#ifndef container_of
/* offsetof() definition that uses __builtin_offset() might not preserve field
* offset CO-RE relocation properly, so force-redefine offsetof() using
* old-school approach which works with CO-RE correctly
*/
#undef offsetof
#define offsetof(type, member) ((unsigned long)&((type *)0)->member)
/* redefined container_of() to ensure we use the above offsetof() macro */
#undef container_of
#define container_of(ptr, type, member) \
({ \
void *__mptr = (void *)(ptr); \
((type *)(__mptr - offsetof(type, member))); \
})
#endif
/*
* Compiler (optimization) barrier.
@@ -109,7 +115,7 @@
* This is a variable-specific variant of more global barrier().
*/
#ifndef barrier_var
#define barrier_var(var) asm volatile("" : "=r"(var) : "0"(var))
#define barrier_var(var) asm volatile("" : "+r"(var))
#endif
/*
@@ -160,18 +166,6 @@ bpf_tail_call_static(void *ctx, const void *map, const __u32 slot)
}
#endif
/*
* Helper structure used by eBPF C program
* to describe BPF map attributes to libbpf loader
*/
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
unsigned int map_flags;
} __attribute__((deprecated("use BTF-defined maps in .maps section")));
enum libbpf_pin_type {
LIBBPF_PIN_NONE,
/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
@@ -186,8 +180,20 @@ enum libbpf_tristate {
#define __kconfig __attribute__((section(".kconfig")))
#define __ksym __attribute__((section(".ksyms")))
#define __kptr_untrusted __attribute__((btf_type_tag("kptr_untrusted")))
#define __kptr __attribute__((btf_type_tag("kptr")))
#define __kptr_ref __attribute__((btf_type_tag("kptr_ref")))
#define __percpu_kptr __attribute__((btf_type_tag("percpu_kptr")))
#define bpf_ksym_exists(sym) ({ \
_Static_assert(!__builtin_constant_p(!!sym), #sym " should be marked as __weak"); \
!!sym; \
})
#define __arg_ctx __attribute__((btf_decl_tag("arg:ctx")))
#define __arg_nonnull __attribute((btf_decl_tag("arg:nonnull")))
#define __arg_nullable __attribute((btf_decl_tag("arg:nullable")))
#define __arg_trusted __attribute((btf_decl_tag("arg:trusted")))
#define __arg_arena __attribute((btf_decl_tag("arg:arena")))
#ifndef ___bpf_concat
#define ___bpf_concat(a, b) a ## b
@@ -298,4 +304,107 @@ enum libbpf_tristate {
/* Helper macro to print out debug messages */
#define bpf_printk(fmt, args...) ___bpf_pick_printk(args)(fmt, ##args)
struct bpf_iter_num;
extern int bpf_iter_num_new(struct bpf_iter_num *it, int start, int end) __weak __ksym;
extern int *bpf_iter_num_next(struct bpf_iter_num *it) __weak __ksym;
extern void bpf_iter_num_destroy(struct bpf_iter_num *it) __weak __ksym;
#ifndef bpf_for_each
/* bpf_for_each(iter_type, cur_elem, args...) provides generic construct for
* using BPF open-coded iterators without having to write mundane explicit
* low-level loop logic. Instead, it provides for()-like generic construct
* that can be used pretty naturally. E.g., for some hypothetical cgroup
* iterator, you'd write:
*
* struct cgroup *cg, *parent_cg = <...>;
*
* bpf_for_each(cgroup, cg, parent_cg, CG_ITER_CHILDREN) {
* bpf_printk("Child cgroup id = %d", cg->cgroup_id);
* if (cg->cgroup_id == 123)
* break;
* }
*
* I.e., it looks almost like high-level for each loop in other languages,
* supports continue/break, and is verifiable by BPF verifier.
*
* For iterating integers, the difference betwen bpf_for_each(num, i, N, M)
* and bpf_for(i, N, M) is in that bpf_for() provides additional proof to
* verifier that i is in [N, M) range, and in bpf_for_each() case i is `int
* *`, not just `int`. So for integers bpf_for() is more convenient.
*
* Note: this macro relies on C99 feature of allowing to declare variables
* inside for() loop, bound to for() loop lifetime. It also utilizes GCC
* extension: __attribute__((cleanup(<func>))), supported by both GCC and
* Clang.
*/
#define bpf_for_each(type, cur, args...) for ( \
/* initialize and define destructor */ \
struct bpf_iter_##type ___it __attribute__((aligned(8), /* enforce, just in case */, \
cleanup(bpf_iter_##type##_destroy))), \
/* ___p pointer is just to call bpf_iter_##type##_new() *once* to init ___it */ \
*___p __attribute__((unused)) = ( \
bpf_iter_##type##_new(&___it, ##args), \
/* this is a workaround for Clang bug: it currently doesn't emit BTF */ \
/* for bpf_iter_##type##_destroy() when used from cleanup() attribute */ \
(void)bpf_iter_##type##_destroy, (void *)0); \
/* iteration and termination check */ \
(((cur) = bpf_iter_##type##_next(&___it))); \
)
#endif /* bpf_for_each */
#ifndef bpf_for
/* bpf_for(i, start, end) implements a for()-like looping construct that sets
* provided integer variable *i* to values starting from *start* through,
* but not including, *end*. It also proves to BPF verifier that *i* belongs
* to range [start, end), so this can be used for accessing arrays without
* extra checks.
*
* Note: *start* and *end* are assumed to be expressions with no side effects
* and whose values do not change throughout bpf_for() loop execution. They do
* not have to be statically known or constant, though.
*
* Note: similarly to bpf_for_each(), it relies on C99 feature of declaring for()
* loop bound variables and cleanup attribute, supported by GCC and Clang.
*/
#define bpf_for(i, start, end) for ( \
/* initialize and define destructor */ \
struct bpf_iter_num ___it __attribute__((aligned(8), /* enforce, just in case */ \
cleanup(bpf_iter_num_destroy))), \
/* ___p pointer is necessary to call bpf_iter_num_new() *once* to init ___it */ \
*___p __attribute__((unused)) = ( \
bpf_iter_num_new(&___it, (start), (end)), \
/* this is a workaround for Clang bug: it currently doesn't emit BTF */ \
/* for bpf_iter_num_destroy() when used from cleanup() attribute */ \
(void)bpf_iter_num_destroy, (void *)0); \
({ \
/* iteration step */ \
int *___t = bpf_iter_num_next(&___it); \
/* termination and bounds check */ \
(___t && ((i) = *___t, (i) >= (start) && (i) < (end))); \
}); \
)
#endif /* bpf_for */
#ifndef bpf_repeat
/* bpf_repeat(N) performs N iterations without exposing iteration number
*
* Note: similarly to bpf_for_each(), it relies on C99 feature of declaring for()
* loop bound variables and cleanup attribute, supported by GCC and Clang.
*/
#define bpf_repeat(N) for ( \
/* initialize and define destructor */ \
struct bpf_iter_num ___it __attribute__((aligned(8), /* enforce, just in case */ \
cleanup(bpf_iter_num_destroy))), \
/* ___p pointer is necessary to call bpf_iter_num_new() *once* to init ___it */ \
*___p __attribute__((unused)) = ( \
bpf_iter_num_new(&___it, 0, (N)), \
/* this is a workaround for Clang bug: it currently doesn't emit BTF */ \
/* for bpf_iter_num_destroy() when used from cleanup() attribute */ \
(void)bpf_iter_num_destroy, (void *)0); \
bpf_iter_num_next(&___it); \
/* nothing here */ \
)
#endif /* bpf_repeat */
#endif

View File

@@ -2,7 +2,7 @@
#ifndef __BPF_TRACING_H__
#define __BPF_TRACING_H__
#include <bpf/bpf_helpers.h>
#include "bpf_helpers.h"
/* Scan the ARCH passed in from ARCH env variable (see Makefile) */
#if defined(__TARGET_ARCH_x86)
@@ -32,6 +32,9 @@
#elif defined(__TARGET_ARCH_arc)
#define bpf_target_arc
#define bpf_target_defined
#elif defined(__TARGET_ARCH_loongarch)
#define bpf_target_loongarch
#define bpf_target_defined
#else
/* Fall back to what the compiler says */
@@ -62,6 +65,9 @@
#elif defined(__arc__)
#define bpf_target_arc
#define bpf_target_defined
#elif defined(__loongarch__)
#define bpf_target_loongarch
#define bpf_target_defined
#endif /* no compiler target */
#endif
@@ -72,6 +78,10 @@
#if defined(bpf_target_x86)
/*
* https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI
*/
#if defined(__KERNEL__) || defined(__VMLINUX_H__)
#define __PT_PARM1_REG di
@@ -79,25 +89,40 @@
#define __PT_PARM3_REG dx
#define __PT_PARM4_REG cx
#define __PT_PARM5_REG r8
#define __PT_PARM6_REG r9
/*
* Syscall uses r10 for PARM4. See arch/x86/entry/entry_64.S:entry_SYSCALL_64
* comments in Linux sources. And refer to syscall(2) manpage.
*/
#define __PT_PARM1_SYSCALL_REG __PT_PARM1_REG
#define __PT_PARM2_SYSCALL_REG __PT_PARM2_REG
#define __PT_PARM3_SYSCALL_REG __PT_PARM3_REG
#define __PT_PARM4_SYSCALL_REG r10
#define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
#define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG
#define __PT_RET_REG sp
#define __PT_FP_REG bp
#define __PT_RC_REG ax
#define __PT_SP_REG sp
#define __PT_IP_REG ip
/* syscall uses r10 for PARM4 */
#define PT_REGS_PARM4_SYSCALL(x) ((x)->r10)
#define PT_REGS_PARM4_CORE_SYSCALL(x) BPF_CORE_READ(x, r10)
#else
#ifdef __i386__
/* i386 kernel is built with -mregparm=3 */
#define __PT_PARM1_REG eax
#define __PT_PARM2_REG edx
#define __PT_PARM3_REG ecx
/* i386 kernel is built with -mregparm=3 */
#define __PT_PARM4_REG __unsupported__
#define __PT_PARM5_REG __unsupported__
/* i386 syscall ABI is very different, refer to syscall(2) manpage */
#define __PT_PARM1_SYSCALL_REG ebx
#define __PT_PARM2_SYSCALL_REG ecx
#define __PT_PARM3_SYSCALL_REG edx
#define __PT_PARM4_SYSCALL_REG esi
#define __PT_PARM5_SYSCALL_REG edi
#define __PT_PARM6_SYSCALL_REG ebp
#define __PT_RET_REG esp
#define __PT_FP_REG ebp
#define __PT_RC_REG eax
@@ -111,14 +136,20 @@
#define __PT_PARM3_REG rdx
#define __PT_PARM4_REG rcx
#define __PT_PARM5_REG r8
#define __PT_PARM6_REG r9
#define __PT_PARM1_SYSCALL_REG __PT_PARM1_REG
#define __PT_PARM2_SYSCALL_REG __PT_PARM2_REG
#define __PT_PARM3_SYSCALL_REG __PT_PARM3_REG
#define __PT_PARM4_SYSCALL_REG r10
#define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
#define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG
#define __PT_RET_REG rsp
#define __PT_FP_REG rbp
#define __PT_RC_REG rax
#define __PT_SP_REG rsp
#define __PT_IP_REG rip
/* syscall uses r10 for PARM4 */
#define PT_REGS_PARM4_SYSCALL(x) ((x)->r10)
#define PT_REGS_PARM4_CORE_SYSCALL(x) BPF_CORE_READ(x, r10)
#endif /* __i386__ */
@@ -126,6 +157,10 @@
#elif defined(bpf_target_s390)
/*
* https://github.com/IBM/s390x-abi/releases/download/v1.6/lzsabi_s390x.pdf
*/
struct pt_regs___s390 {
unsigned long orig_gpr2;
};
@@ -137,21 +172,42 @@ struct pt_regs___s390 {
#define __PT_PARM3_REG gprs[4]
#define __PT_PARM4_REG gprs[5]
#define __PT_PARM5_REG gprs[6]
#define __PT_RET_REG grps[14]
#define __PT_PARM1_SYSCALL_REG orig_gpr2
#define __PT_PARM2_SYSCALL_REG __PT_PARM2_REG
#define __PT_PARM3_SYSCALL_REG __PT_PARM3_REG
#define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
#define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
#define __PT_PARM6_SYSCALL_REG gprs[7]
#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
#define PT_REGS_PARM1_CORE_SYSCALL(x) \
BPF_CORE_READ((const struct pt_regs___s390 *)(x), __PT_PARM1_SYSCALL_REG)
#define __PT_RET_REG gprs[14]
#define __PT_FP_REG gprs[11] /* Works only with CONFIG_FRAME_POINTER */
#define __PT_RC_REG gprs[2]
#define __PT_SP_REG gprs[15]
#define __PT_IP_REG psw.addr
#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
#define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ((const struct pt_regs___s390 *)(x), orig_gpr2)
#elif defined(bpf_target_arm)
/*
* https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#machine-registers
*/
#define __PT_PARM1_REG uregs[0]
#define __PT_PARM2_REG uregs[1]
#define __PT_PARM3_REG uregs[2]
#define __PT_PARM4_REG uregs[3]
#define __PT_PARM5_REG uregs[4]
#define __PT_PARM1_SYSCALL_REG __PT_PARM1_REG
#define __PT_PARM2_SYSCALL_REG __PT_PARM2_REG
#define __PT_PARM3_SYSCALL_REG __PT_PARM3_REG
#define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
#define __PT_PARM5_SYSCALL_REG uregs[4]
#define __PT_PARM6_SYSCALL_REG uregs[5]
#define __PT_PARM7_SYSCALL_REG uregs[6]
#define __PT_RET_REG uregs[14]
#define __PT_FP_REG uregs[11] /* Works only with CONFIG_FRAME_POINTER */
#define __PT_RC_REG uregs[0]
@@ -160,6 +216,10 @@ struct pt_regs___s390 {
#elif defined(bpf_target_arm64)
/*
* https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#machine-registers
*/
struct pt_regs___arm64 {
unsigned long orig_x0;
};
@@ -171,21 +231,49 @@ struct pt_regs___arm64 {
#define __PT_PARM3_REG regs[2]
#define __PT_PARM4_REG regs[3]
#define __PT_PARM5_REG regs[4]
#define __PT_PARM6_REG regs[5]
#define __PT_PARM7_REG regs[6]
#define __PT_PARM8_REG regs[7]
#define __PT_PARM1_SYSCALL_REG orig_x0
#define __PT_PARM2_SYSCALL_REG __PT_PARM2_REG
#define __PT_PARM3_SYSCALL_REG __PT_PARM3_REG
#define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
#define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
#define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG
#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
#define PT_REGS_PARM1_CORE_SYSCALL(x) \
BPF_CORE_READ((const struct pt_regs___arm64 *)(x), __PT_PARM1_SYSCALL_REG)
#define __PT_RET_REG regs[30]
#define __PT_FP_REG regs[29] /* Works only with CONFIG_FRAME_POINTER */
#define __PT_RC_REG regs[0]
#define __PT_SP_REG sp
#define __PT_IP_REG pc
#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
#define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ((const struct pt_regs___arm64 *)(x), orig_x0)
#elif defined(bpf_target_mips)
/*
* N64 ABI is assumed right now.
* https://en.wikipedia.org/wiki/MIPS_architecture#Calling_conventions
*/
#define __PT_PARM1_REG regs[4]
#define __PT_PARM2_REG regs[5]
#define __PT_PARM3_REG regs[6]
#define __PT_PARM4_REG regs[7]
#define __PT_PARM5_REG regs[8]
#define __PT_PARM6_REG regs[9]
#define __PT_PARM7_REG regs[10]
#define __PT_PARM8_REG regs[11]
#define __PT_PARM1_SYSCALL_REG __PT_PARM1_REG
#define __PT_PARM2_SYSCALL_REG __PT_PARM2_REG
#define __PT_PARM3_SYSCALL_REG __PT_PARM3_REG
#define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
#define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG /* only N32/N64 */
#define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG /* only N32/N64 */
#define __PT_RET_REG regs[31]
#define __PT_FP_REG regs[30] /* Works only with CONFIG_FRAME_POINTER */
#define __PT_RC_REG regs[2]
@@ -194,26 +282,58 @@ struct pt_regs___arm64 {
#elif defined(bpf_target_powerpc)
/*
* http://refspecs.linux-foundation.org/elf/elfspec_ppc.pdf (page 3-14,
* section "Function Calling Sequence")
*/
#define __PT_PARM1_REG gpr[3]
#define __PT_PARM2_REG gpr[4]
#define __PT_PARM3_REG gpr[5]
#define __PT_PARM4_REG gpr[6]
#define __PT_PARM5_REG gpr[7]
#define __PT_PARM6_REG gpr[8]
#define __PT_PARM7_REG gpr[9]
#define __PT_PARM8_REG gpr[10]
/* powerpc does not select ARCH_HAS_SYSCALL_WRAPPER. */
#define PT_REGS_SYSCALL_REGS(ctx) ctx
#define __PT_PARM1_SYSCALL_REG orig_gpr3
#define __PT_PARM2_SYSCALL_REG __PT_PARM2_REG
#define __PT_PARM3_SYSCALL_REG __PT_PARM3_REG
#define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
#define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
#define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG
#if !defined(__arch64__)
#define __PT_PARM7_SYSCALL_REG __PT_PARM7_REG /* only powerpc (not powerpc64) */
#endif
#define __PT_RET_REG regs[31]
#define __PT_FP_REG __unsupported__
#define __PT_RC_REG gpr[3]
#define __PT_SP_REG sp
#define __PT_IP_REG nip
/* powerpc does not select ARCH_HAS_SYSCALL_WRAPPER. */
#define PT_REGS_SYSCALL_REGS(ctx) ctx
#elif defined(bpf_target_sparc)
/*
* https://en.wikipedia.org/wiki/Calling_convention#SPARC
*/
#define __PT_PARM1_REG u_regs[UREG_I0]
#define __PT_PARM2_REG u_regs[UREG_I1]
#define __PT_PARM3_REG u_regs[UREG_I2]
#define __PT_PARM4_REG u_regs[UREG_I3]
#define __PT_PARM5_REG u_regs[UREG_I4]
#define __PT_PARM6_REG u_regs[UREG_I5]
#define __PT_PARM1_SYSCALL_REG __PT_PARM1_REG
#define __PT_PARM2_SYSCALL_REG __PT_PARM2_REG
#define __PT_PARM3_SYSCALL_REG __PT_PARM3_REG
#define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
#define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
#define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG
#define __PT_RET_REG u_regs[UREG_I7]
#define __PT_FP_REG __unsupported__
#define __PT_RC_REG u_regs[UREG_I0]
@@ -227,36 +347,99 @@ struct pt_regs___arm64 {
#elif defined(bpf_target_riscv)
/*
* https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#risc-v-calling-conventions
*/
/* riscv provides struct user_regs_struct instead of struct pt_regs to userspace */
#define __PT_REGS_CAST(x) ((const struct user_regs_struct *)(x))
#define __PT_PARM1_REG a0
#define __PT_PARM2_REG a1
#define __PT_PARM3_REG a2
#define __PT_PARM4_REG a3
#define __PT_PARM5_REG a4
#define __PT_PARM6_REG a5
#define __PT_PARM7_REG a6
#define __PT_PARM8_REG a7
#define __PT_PARM1_SYSCALL_REG __PT_PARM1_REG
#define __PT_PARM2_SYSCALL_REG __PT_PARM2_REG
#define __PT_PARM3_SYSCALL_REG __PT_PARM3_REG
#define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
#define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
#define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG
#define __PT_RET_REG ra
#define __PT_FP_REG s0
#define __PT_RC_REG a0
#define __PT_SP_REG sp
#define __PT_IP_REG pc
/* riscv does not select ARCH_HAS_SYSCALL_WRAPPER. */
#define PT_REGS_SYSCALL_REGS(ctx) ctx
#elif defined(bpf_target_arc)
/* arc provides struct user_pt_regs instead of struct pt_regs to userspace */
/*
* Section "Function Calling Sequence" (page 24):
* https://raw.githubusercontent.com/wiki/foss-for-synopsys-dwc-arc-processors/toolchain/files/ARCv2_ABI.pdf
*/
/* arc provides struct user_regs_struct instead of struct pt_regs to userspace */
#define __PT_REGS_CAST(x) ((const struct user_regs_struct *)(x))
#define __PT_PARM1_REG scratch.r0
#define __PT_PARM2_REG scratch.r1
#define __PT_PARM3_REG scratch.r2
#define __PT_PARM4_REG scratch.r3
#define __PT_PARM5_REG scratch.r4
#define __PT_PARM6_REG scratch.r5
#define __PT_PARM7_REG scratch.r6
#define __PT_PARM8_REG scratch.r7
/* arc does not select ARCH_HAS_SYSCALL_WRAPPER. */
#define PT_REGS_SYSCALL_REGS(ctx) ctx
#define __PT_PARM1_SYSCALL_REG __PT_PARM1_REG
#define __PT_PARM2_SYSCALL_REG __PT_PARM2_REG
#define __PT_PARM3_SYSCALL_REG __PT_PARM3_REG
#define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
#define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
#define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG
#define __PT_RET_REG scratch.blink
#define __PT_FP_REG __unsupported__
#define __PT_FP_REG scratch.fp
#define __PT_RC_REG scratch.r0
#define __PT_SP_REG scratch.sp
#define __PT_IP_REG scratch.ret
/* arc does not select ARCH_HAS_SYSCALL_WRAPPER. */
#elif defined(bpf_target_loongarch)
/*
* https://docs.kernel.org/loongarch/introduction.html
* https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html
*/
/* loongarch provides struct user_pt_regs instead of struct pt_regs to userspace */
#define __PT_REGS_CAST(x) ((const struct user_pt_regs *)(x))
#define __PT_PARM1_REG regs[4]
#define __PT_PARM2_REG regs[5]
#define __PT_PARM3_REG regs[6]
#define __PT_PARM4_REG regs[7]
#define __PT_PARM5_REG regs[8]
#define __PT_PARM6_REG regs[9]
#define __PT_PARM7_REG regs[10]
#define __PT_PARM8_REG regs[11]
/* loongarch does not select ARCH_HAS_SYSCALL_WRAPPER. */
#define PT_REGS_SYSCALL_REGS(ctx) ctx
#define __PT_PARM1_SYSCALL_REG __PT_PARM1_REG
#define __PT_PARM2_SYSCALL_REG __PT_PARM2_REG
#define __PT_PARM3_SYSCALL_REG __PT_PARM3_REG
#define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
#define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
#define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG
#define __PT_RET_REG regs[1]
#define __PT_FP_REG regs[22]
#define __PT_RC_REG regs[4]
#define __PT_SP_REG regs[3]
#define __PT_IP_REG csr_era
#endif
@@ -264,16 +447,49 @@ struct pt_regs___arm64 {
struct pt_regs;
/* allow some architecutres to override `struct pt_regs` */
/* allow some architectures to override `struct pt_regs` */
#ifndef __PT_REGS_CAST
#define __PT_REGS_CAST(x) (x)
#endif
/*
* Different architectures support different number of arguments passed
* through registers. i386 supports just 3, some arches support up to 8.
*/
#ifndef __PT_PARM4_REG
#define __PT_PARM4_REG __unsupported__
#endif
#ifndef __PT_PARM5_REG
#define __PT_PARM5_REG __unsupported__
#endif
#ifndef __PT_PARM6_REG
#define __PT_PARM6_REG __unsupported__
#endif
#ifndef __PT_PARM7_REG
#define __PT_PARM7_REG __unsupported__
#endif
#ifndef __PT_PARM8_REG
#define __PT_PARM8_REG __unsupported__
#endif
/*
* Similarly, syscall-specific conventions might differ between function call
* conventions within each architecutre. All supported architectures pass
* either 6 or 7 syscall arguments in registers.
*
* See syscall(2) manpage for succinct table with information on each arch.
*/
#ifndef __PT_PARM7_SYSCALL_REG
#define __PT_PARM7_SYSCALL_REG __unsupported__
#endif
#define PT_REGS_PARM1(x) (__PT_REGS_CAST(x)->__PT_PARM1_REG)
#define PT_REGS_PARM2(x) (__PT_REGS_CAST(x)->__PT_PARM2_REG)
#define PT_REGS_PARM3(x) (__PT_REGS_CAST(x)->__PT_PARM3_REG)
#define PT_REGS_PARM4(x) (__PT_REGS_CAST(x)->__PT_PARM4_REG)
#define PT_REGS_PARM5(x) (__PT_REGS_CAST(x)->__PT_PARM5_REG)
#define PT_REGS_PARM6(x) (__PT_REGS_CAST(x)->__PT_PARM6_REG)
#define PT_REGS_PARM7(x) (__PT_REGS_CAST(x)->__PT_PARM7_REG)
#define PT_REGS_PARM8(x) (__PT_REGS_CAST(x)->__PT_PARM8_REG)
#define PT_REGS_RET(x) (__PT_REGS_CAST(x)->__PT_RET_REG)
#define PT_REGS_FP(x) (__PT_REGS_CAST(x)->__PT_FP_REG)
#define PT_REGS_RC(x) (__PT_REGS_CAST(x)->__PT_RC_REG)
@@ -285,6 +501,9 @@ struct pt_regs;
#define PT_REGS_PARM3_CORE(x) BPF_CORE_READ(__PT_REGS_CAST(x), __PT_PARM3_REG)
#define PT_REGS_PARM4_CORE(x) BPF_CORE_READ(__PT_REGS_CAST(x), __PT_PARM4_REG)
#define PT_REGS_PARM5_CORE(x) BPF_CORE_READ(__PT_REGS_CAST(x), __PT_PARM5_REG)
#define PT_REGS_PARM6_CORE(x) BPF_CORE_READ(__PT_REGS_CAST(x), __PT_PARM6_REG)
#define PT_REGS_PARM7_CORE(x) BPF_CORE_READ(__PT_REGS_CAST(x), __PT_PARM7_REG)
#define PT_REGS_PARM8_CORE(x) BPF_CORE_READ(__PT_REGS_CAST(x), __PT_PARM8_REG)
#define PT_REGS_RET_CORE(x) BPF_CORE_READ(__PT_REGS_CAST(x), __PT_RET_REG)
#define PT_REGS_FP_CORE(x) BPF_CORE_READ(__PT_REGS_CAST(x), __PT_FP_REG)
#define PT_REGS_RC_CORE(x) BPF_CORE_READ(__PT_REGS_CAST(x), __PT_RC_REG)
@@ -311,24 +530,33 @@ struct pt_regs;
#endif
#ifndef PT_REGS_PARM1_SYSCALL
#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1(x)
#define PT_REGS_PARM1_SYSCALL(x) (__PT_REGS_CAST(x)->__PT_PARM1_SYSCALL_REG)
#define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ(__PT_REGS_CAST(x), __PT_PARM1_SYSCALL_REG)
#endif
#ifndef PT_REGS_PARM2_SYSCALL
#define PT_REGS_PARM2_SYSCALL(x) (__PT_REGS_CAST(x)->__PT_PARM2_SYSCALL_REG)
#define PT_REGS_PARM2_CORE_SYSCALL(x) BPF_CORE_READ(__PT_REGS_CAST(x), __PT_PARM2_SYSCALL_REG)
#endif
#ifndef PT_REGS_PARM3_SYSCALL
#define PT_REGS_PARM3_SYSCALL(x) (__PT_REGS_CAST(x)->__PT_PARM3_SYSCALL_REG)
#define PT_REGS_PARM3_CORE_SYSCALL(x) BPF_CORE_READ(__PT_REGS_CAST(x), __PT_PARM3_SYSCALL_REG)
#endif
#define PT_REGS_PARM2_SYSCALL(x) PT_REGS_PARM2(x)
#define PT_REGS_PARM3_SYSCALL(x) PT_REGS_PARM3(x)
#ifndef PT_REGS_PARM4_SYSCALL
#define PT_REGS_PARM4_SYSCALL(x) PT_REGS_PARM4(x)
#define PT_REGS_PARM4_SYSCALL(x) (__PT_REGS_CAST(x)->__PT_PARM4_SYSCALL_REG)
#define PT_REGS_PARM4_CORE_SYSCALL(x) BPF_CORE_READ(__PT_REGS_CAST(x), __PT_PARM4_SYSCALL_REG)
#endif
#define PT_REGS_PARM5_SYSCALL(x) PT_REGS_PARM5(x)
#ifndef PT_REGS_PARM1_CORE_SYSCALL
#define PT_REGS_PARM1_CORE_SYSCALL(x) PT_REGS_PARM1_CORE(x)
#ifndef PT_REGS_PARM5_SYSCALL
#define PT_REGS_PARM5_SYSCALL(x) (__PT_REGS_CAST(x)->__PT_PARM5_SYSCALL_REG)
#define PT_REGS_PARM5_CORE_SYSCALL(x) BPF_CORE_READ(__PT_REGS_CAST(x), __PT_PARM5_SYSCALL_REG)
#endif
#define PT_REGS_PARM2_CORE_SYSCALL(x) PT_REGS_PARM2_CORE(x)
#define PT_REGS_PARM3_CORE_SYSCALL(x) PT_REGS_PARM3_CORE(x)
#ifndef PT_REGS_PARM4_CORE_SYSCALL
#define PT_REGS_PARM4_CORE_SYSCALL(x) PT_REGS_PARM4_CORE(x)
#ifndef PT_REGS_PARM6_SYSCALL
#define PT_REGS_PARM6_SYSCALL(x) (__PT_REGS_CAST(x)->__PT_PARM6_SYSCALL_REG)
#define PT_REGS_PARM6_CORE_SYSCALL(x) BPF_CORE_READ(__PT_REGS_CAST(x), __PT_PARM6_SYSCALL_REG)
#endif
#ifndef PT_REGS_PARM7_SYSCALL
#define PT_REGS_PARM7_SYSCALL(x) (__PT_REGS_CAST(x)->__PT_PARM7_SYSCALL_REG)
#define PT_REGS_PARM7_CORE_SYSCALL(x) BPF_CORE_READ(__PT_REGS_CAST(x), __PT_PARM7_SYSCALL_REG)
#endif
#define PT_REGS_PARM5_CORE_SYSCALL(x) PT_REGS_PARM5_CORE(x)
#else /* defined(bpf_target_defined) */
@@ -337,6 +565,9 @@ struct pt_regs;
#define PT_REGS_PARM3(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM4(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM5(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM6(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM7(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM8(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_RET(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_FP(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_RC(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
@@ -348,6 +579,9 @@ struct pt_regs;
#define PT_REGS_PARM3_CORE(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM4_CORE(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM5_CORE(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM6_CORE(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM7_CORE(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM8_CORE(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_RET_CORE(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_FP_CORE(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_RC_CORE(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
@@ -362,12 +596,16 @@ struct pt_regs;
#define PT_REGS_PARM3_SYSCALL(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM4_SYSCALL(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM5_SYSCALL(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM6_SYSCALL(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM7_SYSCALL(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM1_CORE_SYSCALL(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM2_CORE_SYSCALL(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM3_CORE_SYSCALL(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM4_CORE_SYSCALL(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM5_CORE_SYSCALL(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM6_CORE_SYSCALL(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#define PT_REGS_PARM7_CORE_SYSCALL(x) ({ _Pragma(__BPF_TARGET_MISSING); 0l; })
#endif /* defined(bpf_target_defined) */
@@ -438,6 +676,113 @@ typeof(name(0)) name(unsigned long long *ctx) \
static __always_inline typeof(name(0)) \
____##name(unsigned long long *ctx, ##args)
#ifndef ___bpf_nth2
#define ___bpf_nth2(_, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _13, \
_14, _15, _16, _17, _18, _19, _20, _21, _22, _23, _24, N, ...) N
#endif
#ifndef ___bpf_narg2
#define ___bpf_narg2(...) \
___bpf_nth2(_, ##__VA_ARGS__, 12, 12, 11, 11, 10, 10, 9, 9, 8, 8, 7, 7, \
6, 6, 5, 5, 4, 4, 3, 3, 2, 2, 1, 1, 0)
#endif
#define ___bpf_treg_cnt(t) \
__builtin_choose_expr(sizeof(t) == 1, 1, \
__builtin_choose_expr(sizeof(t) == 2, 1, \
__builtin_choose_expr(sizeof(t) == 4, 1, \
__builtin_choose_expr(sizeof(t) == 8, 1, \
__builtin_choose_expr(sizeof(t) == 16, 2, \
(void)0)))))
#define ___bpf_reg_cnt0() (0)
#define ___bpf_reg_cnt1(t, x) (___bpf_reg_cnt0() + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt2(t, x, args...) (___bpf_reg_cnt1(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt3(t, x, args...) (___bpf_reg_cnt2(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt4(t, x, args...) (___bpf_reg_cnt3(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt5(t, x, args...) (___bpf_reg_cnt4(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt6(t, x, args...) (___bpf_reg_cnt5(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt7(t, x, args...) (___bpf_reg_cnt6(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt8(t, x, args...) (___bpf_reg_cnt7(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt9(t, x, args...) (___bpf_reg_cnt8(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt10(t, x, args...) (___bpf_reg_cnt9(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt11(t, x, args...) (___bpf_reg_cnt10(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt12(t, x, args...) (___bpf_reg_cnt11(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt(args...) ___bpf_apply(___bpf_reg_cnt, ___bpf_narg2(args))(args)
#define ___bpf_union_arg(t, x, n) \
__builtin_choose_expr(sizeof(t) == 1, ({ union { __u8 z[1]; t x; } ___t = { .z = {ctx[n]}}; ___t.x; }), \
__builtin_choose_expr(sizeof(t) == 2, ({ union { __u16 z[1]; t x; } ___t = { .z = {ctx[n]} }; ___t.x; }), \
__builtin_choose_expr(sizeof(t) == 4, ({ union { __u32 z[1]; t x; } ___t = { .z = {ctx[n]} }; ___t.x; }), \
__builtin_choose_expr(sizeof(t) == 8, ({ union { __u64 z[1]; t x; } ___t = {.z = {ctx[n]} }; ___t.x; }), \
__builtin_choose_expr(sizeof(t) == 16, ({ union { __u64 z[2]; t x; } ___t = {.z = {ctx[n], ctx[n + 1]} }; ___t.x; }), \
(void)0)))))
#define ___bpf_ctx_arg0(n, args...)
#define ___bpf_ctx_arg1(n, t, x) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt1(t, x))
#define ___bpf_ctx_arg2(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt2(t, x, args)) ___bpf_ctx_arg1(n, args)
#define ___bpf_ctx_arg3(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt3(t, x, args)) ___bpf_ctx_arg2(n, args)
#define ___bpf_ctx_arg4(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt4(t, x, args)) ___bpf_ctx_arg3(n, args)
#define ___bpf_ctx_arg5(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt5(t, x, args)) ___bpf_ctx_arg4(n, args)
#define ___bpf_ctx_arg6(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt6(t, x, args)) ___bpf_ctx_arg5(n, args)
#define ___bpf_ctx_arg7(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt7(t, x, args)) ___bpf_ctx_arg6(n, args)
#define ___bpf_ctx_arg8(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt8(t, x, args)) ___bpf_ctx_arg7(n, args)
#define ___bpf_ctx_arg9(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt9(t, x, args)) ___bpf_ctx_arg8(n, args)
#define ___bpf_ctx_arg10(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt10(t, x, args)) ___bpf_ctx_arg9(n, args)
#define ___bpf_ctx_arg11(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt11(t, x, args)) ___bpf_ctx_arg10(n, args)
#define ___bpf_ctx_arg12(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt12(t, x, args)) ___bpf_ctx_arg11(n, args)
#define ___bpf_ctx_arg(args...) ___bpf_apply(___bpf_ctx_arg, ___bpf_narg2(args))(___bpf_reg_cnt(args), args)
#define ___bpf_ctx_decl0()
#define ___bpf_ctx_decl1(t, x) , t x
#define ___bpf_ctx_decl2(t, x, args...) , t x ___bpf_ctx_decl1(args)
#define ___bpf_ctx_decl3(t, x, args...) , t x ___bpf_ctx_decl2(args)
#define ___bpf_ctx_decl4(t, x, args...) , t x ___bpf_ctx_decl3(args)
#define ___bpf_ctx_decl5(t, x, args...) , t x ___bpf_ctx_decl4(args)
#define ___bpf_ctx_decl6(t, x, args...) , t x ___bpf_ctx_decl5(args)
#define ___bpf_ctx_decl7(t, x, args...) , t x ___bpf_ctx_decl6(args)
#define ___bpf_ctx_decl8(t, x, args...) , t x ___bpf_ctx_decl7(args)
#define ___bpf_ctx_decl9(t, x, args...) , t x ___bpf_ctx_decl8(args)
#define ___bpf_ctx_decl10(t, x, args...) , t x ___bpf_ctx_decl9(args)
#define ___bpf_ctx_decl11(t, x, args...) , t x ___bpf_ctx_decl10(args)
#define ___bpf_ctx_decl12(t, x, args...) , t x ___bpf_ctx_decl11(args)
#define ___bpf_ctx_decl(args...) ___bpf_apply(___bpf_ctx_decl, ___bpf_narg2(args))(args)
/*
* BPF_PROG2 is an enhanced version of BPF_PROG in order to handle struct
* arguments. Since each struct argument might take one or two u64 values
* in the trampoline stack, argument type size is needed to place proper number
* of u64 values for each argument. Therefore, BPF_PROG2 has different
* syntax from BPF_PROG. For example, for the following BPF_PROG syntax:
*
* int BPF_PROG(test2, int a, int b) { ... }
*
* the corresponding BPF_PROG2 syntax is:
*
* int BPF_PROG2(test2, int, a, int, b) { ... }
*
* where type and the corresponding argument name are separated by comma.
*
* Use BPF_PROG2 macro if one of the arguments might be a struct/union larger
* than 8 bytes:
*
* int BPF_PROG2(test_struct_arg, struct bpf_testmod_struct_arg_1, a, int, b,
* int, c, int, d, struct bpf_testmod_struct_arg_2, e, int, ret)
* {
* // access a, b, c, d, e, and ret directly
* ...
* }
*/
#define BPF_PROG2(name, args...) \
name(unsigned long long *ctx); \
static __always_inline typeof(name(0)) \
____##name(unsigned long long *ctx ___bpf_ctx_decl(args)); \
typeof(name(0)) name(unsigned long long *ctx) \
{ \
return ____##name(ctx ___bpf_ctx_arg(args)); \
} \
static __always_inline typeof(name(0)) \
____##name(unsigned long long *ctx ___bpf_ctx_decl(args))
struct pt_regs;
#define ___bpf_kprobe_args0() ctx
@@ -446,6 +791,9 @@ struct pt_regs;
#define ___bpf_kprobe_args3(x, args...) ___bpf_kprobe_args2(args), (void *)PT_REGS_PARM3(ctx)
#define ___bpf_kprobe_args4(x, args...) ___bpf_kprobe_args3(args), (void *)PT_REGS_PARM4(ctx)
#define ___bpf_kprobe_args5(x, args...) ___bpf_kprobe_args4(args), (void *)PT_REGS_PARM5(ctx)
#define ___bpf_kprobe_args6(x, args...) ___bpf_kprobe_args5(args), (void *)PT_REGS_PARM6(ctx)
#define ___bpf_kprobe_args7(x, args...) ___bpf_kprobe_args6(args), (void *)PT_REGS_PARM7(ctx)
#define ___bpf_kprobe_args8(x, args...) ___bpf_kprobe_args7(args), (void *)PT_REGS_PARM8(ctx)
#define ___bpf_kprobe_args(args...) ___bpf_apply(___bpf_kprobe_args, ___bpf_narg(args))(args)
/*
@@ -502,6 +850,8 @@ static __always_inline typeof(name(0)) ____##name(struct pt_regs *ctx, ##args)
#define ___bpf_syscall_args3(x, args...) ___bpf_syscall_args2(args), (void *)PT_REGS_PARM3_SYSCALL(regs)
#define ___bpf_syscall_args4(x, args...) ___bpf_syscall_args3(args), (void *)PT_REGS_PARM4_SYSCALL(regs)
#define ___bpf_syscall_args5(x, args...) ___bpf_syscall_args4(args), (void *)PT_REGS_PARM5_SYSCALL(regs)
#define ___bpf_syscall_args6(x, args...) ___bpf_syscall_args5(args), (void *)PT_REGS_PARM6_SYSCALL(regs)
#define ___bpf_syscall_args7(x, args...) ___bpf_syscall_args6(args), (void *)PT_REGS_PARM7_SYSCALL(regs)
#define ___bpf_syscall_args(args...) ___bpf_apply(___bpf_syscall_args, ___bpf_narg(args))(args)
/* If kernel doesn't have CONFIG_ARCH_HAS_SYSCALL_WRAPPER, we have to BPF_CORE_READ from pt_regs */
@@ -511,6 +861,8 @@ static __always_inline typeof(name(0)) ____##name(struct pt_regs *ctx, ##args)
#define ___bpf_syswrap_args3(x, args...) ___bpf_syswrap_args2(args), (void *)PT_REGS_PARM3_CORE_SYSCALL(regs)
#define ___bpf_syswrap_args4(x, args...) ___bpf_syswrap_args3(args), (void *)PT_REGS_PARM4_CORE_SYSCALL(regs)
#define ___bpf_syswrap_args5(x, args...) ___bpf_syswrap_args4(args), (void *)PT_REGS_PARM5_CORE_SYSCALL(regs)
#define ___bpf_syswrap_args6(x, args...) ___bpf_syswrap_args5(args), (void *)PT_REGS_PARM6_CORE_SYSCALL(regs)
#define ___bpf_syswrap_args7(x, args...) ___bpf_syswrap_args6(args), (void *)PT_REGS_PARM7_CORE_SYSCALL(regs)
#define ___bpf_syswrap_args(args...) ___bpf_apply(___bpf_syswrap_args, ___bpf_narg(args))(args)
/*
@@ -560,4 +912,11 @@ ____##name(struct pt_regs *ctx, ##args)
#define BPF_KPROBE_SYSCALL BPF_KSYSCALL
/* BPF_UPROBE and BPF_URETPROBE are identical to BPF_KPROBE and BPF_KRETPROBE,
* but are named way less confusingly for SEC("uprobe") and SEC("uretprobe")
* use cases.
*/
#define BPF_UPROBE(name, args...) BPF_KPROBE(name, ##args)
#define BPF_URETPROBE(name, args...) BPF_KRETPROBE(name, ##args)
#endif

531
src/btf.c
View File

@@ -448,6 +448,165 @@ static int btf_parse_type_sec(struct btf *btf)
return 0;
}
static int btf_validate_str(const struct btf *btf, __u32 str_off, const char *what, __u32 type_id)
{
const char *s;
s = btf__str_by_offset(btf, str_off);
if (!s) {
pr_warn("btf: type [%u]: invalid %s (string offset %u)\n", type_id, what, str_off);
return -EINVAL;
}
return 0;
}
static int btf_validate_id(const struct btf *btf, __u32 id, __u32 ctx_id)
{
const struct btf_type *t;
t = btf__type_by_id(btf, id);
if (!t) {
pr_warn("btf: type [%u]: invalid referenced type ID %u\n", ctx_id, id);
return -EINVAL;
}
return 0;
}
static int btf_validate_type(const struct btf *btf, const struct btf_type *t, __u32 id)
{
__u32 kind = btf_kind(t);
int err, i, n;
err = btf_validate_str(btf, t->name_off, "type name", id);
if (err)
return err;
switch (kind) {
case BTF_KIND_UNKN:
case BTF_KIND_INT:
case BTF_KIND_FWD:
case BTF_KIND_FLOAT:
break;
case BTF_KIND_PTR:
case BTF_KIND_TYPEDEF:
case BTF_KIND_VOLATILE:
case BTF_KIND_CONST:
case BTF_KIND_RESTRICT:
case BTF_KIND_VAR:
case BTF_KIND_DECL_TAG:
case BTF_KIND_TYPE_TAG:
err = btf_validate_id(btf, t->type, id);
if (err)
return err;
break;
case BTF_KIND_ARRAY: {
const struct btf_array *a = btf_array(t);
err = btf_validate_id(btf, a->type, id);
err = err ?: btf_validate_id(btf, a->index_type, id);
if (err)
return err;
break;
}
case BTF_KIND_STRUCT:
case BTF_KIND_UNION: {
const struct btf_member *m = btf_members(t);
n = btf_vlen(t);
for (i = 0; i < n; i++, m++) {
err = btf_validate_str(btf, m->name_off, "field name", id);
err = err ?: btf_validate_id(btf, m->type, id);
if (err)
return err;
}
break;
}
case BTF_KIND_ENUM: {
const struct btf_enum *m = btf_enum(t);
n = btf_vlen(t);
for (i = 0; i < n; i++, m++) {
err = btf_validate_str(btf, m->name_off, "enum name", id);
if (err)
return err;
}
break;
}
case BTF_KIND_ENUM64: {
const struct btf_enum64 *m = btf_enum64(t);
n = btf_vlen(t);
for (i = 0; i < n; i++, m++) {
err = btf_validate_str(btf, m->name_off, "enum name", id);
if (err)
return err;
}
break;
}
case BTF_KIND_FUNC: {
const struct btf_type *ft;
err = btf_validate_id(btf, t->type, id);
if (err)
return err;
ft = btf__type_by_id(btf, t->type);
if (btf_kind(ft) != BTF_KIND_FUNC_PROTO) {
pr_warn("btf: type [%u]: referenced type [%u] is not FUNC_PROTO\n", id, t->type);
return -EINVAL;
}
break;
}
case BTF_KIND_FUNC_PROTO: {
const struct btf_param *m = btf_params(t);
n = btf_vlen(t);
for (i = 0; i < n; i++, m++) {
err = btf_validate_str(btf, m->name_off, "param name", id);
err = err ?: btf_validate_id(btf, m->type, id);
if (err)
return err;
}
break;
}
case BTF_KIND_DATASEC: {
const struct btf_var_secinfo *m = btf_var_secinfos(t);
n = btf_vlen(t);
for (i = 0; i < n; i++, m++) {
err = btf_validate_id(btf, m->type, id);
if (err)
return err;
}
break;
}
default:
pr_warn("btf: type [%u]: unrecognized kind %u\n", id, kind);
return -EINVAL;
}
return 0;
}
/* Validate basic sanity of BTF. It's intentionally less thorough than
* kernel's validation and validates only properties of BTF that libbpf relies
* on to be correct (e.g., valid type IDs, valid string offsets, etc)
*/
static int btf_sanity_check(const struct btf *btf)
{
const struct btf_type *t;
__u32 i, n = btf__type_cnt(btf);
int err;
for (i = 1; i < n; i++) {
t = btf_type_by_id(btf, i);
err = btf_validate_type(btf, t, i);
if (err)
return err;
}
return 0;
}
__u32 btf__type_cnt(const struct btf *btf)
{
return btf->start_id + btf->nr_types;
@@ -688,8 +847,21 @@ int btf__align_of(const struct btf *btf, __u32 id)
if (align <= 0)
return libbpf_err(align);
max_align = max(max_align, align);
/* if field offset isn't aligned according to field
* type's alignment, then struct must be packed
*/
if (btf_member_bitfield_size(t, i) == 0 &&
(m->offset % (8 * align)) != 0)
return 1;
}
/* if struct/union size isn't a multiple of its alignment,
* then struct must be packed
*/
if ((t->size % max_align) != 0)
return 1;
return max_align;
}
default:
@@ -889,6 +1061,7 @@ static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf)
err = btf_parse_str_sec(btf);
err = err ?: btf_parse_type_sec(btf);
err = err ?: btf_sanity_check(btf);
if (err)
goto done;
@@ -906,6 +1079,11 @@ struct btf *btf__new(const void *data, __u32 size)
return libbpf_ptr(btf_new(data, size, NULL));
}
struct btf *btf__new_split(const void *data, __u32 size, struct btf *base_btf)
{
return libbpf_ptr(btf_new(data, size, base_btf));
}
static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
struct btf_ext **btf_ext)
{
@@ -987,10 +1165,9 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
}
}
err = 0;
if (!btf_data) {
err = -ENOENT;
pr_warn("failed to find '%s' ELF section in %s\n", BTF_ELF_SEC, path);
err = -ENODATA;
goto done;
}
btf = btf_new(btf_data->d_buf, btf_data->d_size, base_btf);
@@ -1052,7 +1229,7 @@ static struct btf *btf_parse_raw(const char *path, struct btf *base_btf)
int err = 0;
long sz;
f = fopen(path, "rb");
f = fopen(path, "rbe");
if (!f) {
err = -errno;
goto err_out;
@@ -1145,7 +1322,9 @@ struct btf *btf__parse_split(const char *path, struct btf *base_btf)
static void *btf_get_raw_data(const struct btf *btf, __u32 *size, bool swap_endian);
int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 log_level)
int btf_load_into_kernel(struct btf *btf,
char *log_buf, size_t log_sz, __u32 log_level,
int token_fd)
{
LIBBPF_OPTS(bpf_btf_load_opts, opts);
__u32 buf_sz = 0, raw_size;
@@ -1195,6 +1374,10 @@ retry_load:
opts.log_level = log_level;
}
opts.token_fd = token_fd;
if (token_fd)
opts.btf_flags |= BPF_F_TOKEN_FD;
btf->fd = bpf_btf_load(raw_data, raw_size, &opts);
if (btf->fd < 0) {
/* time to turn on verbose mode and try again */
@@ -1222,7 +1405,7 @@ done:
int btf__load_into_kernel(struct btf *btf)
{
return btf_load_into_kernel(btf, NULL, 0, 0);
return btf_load_into_kernel(btf, NULL, 0, 0, 0);
}
int btf__fd(const struct btf *btf)
@@ -1336,9 +1519,9 @@ struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf)
void *ptr;
int err;
/* we won't know btf_size until we call bpf_obj_get_info_by_fd(). so
/* we won't know btf_size until we call bpf_btf_get_info_by_fd(). so
* let's start with a sane default - 4KiB here - and resize it only if
* bpf_obj_get_info_by_fd() needs a bigger buffer.
* bpf_btf_get_info_by_fd() needs a bigger buffer.
*/
last_size = 4096;
ptr = malloc(last_size);
@@ -1348,7 +1531,7 @@ struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf)
memset(&btf_info, 0, sizeof(btf_info));
btf_info.btf = ptr_to_u64(ptr);
btf_info.btf_size = last_size;
err = bpf_obj_get_info_by_fd(btf_fd, &btf_info, &len);
err = bpf_btf_get_info_by_fd(btf_fd, &btf_info, &len);
if (!err && btf_info.btf_size > last_size) {
void *temp_ptr;
@@ -1366,7 +1549,7 @@ struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf)
btf_info.btf = ptr_to_u64(ptr);
btf_info.btf_size = last_size;
err = bpf_obj_get_info_by_fd(btf_fd, &btf_info, &len);
err = bpf_btf_get_info_by_fd(btf_fd, &btf_info, &len);
}
if (err || btf_info.btf_size > last_size) {
@@ -1559,15 +1742,15 @@ struct btf_pipe {
static int btf_rewrite_str(__u32 *str_off, void *ctx)
{
struct btf_pipe *p = ctx;
void *mapped_off;
long mapped_off;
int off, err;
if (!*str_off) /* nothing to do for empty strings */
return 0;
if (p->str_off_map &&
hashmap__find(p->str_off_map, (void *)(long)*str_off, &mapped_off)) {
*str_off = (__u32)(long)mapped_off;
hashmap__find(p->str_off_map, *str_off, &mapped_off)) {
*str_off = mapped_off;
return 0;
}
@@ -1579,7 +1762,7 @@ static int btf_rewrite_str(__u32 *str_off, void *ctx)
* performing expensive string comparisons.
*/
if (p->str_off_map) {
err = hashmap__append(p->str_off_map, (void *)(long)*str_off, (void *)(long)off);
err = hashmap__append(p->str_off_map, *str_off, off);
if (err)
return err;
}
@@ -1630,8 +1813,8 @@ static int btf_rewrite_type_ids(__u32 *type_id, void *ctx)
return 0;
}
static size_t btf_dedup_identity_hash_fn(const void *key, void *ctx);
static bool btf_dedup_equal_fn(const void *k1, const void *k2, void *ctx);
static size_t btf_dedup_identity_hash_fn(long key, void *ctx);
static bool btf_dedup_equal_fn(long k1, long k2, void *ctx);
int btf__add_btf(struct btf *btf, const struct btf *src_btf)
{
@@ -1724,7 +1907,8 @@ err_out:
memset(btf->strs_data + old_strs_len, 0, btf->hdr->str_len - old_strs_len);
/* and now restore original strings section size; types data size
* wasn't modified, so doesn't need restoring, see big comment above */
* wasn't modified, so doesn't need restoring, see big comment above
*/
btf->hdr->str_len = old_strs_len;
hashmap__free(p.str_off_map);
@@ -2329,7 +2513,7 @@ int btf__add_restrict(struct btf *btf, int ref_type_id)
*/
int btf__add_type_tag(struct btf *btf, const char *value, int ref_type_id)
{
if (!value|| !value[0])
if (!value || !value[0])
return libbpf_err(-EINVAL);
return btf_add_ref_kind(btf, BTF_KIND_TYPE_TAG, value, ref_type_id);
@@ -2866,12 +3050,16 @@ done:
return btf_ext;
}
const void *btf_ext__get_raw_data(const struct btf_ext *btf_ext, __u32 *size)
const void *btf_ext__raw_data(const struct btf_ext *btf_ext, __u32 *size)
{
*size = btf_ext->data_size;
return btf_ext->data;
}
__attribute__((alias("btf_ext__raw_data")))
const void *btf_ext__get_raw_data(const struct btf_ext *btf_ext, __u32 *size);
struct btf_dedup;
static struct btf_dedup *btf_dedup_new(struct btf *btf, const struct btf_dedup_opts *opts);
@@ -2881,6 +3069,7 @@ static int btf_dedup_strings(struct btf_dedup *d);
static int btf_dedup_prim_types(struct btf_dedup *d);
static int btf_dedup_struct_types(struct btf_dedup *d);
static int btf_dedup_ref_types(struct btf_dedup *d);
static int btf_dedup_resolve_fwds(struct btf_dedup *d);
static int btf_dedup_compact_types(struct btf_dedup *d);
static int btf_dedup_remap_types(struct btf_dedup *d);
@@ -2988,15 +3177,16 @@ static int btf_dedup_remap_types(struct btf_dedup *d);
* Algorithm summary
* =================
*
* Algorithm completes its work in 6 separate passes:
* Algorithm completes its work in 7 separate passes:
*
* 1. Strings deduplication.
* 2. Primitive types deduplication (int, enum, fwd).
* 3. Struct/union types deduplication.
* 4. Reference types deduplication (pointers, typedefs, arrays, funcs, func
* 4. Resolve unambiguous forward declarations.
* 5. Reference types deduplication (pointers, typedefs, arrays, funcs, func
* protos, and const/volatile/restrict modifiers).
* 5. Types compaction.
* 6. Types remapping.
* 6. Types compaction.
* 7. Types remapping.
*
* Algorithm determines canonical type descriptor, which is a single
* representative type for each truly unique type. This canonical type is the
@@ -3060,6 +3250,11 @@ int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts)
pr_debug("btf_dedup_struct_types failed:%d\n", err);
goto done;
}
err = btf_dedup_resolve_fwds(d);
if (err < 0) {
pr_debug("btf_dedup_resolve_fwds failed:%d\n", err);
goto done;
}
err = btf_dedup_ref_types(d);
if (err < 0) {
pr_debug("btf_dedup_ref_types failed:%d\n", err);
@@ -3126,12 +3321,11 @@ static long hash_combine(long h, long value)
}
#define for_each_dedup_cand(d, node, hash) \
hashmap__for_each_key_entry(d->dedup_table, node, (void *)hash)
hashmap__for_each_key_entry(d->dedup_table, node, hash)
static int btf_dedup_table_add(struct btf_dedup *d, long hash, __u32 type_id)
{
return hashmap__append(d->dedup_table,
(void *)hash, (void *)(long)type_id);
return hashmap__append(d->dedup_table, hash, type_id);
}
static int btf_dedup_hypot_map_add(struct btf_dedup *d,
@@ -3178,17 +3372,17 @@ static void btf_dedup_free(struct btf_dedup *d)
free(d);
}
static size_t btf_dedup_identity_hash_fn(const void *key, void *ctx)
static size_t btf_dedup_identity_hash_fn(long key, void *ctx)
{
return (size_t)key;
return key;
}
static size_t btf_dedup_collision_hash_fn(const void *key, void *ctx)
static size_t btf_dedup_collision_hash_fn(long key, void *ctx)
{
return 0;
}
static bool btf_dedup_equal_fn(const void *k1, const void *k2, void *ctx)
static bool btf_dedup_equal_fn(long k1, long k2, void *ctx)
{
return k1 == k2;
}
@@ -3404,23 +3598,17 @@ static long btf_hash_enum(struct btf_type *t)
{
long h;
/* don't hash vlen and enum members to support enum fwd resolving */
/* don't hash vlen, enum members and size to support enum fwd resolving */
h = hash_combine(0, t->name_off);
h = hash_combine(h, t->info & ~0xffff);
h = hash_combine(h, t->size);
return h;
}
/* Check structural equality of two ENUMs. */
static bool btf_equal_enum(struct btf_type *t1, struct btf_type *t2)
static bool btf_equal_enum_members(struct btf_type *t1, struct btf_type *t2)
{
const struct btf_enum *m1, *m2;
__u16 vlen;
int i;
if (!btf_equal_common(t1, t2))
return false;
vlen = btf_vlen(t1);
m1 = btf_enum(t1);
m2 = btf_enum(t2);
@@ -3433,15 +3621,12 @@ static bool btf_equal_enum(struct btf_type *t1, struct btf_type *t2)
return true;
}
static bool btf_equal_enum64(struct btf_type *t1, struct btf_type *t2)
static bool btf_equal_enum64_members(struct btf_type *t1, struct btf_type *t2)
{
const struct btf_enum64 *m1, *m2;
__u16 vlen;
int i;
if (!btf_equal_common(t1, t2))
return false;
vlen = btf_vlen(t1);
m1 = btf_enum64(t1);
m2 = btf_enum64(t2);
@@ -3455,6 +3640,19 @@ static bool btf_equal_enum64(struct btf_type *t1, struct btf_type *t2)
return true;
}
/* Check structural equality of two ENUMs or ENUM64s. */
static bool btf_equal_enum(struct btf_type *t1, struct btf_type *t2)
{
if (!btf_equal_common(t1, t2))
return false;
/* t1 & t2 kinds are identical because of btf_equal_common */
if (btf_kind(t1) == BTF_KIND_ENUM)
return btf_equal_enum_members(t1, t2);
else
return btf_equal_enum64_members(t1, t2);
}
static inline bool btf_is_enum_fwd(struct btf_type *t)
{
return btf_is_any_enum(t) && btf_vlen(t) == 0;
@@ -3464,21 +3662,14 @@ static bool btf_compat_enum(struct btf_type *t1, struct btf_type *t2)
{
if (!btf_is_enum_fwd(t1) && !btf_is_enum_fwd(t2))
return btf_equal_enum(t1, t2);
/* ignore vlen when comparing */
/* At this point either t1 or t2 or both are forward declarations, thus:
* - skip comparing vlen because it is zero for forward declarations;
* - skip comparing size to allow enum forward declarations
* to be compatible with enum64 full declarations;
* - skip comparing kind for the same reason.
*/
return t1->name_off == t2->name_off &&
(t1->info & ~0xffff) == (t2->info & ~0xffff) &&
t1->size == t2->size;
}
static bool btf_compat_enum64(struct btf_type *t1, struct btf_type *t2)
{
if (!btf_is_enum_fwd(t1) && !btf_is_enum_fwd(t2))
return btf_equal_enum64(t1, t2);
/* ignore vlen when comparing */
return t1->name_off == t2->name_off &&
(t1->info & ~0xffff) == (t2->info & ~0xffff) &&
t1->size == t2->size;
btf_is_any_enum(t1) && btf_is_any_enum(t2);
}
/*
@@ -3753,7 +3944,7 @@ static int btf_dedup_prim_type(struct btf_dedup *d, __u32 type_id)
case BTF_KIND_INT:
h = btf_hash_int_decl_tag(t);
for_each_dedup_cand(d, hash_entry, h) {
cand_id = (__u32)(long)hash_entry->value;
cand_id = hash_entry->value;
cand = btf_type_by_id(d->btf, cand_id);
if (btf_equal_int_tag(t, cand)) {
new_id = cand_id;
@@ -3763,9 +3954,10 @@ static int btf_dedup_prim_type(struct btf_dedup *d, __u32 type_id)
break;
case BTF_KIND_ENUM:
case BTF_KIND_ENUM64:
h = btf_hash_enum(t);
for_each_dedup_cand(d, hash_entry, h) {
cand_id = (__u32)(long)hash_entry->value;
cand_id = hash_entry->value;
cand = btf_type_by_id(d->btf, cand_id);
if (btf_equal_enum(t, cand)) {
new_id = cand_id;
@@ -3783,32 +3975,11 @@ static int btf_dedup_prim_type(struct btf_dedup *d, __u32 type_id)
}
break;
case BTF_KIND_ENUM64:
h = btf_hash_enum(t);
for_each_dedup_cand(d, hash_entry, h) {
cand_id = (__u32)(long)hash_entry->value;
cand = btf_type_by_id(d->btf, cand_id);
if (btf_equal_enum64(t, cand)) {
new_id = cand_id;
break;
}
if (btf_compat_enum64(t, cand)) {
if (btf_is_enum_fwd(t)) {
/* resolve fwd to full enum */
new_id = cand_id;
break;
}
/* resolve canonical enum fwd to full enum */
d->map[cand_id] = type_id;
}
}
break;
case BTF_KIND_FWD:
case BTF_KIND_FLOAT:
h = btf_hash_common(t);
for_each_dedup_cand(d, hash_entry, h) {
cand_id = (__u32)(long)hash_entry->value;
cand_id = hash_entry->value;
cand = btf_type_by_id(d->btf, cand_id);
if (btf_equal_common(t, cand)) {
new_id = cand_id;
@@ -3887,14 +4058,14 @@ static inline __u16 btf_fwd_kind(struct btf_type *t)
}
/* Check if given two types are identical ARRAY definitions */
static int btf_dedup_identical_arrays(struct btf_dedup *d, __u32 id1, __u32 id2)
static bool btf_dedup_identical_arrays(struct btf_dedup *d, __u32 id1, __u32 id2)
{
struct btf_type *t1, *t2;
t1 = btf_type_by_id(d->btf, id1);
t2 = btf_type_by_id(d->btf, id2);
if (!btf_is_array(t1) || !btf_is_array(t2))
return 0;
return false;
return btf_equal_array(t1, t2);
}
@@ -3918,7 +4089,9 @@ static bool btf_dedup_identical_structs(struct btf_dedup *d, __u32 id1, __u32 id
m1 = btf_members(t1);
m2 = btf_members(t2);
for (i = 0, n = btf_vlen(t1); i < n; i++, m1++, m2++) {
if (m1->type != m2->type)
if (m1->type != m2->type &&
!btf_dedup_identical_arrays(d, m1->type, m2->type) &&
!btf_dedup_identical_structs(d, m1->type, m2->type))
return false;
}
return true;
@@ -4097,10 +4270,8 @@ static int btf_dedup_is_equiv(struct btf_dedup *d, __u32 cand_id,
return btf_equal_int_tag(cand_type, canon_type);
case BTF_KIND_ENUM:
return btf_compat_enum(cand_type, canon_type);
case BTF_KIND_ENUM64:
return btf_compat_enum64(cand_type, canon_type);
return btf_compat_enum(cand_type, canon_type);
case BTF_KIND_FWD:
case BTF_KIND_FLOAT:
@@ -4311,7 +4482,7 @@ static int btf_dedup_struct_type(struct btf_dedup *d, __u32 type_id)
h = btf_hash_struct(t);
for_each_dedup_cand(d, hash_entry, h) {
__u32 cand_id = (__u32)(long)hash_entry->value;
__u32 cand_id = hash_entry->value;
int eq;
/*
@@ -4416,7 +4587,7 @@ static int btf_dedup_ref_type(struct btf_dedup *d, __u32 type_id)
h = btf_hash_common(t);
for_each_dedup_cand(d, hash_entry, h) {
cand_id = (__u32)(long)hash_entry->value;
cand_id = hash_entry->value;
cand = btf_type_by_id(d->btf, cand_id);
if (btf_equal_common(t, cand)) {
new_id = cand_id;
@@ -4433,7 +4604,7 @@ static int btf_dedup_ref_type(struct btf_dedup *d, __u32 type_id)
h = btf_hash_int_decl_tag(t);
for_each_dedup_cand(d, hash_entry, h) {
cand_id = (__u32)(long)hash_entry->value;
cand_id = hash_entry->value;
cand = btf_type_by_id(d->btf, cand_id);
if (btf_equal_int_tag(t, cand)) {
new_id = cand_id;
@@ -4457,7 +4628,7 @@ static int btf_dedup_ref_type(struct btf_dedup *d, __u32 type_id)
h = btf_hash_array(t);
for_each_dedup_cand(d, hash_entry, h) {
cand_id = (__u32)(long)hash_entry->value;
cand_id = hash_entry->value;
cand = btf_type_by_id(d->btf, cand_id);
if (btf_equal_array(t, cand)) {
new_id = cand_id;
@@ -4489,7 +4660,7 @@ static int btf_dedup_ref_type(struct btf_dedup *d, __u32 type_id)
h = btf_hash_fnproto(t);
for_each_dedup_cand(d, hash_entry, h) {
cand_id = (__u32)(long)hash_entry->value;
cand_id = hash_entry->value;
cand = btf_type_by_id(d->btf, cand_id);
if (btf_equal_fnproto(t, cand)) {
new_id = cand_id;
@@ -4525,6 +4696,134 @@ static int btf_dedup_ref_types(struct btf_dedup *d)
return 0;
}
/*
* Collect a map from type names to type ids for all canonical structs
* and unions. If the same name is shared by several canonical types
* use a special value 0 to indicate this fact.
*/
static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map)
{
__u32 nr_types = btf__type_cnt(d->btf);
struct btf_type *t;
__u32 type_id;
__u16 kind;
int err;
/*
* Iterate over base and split module ids in order to get all
* available structs in the map.
*/
for (type_id = 1; type_id < nr_types; ++type_id) {
t = btf_type_by_id(d->btf, type_id);
kind = btf_kind(t);
if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
continue;
/* Skip non-canonical types */
if (type_id != d->map[type_id])
continue;
err = hashmap__add(names_map, t->name_off, type_id);
if (err == -EEXIST)
err = hashmap__set(names_map, t->name_off, 0, NULL, NULL);
if (err)
return err;
}
return 0;
}
static int btf_dedup_resolve_fwd(struct btf_dedup *d, struct hashmap *names_map, __u32 type_id)
{
struct btf_type *t = btf_type_by_id(d->btf, type_id);
enum btf_fwd_kind fwd_kind = btf_kflag(t);
__u16 cand_kind, kind = btf_kind(t);
struct btf_type *cand_t;
uintptr_t cand_id;
if (kind != BTF_KIND_FWD)
return 0;
/* Skip if this FWD already has a mapping */
if (type_id != d->map[type_id])
return 0;
if (!hashmap__find(names_map, t->name_off, &cand_id))
return 0;
/* Zero is a special value indicating that name is not unique */
if (!cand_id)
return 0;
cand_t = btf_type_by_id(d->btf, cand_id);
cand_kind = btf_kind(cand_t);
if ((cand_kind == BTF_KIND_STRUCT && fwd_kind != BTF_FWD_STRUCT) ||
(cand_kind == BTF_KIND_UNION && fwd_kind != BTF_FWD_UNION))
return 0;
d->map[type_id] = cand_id;
return 0;
}
/*
* Resolve unambiguous forward declarations.
*
* The lion's share of all FWD declarations is resolved during
* `btf_dedup_struct_types` phase when different type graphs are
* compared against each other. However, if in some compilation unit a
* FWD declaration is not a part of a type graph compared against
* another type graph that declaration's canonical type would not be
* changed. Example:
*
* CU #1:
*
* struct foo;
* struct foo *some_global;
*
* CU #2:
*
* struct foo { int u; };
* struct foo *another_global;
*
* After `btf_dedup_struct_types` the BTF looks as follows:
*
* [1] STRUCT 'foo' size=4 vlen=1 ...
* [2] INT 'int' size=4 ...
* [3] PTR '(anon)' type_id=1
* [4] FWD 'foo' fwd_kind=struct
* [5] PTR '(anon)' type_id=4
*
* This pass assumes that such FWD declarations should be mapped to
* structs or unions with identical name in case if the name is not
* ambiguous.
*/
static int btf_dedup_resolve_fwds(struct btf_dedup *d)
{
int i, err;
struct hashmap *names_map;
names_map = hashmap__new(btf_dedup_identity_hash_fn, btf_dedup_equal_fn, NULL);
if (IS_ERR(names_map))
return PTR_ERR(names_map);
err = btf_dedup_fill_unique_names_map(d, names_map);
if (err < 0)
goto exit;
for (i = 0; i < d->btf->nr_types; i++) {
err = btf_dedup_resolve_fwd(d, names_map, d->btf->start_id + i);
if (err < 0)
break;
}
exit:
hashmap__free(names_map);
return err;
}
/*
* Compact types.
*
@@ -4642,38 +4941,46 @@ static int btf_dedup_remap_types(struct btf_dedup *d)
*/
struct btf *btf__load_vmlinux_btf(void)
{
struct {
const char *path_fmt;
bool raw_btf;
} locations[] = {
/* try canonical vmlinux BTF through sysfs first */
{ "/sys/kernel/btf/vmlinux", true /* raw BTF */ },
/* fall back to trying to find vmlinux ELF on disk otherwise */
{ "/boot/vmlinux-%1$s" },
{ "/lib/modules/%1$s/vmlinux-%1$s" },
{ "/lib/modules/%1$s/build/vmlinux" },
{ "/usr/lib/modules/%1$s/kernel/vmlinux" },
{ "/usr/lib/debug/boot/vmlinux-%1$s" },
{ "/usr/lib/debug/boot/vmlinux-%1$s.debug" },
{ "/usr/lib/debug/lib/modules/%1$s/vmlinux" },
const char *sysfs_btf_path = "/sys/kernel/btf/vmlinux";
/* fall back locations, trying to find vmlinux on disk */
const char *locations[] = {
"/boot/vmlinux-%1$s",
"/lib/modules/%1$s/vmlinux-%1$s",
"/lib/modules/%1$s/build/vmlinux",
"/usr/lib/modules/%1$s/kernel/vmlinux",
"/usr/lib/debug/boot/vmlinux-%1$s",
"/usr/lib/debug/boot/vmlinux-%1$s.debug",
"/usr/lib/debug/lib/modules/%1$s/vmlinux",
};
char path[PATH_MAX + 1];
struct utsname buf;
struct btf *btf;
int i, err;
/* is canonical sysfs location accessible? */
if (faccessat(AT_FDCWD, sysfs_btf_path, F_OK, AT_EACCESS) < 0) {
pr_warn("kernel BTF is missing at '%s', was CONFIG_DEBUG_INFO_BTF enabled?\n",
sysfs_btf_path);
} else {
btf = btf__parse(sysfs_btf_path, NULL);
if (!btf) {
err = -errno;
pr_warn("failed to read kernel BTF from '%s': %d\n", sysfs_btf_path, err);
return libbpf_err_ptr(err);
}
pr_debug("loaded kernel BTF from '%s'\n", sysfs_btf_path);
return btf;
}
/* try fallback locations */
uname(&buf);
for (i = 0; i < ARRAY_SIZE(locations); i++) {
snprintf(path, PATH_MAX, locations[i].path_fmt, buf.release);
snprintf(path, PATH_MAX, locations[i], buf.release);
if (access(path, R_OK))
if (faccessat(AT_FDCWD, path, R_OK, AT_EACCESS))
continue;
if (locations[i].raw_btf)
btf = btf__parse_raw(path);
else
btf = btf__parse_elf(path, NULL);
btf = btf__parse(path, NULL);
err = libbpf_get_error(btf);
pr_debug("loading kernel BTF '%s': %d\n", path, err);
if (err)

View File

@@ -486,6 +486,8 @@ static inline struct btf_enum *btf_enum(const struct btf_type *t)
return (struct btf_enum *)(t + 1);
}
struct btf_enum64;
static inline struct btf_enum64 *btf_enum64(const struct btf_type *t)
{
return (struct btf_enum64 *)(t + 1);
@@ -493,7 +495,28 @@ static inline struct btf_enum64 *btf_enum64(const struct btf_type *t)
static inline __u64 btf_enum64_value(const struct btf_enum64 *e)
{
return ((__u64)e->val_hi32 << 32) | e->val_lo32;
/* struct btf_enum64 is introduced in Linux 6.0, which is very
* bleeding-edge. Here we are avoiding relying on struct btf_enum64
* definition coming from kernel UAPI headers to support wider range
* of system-wide kernel headers.
*
* Given this header can be also included from C++ applications, that
* further restricts C tricks we can use (like using compatible
* anonymous struct). So just treat struct btf_enum64 as
* a three-element array of u32 and access second (lo32) and third
* (hi32) elements directly.
*
* For reference, here is a struct btf_enum64 definition:
*
* const struct btf_enum64 {
* __u32 name_off;
* __u32 val_lo32;
* __u32 val_hi32;
* };
*/
const __u32 *e64 = (const __u32 *)e;
return ((__u64)e64[2] << 32) | e64[1];
}
static inline struct btf_member *btf_members(const struct btf_type *t)

View File

@@ -13,6 +13,7 @@
#include <ctype.h>
#include <endian.h>
#include <errno.h>
#include <limits.h>
#include <linux/err.h>
#include <linux/btf.h>
#include <linux/kernel.h>
@@ -117,14 +118,14 @@ struct btf_dump {
struct btf_dump_data *typed_dump;
};
static size_t str_hash_fn(const void *key, void *ctx)
static size_t str_hash_fn(long key, void *ctx)
{
return str_hash(key);
return str_hash((void *)key);
}
static bool str_equal_fn(const void *a, const void *b, void *ctx)
static bool str_equal_fn(long a, long b, void *ctx)
{
return strcmp(a, b) == 0;
return strcmp((void *)a, (void *)b) == 0;
}
static const char *btf_name_of(const struct btf_dump *d, __u32 name_off)
@@ -219,6 +220,17 @@ static int btf_dump_resize(struct btf_dump *d)
return 0;
}
static void btf_dump_free_names(struct hashmap *map)
{
size_t bkt;
struct hashmap_entry *cur;
hashmap__for_each_entry(map, cur, bkt)
free((void *)cur->pkey);
hashmap__free(map);
}
void btf_dump__free(struct btf_dump *d)
{
int i;
@@ -237,8 +249,8 @@ void btf_dump__free(struct btf_dump *d)
free(d->cached_names);
free(d->emit_queue);
free(d->decl_stack);
hashmap__free(d->type_names);
hashmap__free(d->ident_names);
btf_dump_free_names(d->type_names);
btf_dump_free_names(d->ident_names);
free(d);
}
@@ -822,14 +834,9 @@ static bool btf_is_struct_packed(const struct btf *btf, __u32 id,
const struct btf_type *t)
{
const struct btf_member *m;
int align, i, bit_sz;
int max_align = 1, align, i, bit_sz;
__u16 vlen;
align = btf__align_of(btf, id);
/* size of a non-packed struct has to be a multiple of its alignment*/
if (align && t->size % align)
return true;
m = btf_members(t);
vlen = btf_vlen(t);
/* all non-bitfield fields have to be naturally aligned */
@@ -838,8 +845,11 @@ static bool btf_is_struct_packed(const struct btf *btf, __u32 id,
bit_sz = btf_member_bitfield_size(t, i);
if (align && bit_sz == 0 && m->offset % (8 * align) != 0)
return true;
max_align = max(align, max_align);
}
/* size of a non-packed struct has to be a multiple of its alignment */
if (t->size % max_align != 0)
return true;
/*
* if original struct was marked as packed, but its layout is
* naturally aligned, we'll detect that it's not packed
@@ -847,44 +857,97 @@ static bool btf_is_struct_packed(const struct btf *btf, __u32 id,
return false;
}
static int chip_away_bits(int total, int at_most)
{
return total % at_most ? : at_most;
}
static void btf_dump_emit_bit_padding(const struct btf_dump *d,
int cur_off, int m_off, int m_bit_sz,
int align, int lvl)
int cur_off, int next_off, int next_align,
bool in_bitfield, int lvl)
{
int off_diff = m_off - cur_off;
int ptr_bits = d->ptr_sz * 8;
const struct {
const char *name;
int bits;
} pads[] = {
{"long", d->ptr_sz * 8}, {"int", 32}, {"short", 16}, {"char", 8}
};
int new_off, pad_bits, bits, i;
const char *pad_type;
if (off_diff <= 0)
/* no gap */
return;
if (m_bit_sz == 0 && off_diff < align * 8)
/* natural padding will take care of a gap */
return;
if (cur_off >= next_off)
return; /* no gap */
while (off_diff > 0) {
const char *pad_type;
int pad_bits;
/* For filling out padding we want to take advantage of
* natural alignment rules to minimize unnecessary explicit
* padding. First, we find the largest type (among long, int,
* short, or char) that can be used to force naturally aligned
* boundary. Once determined, we'll use such type to fill in
* the remaining padding gap. In some cases we can rely on
* compiler filling some gaps, but sometimes we need to force
* alignment to close natural alignment with markers like
* `long: 0` (this is always the case for bitfields). Note
* that even if struct itself has, let's say 4-byte alignment
* (i.e., it only uses up to int-aligned types), using `long:
* X;` explicit padding doesn't actually change struct's
* overall alignment requirements, but compiler does take into
* account that type's (long, in this example) natural
* alignment requirements when adding implicit padding. We use
* this fact heavily and don't worry about ruining correct
* struct alignment requirement.
*/
for (i = 0; i < ARRAY_SIZE(pads); i++) {
pad_bits = pads[i].bits;
pad_type = pads[i].name;
if (ptr_bits > 32 && off_diff > 32) {
pad_type = "long";
pad_bits = chip_away_bits(off_diff, ptr_bits);
} else if (off_diff > 16) {
pad_type = "int";
pad_bits = chip_away_bits(off_diff, 32);
} else if (off_diff > 8) {
pad_type = "short";
pad_bits = chip_away_bits(off_diff, 16);
} else {
pad_type = "char";
pad_bits = chip_away_bits(off_diff, 8);
new_off = roundup(cur_off, pad_bits);
if (new_off <= next_off)
break;
}
if (new_off > cur_off && new_off <= next_off) {
/* We need explicit `<type>: 0` aligning mark if next
* field is right on alignment offset and its
* alignment requirement is less strict than <type>'s
* alignment (so compiler won't naturally align to the
* offset we expect), or if subsequent `<type>: X`,
* will actually completely fit in the remaining hole,
* making compiler basically ignore `<type>: X`
* completely.
*/
if (in_bitfield ||
(new_off == next_off && roundup(cur_off, next_align * 8) != new_off) ||
(new_off != next_off && next_off - new_off <= new_off - cur_off))
/* but for bitfields we'll emit explicit bit count */
btf_dump_printf(d, "\n%s%s: %d;", pfx(lvl), pad_type,
in_bitfield ? new_off - cur_off : 0);
cur_off = new_off;
}
/* Now we know we start at naturally aligned offset for a chosen
* padding type (long, int, short, or char), and so the rest is just
* a straightforward filling of remaining padding gap with full
* `<type>: sizeof(<type>);` markers, except for the last one, which
* might need smaller than sizeof(<type>) padding.
*/
while (cur_off != next_off) {
bits = min(next_off - cur_off, pad_bits);
if (bits == pad_bits) {
btf_dump_printf(d, "\n%s%s: %d;", pfx(lvl), pad_type, pad_bits);
cur_off += bits;
continue;
}
/* For the remainder padding that doesn't cover entire
* pad_type bit length, we pick the smallest necessary type.
* This is pure aesthetics, we could have just used `long`,
* but having smallest necessary one communicates better the
* scale of the padding gap.
*/
for (i = ARRAY_SIZE(pads) - 1; i >= 0; i--) {
pad_type = pads[i].name;
pad_bits = pads[i].bits;
if (pad_bits < bits)
continue;
btf_dump_printf(d, "\n%s%s: %d;", pfx(lvl), pad_type, bits);
cur_off += bits;
break;
}
btf_dump_printf(d, "\n%s%s: %d;", pfx(lvl), pad_type, pad_bits);
off_diff -= pad_bits;
}
}
@@ -904,9 +967,11 @@ static void btf_dump_emit_struct_def(struct btf_dump *d,
{
const struct btf_member *m = btf_members(t);
bool is_struct = btf_is_struct(t);
int align, i, packed, off = 0;
bool packed, prev_bitfield = false;
int align, i, off = 0;
__u16 vlen = btf_vlen(t);
align = btf__align_of(d->btf, id);
packed = is_struct ? btf_is_struct_packed(d->btf, id, t) : 0;
btf_dump_printf(d, "%s%s%s {",
@@ -916,37 +981,47 @@ static void btf_dump_emit_struct_def(struct btf_dump *d,
for (i = 0; i < vlen; i++, m++) {
const char *fname;
int m_off, m_sz;
int m_off, m_sz, m_align;
bool in_bitfield;
fname = btf_name_of(d, m->name_off);
m_sz = btf_member_bitfield_size(t, i);
m_off = btf_member_bit_offset(t, i);
align = packed ? 1 : btf__align_of(d->btf, m->type);
m_align = packed ? 1 : btf__align_of(d->btf, m->type);
btf_dump_emit_bit_padding(d, off, m_off, m_sz, align, lvl + 1);
in_bitfield = prev_bitfield && m_sz != 0;
btf_dump_emit_bit_padding(d, off, m_off, m_align, in_bitfield, lvl + 1);
btf_dump_printf(d, "\n%s", pfx(lvl + 1));
btf_dump_emit_type_decl(d, m->type, fname, lvl + 1);
if (m_sz) {
btf_dump_printf(d, ": %d", m_sz);
off = m_off + m_sz;
prev_bitfield = true;
} else {
m_sz = max((__s64)0, btf__resolve_size(d->btf, m->type));
off = m_off + m_sz * 8;
prev_bitfield = false;
}
btf_dump_printf(d, ";");
}
/* pad at the end, if necessary */
if (is_struct) {
align = packed ? 1 : btf__align_of(d->btf, id);
btf_dump_emit_bit_padding(d, off, t->size * 8, 0, align,
lvl + 1);
}
if (is_struct)
btf_dump_emit_bit_padding(d, off, t->size * 8, align, false, lvl + 1);
if (vlen)
/*
* Keep `struct empty {}` on a single line,
* only print newline when there are regular or padding fields.
*/
if (vlen || t->size) {
btf_dump_printf(d, "\n");
btf_dump_printf(d, "%s}", pfx(lvl));
btf_dump_printf(d, "%s}", pfx(lvl));
} else {
btf_dump_printf(d, "}");
}
if (packed)
btf_dump_printf(d, " __attribute__((packed))");
}
@@ -1058,6 +1133,43 @@ static void btf_dump_emit_enum_def(struct btf_dump *d, __u32 id,
else
btf_dump_emit_enum64_val(d, t, lvl, vlen);
btf_dump_printf(d, "\n%s}", pfx(lvl));
/* special case enums with special sizes */
if (t->size == 1) {
/* one-byte enums can be forced with mode(byte) attribute */
btf_dump_printf(d, " __attribute__((mode(byte)))");
} else if (t->size == 8 && d->ptr_sz == 8) {
/* enum can be 8-byte sized if one of the enumerator values
* doesn't fit in 32-bit integer, or by adding mode(word)
* attribute (but probably only on 64-bit architectures); do
* our best here to try to satisfy the contract without adding
* unnecessary attributes
*/
bool needs_word_mode;
if (btf_is_enum(t)) {
/* enum can't represent 64-bit values, so we need word mode */
needs_word_mode = true;
} else {
/* enum64 needs mode(word) if none of its values has
* non-zero upper 32-bits (which means that all values
* fit in 32-bit integers and won't cause compiler to
* bump enum to be 64-bit naturally
*/
int i;
needs_word_mode = true;
for (i = 0; i < vlen; i++) {
if (btf_enum64(t)[i].val_hi32 != 0) {
needs_word_mode = false;
break;
}
}
}
if (needs_word_mode)
btf_dump_printf(d, " __attribute__((mode(word)))");
}
}
static void btf_dump_emit_fwd_def(struct btf_dump *d, __u32 id,
@@ -1520,11 +1632,22 @@ static void btf_dump_emit_type_cast(struct btf_dump *d, __u32 id,
static size_t btf_dump_name_dups(struct btf_dump *d, struct hashmap *name_map,
const char *orig_name)
{
char *old_name, *new_name;
size_t dup_cnt = 0;
int err;
hashmap__find(name_map, orig_name, (void **)&dup_cnt);
new_name = strdup(orig_name);
if (!new_name)
return 1;
(void)hashmap__find(name_map, orig_name, &dup_cnt);
dup_cnt++;
hashmap__set(name_map, orig_name, (void *)dup_cnt, NULL, NULL);
err = hashmap__set(name_map, new_name, dup_cnt, &old_name, NULL);
if (err)
free(new_name);
free(old_name);
return dup_cnt;
}
@@ -1963,7 +2086,7 @@ static int btf_dump_struct_data(struct btf_dump *d,
{
const struct btf_member *m = btf_members(t);
__u16 n = btf_vlen(t);
int i, err;
int i, err = 0;
/* note that we increment depth before calling btf_dump_print() below;
* this is intentional. btf_dump_data_newline() will not print a
@@ -2127,9 +2250,25 @@ static int btf_dump_type_data_check_overflow(struct btf_dump *d,
const struct btf_type *t,
__u32 id,
const void *data,
__u8 bits_offset)
__u8 bits_offset,
__u8 bit_sz)
{
__s64 size = btf__resolve_size(d->btf, id);
__s64 size;
if (bit_sz) {
/* bits_offset is at most 7. bit_sz is at most 128. */
__u8 nr_bytes = (bits_offset + bit_sz + 7) / 8;
/* When bit_sz is non zero, it is called from
* btf_dump_struct_data() where it only cares about
* negative error value.
* Return nr_bytes in success case to make it
* consistent as the regular integer case below.
*/
return data + nr_bytes > d->typed_dump->data_end ? -E2BIG : nr_bytes;
}
size = btf__resolve_size(d->btf, id);
if (size < 0 || size >= INT_MAX) {
pr_warn("unexpected size [%zu] for id [%u]\n",
@@ -2284,7 +2423,7 @@ static int btf_dump_dump_type_data(struct btf_dump *d,
{
int size, err = 0;
size = btf_dump_type_data_check_overflow(d, t, id, data, bits_offset);
size = btf_dump_type_data_check_overflow(d, t, id, data, bits_offset, bit_sz);
if (size < 0)
return size;
err = btf_dump_type_data_check_zero(d, t, id, data, bits_offset, bit_sz);
@@ -2385,7 +2524,7 @@ int btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
d->typed_dump->indent_lvl = OPTS_GET(opts, indent_level, 0);
/* default indent string is a tab */
if (!opts->indent_str)
if (!OPTS_GET(opts, indent_str, NULL))
d->typed_dump->indent_str[0] = '\t';
else
libbpf_strlcpy(d->typed_dump->indent_str, opts->indent_str,

558
src/elf.c Normal file
View File

@@ -0,0 +1,558 @@
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <libelf.h>
#include <gelf.h>
#include <fcntl.h>
#include <linux/kernel.h>
#include "libbpf_internal.h"
#include "str_error.h"
/* A SHT_GNU_versym section holds 16-bit words. This bit is set if
* the symbol is hidden and can only be seen when referenced using an
* explicit version number. This is a GNU extension.
*/
#define VERSYM_HIDDEN 0x8000
/* This is the mask for the rest of the data in a word read from a
* SHT_GNU_versym section.
*/
#define VERSYM_VERSION 0x7fff
int elf_open(const char *binary_path, struct elf_fd *elf_fd)
{
char errmsg[STRERR_BUFSIZE];
int fd, ret;
Elf *elf;
if (elf_version(EV_CURRENT) == EV_NONE) {
pr_warn("elf: failed to init libelf for %s\n", binary_path);
return -LIBBPF_ERRNO__LIBELF;
}
fd = open(binary_path, O_RDONLY | O_CLOEXEC);
if (fd < 0) {
ret = -errno;
pr_warn("elf: failed to open %s: %s\n", binary_path,
libbpf_strerror_r(ret, errmsg, sizeof(errmsg)));
return ret;
}
elf = elf_begin(fd, ELF_C_READ_MMAP, NULL);
if (!elf) {
pr_warn("elf: could not read elf from %s: %s\n", binary_path, elf_errmsg(-1));
close(fd);
return -LIBBPF_ERRNO__FORMAT;
}
elf_fd->fd = fd;
elf_fd->elf = elf;
return 0;
}
void elf_close(struct elf_fd *elf_fd)
{
if (!elf_fd)
return;
elf_end(elf_fd->elf);
close(elf_fd->fd);
}
/* Return next ELF section of sh_type after scn, or first of that type if scn is NULL. */
static Elf_Scn *elf_find_next_scn_by_type(Elf *elf, int sh_type, Elf_Scn *scn)
{
while ((scn = elf_nextscn(elf, scn)) != NULL) {
GElf_Shdr sh;
if (!gelf_getshdr(scn, &sh))
continue;
if (sh.sh_type == sh_type)
return scn;
}
return NULL;
}
struct elf_sym {
const char *name;
GElf_Sym sym;
GElf_Shdr sh;
int ver;
bool hidden;
};
struct elf_sym_iter {
Elf *elf;
Elf_Data *syms;
Elf_Data *versyms;
Elf_Data *verdefs;
size_t nr_syms;
size_t strtabidx;
size_t verdef_strtabidx;
size_t next_sym_idx;
struct elf_sym sym;
int st_type;
};
static int elf_sym_iter_new(struct elf_sym_iter *iter,
Elf *elf, const char *binary_path,
int sh_type, int st_type)
{
Elf_Scn *scn = NULL;
GElf_Ehdr ehdr;
GElf_Shdr sh;
memset(iter, 0, sizeof(*iter));
if (!gelf_getehdr(elf, &ehdr)) {
pr_warn("elf: failed to get ehdr from %s: %s\n", binary_path, elf_errmsg(-1));
return -EINVAL;
}
scn = elf_find_next_scn_by_type(elf, sh_type, NULL);
if (!scn) {
pr_debug("elf: failed to find symbol table ELF sections in '%s'\n",
binary_path);
return -ENOENT;
}
if (!gelf_getshdr(scn, &sh))
return -EINVAL;
iter->strtabidx = sh.sh_link;
iter->syms = elf_getdata(scn, 0);
if (!iter->syms) {
pr_warn("elf: failed to get symbols for symtab section in '%s': %s\n",
binary_path, elf_errmsg(-1));
return -EINVAL;
}
iter->nr_syms = iter->syms->d_size / sh.sh_entsize;
iter->elf = elf;
iter->st_type = st_type;
/* Version symbol table is meaningful to dynsym only */
if (sh_type != SHT_DYNSYM)
return 0;
scn = elf_find_next_scn_by_type(elf, SHT_GNU_versym, NULL);
if (!scn)
return 0;
iter->versyms = elf_getdata(scn, 0);
scn = elf_find_next_scn_by_type(elf, SHT_GNU_verdef, NULL);
if (!scn)
return 0;
iter->verdefs = elf_getdata(scn, 0);
if (!iter->verdefs || !gelf_getshdr(scn, &sh)) {
pr_warn("elf: failed to get verdef ELF section in '%s'\n", binary_path);
return -EINVAL;
}
iter->verdef_strtabidx = sh.sh_link;
return 0;
}
static struct elf_sym *elf_sym_iter_next(struct elf_sym_iter *iter)
{
struct elf_sym *ret = &iter->sym;
GElf_Sym *sym = &ret->sym;
const char *name = NULL;
GElf_Versym versym;
Elf_Scn *sym_scn;
size_t idx;
for (idx = iter->next_sym_idx; idx < iter->nr_syms; idx++) {
if (!gelf_getsym(iter->syms, idx, sym))
continue;
if (GELF_ST_TYPE(sym->st_info) != iter->st_type)
continue;
name = elf_strptr(iter->elf, iter->strtabidx, sym->st_name);
if (!name)
continue;
sym_scn = elf_getscn(iter->elf, sym->st_shndx);
if (!sym_scn)
continue;
if (!gelf_getshdr(sym_scn, &ret->sh))
continue;
iter->next_sym_idx = idx + 1;
ret->name = name;
ret->ver = 0;
ret->hidden = false;
if (iter->versyms) {
if (!gelf_getversym(iter->versyms, idx, &versym))
continue;
ret->ver = versym & VERSYM_VERSION;
ret->hidden = versym & VERSYM_HIDDEN;
}
return ret;
}
return NULL;
}
static const char *elf_get_vername(struct elf_sym_iter *iter, int ver)
{
GElf_Verdaux verdaux;
GElf_Verdef verdef;
int offset;
if (!iter->verdefs)
return NULL;
offset = 0;
while (gelf_getverdef(iter->verdefs, offset, &verdef)) {
if (verdef.vd_ndx != ver) {
if (!verdef.vd_next)
break;
offset += verdef.vd_next;
continue;
}
if (!gelf_getverdaux(iter->verdefs, offset + verdef.vd_aux, &verdaux))
break;
return elf_strptr(iter->elf, iter->verdef_strtabidx, verdaux.vda_name);
}
return NULL;
}
static bool symbol_match(struct elf_sym_iter *iter, int sh_type, struct elf_sym *sym,
const char *name, size_t name_len, const char *lib_ver)
{
const char *ver_name;
/* Symbols are in forms of func, func@LIB_VER or func@@LIB_VER
* make sure the func part matches the user specified name
*/
if (strncmp(sym->name, name, name_len) != 0)
return false;
/* ...but we don't want a search for "foo" to match 'foo2" also, so any
* additional characters in sname should be of the form "@@LIB".
*/
if (sym->name[name_len] != '\0' && sym->name[name_len] != '@')
return false;
/* If user does not specify symbol version, then we got a match */
if (!lib_ver)
return true;
/* If user specifies symbol version, for dynamic symbols,
* get version name from ELF verdef section for comparison.
*/
if (sh_type == SHT_DYNSYM) {
ver_name = elf_get_vername(iter, sym->ver);
if (!ver_name)
return false;
return strcmp(ver_name, lib_ver) == 0;
}
/* For normal symbols, it is already in form of func@LIB_VER */
return strcmp(sym->name, name) == 0;
}
/* Transform symbol's virtual address (absolute for binaries and relative
* for shared libs) into file offset, which is what kernel is expecting
* for uprobe/uretprobe attachment.
* See Documentation/trace/uprobetracer.rst for more details. This is done
* by looking up symbol's containing section's header and using iter's virtual
* address (sh_addr) and corresponding file offset (sh_offset) to transform
* sym.st_value (virtual address) into desired final file offset.
*/
static unsigned long elf_sym_offset(struct elf_sym *sym)
{
return sym->sym.st_value - sym->sh.sh_addr + sym->sh.sh_offset;
}
/* Find offset of function name in the provided ELF object. "binary_path" is
* the path to the ELF binary represented by "elf", and only used for error
* reporting matters. "name" matches symbol name or name@@LIB for library
* functions.
*/
long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
{
int i, sh_types[2] = { SHT_DYNSYM, SHT_SYMTAB };
const char *at_symbol, *lib_ver;
bool is_shared_lib;
long ret = -ENOENT;
size_t name_len;
GElf_Ehdr ehdr;
if (!gelf_getehdr(elf, &ehdr)) {
pr_warn("elf: failed to get ehdr from %s: %s\n", binary_path, elf_errmsg(-1));
ret = -LIBBPF_ERRNO__FORMAT;
goto out;
}
/* for shared lib case, we do not need to calculate relative offset */
is_shared_lib = ehdr.e_type == ET_DYN;
/* Does name specify "@@LIB_VER" or "@LIB_VER" ? */
at_symbol = strchr(name, '@');
if (at_symbol) {
name_len = at_symbol - name;
/* skip second @ if it's @@LIB_VER case */
if (at_symbol[1] == '@')
at_symbol++;
lib_ver = at_symbol + 1;
} else {
name_len = strlen(name);
lib_ver = NULL;
}
/* Search SHT_DYNSYM, SHT_SYMTAB for symbol. This search order is used because if
* a binary is stripped, it may only have SHT_DYNSYM, and a fully-statically
* linked binary may not have SHT_DYMSYM, so absence of a section should not be
* reported as a warning/error.
*/
for (i = 0; i < ARRAY_SIZE(sh_types); i++) {
struct elf_sym_iter iter;
struct elf_sym *sym;
int last_bind = -1;
int cur_bind;
ret = elf_sym_iter_new(&iter, elf, binary_path, sh_types[i], STT_FUNC);
if (ret == -ENOENT)
continue;
if (ret)
goto out;
while ((sym = elf_sym_iter_next(&iter))) {
if (!symbol_match(&iter, sh_types[i], sym, name, name_len, lib_ver))
continue;
cur_bind = GELF_ST_BIND(sym->sym.st_info);
if (ret > 0) {
/* handle multiple matches */
if (elf_sym_offset(sym) == ret) {
/* same offset, no problem */
continue;
} else if (last_bind != STB_WEAK && cur_bind != STB_WEAK) {
/* Only accept one non-weak bind. */
pr_warn("elf: ambiguous match for '%s', '%s' in '%s'\n",
sym->name, name, binary_path);
ret = -LIBBPF_ERRNO__FORMAT;
goto out;
} else if (cur_bind == STB_WEAK) {
/* already have a non-weak bind, and
* this is a weak bind, so ignore.
*/
continue;
}
}
ret = elf_sym_offset(sym);
last_bind = cur_bind;
}
if (ret > 0)
break;
}
if (ret > 0) {
pr_debug("elf: symbol address match for '%s' in '%s': 0x%lx\n", name, binary_path,
ret);
} else {
if (ret == 0) {
pr_warn("elf: '%s' is 0 in symtab for '%s': %s\n", name, binary_path,
is_shared_lib ? "should not be 0 in a shared library" :
"try using shared library path instead");
ret = -ENOENT;
} else {
pr_warn("elf: failed to find symbol '%s' in '%s'\n", name, binary_path);
}
}
out:
return ret;
}
/* Find offset of function name in ELF object specified by path. "name" matches
* symbol name or name@@LIB for library functions.
*/
long elf_find_func_offset_from_file(const char *binary_path, const char *name)
{
struct elf_fd elf_fd;
long ret = -ENOENT;
ret = elf_open(binary_path, &elf_fd);
if (ret)
return ret;
ret = elf_find_func_offset(elf_fd.elf, binary_path, name);
elf_close(&elf_fd);
return ret;
}
struct symbol {
const char *name;
int bind;
int idx;
};
static int symbol_cmp(const void *a, const void *b)
{
const struct symbol *sym_a = a;
const struct symbol *sym_b = b;
return strcmp(sym_a->name, sym_b->name);
}
/*
* Return offsets in @poffsets for symbols specified in @syms array argument.
* On success returns 0 and offsets are returned in allocated array with @cnt
* size, that needs to be released by the caller.
*/
int elf_resolve_syms_offsets(const char *binary_path, int cnt,
const char **syms, unsigned long **poffsets,
int st_type)
{
int sh_types[2] = { SHT_DYNSYM, SHT_SYMTAB };
int err = 0, i, cnt_done = 0;
unsigned long *offsets;
struct symbol *symbols;
struct elf_fd elf_fd;
err = elf_open(binary_path, &elf_fd);
if (err)
return err;
offsets = calloc(cnt, sizeof(*offsets));
symbols = calloc(cnt, sizeof(*symbols));
if (!offsets || !symbols) {
err = -ENOMEM;
goto out;
}
for (i = 0; i < cnt; i++) {
symbols[i].name = syms[i];
symbols[i].idx = i;
}
qsort(symbols, cnt, sizeof(*symbols), symbol_cmp);
for (i = 0; i < ARRAY_SIZE(sh_types); i++) {
struct elf_sym_iter iter;
struct elf_sym *sym;
err = elf_sym_iter_new(&iter, elf_fd.elf, binary_path, sh_types[i], st_type);
if (err == -ENOENT)
continue;
if (err)
goto out;
while ((sym = elf_sym_iter_next(&iter))) {
unsigned long sym_offset = elf_sym_offset(sym);
int bind = GELF_ST_BIND(sym->sym.st_info);
struct symbol *found, tmp = {
.name = sym->name,
};
unsigned long *offset;
found = bsearch(&tmp, symbols, cnt, sizeof(*symbols), symbol_cmp);
if (!found)
continue;
offset = &offsets[found->idx];
if (*offset > 0) {
/* same offset, no problem */
if (*offset == sym_offset)
continue;
/* handle multiple matches */
if (found->bind != STB_WEAK && bind != STB_WEAK) {
/* Only accept one non-weak bind. */
pr_warn("elf: ambiguous match found '%s@%lu' in '%s' previous offset %lu\n",
sym->name, sym_offset, binary_path, *offset);
err = -ESRCH;
goto out;
} else if (bind == STB_WEAK) {
/* already have a non-weak bind, and
* this is a weak bind, so ignore.
*/
continue;
}
} else {
cnt_done++;
}
*offset = sym_offset;
found->bind = bind;
}
}
if (cnt != cnt_done) {
err = -ENOENT;
goto out;
}
*poffsets = offsets;
out:
free(symbols);
if (err)
free(offsets);
elf_close(&elf_fd);
return err;
}
/*
* Return offsets in @poffsets for symbols specified by @pattern argument.
* On success returns 0 and offsets are returned in allocated @poffsets
* array with the @pctn size, that needs to be released by the caller.
*/
int elf_resolve_pattern_offsets(const char *binary_path, const char *pattern,
unsigned long **poffsets, size_t *pcnt)
{
int sh_types[2] = { SHT_SYMTAB, SHT_DYNSYM };
unsigned long *offsets = NULL;
size_t cap = 0, cnt = 0;
struct elf_fd elf_fd;
int err = 0, i;
err = elf_open(binary_path, &elf_fd);
if (err)
return err;
for (i = 0; i < ARRAY_SIZE(sh_types); i++) {
struct elf_sym_iter iter;
struct elf_sym *sym;
err = elf_sym_iter_new(&iter, elf_fd.elf, binary_path, sh_types[i], STT_FUNC);
if (err == -ENOENT)
continue;
if (err)
goto out;
while ((sym = elf_sym_iter_next(&iter))) {
if (!glob_match(sym->name, pattern))
continue;
err = libbpf_ensure_mem((void **) &offsets, &cap, sizeof(*offsets),
cnt + 1);
if (err)
goto out;
offsets[cnt++] = elf_sym_offset(sym);
}
/* If we found anything in the first symbol section,
* do not search others to avoid duplicates.
*/
if (cnt)
break;
}
if (cnt) {
*poffsets = offsets;
*pcnt = cnt;
} else {
err = -ENOENT;
}
out:
if (err)
free(offsets);
elf_close(&elf_fd);
return err;
}

583
src/features.c Normal file
View File

@@ -0,0 +1,583 @@
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
#include <linux/kernel.h>
#include <linux/filter.h>
#include "bpf.h"
#include "libbpf.h"
#include "libbpf_common.h"
#include "libbpf_internal.h"
#include "str_error.h"
static inline __u64 ptr_to_u64(const void *ptr)
{
return (__u64)(unsigned long)ptr;
}
int probe_fd(int fd)
{
if (fd >= 0)
close(fd);
return fd >= 0;
}
static int probe_kern_prog_name(int token_fd)
{
const size_t attr_sz = offsetofend(union bpf_attr, prog_name);
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
union bpf_attr attr;
int ret;
memset(&attr, 0, attr_sz);
attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
attr.license = ptr_to_u64("GPL");
attr.insns = ptr_to_u64(insns);
attr.insn_cnt = (__u32)ARRAY_SIZE(insns);
attr.prog_token_fd = token_fd;
if (token_fd)
attr.prog_flags |= BPF_F_TOKEN_FD;
libbpf_strlcpy(attr.prog_name, "libbpf_nametest", sizeof(attr.prog_name));
/* make sure loading with name works */
ret = sys_bpf_prog_load(&attr, attr_sz, PROG_LOAD_ATTEMPTS);
return probe_fd(ret);
}
static int probe_kern_global_data(int token_fd)
{
char *cp, errmsg[STRERR_BUFSIZE];
struct bpf_insn insns[] = {
BPF_LD_MAP_VALUE(BPF_REG_1, 0, 16),
BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 42),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
LIBBPF_OPTS(bpf_map_create_opts, map_opts,
.token_fd = token_fd,
.map_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
LIBBPF_OPTS(bpf_prog_load_opts, prog_opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int ret, map, insn_cnt = ARRAY_SIZE(insns);
map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_global", sizeof(int), 32, 1, &map_opts);
if (map < 0) {
ret = -errno;
cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n",
__func__, cp, -ret);
return ret;
}
insns[0].imm = map;
ret = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, &prog_opts);
close(map);
return probe_fd(ret);
}
static int probe_kern_btf(int token_fd)
{
static const char strs[] = "\0int";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_func(int token_fd)
{
static const char strs[] = "\0int\0x\0a";
/* void x(int a) {} */
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* FUNC_PROTO */ /* [2] */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0),
BTF_PARAM_ENC(7, 1),
/* FUNC x */ /* [3] */
BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 0), 2),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_func_global(int token_fd)
{
static const char strs[] = "\0int\0x\0a";
/* static void x(int a) {} */
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* FUNC_PROTO */ /* [2] */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0),
BTF_PARAM_ENC(7, 1),
/* FUNC x BTF_FUNC_GLOBAL */ /* [3] */
BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, BTF_FUNC_GLOBAL), 2),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_datasec(int token_fd)
{
static const char strs[] = "\0x\0.data";
/* static int a; */
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* VAR x */ /* [2] */
BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1),
BTF_VAR_STATIC,
/* DATASEC val */ /* [3] */
BTF_TYPE_ENC(3, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
BTF_VAR_SECINFO_ENC(2, 0, 4),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_qmark_datasec(int token_fd)
{
static const char strs[] = "\0x\0?.data";
/* static int a; */
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* VAR x */ /* [2] */
BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1),
BTF_VAR_STATIC,
/* DATASEC ?.data */ /* [3] */
BTF_TYPE_ENC(3, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
BTF_VAR_SECINFO_ENC(2, 0, 4),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_float(int token_fd)
{
static const char strs[] = "\0float";
__u32 types[] = {
/* float */
BTF_TYPE_FLOAT_ENC(1, 4),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_decl_tag(int token_fd)
{
static const char strs[] = "\0tag";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* VAR x */ /* [2] */
BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1),
BTF_VAR_STATIC,
/* attr */
BTF_TYPE_DECL_TAG_ENC(1, 2, -1),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_type_tag(int token_fd)
{
static const char strs[] = "\0tag";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* attr */
BTF_TYPE_TYPE_TAG_ENC(1, 1), /* [2] */
/* ptr */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 2), /* [3] */
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_array_mmap(int token_fd)
{
LIBBPF_OPTS(bpf_map_create_opts, opts,
.map_flags = BPF_F_MMAPABLE | (token_fd ? BPF_F_TOKEN_FD : 0),
.token_fd = token_fd,
);
int fd;
fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_mmap", sizeof(int), sizeof(int), 1, &opts);
return probe_fd(fd);
}
static int probe_kern_exp_attach_type(int token_fd)
{
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.expected_attach_type = BPF_CGROUP_INET_SOCK_CREATE,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
int fd, insn_cnt = ARRAY_SIZE(insns);
/* use any valid combination of program type and (optional)
* non-zero expected attach type (i.e., not a BPF_CGROUP_INET_INGRESS)
* to see if kernel supports expected_attach_type field for
* BPF_PROG_LOAD command
*/
fd = bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, NULL, "GPL", insns, insn_cnt, &opts);
return probe_fd(fd);
}
static int probe_kern_probe_read_kernel(int token_fd)
{
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
struct bpf_insn insns[] = {
BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), /* r1 = r10 (fp) */
BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), /* r1 += -8 */
BPF_MOV64_IMM(BPF_REG_2, 8), /* r2 = 8 */
BPF_MOV64_IMM(BPF_REG_3, 0), /* r3 = 0 */
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_probe_read_kernel),
BPF_EXIT_INSN(),
};
int fd, insn_cnt = ARRAY_SIZE(insns);
fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, &opts);
return probe_fd(fd);
}
static int probe_prog_bind_map(int token_fd)
{
char *cp, errmsg[STRERR_BUFSIZE];
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
LIBBPF_OPTS(bpf_map_create_opts, map_opts,
.token_fd = token_fd,
.map_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
LIBBPF_OPTS(bpf_prog_load_opts, prog_opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int ret, map, prog, insn_cnt = ARRAY_SIZE(insns);
map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_det_bind", sizeof(int), 32, 1, &map_opts);
if (map < 0) {
ret = -errno;
cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n",
__func__, cp, -ret);
return ret;
}
prog = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, &prog_opts);
if (prog < 0) {
close(map);
return 0;
}
ret = bpf_prog_bind_map(prog, map, NULL);
close(map);
close(prog);
return ret >= 0;
}
static int probe_module_btf(int token_fd)
{
static const char strs[] = "\0int";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),
};
struct bpf_btf_info info;
__u32 len = sizeof(info);
char name[16];
int fd, err;
fd = libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs), token_fd);
if (fd < 0)
return 0; /* BTF not supported at all */
memset(&info, 0, sizeof(info));
info.name = ptr_to_u64(name);
info.name_len = sizeof(name);
/* check that BPF_OBJ_GET_INFO_BY_FD supports specifying name pointer;
* kernel's module BTF support coincides with support for
* name/name_len fields in struct bpf_btf_info.
*/
err = bpf_btf_get_info_by_fd(fd, &info, &len);
close(fd);
return !err;
}
static int probe_perf_link(int token_fd)
{
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int prog_fd, link_fd, err;
prog_fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL",
insns, ARRAY_SIZE(insns), &opts);
if (prog_fd < 0)
return -errno;
/* use invalid perf_event FD to get EBADF, if link is supported;
* otherwise EINVAL should be returned
*/
link_fd = bpf_link_create(prog_fd, -1, BPF_PERF_EVENT, NULL);
err = -errno; /* close() can clobber errno */
if (link_fd >= 0)
close(link_fd);
close(prog_fd);
return link_fd < 0 && err == -EBADF;
}
static int probe_uprobe_multi_link(int token_fd)
{
LIBBPF_OPTS(bpf_prog_load_opts, load_opts,
.expected_attach_type = BPF_TRACE_UPROBE_MULTI,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
LIBBPF_OPTS(bpf_link_create_opts, link_opts);
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
int prog_fd, link_fd, err;
unsigned long offset = 0;
prog_fd = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL",
insns, ARRAY_SIZE(insns), &load_opts);
if (prog_fd < 0)
return -errno;
/* Creating uprobe in '/' binary should fail with -EBADF. */
link_opts.uprobe_multi.path = "/";
link_opts.uprobe_multi.offsets = &offset;
link_opts.uprobe_multi.cnt = 1;
link_fd = bpf_link_create(prog_fd, -1, BPF_TRACE_UPROBE_MULTI, &link_opts);
err = -errno; /* close() can clobber errno */
if (link_fd >= 0)
close(link_fd);
close(prog_fd);
return link_fd < 0 && err == -EBADF;
}
static int probe_kern_bpf_cookie(int token_fd)
{
struct bpf_insn insns[] = {
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_attach_cookie),
BPF_EXIT_INSN(),
};
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int ret, insn_cnt = ARRAY_SIZE(insns);
ret = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, &opts);
return probe_fd(ret);
}
static int probe_kern_btf_enum64(int token_fd)
{
static const char strs[] = "\0enum64";
__u32 types[] = {
BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_ENUM64, 0, 0), 8),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_arg_ctx_tag(int token_fd)
{
static const char strs[] = "\0a\0b\0arg:ctx\0";
const __u32 types[] = {
/* [1] INT */
BTF_TYPE_INT_ENC(1 /* "a" */, BTF_INT_SIGNED, 0, 32, 4),
/* [2] PTR -> VOID */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 0),
/* [3] FUNC_PROTO `int(void *a)` */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 1),
BTF_PARAM_ENC(1 /* "a" */, 2),
/* [4] FUNC 'a' -> FUNC_PROTO (main prog) */
BTF_TYPE_ENC(1 /* "a" */, BTF_INFO_ENC(BTF_KIND_FUNC, 0, BTF_FUNC_GLOBAL), 3),
/* [5] FUNC_PROTO `int(void *b __arg_ctx)` */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 1),
BTF_PARAM_ENC(3 /* "b" */, 2),
/* [6] FUNC 'b' -> FUNC_PROTO (subprog) */
BTF_TYPE_ENC(3 /* "b" */, BTF_INFO_ENC(BTF_KIND_FUNC, 0, BTF_FUNC_GLOBAL), 5),
/* [7] DECL_TAG 'arg:ctx' -> func 'b' arg 'b' */
BTF_TYPE_DECL_TAG_ENC(5 /* "arg:ctx" */, 6, 0),
};
const struct bpf_insn insns[] = {
/* main prog */
BPF_CALL_REL(+1),
BPF_EXIT_INSN(),
/* global subprog */
BPF_EMIT_CALL(BPF_FUNC_get_func_ip), /* needs PTR_TO_CTX */
BPF_EXIT_INSN(),
};
const struct bpf_func_info_min func_infos[] = {
{ 0, 4 }, /* main prog -> FUNC 'a' */
{ 2, 6 }, /* subprog -> FUNC 'b' */
};
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int prog_fd, btf_fd, insn_cnt = ARRAY_SIZE(insns);
btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs), token_fd);
if (btf_fd < 0)
return 0;
opts.prog_btf_fd = btf_fd;
opts.func_info = &func_infos;
opts.func_info_cnt = ARRAY_SIZE(func_infos);
opts.func_info_rec_size = sizeof(func_infos[0]);
prog_fd = bpf_prog_load(BPF_PROG_TYPE_KPROBE, "det_arg_ctx",
"GPL", insns, insn_cnt, &opts);
close(btf_fd);
return probe_fd(prog_fd);
}
typedef int (*feature_probe_fn)(int /* token_fd */);
static struct kern_feature_cache feature_cache;
static struct kern_feature_desc {
const char *desc;
feature_probe_fn probe;
} feature_probes[__FEAT_CNT] = {
[FEAT_PROG_NAME] = {
"BPF program name", probe_kern_prog_name,
},
[FEAT_GLOBAL_DATA] = {
"global variables", probe_kern_global_data,
},
[FEAT_BTF] = {
"minimal BTF", probe_kern_btf,
},
[FEAT_BTF_FUNC] = {
"BTF functions", probe_kern_btf_func,
},
[FEAT_BTF_GLOBAL_FUNC] = {
"BTF global function", probe_kern_btf_func_global,
},
[FEAT_BTF_DATASEC] = {
"BTF data section and variable", probe_kern_btf_datasec,
},
[FEAT_ARRAY_MMAP] = {
"ARRAY map mmap()", probe_kern_array_mmap,
},
[FEAT_EXP_ATTACH_TYPE] = {
"BPF_PROG_LOAD expected_attach_type attribute",
probe_kern_exp_attach_type,
},
[FEAT_PROBE_READ_KERN] = {
"bpf_probe_read_kernel() helper", probe_kern_probe_read_kernel,
},
[FEAT_PROG_BIND_MAP] = {
"BPF_PROG_BIND_MAP support", probe_prog_bind_map,
},
[FEAT_MODULE_BTF] = {
"module BTF support", probe_module_btf,
},
[FEAT_BTF_FLOAT] = {
"BTF_KIND_FLOAT support", probe_kern_btf_float,
},
[FEAT_PERF_LINK] = {
"BPF perf link support", probe_perf_link,
},
[FEAT_BTF_DECL_TAG] = {
"BTF_KIND_DECL_TAG support", probe_kern_btf_decl_tag,
},
[FEAT_BTF_TYPE_TAG] = {
"BTF_KIND_TYPE_TAG support", probe_kern_btf_type_tag,
},
[FEAT_MEMCG_ACCOUNT] = {
"memcg-based memory accounting", probe_memcg_account,
},
[FEAT_BPF_COOKIE] = {
"BPF cookie support", probe_kern_bpf_cookie,
},
[FEAT_BTF_ENUM64] = {
"BTF_KIND_ENUM64 support", probe_kern_btf_enum64,
},
[FEAT_SYSCALL_WRAPPER] = {
"Kernel using syscall wrapper", probe_kern_syscall_wrapper,
},
[FEAT_UPROBE_MULTI_LINK] = {
"BPF multi-uprobe link support", probe_uprobe_multi_link,
},
[FEAT_ARG_CTX_TAG] = {
"kernel-side __arg_ctx tag", probe_kern_arg_ctx_tag,
},
[FEAT_BTF_QMARK_DATASEC] = {
"BTF DATASEC names starting from '?'", probe_kern_btf_qmark_datasec,
},
};
bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id)
{
struct kern_feature_desc *feat = &feature_probes[feat_id];
int ret;
/* assume global feature cache, unless custom one is provided */
if (!cache)
cache = &feature_cache;
if (READ_ONCE(cache->res[feat_id]) == FEAT_UNKNOWN) {
ret = feat->probe(cache->token_fd);
if (ret > 0) {
WRITE_ONCE(cache->res[feat_id], FEAT_SUPPORTED);
} else if (ret == 0) {
WRITE_ONCE(cache->res[feat_id], FEAT_MISSING);
} else {
pr_warn("Detection of kernel %s support failed: %d\n", feat->desc, ret);
WRITE_ONCE(cache->res[feat_id], FEAT_MISSING);
}
}
return READ_ONCE(cache->res[feat_id]) == FEAT_SUPPORTED;
}

View File

@@ -560,7 +560,7 @@ static void emit_find_attach_target(struct bpf_gen *gen)
}
void bpf_gen__record_extern(struct bpf_gen *gen, const char *name, bool is_weak,
bool is_typeless, int kind, int insn_idx)
bool is_typeless, bool is_ld64, int kind, int insn_idx)
{
struct ksym_relo_desc *relo;
@@ -574,6 +574,7 @@ void bpf_gen__record_extern(struct bpf_gen *gen, const char *name, bool is_weak,
relo->name = name;
relo->is_weak = is_weak;
relo->is_typeless = is_typeless;
relo->is_ld64 = is_ld64;
relo->kind = kind;
relo->insn_idx = insn_idx;
gen->relo_cnt++;
@@ -586,9 +587,11 @@ static struct ksym_desc *get_ksym_desc(struct bpf_gen *gen, struct ksym_relo_des
int i;
for (i = 0; i < gen->nr_ksyms; i++) {
if (!strcmp(gen->ksyms[i].name, relo->name)) {
gen->ksyms[i].ref++;
return &gen->ksyms[i];
kdesc = &gen->ksyms[i];
if (kdesc->kind == relo->kind && kdesc->is_ld64 == relo->is_ld64 &&
!strcmp(kdesc->name, relo->name)) {
kdesc->ref++;
return kdesc;
}
}
kdesc = libbpf_reallocarray(gen->ksyms, gen->nr_ksyms + 1, sizeof(*kdesc));
@@ -603,6 +606,7 @@ static struct ksym_desc *get_ksym_desc(struct bpf_gen *gen, struct ksym_relo_des
kdesc->ref = 1;
kdesc->off = 0;
kdesc->insn = 0;
kdesc->is_ld64 = relo->is_ld64;
return kdesc;
}
@@ -699,17 +703,17 @@ static void emit_relo_kfunc_btf(struct bpf_gen *gen, struct ksym_relo_desc *relo
/* obtain fd in BPF_REG_9 */
emit(gen, BPF_MOV64_REG(BPF_REG_9, BPF_REG_7));
emit(gen, BPF_ALU64_IMM(BPF_RSH, BPF_REG_9, 32));
/* jump to fd_array store if fd denotes module BTF */
/* load fd_array slot pointer */
emit2(gen, BPF_LD_IMM64_RAW_FULL(BPF_REG_0, BPF_PSEUDO_MAP_IDX_VALUE,
0, 0, 0, blob_fd_array_off(gen, btf_fd_idx)));
/* store BTF fd in slot, 0 for vmlinux */
emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_0, BPF_REG_9, 0));
/* jump to insn[insn_idx].off store if fd denotes module BTF */
emit(gen, BPF_JMP_IMM(BPF_JNE, BPF_REG_9, 0, 2));
/* set the default value for off */
emit(gen, BPF_ST_MEM(BPF_H, BPF_REG_8, offsetof(struct bpf_insn, off), 0));
/* skip BTF fd store for vmlinux BTF */
emit(gen, BPF_JMP_IMM(BPF_JA, 0, 0, 4));
/* load fd_array slot pointer */
emit2(gen, BPF_LD_IMM64_RAW_FULL(BPF_REG_0, BPF_PSEUDO_MAP_IDX_VALUE,
0, 0, 0, blob_fd_array_off(gen, btf_fd_idx)));
/* store BTF fd in slot */
emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_0, BPF_REG_9, 0));
emit(gen, BPF_JMP_IMM(BPF_JA, 0, 0, 1));
/* store index into insn[insn_idx].off */
emit(gen, BPF_ST_MEM(BPF_H, BPF_REG_8, offsetof(struct bpf_insn, off), btf_fd_idx));
log:
@@ -804,11 +808,13 @@ static void emit_relo_ksym_btf(struct bpf_gen *gen, struct ksym_relo_desc *relo,
return;
/* try to copy from existing ldimm64 insn */
if (kdesc->ref > 1) {
move_blob2blob(gen, insn + offsetof(struct bpf_insn, imm), 4,
kdesc->insn + offsetof(struct bpf_insn, imm));
move_blob2blob(gen, insn + sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm), 4,
kdesc->insn + sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm));
/* jump over src_reg adjustment if imm is not 0, reuse BPF_REG_0 from move_blob2blob */
move_blob2blob(gen, insn + offsetof(struct bpf_insn, imm), 4,
kdesc->insn + offsetof(struct bpf_insn, imm));
/* jump over src_reg adjustment if imm (btf_id) is not 0, reuse BPF_REG_0 from move_blob2blob
* If btf_id is zero, clear BPF_PSEUDO_BTF_ID flag in src_reg of ld_imm64 insn
*/
emit(gen, BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 3));
goto clear_src_reg;
}
@@ -831,7 +837,7 @@ static void emit_relo_ksym_btf(struct bpf_gen *gen, struct ksym_relo_desc *relo,
emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_8, BPF_REG_7,
sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm)));
/* skip src_reg adjustment */
emit(gen, BPF_JMP_IMM(BPF_JSGE, BPF_REG_7, 0, 3));
emit(gen, BPF_JMP_IMM(BPF_JA, 0, 0, 3));
clear_src_reg:
/* clear bpf_object__relocate_data's src_reg assignment, otherwise we get a verifier failure */
reg_mask = src_reg_mask();
@@ -862,23 +868,17 @@ static void emit_relo(struct bpf_gen *gen, struct ksym_relo_desc *relo, int insn
{
int insn;
pr_debug("gen: emit_relo (%d): %s at %d\n", relo->kind, relo->name, relo->insn_idx);
pr_debug("gen: emit_relo (%d): %s at %d %s\n",
relo->kind, relo->name, relo->insn_idx, relo->is_ld64 ? "ld64" : "call");
insn = insns + sizeof(struct bpf_insn) * relo->insn_idx;
emit2(gen, BPF_LD_IMM64_RAW_FULL(BPF_REG_8, BPF_PSEUDO_MAP_IDX_VALUE, 0, 0, 0, insn));
switch (relo->kind) {
case BTF_KIND_VAR:
if (relo->is_ld64) {
if (relo->is_typeless)
emit_relo_ksym_typeless(gen, relo, insn);
else
emit_relo_ksym_btf(gen, relo, insn);
break;
case BTF_KIND_FUNC:
} else {
emit_relo_kfunc_btf(gen, relo, insn);
break;
default:
pr_warn("Unknown relocation kind '%d'\n", relo->kind);
gen->error = -EDOM;
return;
}
}
@@ -901,18 +901,20 @@ static void cleanup_core_relo(struct bpf_gen *gen)
static void cleanup_relos(struct bpf_gen *gen, int insns)
{
struct ksym_desc *kdesc;
int i, insn;
for (i = 0; i < gen->nr_ksyms; i++) {
kdesc = &gen->ksyms[i];
/* only close fds for typed ksyms and kfuncs */
if (gen->ksyms[i].kind == BTF_KIND_VAR && !gen->ksyms[i].typeless) {
if (kdesc->is_ld64 && !kdesc->typeless) {
/* close fd recorded in insn[insn_idx + 1].imm */
insn = gen->ksyms[i].insn;
insn = kdesc->insn;
insn += sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm);
emit_sys_close_blob(gen, insn);
} else if (gen->ksyms[i].kind == BTF_KIND_FUNC) {
emit_sys_close_blob(gen, blob_fd_array_off(gen, gen->ksyms[i].off));
if (gen->ksyms[i].off < MAX_FD_ARRAY_SZ)
} else if (!kdesc->is_ld64) {
emit_sys_close_blob(gen, blob_fd_array_off(gen, kdesc->off));
if (kdesc->off < MAX_FD_ARRAY_SZ)
gen->nr_fd_array--;
}
}

View File

@@ -128,7 +128,7 @@ static int hashmap_grow(struct hashmap *map)
}
static bool hashmap_find_entry(const struct hashmap *map,
const void *key, size_t hash,
const long key, size_t hash,
struct hashmap_entry ***pprev,
struct hashmap_entry **entry)
{
@@ -151,18 +151,18 @@ static bool hashmap_find_entry(const struct hashmap *map,
return false;
}
int hashmap__insert(struct hashmap *map, const void *key, void *value,
enum hashmap_insert_strategy strategy,
const void **old_key, void **old_value)
int hashmap_insert(struct hashmap *map, long key, long value,
enum hashmap_insert_strategy strategy,
long *old_key, long *old_value)
{
struct hashmap_entry *entry;
size_t h;
int err;
if (old_key)
*old_key = NULL;
*old_key = 0;
if (old_value)
*old_value = NULL;
*old_value = 0;
h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
if (strategy != HASHMAP_APPEND &&
@@ -203,7 +203,7 @@ int hashmap__insert(struct hashmap *map, const void *key, void *value,
return 0;
}
bool hashmap__find(const struct hashmap *map, const void *key, void **value)
bool hashmap_find(const struct hashmap *map, long key, long *value)
{
struct hashmap_entry *entry;
size_t h;
@@ -217,8 +217,8 @@ bool hashmap__find(const struct hashmap *map, const void *key, void **value)
return true;
}
bool hashmap__delete(struct hashmap *map, const void *key,
const void **old_key, void **old_value)
bool hashmap_delete(struct hashmap *map, long key,
long *old_key, long *old_value)
{
struct hashmap_entry **pprev, *entry;
size_t h;

View File

@@ -40,12 +40,32 @@ static inline size_t str_hash(const char *s)
return h;
}
typedef size_t (*hashmap_hash_fn)(const void *key, void *ctx);
typedef bool (*hashmap_equal_fn)(const void *key1, const void *key2, void *ctx);
typedef size_t (*hashmap_hash_fn)(long key, void *ctx);
typedef bool (*hashmap_equal_fn)(long key1, long key2, void *ctx);
/*
* Hashmap interface is polymorphic, keys and values could be either
* long-sized integers or pointers, this is achieved as follows:
* - interface functions that operate on keys and values are hidden
* behind auxiliary macros, e.g. hashmap_insert <-> hashmap__insert;
* - these auxiliary macros cast the key and value parameters as
* long or long *, so the user does not have to specify the casts explicitly;
* - for pointer parameters (e.g. old_key) the size of the pointed
* type is verified by hashmap_cast_ptr using _Static_assert;
* - when iterating using hashmap__for_each_* forms
* hasmap_entry->key should be used for integer keys and
* hasmap_entry->pkey should be used for pointer keys,
* same goes for values.
*/
struct hashmap_entry {
const void *key;
void *value;
union {
long key;
const void *pkey;
};
union {
long value;
void *pvalue;
};
struct hashmap_entry *next;
};
@@ -60,16 +80,6 @@ struct hashmap {
size_t sz;
};
#define HASHMAP_INIT(hash_fn, equal_fn, ctx) { \
.hash_fn = (hash_fn), \
.equal_fn = (equal_fn), \
.ctx = (ctx), \
.buckets = NULL, \
.cap = 0, \
.cap_bits = 0, \
.sz = 0, \
}
void hashmap__init(struct hashmap *map, hashmap_hash_fn hash_fn,
hashmap_equal_fn equal_fn, void *ctx);
struct hashmap *hashmap__new(hashmap_hash_fn hash_fn,
@@ -102,6 +112,13 @@ enum hashmap_insert_strategy {
HASHMAP_APPEND,
};
#define hashmap_cast_ptr(p) ({ \
_Static_assert((__builtin_constant_p((p)) ? (p) == NULL : 0) || \
sizeof(*(p)) == sizeof(long), \
#p " pointee should be a long-sized integer or a pointer"); \
(long *)(p); \
})
/*
* hashmap__insert() adds key/value entry w/ various semantics, depending on
* provided strategy value. If a given key/value pair replaced already
@@ -109,42 +126,38 @@ enum hashmap_insert_strategy {
* through old_key and old_value to allow calling code do proper memory
* management.
*/
int hashmap__insert(struct hashmap *map, const void *key, void *value,
enum hashmap_insert_strategy strategy,
const void **old_key, void **old_value);
int hashmap_insert(struct hashmap *map, long key, long value,
enum hashmap_insert_strategy strategy,
long *old_key, long *old_value);
static inline int hashmap__add(struct hashmap *map,
const void *key, void *value)
{
return hashmap__insert(map, key, value, HASHMAP_ADD, NULL, NULL);
}
#define hashmap__insert(map, key, value, strategy, old_key, old_value) \
hashmap_insert((map), (long)(key), (long)(value), (strategy), \
hashmap_cast_ptr(old_key), \
hashmap_cast_ptr(old_value))
static inline int hashmap__set(struct hashmap *map,
const void *key, void *value,
const void **old_key, void **old_value)
{
return hashmap__insert(map, key, value, HASHMAP_SET,
old_key, old_value);
}
#define hashmap__add(map, key, value) \
hashmap__insert((map), (key), (value), HASHMAP_ADD, NULL, NULL)
static inline int hashmap__update(struct hashmap *map,
const void *key, void *value,
const void **old_key, void **old_value)
{
return hashmap__insert(map, key, value, HASHMAP_UPDATE,
old_key, old_value);
}
#define hashmap__set(map, key, value, old_key, old_value) \
hashmap__insert((map), (key), (value), HASHMAP_SET, (old_key), (old_value))
static inline int hashmap__append(struct hashmap *map,
const void *key, void *value)
{
return hashmap__insert(map, key, value, HASHMAP_APPEND, NULL, NULL);
}
#define hashmap__update(map, key, value, old_key, old_value) \
hashmap__insert((map), (key), (value), HASHMAP_UPDATE, (old_key), (old_value))
bool hashmap__delete(struct hashmap *map, const void *key,
const void **old_key, void **old_value);
#define hashmap__append(map, key, value) \
hashmap__insert((map), (key), (value), HASHMAP_APPEND, NULL, NULL)
bool hashmap__find(const struct hashmap *map, const void *key, void **value);
bool hashmap_delete(struct hashmap *map, long key, long *old_key, long *old_value);
#define hashmap__delete(map, key, old_key, old_value) \
hashmap_delete((map), (long)(key), \
hashmap_cast_ptr(old_key), \
hashmap_cast_ptr(old_value))
bool hashmap_find(const struct hashmap *map, long key, long *value);
#define hashmap__find(map, key, value) \
hashmap_find((map), (long)(key), hashmap_cast_ptr(value))
/*
* hashmap__for_each_entry - iterate over all entries in hashmap

File diff suppressed because it is too large Load Diff

View File

@@ -96,6 +96,14 @@ enum libbpf_print_level {
typedef int (*libbpf_print_fn_t)(enum libbpf_print_level level,
const char *, va_list ap);
/**
* @brief **libbpf_set_print()** sets user-provided log callback function to
* be used for libbpf warnings and informational messages.
* @param fn The log print function. If NULL, libbpf won't print anything.
* @return Pointer to old print function.
*
* This function is thread-safe.
*/
LIBBPF_API libbpf_print_fn_t libbpf_set_print(libbpf_print_fn_t fn);
/* Hide internal to user */
@@ -118,7 +126,9 @@ struct bpf_object_open_opts {
* auto-pinned to that path on load; defaults to "/sys/fs/bpf".
*/
const char *pin_root_path;
long :0;
__u32 :32; /* stub out now removed attach_prog_fd */
/* Additional kernel config content that augments and overrides
* system Kconfig for CONFIG_xxx externs.
*/
@@ -167,11 +177,38 @@ struct bpf_object_open_opts {
* logs through its print callback.
*/
__u32 kernel_log_level;
/* Path to BPF FS mount point to derive BPF token from.
*
* Created BPF token will be used for all bpf() syscall operations
* that accept BPF token (e.g., map creation, BTF and program loads,
* etc) automatically within instantiated BPF object.
*
* If bpf_token_path is not specified, libbpf will consult
* LIBBPF_BPF_TOKEN_PATH environment variable. If set, it will be
* taken as a value of bpf_token_path option and will force libbpf to
* either create BPF token from provided custom BPF FS path, or will
* disable implicit BPF token creation, if envvar value is an empty
* string. bpf_token_path overrides LIBBPF_BPF_TOKEN_PATH, if both are
* set at the same time.
*
* Setting bpf_token_path option to empty string disables libbpf's
* automatic attempt to create BPF token from default BPF FS mount
* point (/sys/fs/bpf), in case this default behavior is undesirable.
*/
const char *bpf_token_path;
size_t :0;
};
#define bpf_object_open_opts__last_field kernel_log_level
#define bpf_object_open_opts__last_field bpf_token_path
/**
* @brief **bpf_object__open()** creates a bpf_object by opening
* the BPF ELF object file pointed to by the passed path and loading it
* into memory.
* @param path BPF object file path.
* @return pointer to the new bpf_object; or NULL is returned on error,
* error code is stored in errno
*/
LIBBPF_API struct bpf_object *bpf_object__open(const char *path);
/**
@@ -201,16 +238,46 @@ LIBBPF_API struct bpf_object *
bpf_object__open_mem(const void *obj_buf, size_t obj_buf_sz,
const struct bpf_object_open_opts *opts);
/* Load/unload object into/from kernel */
/**
* @brief **bpf_object__load()** loads BPF object into kernel.
* @param obj Pointer to a valid BPF object instance returned by
* **bpf_object__open*()** APIs
* @return 0, on success; negative error code, otherwise, error code is
* stored in errno
*/
LIBBPF_API int bpf_object__load(struct bpf_object *obj);
LIBBPF_API void bpf_object__close(struct bpf_object *object);
/**
* @brief **bpf_object__close()** closes a BPF object and releases all
* resources.
* @param obj Pointer to a valid BPF object
*/
LIBBPF_API void bpf_object__close(struct bpf_object *obj);
/* pin_maps and unpin_maps can both be called with a NULL path, in which case
* they will use the pin_path attribute of each map (and ignore all maps that
* don't have a pin_path set).
/**
* @brief **bpf_object__pin_maps()** pins each map contained within
* the BPF object at the passed directory.
* @param obj Pointer to a valid BPF object
* @param path A directory where maps should be pinned.
* @return 0, on success; negative error code, otherwise
*
* If `path` is NULL `bpf_map__pin` (which is being used on each map)
* will use the pin_path attribute of each map. In this case, maps that
* don't have a pin_path set will be ignored.
*/
LIBBPF_API int bpf_object__pin_maps(struct bpf_object *obj, const char *path);
/**
* @brief **bpf_object__unpin_maps()** unpins each map contained within
* the BPF object found in the passed directory.
* @param obj Pointer to a valid BPF object
* @param path A directory where pinned maps should be searched for.
* @return 0, on success; negative error code, otherwise
*
* If `path` is NULL `bpf_map__unpin` (which is being used on each map)
* will use the pin_path attribute of each map. In this case, maps that
* don't have a pin_path set will be ignored.
*/
LIBBPF_API int bpf_object__unpin_maps(struct bpf_object *obj,
const char *path);
LIBBPF_API int bpf_object__pin_programs(struct bpf_object *obj,
@@ -218,6 +285,7 @@ LIBBPF_API int bpf_object__pin_programs(struct bpf_object *obj,
LIBBPF_API int bpf_object__unpin_programs(struct bpf_object *obj,
const char *path);
LIBBPF_API int bpf_object__pin(struct bpf_object *object, const char *path);
LIBBPF_API int bpf_object__unpin(struct bpf_object *object, const char *path);
LIBBPF_API const char *bpf_object__name(const struct bpf_object *obj);
LIBBPF_API unsigned int bpf_object__kversion(const struct bpf_object *obj);
@@ -401,12 +469,15 @@ LIBBPF_API struct bpf_link *
bpf_program__attach(const struct bpf_program *prog);
struct bpf_perf_event_opts {
/* size of this struct, for forward/backward compatiblity */
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* custom user-provided value fetchable through bpf_get_attach_cookie() */
__u64 bpf_cookie;
/* don't use BPF link when attach BPF program */
bool force_ioctl_attach;
size_t :0;
};
#define bpf_perf_event_opts__last_field bpf_cookie
#define bpf_perf_event_opts__last_field force_ioctl_attach
LIBBPF_API struct bpf_link *
bpf_program__attach_perf_event(const struct bpf_program *prog, int pfd);
@@ -415,8 +486,25 @@ LIBBPF_API struct bpf_link *
bpf_program__attach_perf_event_opts(const struct bpf_program *prog, int pfd,
const struct bpf_perf_event_opts *opts);
/**
* enum probe_attach_mode - the mode to attach kprobe/uprobe
*
* force libbpf to attach kprobe/uprobe in specific mode, -ENOTSUP will
* be returned if it is not supported by the kernel.
*/
enum probe_attach_mode {
/* attach probe in latest supported mode by kernel */
PROBE_ATTACH_MODE_DEFAULT = 0,
/* attach probe in legacy mode, using debugfs/tracefs */
PROBE_ATTACH_MODE_LEGACY,
/* create perf event with perf_event_open() syscall */
PROBE_ATTACH_MODE_PERF,
/* attach probe with BPF link */
PROBE_ATTACH_MODE_LINK,
};
struct bpf_kprobe_opts {
/* size of this struct, for forward/backward compatiblity */
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* custom user-provided value fetchable through bpf_get_attach_cookie() */
__u64 bpf_cookie;
@@ -424,9 +512,11 @@ struct bpf_kprobe_opts {
size_t offset;
/* kprobe is return probe */
bool retprobe;
/* kprobe attach mode */
enum probe_attach_mode attach_mode;
size_t :0;
};
#define bpf_kprobe_opts__last_field retprobe
#define bpf_kprobe_opts__last_field attach_mode
LIBBPF_API struct bpf_link *
bpf_program__attach_kprobe(const struct bpf_program *prog, bool retprobe,
@@ -459,8 +549,59 @@ bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog,
const char *pattern,
const struct bpf_kprobe_multi_opts *opts);
struct bpf_uprobe_multi_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* array of function symbols to attach to */
const char **syms;
/* array of function addresses to attach to */
const unsigned long *offsets;
/* optional, array of associated ref counter offsets */
const unsigned long *ref_ctr_offsets;
/* optional, array of associated BPF cookies */
const __u64 *cookies;
/* number of elements in syms/addrs/cookies arrays */
size_t cnt;
/* create return uprobes */
bool retprobe;
size_t :0;
};
#define bpf_uprobe_multi_opts__last_field retprobe
/**
* @brief **bpf_program__attach_uprobe_multi()** attaches a BPF program
* to multiple uprobes with uprobe_multi link.
*
* User can specify 2 mutually exclusive set of inputs:
*
* 1) use only path/func_pattern/pid arguments
*
* 2) use path/pid with allowed combinations of
* syms/offsets/ref_ctr_offsets/cookies/cnt
*
* - syms and offsets are mutually exclusive
* - ref_ctr_offsets and cookies are optional
*
*
* @param prog BPF program to attach
* @param pid Process ID to attach the uprobe to, 0 for self (own process),
* -1 for all processes
* @param binary_path Path to binary
* @param func_pattern Regular expression to specify functions to attach
* BPF program to
* @param opts Additional options (see **struct bpf_uprobe_multi_opts**)
* @return 0, on success; negative error code, otherwise
*/
LIBBPF_API struct bpf_link *
bpf_program__attach_uprobe_multi(const struct bpf_program *prog,
pid_t pid,
const char *binary_path,
const char *func_pattern,
const struct bpf_uprobe_multi_opts *opts);
struct bpf_ksyscall_opts {
/* size of this struct, for forward/backward compatiblity */
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* custom user-provided value fetchable through bpf_get_attach_cookie() */
__u64 bpf_cookie;
@@ -506,7 +647,7 @@ bpf_program__attach_ksyscall(const struct bpf_program *prog,
const struct bpf_ksyscall_opts *opts);
struct bpf_uprobe_opts {
/* size of this struct, for forward/backward compatiblity */
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* offset of kernel reference counted USDT semaphore, added in
* a6ca88b241d5 ("trace_uprobe: support reference counter in fd-based uprobe")
@@ -524,9 +665,11 @@ struct bpf_uprobe_opts {
* binary_path.
*/
const char *func_name;
/* uprobe attach mode */
enum probe_attach_mode attach_mode;
size_t :0;
};
#define bpf_uprobe_opts__last_field func_name
#define bpf_uprobe_opts__last_field attach_mode
/**
* @brief **bpf_program__attach_uprobe()** attaches a BPF program
@@ -600,7 +743,7 @@ bpf_program__attach_usdt(const struct bpf_program *prog,
const struct bpf_usdt_opts *opts);
struct bpf_tracepoint_opts {
/* size of this struct, for forward/backward compatiblity */
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* custom user-provided value fetchable through bpf_get_attach_cookie() */
__u64 bpf_cookie;
@@ -617,9 +760,20 @@ bpf_program__attach_tracepoint_opts(const struct bpf_program *prog,
const char *tp_name,
const struct bpf_tracepoint_opts *opts);
struct bpf_raw_tracepoint_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u64 cookie;
size_t :0;
};
#define bpf_raw_tracepoint_opts__last_field cookie
LIBBPF_API struct bpf_link *
bpf_program__attach_raw_tracepoint(const struct bpf_program *prog,
const char *tp_name);
LIBBPF_API struct bpf_link *
bpf_program__attach_raw_tracepoint_opts(const struct bpf_program *prog,
const char *tp_name,
struct bpf_raw_tracepoint_opts *opts);
struct bpf_trace_opts {
/* size of this struct, for forward/backward compatibility */
@@ -646,9 +800,55 @@ LIBBPF_API struct bpf_link *
bpf_program__attach_freplace(const struct bpf_program *prog,
int target_fd, const char *attach_func_name);
struct bpf_netfilter_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
__u32 pf;
__u32 hooknum;
__s32 priority;
__u32 flags;
};
#define bpf_netfilter_opts__last_field flags
LIBBPF_API struct bpf_link *
bpf_program__attach_netfilter(const struct bpf_program *prog,
const struct bpf_netfilter_opts *opts);
struct bpf_tcx_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
__u32 flags;
__u32 relative_fd;
__u32 relative_id;
__u64 expected_revision;
size_t :0;
};
#define bpf_tcx_opts__last_field expected_revision
LIBBPF_API struct bpf_link *
bpf_program__attach_tcx(const struct bpf_program *prog, int ifindex,
const struct bpf_tcx_opts *opts);
struct bpf_netkit_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
__u32 flags;
__u32 relative_fd;
__u32 relative_id;
__u64 expected_revision;
size_t :0;
};
#define bpf_netkit_opts__last_field expected_revision
LIBBPF_API struct bpf_link *
bpf_program__attach_netkit(const struct bpf_program *prog, int ifindex,
const struct bpf_netkit_opts *opts);
struct bpf_map;
LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map);
LIBBPF_API int bpf_link__update_map(struct bpf_link *link, const struct bpf_map *map);
struct bpf_iter_attach_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
@@ -796,8 +996,22 @@ LIBBPF_API int bpf_map__set_numa_node(struct bpf_map *map, __u32 numa_node);
/* get/set map key size */
LIBBPF_API __u32 bpf_map__key_size(const struct bpf_map *map);
LIBBPF_API int bpf_map__set_key_size(struct bpf_map *map, __u32 size);
/* get/set map value size */
/* get map value size */
LIBBPF_API __u32 bpf_map__value_size(const struct bpf_map *map);
/**
* @brief **bpf_map__set_value_size()** sets map value size.
* @param map the BPF map instance
* @return 0, on success; negative error, otherwise
*
* There is a special case for maps with associated memory-mapped regions, like
* the global data section maps (bss, data, rodata). When this function is used
* on such a map, the mapped region is resized. Afterward, an attempt is made to
* adjust the corresponding BTF info. This attempt is best-effort and can only
* succeed if the last variable of the data section map is an array. The array
* BTF type is replaced by a new BTF array type with a different length.
* Any previously existing pointers returned from bpf_map__initial_value() or
* corresponding data section skeleton pointer must be reinitialized.
*/
LIBBPF_API int bpf_map__set_value_size(struct bpf_map *map, __u32 size);
/* get map key/value BTF type IDs */
LIBBPF_API __u32 bpf_map__btf_key_type_id(const struct bpf_map *map);
@@ -811,7 +1025,7 @@ LIBBPF_API int bpf_map__set_map_extra(struct bpf_map *map, __u64 map_extra);
LIBBPF_API int bpf_map__set_initial_value(struct bpf_map *map,
const void *data, size_t size);
LIBBPF_API const void *bpf_map__initial_value(struct bpf_map *map, size_t *psize);
LIBBPF_API void *bpf_map__initial_value(const struct bpf_map *map, size_t *psize);
/**
* @brief **bpf_map__is_internal()** tells the caller whether or not the
@@ -821,10 +1035,57 @@ LIBBPF_API const void *bpf_map__initial_value(struct bpf_map *map, size_t *psize
* @return true, if the map is an internal map; false, otherwise
*/
LIBBPF_API bool bpf_map__is_internal(const struct bpf_map *map);
/**
* @brief **bpf_map__set_pin_path()** sets the path attribute that tells where the
* BPF map should be pinned. This does not actually create the 'pin'.
* @param map The bpf_map
* @param path The path
* @return 0, on success; negative error, otherwise
*/
LIBBPF_API int bpf_map__set_pin_path(struct bpf_map *map, const char *path);
/**
* @brief **bpf_map__pin_path()** gets the path attribute that tells where the
* BPF map should be pinned.
* @param map The bpf_map
* @return The path string; which can be NULL
*/
LIBBPF_API const char *bpf_map__pin_path(const struct bpf_map *map);
/**
* @brief **bpf_map__is_pinned()** tells the caller whether or not the
* passed map has been pinned via a 'pin' file.
* @param map The bpf_map
* @return true, if the map is pinned; false, otherwise
*/
LIBBPF_API bool bpf_map__is_pinned(const struct bpf_map *map);
/**
* @brief **bpf_map__pin()** creates a file that serves as a 'pin'
* for the BPF map. This increments the reference count on the
* BPF map which will keep the BPF map loaded even after the
* userspace process which loaded it has exited.
* @param map The bpf_map to pin
* @param path A file path for the 'pin'
* @return 0, on success; negative error, otherwise
*
* If `path` is NULL the maps `pin_path` attribute will be used. If this is
* also NULL, an error will be returned and the map will not be pinned.
*/
LIBBPF_API int bpf_map__pin(struct bpf_map *map, const char *path);
/**
* @brief **bpf_map__unpin()** removes the file that serves as a
* 'pin' for the BPF map.
* @param map The bpf_map to unpin
* @param path A file path for the 'pin'
* @return 0, on success; negative error, otherwise
*
* The `path` parameter can be NULL, in which case the `pin_path`
* map attribute is unpinned. If both the `path` parameter and
* `pin_path` map attribute are set, they must be equal.
*/
LIBBPF_API int bpf_map__unpin(struct bpf_map *map, const char *path);
LIBBPF_API int bpf_map__set_inner_map_fd(struct bpf_map *map, int fd);
@@ -955,9 +1216,11 @@ struct bpf_xdp_query_opts {
__u32 hw_prog_id; /* output */
__u32 skb_prog_id; /* output */
__u8 attach_mode; /* output */
__u64 feature_flags; /* output */
__u32 xdp_zc_max_segs; /* output */
size_t :0;
};
#define bpf_xdp_query_opts__last_field attach_mode
#define bpf_xdp_query_opts__last_field xdp_zc_max_segs
LIBBPF_API int bpf_xdp_attach(int ifindex, int prog_fd, __u32 flags,
const struct bpf_xdp_attach_opts *opts);
@@ -1011,11 +1274,13 @@ LIBBPF_API int bpf_tc_query(const struct bpf_tc_hook *hook,
/* Ring buffer APIs */
struct ring_buffer;
struct ring;
struct user_ring_buffer;
typedef int (*ring_buffer_sample_fn)(void *ctx, void *data, size_t size);
struct ring_buffer_opts {
size_t sz; /* size of this struct, for forward/backward compatiblity */
size_t sz; /* size of this struct, for forward/backward compatibility */
};
#define ring_buffer_opts__last_field sz
@@ -1030,6 +1295,190 @@ LIBBPF_API int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms);
LIBBPF_API int ring_buffer__consume(struct ring_buffer *rb);
LIBBPF_API int ring_buffer__epoll_fd(const struct ring_buffer *rb);
/**
* @brief **ring_buffer__ring()** returns the ringbuffer object inside a given
* ringbuffer manager representing a single BPF_MAP_TYPE_RINGBUF map instance.
*
* @param rb A ringbuffer manager object.
* @param idx An index into the ringbuffers contained within the ringbuffer
* manager object. The index is 0-based and corresponds to the order in which
* ring_buffer__add was called.
* @return A ringbuffer object on success; NULL and errno set if the index is
* invalid.
*/
LIBBPF_API struct ring *ring_buffer__ring(struct ring_buffer *rb,
unsigned int idx);
/**
* @brief **ring__consumer_pos()** returns the current consumer position in the
* given ringbuffer.
*
* @param r A ringbuffer object.
* @return The current consumer position.
*/
LIBBPF_API unsigned long ring__consumer_pos(const struct ring *r);
/**
* @brief **ring__producer_pos()** returns the current producer position in the
* given ringbuffer.
*
* @param r A ringbuffer object.
* @return The current producer position.
*/
LIBBPF_API unsigned long ring__producer_pos(const struct ring *r);
/**
* @brief **ring__avail_data_size()** returns the number of bytes in the
* ringbuffer not yet consumed. This has no locking associated with it, so it
* can be inaccurate if operations are ongoing while this is called. However, it
* should still show the correct trend over the long-term.
*
* @param r A ringbuffer object.
* @return The number of bytes not yet consumed.
*/
LIBBPF_API size_t ring__avail_data_size(const struct ring *r);
/**
* @brief **ring__size()** returns the total size of the ringbuffer's map data
* area (excluding special producer/consumer pages). Effectively this gives the
* amount of usable bytes of data inside the ringbuffer.
*
* @param r A ringbuffer object.
* @return The total size of the ringbuffer map data area.
*/
LIBBPF_API size_t ring__size(const struct ring *r);
/**
* @brief **ring__map_fd()** returns the file descriptor underlying the given
* ringbuffer.
*
* @param r A ringbuffer object.
* @return The underlying ringbuffer file descriptor
*/
LIBBPF_API int ring__map_fd(const struct ring *r);
/**
* @brief **ring__consume()** consumes available ringbuffer data without event
* polling.
*
* @param r A ringbuffer object.
* @return The number of records consumed (or INT_MAX, whichever is less), or
* a negative number if any of the callbacks return an error.
*/
LIBBPF_API int ring__consume(struct ring *r);
struct user_ring_buffer_opts {
size_t sz; /* size of this struct, for forward/backward compatibility */
};
#define user_ring_buffer_opts__last_field sz
/**
* @brief **user_ring_buffer__new()** creates a new instance of a user ring
* buffer.
*
* @param map_fd A file descriptor to a BPF_MAP_TYPE_USER_RINGBUF map.
* @param opts Options for how the ring buffer should be created.
* @return A user ring buffer on success; NULL and errno being set on a
* failure.
*/
LIBBPF_API struct user_ring_buffer *
user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts);
/**
* @brief **user_ring_buffer__reserve()** reserves a pointer to a sample in the
* user ring buffer.
* @param rb A pointer to a user ring buffer.
* @param size The size of the sample, in bytes.
* @return A pointer to an 8-byte aligned reserved region of the user ring
* buffer; NULL, and errno being set if a sample could not be reserved.
*
* This function is *not* thread safe, and callers must synchronize accessing
* this function if there are multiple producers. If a size is requested that
* is larger than the size of the entire ring buffer, errno will be set to
* E2BIG and NULL is returned. If the ring buffer could accommodate the size,
* but currently does not have enough space, errno is set to ENOSPC and NULL is
* returned.
*
* After initializing the sample, callers must invoke
* **user_ring_buffer__submit()** to post the sample to the kernel. Otherwise,
* the sample must be freed with **user_ring_buffer__discard()**.
*/
LIBBPF_API void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
/**
* @brief **user_ring_buffer__reserve_blocking()** reserves a record in the
* ring buffer, possibly blocking for up to @timeout_ms until a sample becomes
* available.
* @param rb The user ring buffer.
* @param size The size of the sample, in bytes.
* @param timeout_ms The amount of time, in milliseconds, for which the caller
* should block when waiting for a sample. -1 causes the caller to block
* indefinitely.
* @return A pointer to an 8-byte aligned reserved region of the user ring
* buffer; NULL, and errno being set if a sample could not be reserved.
*
* This function is *not* thread safe, and callers must synchronize
* accessing this function if there are multiple producers
*
* If **timeout_ms** is -1, the function will block indefinitely until a sample
* becomes available. Otherwise, **timeout_ms** must be non-negative, or errno
* is set to EINVAL, and NULL is returned. If **timeout_ms** is 0, no blocking
* will occur and the function will return immediately after attempting to
* reserve a sample.
*
* If **size** is larger than the size of the entire ring buffer, errno is set
* to E2BIG and NULL is returned. If the ring buffer could accommodate
* **size**, but currently does not have enough space, the caller will block
* until at most **timeout_ms** has elapsed. If insufficient space is available
* at that time, errno is set to ENOSPC, and NULL is returned.
*
* The kernel guarantees that it will wake up this thread to check if
* sufficient space is available in the ring buffer at least once per
* invocation of the **bpf_ringbuf_drain()** helper function, provided that at
* least one sample is consumed, and the BPF program did not invoke the
* function with BPF_RB_NO_WAKEUP. A wakeup may occur sooner than that, but the
* kernel does not guarantee this. If the helper function is invoked with
* BPF_RB_FORCE_WAKEUP, a wakeup event will be sent even if no sample is
* consumed.
*
* When a sample of size **size** is found within **timeout_ms**, a pointer to
* the sample is returned. After initializing the sample, callers must invoke
* **user_ring_buffer__submit()** to post the sample to the ring buffer.
* Otherwise, the sample must be freed with **user_ring_buffer__discard()**.
*/
LIBBPF_API void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
__u32 size,
int timeout_ms);
/**
* @brief **user_ring_buffer__submit()** submits a previously reserved sample
* into the ring buffer.
* @param rb The user ring buffer.
* @param sample A reserved sample.
*
* It is not necessary to synchronize amongst multiple producers when invoking
* this function.
*/
LIBBPF_API void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample);
/**
* @brief **user_ring_buffer__discard()** discards a previously reserved sample.
* @param rb The user ring buffer.
* @param sample A reserved sample.
*
* It is not necessary to synchronize amongst multiple producers when invoking
* this function.
*/
LIBBPF_API void user_ring_buffer__discard(struct user_ring_buffer *rb, void *sample);
/**
* @brief **user_ring_buffer__free()** frees a ring buffer that was previously
* created with **user_ring_buffer__new()**.
* @param rb The user ring buffer being freed.
*/
LIBBPF_API void user_ring_buffer__free(struct user_ring_buffer *rb);
/* Perf buffer APIs */
struct perf_buffer;
@@ -1040,8 +1489,10 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt);
/* common use perf buffer options */
struct perf_buffer_opts {
size_t sz;
__u32 sample_period;
size_t :0;
};
#define perf_buffer_opts__last_field sz
#define perf_buffer_opts__last_field sample_period
/**
* @brief **perf_buffer__new()** creates BPF perfbuf manager for a specified
@@ -1266,7 +1717,7 @@ LIBBPF_API void
bpf_object__destroy_subskeleton(struct bpf_object_subskeleton *s);
struct gen_loader_opts {
size_t sz; /* size of this struct, for forward/backward compatiblity */
size_t sz; /* size of this struct, for forward/backward compatibility */
const char *data;
const char *insns;
__u32 data_sz;
@@ -1284,13 +1735,13 @@ enum libbpf_tristate {
};
struct bpf_linker_opts {
/* size of this struct, for forward/backward compatiblity */
/* size of this struct, for forward/backward compatibility */
size_t sz;
};
#define bpf_linker_opts__last_field sz
struct bpf_linker_file_opts {
/* size of this struct, for forward/backward compatiblity */
/* size of this struct, for forward/backward compatibility */
size_t sz;
};
#define bpf_linker_file_opts__last_field sz
@@ -1333,7 +1784,7 @@ typedef int (*libbpf_prog_attach_fn_t)(const struct bpf_program *prog, long cook
struct bpf_link **link);
struct libbpf_prog_handler_opts {
/* size of this struct, for forward/backward compatiblity */
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* User-provided value that is passed to prog_setup_fn,
* prog_prepare_load_fn, and prog_attach_fn callbacks. Allows user to

View File

@@ -245,7 +245,6 @@ LIBBPF_0.3.0 {
btf__parse_raw_split;
btf__parse_split;
btf__new_empty_split;
btf__new_split;
ring_buffer__epoll_fd;
} LIBBPF_0.2.0;
@@ -326,7 +325,6 @@ LIBBPF_0.7.0 {
bpf_xdp_detach;
bpf_xdp_query;
bpf_xdp_query_id;
btf_ext__raw_data;
libbpf_probe_bpf_helper;
libbpf_probe_bpf_map_type;
libbpf_probe_bpf_prog_type;
@@ -367,4 +365,54 @@ LIBBPF_1.0.0 {
libbpf_bpf_map_type_str;
libbpf_bpf_prog_type_str;
perf_buffer__buffer;
};
} LIBBPF_0.8.0;
LIBBPF_1.1.0 {
global:
bpf_btf_get_fd_by_id_opts;
bpf_link_get_fd_by_id_opts;
bpf_map_get_fd_by_id_opts;
bpf_prog_get_fd_by_id_opts;
user_ring_buffer__discard;
user_ring_buffer__free;
user_ring_buffer__new;
user_ring_buffer__reserve;
user_ring_buffer__reserve_blocking;
user_ring_buffer__submit;
} LIBBPF_1.0.0;
LIBBPF_1.2.0 {
global:
bpf_btf_get_info_by_fd;
bpf_link__update_map;
bpf_link_get_info_by_fd;
bpf_map_get_info_by_fd;
bpf_prog_get_info_by_fd;
} LIBBPF_1.1.0;
LIBBPF_1.3.0 {
global:
bpf_obj_pin_opts;
bpf_object__unpin;
bpf_prog_detach_opts;
bpf_program__attach_netfilter;
bpf_program__attach_netkit;
bpf_program__attach_tcx;
bpf_program__attach_uprobe_multi;
ring__avail_data_size;
ring__consume;
ring__consumer_pos;
ring__map_fd;
ring__producer_pos;
ring__size;
ring_buffer__ring;
} LIBBPF_1.2.0;
LIBBPF_1.4.0 {
global:
bpf_program__attach_raw_tracepoint_opts;
bpf_raw_tracepoint_open_opts;
bpf_token_create;
btf__new_split;
btf_ext__raw_data;
} LIBBPF_1.3.0;

View File

@@ -70,4 +70,23 @@
}; \
})
/* Helper macro to clear and optionally reinitialize libbpf options struct
*
* Small helper macro to reset all fields and to reinitialize the common
* structure size member. Values provided by users in struct initializer-
* syntax as varargs can be provided as well to reinitialize options struct
* specific members.
*/
#define LIBBPF_OPTS_RESET(NAME, ...) \
do { \
typeof(NAME) ___##NAME = ({ \
memset(&___##NAME, 0, sizeof(NAME)); \
(typeof(NAME)) { \
.sz = sizeof(NAME), \
__VA_ARGS__ \
}; \
}); \
memcpy(&NAME, &___##NAME, sizeof(NAME)); \
} while (0)
#endif /* __LIBBPF_LIBBPF_COMMON_H */

View File

@@ -39,14 +39,14 @@ static const char *libbpf_strerror_table[NR_ERRNO] = {
int libbpf_strerror(int err, char *buf, size_t size)
{
int ret;
if (!buf || !size)
return libbpf_err(-EINVAL);
err = err > 0 ? err : -err;
if (err < __LIBBPF_ERRNO__START) {
int ret;
ret = strerror_r(err, buf, size);
buf[size - 1] = '\0';
return libbpf_err_errno(ret);
@@ -56,12 +56,20 @@ int libbpf_strerror(int err, char *buf, size_t size)
const char *msg;
msg = libbpf_strerror_table[ERRNO_OFFSET(err)];
snprintf(buf, size, "%s", msg);
ret = snprintf(buf, size, "%s", msg);
buf[size - 1] = '\0';
/* The length of the buf and msg is positive.
* A negative number may be returned only when the
* size exceeds INT_MAX. Not likely to appear.
*/
if (ret >= size)
return libbpf_err(-ERANGE);
return 0;
}
snprintf(buf, size, "Unknown libbpf error %d", err);
ret = snprintf(buf, size, "Unknown libbpf error %d", err);
buf[size - 1] = '\0';
if (ret >= size)
return libbpf_err(-ERANGE);
return libbpf_err(-ENOENT);
}

View File

@@ -15,8 +15,24 @@
#include <linux/err.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <libelf.h>
#include "relo_core.h"
/* Android's libc doesn't support AT_EACCESS in faccessat() implementation
* ([0]), and just returns -EINVAL even if file exists and is accessible.
* See [1] for issues caused by this.
*
* So just redefine it to 0 on Android.
*
* [0] https://android.googlesource.com/platform/bionic/+/refs/heads/android13-release/libc/bionic/faccessat.cpp#50
* [1] https://github.com/libbpf/libbpf-bootstrap/issues/250#issuecomment-1911324250
*/
#ifdef __ANDROID__
#undef AT_EACCESS
#define AT_EACCESS 0
#endif
/* make sure libbpf doesn't use kernel-only integer typedefs */
#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64
@@ -354,18 +370,41 @@ enum kern_feature_id {
FEAT_BTF_ENUM64,
/* Kernel uses syscall wrapper (CONFIG_ARCH_HAS_SYSCALL_WRAPPER) */
FEAT_SYSCALL_WRAPPER,
/* BPF multi-uprobe link support */
FEAT_UPROBE_MULTI_LINK,
/* Kernel supports arg:ctx tag (__arg_ctx) for global subprogs natively */
FEAT_ARG_CTX_TAG,
/* Kernel supports '?' at the front of datasec names */
FEAT_BTF_QMARK_DATASEC,
__FEAT_CNT,
};
int probe_memcg_account(void);
enum kern_feature_result {
FEAT_UNKNOWN = 0,
FEAT_SUPPORTED = 1,
FEAT_MISSING = 2,
};
struct kern_feature_cache {
enum kern_feature_result res[__FEAT_CNT];
int token_fd;
};
bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id);
bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id);
int probe_kern_syscall_wrapper(int token_fd);
int probe_memcg_account(int token_fd);
int bump_rlimit_memlock(void);
int parse_cpu_mask_str(const char *s, bool **mask, int *mask_sz);
int parse_cpu_mask_file(const char *fcpu, bool **mask, int *mask_sz);
int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
const char *str_sec, size_t str_len);
int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 log_level);
const char *str_sec, size_t str_len,
int token_fd);
int btf_load_into_kernel(struct btf *btf,
char *log_buf, size_t log_sz, __u32 log_level,
int token_fd);
struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf);
void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type,
@@ -529,6 +568,17 @@ static inline bool is_ldimm64_insn(struct bpf_insn *insn)
return insn->code == (BPF_LD | BPF_IMM | BPF_DW);
}
/* Unconditionally dup FD, ensuring it doesn't use [0, 2] range.
* Original FD is not closed or altered in any other way.
* Preserves original FD value, if it's invalid (negative).
*/
static inline int dup_good_fd(int fd)
{
if (fd < 0)
return fd;
return fcntl(fd, F_DUPFD_CLOEXEC, 3);
}
/* if fd is stdin, stdout, or stderr, dup to a fd greater than 2
* Takes ownership of the fd passed in, and closes it if calling
* fcntl(fd, F_DUPFD_CLOEXEC, 3).
@@ -540,9 +590,10 @@ static inline int ensure_good_fd(int fd)
if (fd < 0)
return fd;
if (fd < 3) {
fd = fcntl(fd, F_DUPFD_CLOEXEC, 3);
fd = dup_good_fd(fd);
saved_errno = errno;
close(old_fd);
errno = saved_errno;
if (fd < 0) {
pr_warn("failed to dup FD %d to FD > 2: %d\n", old_fd, -saved_errno);
errno = saved_errno;
@@ -551,6 +602,29 @@ static inline int ensure_good_fd(int fd)
return fd;
}
static inline int sys_dup2(int oldfd, int newfd)
{
#ifdef __NR_dup2
return syscall(__NR_dup2, oldfd, newfd);
#else
return syscall(__NR_dup3, oldfd, newfd, 0);
#endif
}
/* Point *fixed_fd* to the same file that *tmp_fd* points to.
* Regardless of success, *tmp_fd* is closed.
* Whatever *fixed_fd* pointed to is closed silently.
*/
static inline int reuse_fd(int fixed_fd, int tmp_fd)
{
int err;
err = sys_dup2(tmp_fd, fixed_fd);
err = err < 0 ? -errno : 0;
close(tmp_fd); /* clean up temporary FD */
return err;
}
/* The following two functions are exposed to bpftool */
int bpf_core_add_cands(struct bpf_core_cand *local_cand,
size_t local_essent_len,
@@ -576,4 +650,25 @@ static inline bool is_pow_of_2(size_t x)
#define PROG_LOAD_ATTEMPTS 5
int sys_bpf_prog_load(union bpf_attr *attr, unsigned int size, int attempts);
bool glob_match(const char *str, const char *pat);
long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name);
long elf_find_func_offset_from_file(const char *binary_path, const char *name);
struct elf_fd {
Elf *elf;
int fd;
};
int elf_open(const char *binary_path, struct elf_fd *elf_fd);
void elf_close(struct elf_fd *elf_fd);
int elf_resolve_syms_offsets(const char *binary_path, int cnt,
const char **syms, unsigned long **poffsets,
int st_type);
int elf_resolve_pattern_offsets(const char *binary_path, const char *pattern,
unsigned long **poffsets, size_t *pcnt);
int probe_fd(int fd);
#endif /* __LIBBPF_LIBBPF_INTERNAL_H */

View File

@@ -12,11 +12,94 @@
#include <linux/btf.h>
#include <linux/filter.h>
#include <linux/kernel.h>
#include <linux/version.h>
#include "bpf.h"
#include "libbpf.h"
#include "libbpf_internal.h"
/* On Ubuntu LINUX_VERSION_CODE doesn't correspond to info.release,
* but Ubuntu provides /proc/version_signature file, as described at
* https://ubuntu.com/kernel, with an example contents below, which we
* can use to get a proper LINUX_VERSION_CODE.
*
* Ubuntu 5.4.0-12.15-generic 5.4.8
*
* In the above, 5.4.8 is what kernel is actually expecting, while
* uname() call will return 5.4.0 in info.release.
*/
static __u32 get_ubuntu_kernel_version(void)
{
const char *ubuntu_kver_file = "/proc/version_signature";
__u32 major, minor, patch;
int ret;
FILE *f;
if (faccessat(AT_FDCWD, ubuntu_kver_file, R_OK, AT_EACCESS) != 0)
return 0;
f = fopen(ubuntu_kver_file, "re");
if (!f)
return 0;
ret = fscanf(f, "%*s %*s %u.%u.%u\n", &major, &minor, &patch);
fclose(f);
if (ret != 3)
return 0;
return KERNEL_VERSION(major, minor, patch);
}
/* On Debian LINUX_VERSION_CODE doesn't correspond to info.release.
* Instead, it is provided in info.version. An example content of
* Debian 10 looks like the below.
*
* utsname::release 4.19.0-22-amd64
* utsname::version #1 SMP Debian 4.19.260-1 (2022-09-29)
*
* In the above, 4.19.260 is what kernel is actually expecting, while
* uname() call will return 4.19.0 in info.release.
*/
static __u32 get_debian_kernel_version(struct utsname *info)
{
__u32 major, minor, patch;
char *p;
p = strstr(info->version, "Debian ");
if (!p) {
/* This is not a Debian kernel. */
return 0;
}
if (sscanf(p, "Debian %u.%u.%u", &major, &minor, &patch) != 3)
return 0;
return KERNEL_VERSION(major, minor, patch);
}
__u32 get_kernel_version(void)
{
__u32 major, minor, patch, version;
struct utsname info;
/* Check if this is an Ubuntu kernel. */
version = get_ubuntu_kernel_version();
if (version != 0)
return version;
uname(&info);
/* Check if this is a Debian kernel. */
version = get_debian_kernel_version(&info);
if (version != 0)
return version;
if (sscanf(info.release, "%u.%u.%u", &major, &minor, &patch) != 3)
return 0;
return KERNEL_VERSION(major, minor, patch);
}
static int probe_prog_load(enum bpf_prog_type prog_type,
const struct bpf_insn *insns, size_t insns_cnt,
char *log_buf, size_t log_buf_sz)
@@ -98,6 +181,9 @@ static int probe_prog_load(enum bpf_prog_type prog_type,
case BPF_PROG_TYPE_FLOW_DISSECTOR:
case BPF_PROG_TYPE_CGROUP_SYSCTL:
break;
case BPF_PROG_TYPE_NETFILTER:
opts.expected_attach_type = BPF_NETFILTER;
break;
default:
return -EOPNOTSUPP;
}
@@ -133,7 +219,8 @@ int libbpf_probe_bpf_prog_type(enum bpf_prog_type prog_type, const void *opts)
}
int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
const char *str_sec, size_t str_len)
const char *str_sec, size_t str_len,
int token_fd)
{
struct btf_header hdr = {
.magic = BTF_MAGIC,
@@ -143,6 +230,10 @@ int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
.str_off = types_len,
.str_len = str_len,
};
LIBBPF_OPTS(bpf_btf_load_opts, opts,
.token_fd = token_fd,
.btf_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int btf_fd, btf_len;
__u8 *raw_btf;
@@ -155,7 +246,7 @@ int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
memcpy(raw_btf + hdr.hdr_len, raw_types, hdr.type_len);
memcpy(raw_btf + hdr.hdr_len + hdr.type_len, str_sec, hdr.str_len);
btf_fd = bpf_btf_load(raw_btf, btf_len, NULL);
btf_fd = bpf_btf_load(raw_btf, btf_len, &opts);
free(raw_btf);
return btf_fd;
@@ -185,7 +276,7 @@ static int load_local_storage_btf(void)
};
return libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs));
strs, sizeof(strs), 0);
}
static int probe_map_create(enum bpf_map_type map_type)
@@ -221,6 +312,7 @@ static int probe_map_create(enum bpf_map_type map_type)
case BPF_MAP_TYPE_SK_STORAGE:
case BPF_MAP_TYPE_INODE_STORAGE:
case BPF_MAP_TYPE_TASK_STORAGE:
case BPF_MAP_TYPE_CGRP_STORAGE:
btf_key_type_id = 1;
btf_value_type_id = 3;
value_size = 8;
@@ -231,19 +323,28 @@ static int probe_map_create(enum bpf_map_type map_type)
return btf_fd;
break;
case BPF_MAP_TYPE_RINGBUF:
case BPF_MAP_TYPE_USER_RINGBUF:
key_size = 0;
value_size = 0;
max_entries = 4096;
max_entries = sysconf(_SC_PAGE_SIZE);
break;
case BPF_MAP_TYPE_STRUCT_OPS:
/* we'll get -ENOTSUPP for invalid BTF type ID for struct_ops */
opts.btf_vmlinux_value_type_id = 1;
opts.value_type_btf_obj_fd = -1;
exp_err = -524; /* -ENOTSUPP */
break;
case BPF_MAP_TYPE_BLOOM_FILTER:
key_size = 0;
max_entries = 1;
break;
case BPF_MAP_TYPE_ARENA:
key_size = 0;
value_size = 0;
max_entries = 1; /* one page */
opts.map_extra = 0; /* can mmap() at any address */
opts.map_flags = BPF_F_MMAPABLE;
break;
case BPF_MAP_TYPE_HASH:
case BPF_MAP_TYPE_ARRAY:
case BPF_MAP_TYPE_PROG_ARRAY:

View File

@@ -4,6 +4,6 @@
#define __LIBBPF_VERSION_H
#define LIBBPF_MAJOR_VERSION 1
#define LIBBPF_MINOR_VERSION 0
#define LIBBPF_MINOR_VERSION 4
#endif /* __LIBBPF_VERSION_H */

View File

@@ -719,13 +719,28 @@ static int linker_sanity_check_elf(struct src_obj *obj)
return -EINVAL;
}
if (sec->shdr->sh_addralign && !is_pow_of_2(sec->shdr->sh_addralign))
return -EINVAL;
if (sec->shdr->sh_addralign != sec->data->d_align)
return -EINVAL;
if (is_dwarf_sec_name(sec->sec_name))
continue;
if (sec->shdr->sh_size != sec->data->d_size)
if (sec->shdr->sh_addralign && !is_pow_of_2(sec->shdr->sh_addralign)) {
pr_warn("ELF section #%zu alignment %llu is non pow-of-2 alignment in %s\n",
sec->sec_idx, (long long unsigned)sec->shdr->sh_addralign,
obj->filename);
return -EINVAL;
}
if (sec->shdr->sh_addralign != sec->data->d_align) {
pr_warn("ELF section #%zu has inconsistent alignment addr=%llu != d=%llu in %s\n",
sec->sec_idx, (long long unsigned)sec->shdr->sh_addralign,
(long long unsigned)sec->data->d_align, obj->filename);
return -EINVAL;
}
if (sec->shdr->sh_size != sec->data->d_size) {
pr_warn("ELF section #%zu has inconsistent section size sh=%llu != d=%llu in %s\n",
sec->sec_idx, (long long unsigned)sec->shdr->sh_size,
(long long unsigned)sec->data->d_size, obj->filename);
return -EINVAL;
}
switch (sec->shdr->sh_type) {
case SHT_SYMTAB:
@@ -737,8 +752,12 @@ static int linker_sanity_check_elf(struct src_obj *obj)
break;
case SHT_PROGBITS:
if (sec->shdr->sh_flags & SHF_EXECINSTR) {
if (sec->shdr->sh_size % sizeof(struct bpf_insn) != 0)
if (sec->shdr->sh_size % sizeof(struct bpf_insn) != 0) {
pr_warn("ELF section #%zu has unexpected size alignment %llu in %s\n",
sec->sec_idx, (long long unsigned)sec->shdr->sh_size,
obj->filename);
return -EINVAL;
}
}
break;
case SHT_NOBITS:
@@ -1115,7 +1134,19 @@ static int extend_sec(struct bpf_linker *linker, struct dst_sec *dst, struct src
if (src->shdr->sh_type != SHT_NOBITS) {
tmp = realloc(dst->raw_data, dst_final_sz);
if (!tmp)
/* If dst_align_sz == 0, realloc() behaves in a special way:
* 1. When dst->raw_data is NULL it returns:
* "either NULL or a pointer suitable to be passed to free()" [1].
* 2. When dst->raw_data is not-NULL it frees dst->raw_data and returns NULL,
* thus invalidating any "pointer suitable to be passed to free()" obtained
* at step (1).
*
* The dst_align_sz > 0 check avoids error exit after (2), otherwise
* dst->raw_data would be freed again in bpf_linker__free().
*
* [1] man 3 realloc
*/
if (!tmp && dst_align_sz > 0)
return -ENOMEM;
dst->raw_data = tmp;
@@ -1997,7 +2028,6 @@ add_sym:
static int linker_append_elf_relos(struct bpf_linker *linker, struct src_obj *obj)
{
struct src_sec *src_symtab = &obj->secs[obj->symtab_sec_idx];
struct dst_sec *dst_symtab;
int i, err;
for (i = 1; i < obj->sec_cnt; i++) {
@@ -2030,9 +2060,6 @@ static int linker_append_elf_relos(struct bpf_linker *linker, struct src_obj *ob
return -1;
}
/* add_dst_sec() above could have invalidated linker->secs */
dst_symtab = &linker->secs[linker->symtab_sec_idx];
/* shdr->sh_link points to SYMTAB */
dst_sec->shdr->sh_link = linker->symtab_sec_idx;
@@ -2049,16 +2076,13 @@ static int linker_append_elf_relos(struct bpf_linker *linker, struct src_obj *ob
dst_rel = dst_sec->raw_data + src_sec->dst_off;
n = src_sec->shdr->sh_size / src_sec->shdr->sh_entsize;
for (j = 0; j < n; j++, src_rel++, dst_rel++) {
size_t src_sym_idx = ELF64_R_SYM(src_rel->r_info);
size_t sym_type = ELF64_R_TYPE(src_rel->r_info);
Elf64_Sym *src_sym, *dst_sym;
size_t dst_sym_idx;
size_t src_sym_idx, dst_sym_idx, sym_type;
Elf64_Sym *src_sym;
src_sym_idx = ELF64_R_SYM(src_rel->r_info);
src_sym = src_symtab->data->d_buf + sizeof(*src_sym) * src_sym_idx;
dst_sym_idx = obj->sym_map[src_sym_idx];
dst_sym = dst_symtab->raw_data + sizeof(*dst_sym) * dst_sym_idx;
dst_rel->r_offset += src_linked_sec->dst_off;
sym_type = ELF64_R_TYPE(src_rel->r_info);
dst_rel->r_info = ELF64_R_INFO(dst_sym_idx, sym_type);
@@ -2708,7 +2732,7 @@ static int finalize_btf(struct bpf_linker *linker)
/* Emit .BTF.ext section */
if (linker->btf_ext) {
raw_data = btf_ext__get_raw_data(linker->btf_ext, &raw_sz);
raw_data = btf_ext__raw_data(linker->btf_ext, &raw_sz);
if (!raw_data)
return -ENOMEM;

View File

@@ -9,6 +9,7 @@
#include <linux/if_ether.h>
#include <linux/pkt_cls.h>
#include <linux/rtnetlink.h>
#include <linux/netdev.h>
#include <sys/socket.h>
#include <errno.h>
#include <time.h>
@@ -39,9 +40,16 @@ struct xdp_id_md {
int ifindex;
__u32 flags;
struct xdp_link_info info;
__u64 feature_flags;
};
static int libbpf_netlink_open(__u32 *nl_pid)
struct xdp_features_md {
int ifindex;
__u32 xdp_zc_max_segs;
__u64 flags;
};
static int libbpf_netlink_open(__u32 *nl_pid, int proto)
{
struct sockaddr_nl sa;
socklen_t addrlen;
@@ -51,7 +59,7 @@ static int libbpf_netlink_open(__u32 *nl_pid)
memset(&sa, 0, sizeof(sa));
sa.nl_family = AF_NETLINK;
sock = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_ROUTE);
sock = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, proto);
if (sock < 0)
return -errno;
@@ -212,14 +220,14 @@ done:
}
static int libbpf_netlink_send_recv(struct libbpf_nla_req *req,
__dump_nlmsg_t parse_msg,
int proto, __dump_nlmsg_t parse_msg,
libbpf_dump_nlmsg_t parse_attr,
void *cookie)
{
__u32 nl_pid = 0;
int sock, ret;
sock = libbpf_netlink_open(&nl_pid);
sock = libbpf_netlink_open(&nl_pid, proto);
if (sock < 0)
return sock;
@@ -238,6 +246,43 @@ out:
return ret;
}
static int parse_genl_family_id(struct nlmsghdr *nh, libbpf_dump_nlmsg_t fn,
void *cookie)
{
struct genlmsghdr *gnl = NLMSG_DATA(nh);
struct nlattr *na = (struct nlattr *)((void *)gnl + GENL_HDRLEN);
struct nlattr *tb[CTRL_ATTR_FAMILY_ID + 1];
__u16 *id = cookie;
libbpf_nla_parse(tb, CTRL_ATTR_FAMILY_ID, na,
NLMSG_PAYLOAD(nh, sizeof(*gnl)), NULL);
if (!tb[CTRL_ATTR_FAMILY_ID])
return NL_CONT;
*id = libbpf_nla_getattr_u16(tb[CTRL_ATTR_FAMILY_ID]);
return NL_DONE;
}
static int libbpf_netlink_resolve_genl_family_id(const char *name,
__u16 len, __u16 *id)
{
struct libbpf_nla_req req = {
.nh.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN),
.nh.nlmsg_type = GENL_ID_CTRL,
.nh.nlmsg_flags = NLM_F_REQUEST,
.gnl.cmd = CTRL_CMD_GETFAMILY,
.gnl.version = 2,
};
int err;
err = nlattr_add(&req, CTRL_ATTR_FAMILY_NAME, name, len);
if (err < 0)
return err;
return libbpf_netlink_send_recv(&req, NETLINK_GENERIC,
parse_genl_family_id, NULL, id);
}
static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
__u32 flags)
{
@@ -271,7 +316,7 @@ static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
}
nlattr_end_nested(&req, nla);
return libbpf_netlink_send_recv(&req, NULL, NULL, NULL);
return libbpf_netlink_send_recv(&req, NETLINK_ROUTE, NULL, NULL, NULL);
}
int bpf_xdp_attach(int ifindex, int prog_fd, __u32 flags, const struct bpf_xdp_attach_opts *opts)
@@ -357,6 +402,32 @@ static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb)
return 0;
}
static int parse_xdp_features(struct nlmsghdr *nh, libbpf_dump_nlmsg_t fn,
void *cookie)
{
struct genlmsghdr *gnl = NLMSG_DATA(nh);
struct nlattr *na = (struct nlattr *)((void *)gnl + GENL_HDRLEN);
struct nlattr *tb[NETDEV_CMD_MAX + 1];
struct xdp_features_md *md = cookie;
__u32 ifindex;
libbpf_nla_parse(tb, NETDEV_CMD_MAX, na,
NLMSG_PAYLOAD(nh, sizeof(*gnl)), NULL);
if (!tb[NETDEV_A_DEV_IFINDEX] || !tb[NETDEV_A_DEV_XDP_FEATURES])
return NL_CONT;
ifindex = libbpf_nla_getattr_u32(tb[NETDEV_A_DEV_IFINDEX]);
if (ifindex != md->ifindex)
return NL_CONT;
md->flags = libbpf_nla_getattr_u64(tb[NETDEV_A_DEV_XDP_FEATURES]);
if (tb[NETDEV_A_DEV_XDP_ZC_MAX_SEGS])
md->xdp_zc_max_segs =
libbpf_nla_getattr_u32(tb[NETDEV_A_DEV_XDP_ZC_MAX_SEGS]);
return NL_DONE;
}
int bpf_xdp_query(int ifindex, int xdp_flags, struct bpf_xdp_query_opts *opts)
{
struct libbpf_nla_req req = {
@@ -366,6 +437,10 @@ int bpf_xdp_query(int ifindex, int xdp_flags, struct bpf_xdp_query_opts *opts)
.ifinfo.ifi_family = AF_PACKET,
};
struct xdp_id_md xdp_id = {};
struct xdp_features_md md = {
.ifindex = ifindex,
};
__u16 id;
int err;
if (!OPTS_VALID(opts, bpf_xdp_query_opts))
@@ -382,7 +457,7 @@ int bpf_xdp_query(int ifindex, int xdp_flags, struct bpf_xdp_query_opts *opts)
xdp_id.ifindex = ifindex;
xdp_id.flags = xdp_flags;
err = libbpf_netlink_send_recv(&req, __dump_link_nlmsg,
err = libbpf_netlink_send_recv(&req, NETLINK_ROUTE, __dump_link_nlmsg,
get_xdp_info, &xdp_id);
if (err)
return libbpf_err(err);
@@ -393,6 +468,38 @@ int bpf_xdp_query(int ifindex, int xdp_flags, struct bpf_xdp_query_opts *opts)
OPTS_SET(opts, skb_prog_id, xdp_id.info.skb_prog_id);
OPTS_SET(opts, attach_mode, xdp_id.info.attach_mode);
if (!OPTS_HAS(opts, feature_flags))
return 0;
err = libbpf_netlink_resolve_genl_family_id("netdev", sizeof("netdev"), &id);
if (err < 0) {
if (err == -ENOENT) {
opts->feature_flags = 0;
goto skip_feature_flags;
}
return libbpf_err(err);
}
memset(&req, 0, sizeof(req));
req.nh.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
req.nh.nlmsg_flags = NLM_F_REQUEST;
req.nh.nlmsg_type = id;
req.gnl.cmd = NETDEV_CMD_DEV_GET;
req.gnl.version = 2;
err = nlattr_add(&req, NETDEV_A_DEV_IFINDEX, &ifindex, sizeof(ifindex));
if (err < 0)
return libbpf_err(err);
err = libbpf_netlink_send_recv(&req, NETLINK_GENERIC,
parse_xdp_features, NULL, &md);
if (err)
return libbpf_err(err);
OPTS_SET(opts, feature_flags, md.flags);
OPTS_SET(opts, xdp_zc_max_segs, md.xdp_zc_max_segs);
skip_feature_flags:
return 0;
}
@@ -493,7 +600,7 @@ static int tc_qdisc_modify(struct bpf_tc_hook *hook, int cmd, int flags)
if (ret < 0)
return ret;
return libbpf_netlink_send_recv(&req, NULL, NULL, NULL);
return libbpf_netlink_send_recv(&req, NETLINK_ROUTE, NULL, NULL, NULL);
}
static int tc_qdisc_create_excl(struct bpf_tc_hook *hook)
@@ -593,7 +700,7 @@ static int tc_add_fd_and_name(struct libbpf_nla_req *req, int fd)
int len, ret;
memset(&info, 0, info_len);
ret = bpf_obj_get_info_by_fd(fd, &info, &info_len);
ret = bpf_prog_get_info_by_fd(fd, &info, &info_len);
if (ret < 0)
return ret;
@@ -673,7 +780,8 @@ int bpf_tc_attach(const struct bpf_tc_hook *hook, struct bpf_tc_opts *opts)
info.opts = opts;
ret = libbpf_netlink_send_recv(&req, get_tc_info, NULL, &info);
ret = libbpf_netlink_send_recv(&req, NETLINK_ROUTE, get_tc_info, NULL,
&info);
if (ret < 0)
return libbpf_err(ret);
if (!info.processed)
@@ -739,7 +847,7 @@ static int __bpf_tc_detach(const struct bpf_tc_hook *hook,
return ret;
}
return libbpf_netlink_send_recv(&req, NULL, NULL, NULL);
return libbpf_netlink_send_recv(&req, NETLINK_ROUTE, NULL, NULL, NULL);
}
int bpf_tc_detach(const struct bpf_tc_hook *hook,
@@ -804,7 +912,8 @@ int bpf_tc_query(const struct bpf_tc_hook *hook, struct bpf_tc_opts *opts)
info.opts = opts;
ret = libbpf_netlink_send_recv(&req, get_tc_info, NULL, &info);
ret = libbpf_netlink_send_recv(&req, NETLINK_ROUTE, get_tc_info, NULL,
&info);
if (ret < 0)
return libbpf_err(ret);
if (!info.processed)

View File

@@ -32,7 +32,7 @@ static struct nlattr *nla_next(const struct nlattr *nla, int *remaining)
static int nla_ok(const struct nlattr *nla, int remaining)
{
return remaining >= sizeof(*nla) &&
return remaining >= (int)sizeof(*nla) &&
nla->nla_len >= sizeof(*nla) &&
nla->nla_len <= remaining;
}
@@ -178,7 +178,7 @@ int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh)
hlen += nlmsg_len(&err->msg);
attr = (struct nlattr *) ((void *) err + hlen);
alen = nlh->nlmsg_len - hlen;
alen = (void *)nlh + nlh->nlmsg_len - (void *)attr;
if (libbpf_nla_parse(tb, NLMSGERR_ATTR_MAX, attr, alen,
extack_policy) != 0) {

View File

@@ -14,6 +14,7 @@
#include <errno.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <linux/genetlink.h>
/* avoid multiple definition of netlink features */
#define __LINUX_NETLINK_H
@@ -58,6 +59,7 @@ struct libbpf_nla_req {
union {
struct ifinfomsg ifinfo;
struct tcmsg tc;
struct genlmsghdr gnl;
};
char buf[128];
};
@@ -89,11 +91,21 @@ static inline uint8_t libbpf_nla_getattr_u8(const struct nlattr *nla)
return *(uint8_t *)libbpf_nla_data(nla);
}
static inline uint16_t libbpf_nla_getattr_u16(const struct nlattr *nla)
{
return *(uint16_t *)libbpf_nla_data(nla);
}
static inline uint32_t libbpf_nla_getattr_u32(const struct nlattr *nla)
{
return *(uint32_t *)libbpf_nla_data(nla);
}
static inline uint64_t libbpf_nla_getattr_u64(const struct nlattr *nla)
{
return *(uint64_t *)libbpf_nla_data(nla);
}
static inline const char *libbpf_nla_getattr_str(const struct nlattr *nla)
{
return (const char *)libbpf_nla_data(nla);

View File

@@ -776,7 +776,7 @@ static int bpf_core_calc_field_relo(const char *prog_name,
break;
case BPF_CORE_FIELD_SIGNED:
*val = (btf_is_any_enum(mt) && BTF_INFO_KFLAG(mt->info)) ||
(btf_int_encoding(mt) & BTF_INT_SIGNED);
(btf_is_int(mt) && (btf_int_encoding(mt) & BTF_INT_SIGNED));
if (validate)
*validate = true; /* signedness is never ambiguous */
break;
@@ -1551,9 +1551,6 @@ int __bpf_core_types_match(const struct btf *local_btf, __u32 local_id, const st
if (level <= 0)
return -EINVAL;
local_t = btf_type_by_id(local_btf, local_id);
targ_t = btf_type_by_id(targ_btf, targ_id);
recur:
depth--;
if (depth < 0)

Some files were not shown because too many files have changed in this diff Show More