This avoids compilation warning if `struct bpf_redir_neigh` is not provided by
other kernel headers.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Yaniv reported a compilation error after pulling latest libbpf:
[...]
../libbpf/src/root/usr/include/bpf/bpf_helpers.h:99:10: error:
unknown register name 'r0' in asm
: "r0", "r1", "r2", "r3", "r4", "r5");
[...]
The issue got triggered given Yaniv was compiling tracing programs with native
target (e.g. x86) instead of BPF target, hence no BTF generated vmlinux.h nor
CO-RE used, and later llc with -march=bpf was invoked to compile from LLVM IR
to BPF object file. Given that clang was expecting x86 inline asm and not BPF
one the error complained that these regs don't exist on the former.
Guard bpf_tail_call_static() with defined(__bpf__) where BPF inline asm is valid
to use. BPF tracing programs on more modern kernels use BPF target anyway and
thus the bpf_tail_call_static() function will be available for them. BPF inline
asm is supported since clang 7 (clang <= 6 otherwise throws same above error),
and __bpf_unreachable() since clang 8, therefore include the latter condition
in order to prevent compilation errors for older clang versions. Given even an
old Ubuntu 18.04 LTS has official LLVM packages all the way up to llvm-10, I did
not bother to special case the __bpf_unreachable() inside bpf_tail_call_static()
further.
Also, undo the sockex3_kern's use of bpf_tail_call_static() sample given they
still have the old hacky way to even compile networking progs with native instead
of BPF target so bpf_tail_call_static() won't be defined there anymore.
Fixes: 0e9f6841f664 ("bpf, libbpf: Add bpf_tail_call_static helper for bpf programs")
Reported-by: Yaniv Agman <yanivagman@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Tested-by: Yaniv Agman <yanivagman@gmail.com>
Link: https://lore.kernel.org/bpf/CAMy7=ZUk08w5Gc2Z-EKi4JFtuUCaZYmE4yzhJjrExXpYKR4L8w@mail.gmail.com
Link: https://lore.kernel.org/bpf/20201021203257.26223-1-daniel@iogearbox.net
Recent work in f4d05259213f ("bpf: Add map_meta_equal map ops") and 134fede4eecf
("bpf: Relax max_entries check for most of the inner map types") added support
for dynamic inner max elements for most map-in-map types. Exceptions were maps
like array or prog array where the map_gen_lookup() callback uses the maps'
max_entries field as a constant when emitting instructions.
We recently implemented Maglev consistent hashing into Cilium's load balancer
which uses map-in-map with an outer map being hash and inner being array holding
the Maglev backend table for each service. This has been designed this way in
order to reduce overall memory consumption given the outer hash map allows to
avoid preallocating a large, flat memory area for all services. Also, the
number of service mappings is not always known a-priori.
The use case for dynamic inner array map entries is to further reduce memory
overhead, for example, some services might just have a small number of back
ends while others could have a large number. Right now the Maglev backend table
for small and large number of backends would need to have the same inner array
map entries which adds a lot of unneeded overhead.
Dynamic inner array map entries can be realized by avoiding the inlined code
generation for their lookup. The lookup will still be efficient since it will
be calling into array_map_lookup_elem() directly and thus avoiding retpoline.
The patch adds a BPF_F_INNER_MAP flag to map creation which therefore skips
inline code generation and relaxes array_map_meta_equal() check to ignore both
maps' max_entries. This also still allows to have faster lookups for map-in-map
when BPF_F_INNER_MAP is not specified and hence dynamic max_entries not needed.
Example code generation where inner map is dynamic sized array:
# bpftool p d x i 125
int handle__sys_enter(void * ctx):
; int handle__sys_enter(void *ctx)
0: (b4) w1 = 0
; int key = 0;
1: (63) *(u32 *)(r10 -4) = r1
2: (bf) r2 = r10
;
3: (07) r2 += -4
; inner_map = bpf_map_lookup_elem(&outer_arr_dyn, &key);
4: (18) r1 = map[id:468]
6: (07) r1 += 272
7: (61) r0 = *(u32 *)(r2 +0)
8: (35) if r0 >= 0x3 goto pc+5
9: (67) r0 <<= 3
10: (0f) r0 += r1
11: (79) r0 = *(u64 *)(r0 +0)
12: (15) if r0 == 0x0 goto pc+1
13: (05) goto pc+1
14: (b7) r0 = 0
15: (b4) w6 = -1
; if (!inner_map)
16: (15) if r0 == 0x0 goto pc+6
17: (bf) r2 = r10
;
18: (07) r2 += -4
; val = bpf_map_lookup_elem(inner_map, &key);
19: (bf) r1 = r0 | No inlining but instead
20: (85) call array_map_lookup_elem#149280 | call to array_map_lookup_elem()
; return val ? *val : -1; | for inner array lookup.
21: (15) if r0 == 0x0 goto pc+1
; return val ? *val : -1;
22: (61) r6 = *(u32 *)(r0 +0)
; }
23: (bc) w0 = w6
24: (95) exit
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201010234006.7075-4-daniel@iogearbox.net
Add an efficient ingress to ingress netns switch that can be used out of tc BPF
programs in order to redirect traffic from host ns ingress into a container
veth device ingress without having to go via CPU backlog queue [0]. For local
containers this can also be utilized and path via CPU backlog queue only needs
to be taken once, not twice. On a high level this borrows from ipvlan which does
similar switch in __netif_receive_skb_core() and then iterates via another_round.
This helps to reduce latency for mentioned use cases.
Pod to remote pod with redirect(), TCP_RR [1]:
# percpu_netperf 10.217.1.33
RT_LATENCY: 122.450 (per CPU: 122.666 122.401 122.333 122.401 )
MEAN_LATENCY: 121.210 (per CPU: 121.100 121.260 121.320 121.160 )
STDDEV_LATENCY: 120.040 (per CPU: 119.420 119.910 125.460 115.370 )
MIN_LATENCY: 46.500 (per CPU: 47.000 47.000 47.000 45.000 )
P50_LATENCY: 118.500 (per CPU: 118.000 119.000 118.000 119.000 )
P90_LATENCY: 127.500 (per CPU: 127.000 128.000 127.000 128.000 )
P99_LATENCY: 130.750 (per CPU: 131.000 131.000 129.000 132.000 )
TRANSACTION_RATE: 32666.400 (per CPU: 8152.200 8169.842 8174.439 8169.897 )
Pod to remote pod with redirect_peer(), TCP_RR:
# percpu_netperf 10.217.1.33
RT_LATENCY: 44.449 (per CPU: 43.767 43.127 45.279 45.622 )
MEAN_LATENCY: 45.065 (per CPU: 44.030 45.530 45.190 45.510 )
STDDEV_LATENCY: 84.823 (per CPU: 66.770 97.290 84.380 90.850 )
MIN_LATENCY: 33.500 (per CPU: 33.000 33.000 34.000 34.000 )
P50_LATENCY: 43.250 (per CPU: 43.000 43.000 43.000 44.000 )
P90_LATENCY: 46.750 (per CPU: 46.000 47.000 47.000 47.000 )
P99_LATENCY: 52.750 (per CPU: 51.000 54.000 53.000 53.000 )
TRANSACTION_RATE: 90039.500 (per CPU: 22848.186 23187.089 22085.077 21919.130 )
[0] https://linuxplumbersconf.org/event/7/contributions/674/attachments/568/1002/plumbers_2020_cilium_load_balancer.pdf
[1] https://github.com/borkmann/netperf_scripts/blob/master/percpu_netperf
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201010234006.7075-3-daniel@iogearbox.net
Add support for patching instructions of the following form:
- rX = *(T *)(rY + <off>);
- *(T *)(rX + <off>) = rY;
- *(T *)(rX + <off>) = <imm>, where T is one of {u8, u16, u32, u64}.
For such instructions, if the actual kernel field recorded in CO-RE relocation
has a different size than the one recorded locally (e.g., from vmlinux.h),
then libbpf will adjust T to an appropriate 1-, 2-, 4-, or 8-byte loads.
In general, such transformation is not always correct and could lead to
invalid final value being loaded or stored. But two classes of cases are
always safe:
- if both local and target (kernel) types are unsigned integers, but of
different sizes, then it's OK to adjust load/store instruction according to
the necessary memory size. Zero-extending nature of such instructions and
unsignedness make sure that the final value is always correct;
- pointer size mismatch between BPF target architecture (which is always
64-bit) and 32-bit host kernel architecture can be similarly resolved
automatically, because pointer is essentially an unsigned integer. Loading
32-bit pointer into 64-bit BPF register with zero extension will leave
correct pointer in the register.
Both cases are necessary to support CO-RE on 32-bit kernels, as `unsigned
long` in vmlinux.h generated from 32-bit kernel is 32-bit, but when compiled
with BPF program for BPF target it will be treated by compiler as 64-bit
integer. Similarly, pointers in vmlinux.h are 32-bit for kernel, but treated
as 64-bit values by compiler for BPF target. Both problems are now resolved by
libbpf for direct memory reads.
But similar transformations are useful in general when kernel fields are
"resized" from, e.g., unsigned int to unsigned long (or vice versa).
Now, similar transformations for signed integers are not safe to perform as
they will result in incorrect sign extension of the value. If such situation
is detected, libbpf will emit helpful message and will poison the instruction.
Not failing immediately means that it's possible to guard the instruction
based on kernel version (or other conditions) and make sure it's not
reachable.
If there is a need to read signed integers that change sizes between different
kernels, it's possible to use BPF_CORE_READ_BITFIELD() macro, which works both
with bitfields and non-bitfield integers of any signedness and handles
sign-extension properly. Also, bpf_core_read() with proper size and/or use of
bpf_core_field_size() relocation could allow to deal with such complicated
situations explicitly, if not so conventiently as direct memory reads.
Selftests added in a separate patch in progs/test_core_autosize.c demonstrate
both direct memory and probed use cases.
BPF_CORE_READ() is not changed and it won't deal with such situations as
automatically as direct memory reads due to the signedness integer
limitations, which are much harder to detect and control with compiler macro
magic. So it's encouraged to utilize direct memory reads as much as possible.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201008001025.292064-3-andrii@kernel.org
Bypass CO-RE relocations step for BPF programs that are not going to be
loaded. This allows to have BPF programs compiled in and disabled dynamically
if kernel is not supposed to provide enough relocation information. In such
case, there won't be unnecessary warnings about failed relocations.
Fixes: d929758101fc ("libbpf: Support disabling auto-loading BPF programs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201008001025.292064-2-andrii@kernel.org
Fix a compatibility problem when the old XDP_SHARED_UMEM mode is used
together with the xsk_socket__create() call. In the old XDP_SHARED_UMEM
mode, only sharing of the same device and queue id was allowed, and
in this mode, the fill ring and completion ring were shared between
the AF_XDP sockets.
Therefore, it was perfectly fine to call the xsk_socket__create() API
for each socket and not use the new xsk_socket__create_shared() API.
This behavior was ruined by the commit introducing XDP_SHARED_UMEM
support between different devices and/or queue ids. This patch restores
the ability to use xsk_socket__create in these circumstances so that
backward compatibility is not broken.
Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1602070946-11154-1-git-send-email-magnus.karlsson@gmail.com
bpf_program__set_attach_target(prog, fd, ...) will always fail when
fd = 0 (attach to a kernel symbol) because obj->btf_vmlinux is NULL
and there is no way to set it (at the moment btf_vmlinux is meant
to be temporary storage for use in bpf_object__load_xattr()).
Fix this by using libbpf_find_vmlinux_btf_id().
At some point we may want to opportunistically cache btf_vmlinux
so it can be reused with multiple programs.
Signed-off-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Petar Penkov <ppenkov@google.com>
Link: https://lore.kernel.org/bpf/20201005224528.389097-1-lrizzo@google.com
Say a user reuse map fd after creating a map manually and set the
pin_path, then load the object via libbpf.
In libbpf bpf_object__create_maps(), bpf_object__reuse_map() will
return 0 if there is no pinned map in map->pin_path. Then after
checking if map fd exist, we should also check if pin_path was set
and do bpf_map__pin() instead of continue the loop.
Fix it by creating map if fd not exist and continue checking pin_path
after that.
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201006021345.3817033-3-liuhangbin@gmail.com
Add bpf_this_cpu_ptr() to help access percpu var on this cpu. This
helper always returns a valid pointer, therefore no need to check
returned value for NULL. Also note that all programs run with
preemption disabled, which means that the returned pointer is stable
during all the execution of the program.
Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200929235049.2533242-6-haoluo@google.com
Add bpf_per_cpu_ptr() to help bpf programs access percpu vars.
bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel
except that it may return NULL. This happens when the cpu parameter is
out of range. So the caller must check the returned value.
Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200929235049.2533242-5-haoluo@google.com
If a ksym is defined with a type, libbpf will try to find the ksym's btf
information from kernel btf. If a valid btf entry for the ksym is found,
libbpf can pass in the found btf id to the verifier, which validates the
ksym's type and value.
Typeless ksyms (i.e. those defined as 'void') will not have such btf_id,
but it has the symbol's address (read from kallsyms) and its value is
treated as a raw pointer.
Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200929235049.2533242-3-haoluo@google.com
Pseudo_btf_id is a type of ld_imm insn that associates a btf_id to a
ksym so that further dereferences on the ksym can use the BTF info
to validate accesses. Internally, when seeing a pseudo_btf_id ld insn,
the verifier reads the btf_id stored in the insn[0]'s imm field and
marks the dst_reg as PTR_TO_BTF_ID. The btf_id points to a VAR_KIND,
which is encoded in btf_vminux by pahole. If the VAR is not of a struct
type, the dst reg will be marked as PTR_TO_MEM instead of PTR_TO_BTF_ID
and the mem_size is resolved to the size of the VAR's type.
>From the VAR btf_id, the verifier can also read the address of the
ksym's corresponding kernel var from kallsyms and use that to fill
dst_reg.
Therefore, the proper functionality of pseudo_btf_id depends on (1)
kallsyms and (2) the encoding of kernel global VARs in pahole, which
should be available since pahole v1.18.
Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200929235049.2533242-2-haoluo@google.com
Currently, perf event in perf event array is removed from the array when
the map fd used to add the event is closed. This behavior makes it
difficult to the share perf events with perf event array.
Introduce perf event map that keeps the perf event open with a new flag
BPF_F_PRESERVE_ELEMS. With this flag set, perf events in the array are not
removed when the original map fd is closed. Instead, the perf event will
stay in the map until 1) it is explicitly removed from the array; or 2)
the array is freed.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200930224927.1936644-2-songliubraving@fb.com
Libbpf doesn't rely on libc_compat.h anymore, so ignore it for the purposes of
syncing libbpf sources into Github.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Ensure that btf_dump can accommodate new BTF types being appended to BTF
instance after struct btf_dump was created. This came up during attemp to
use btf_dump for raw type dumping in selftests, but given changes are not
excessive, it's good to not have any gotchas in API usage, so I decided to
support such use case in general.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929232843.1249318-2-andriin@fb.com
Port of tail_call_static() helper function from Cilium's BPF code base [0]
to libbpf, so others can easily consume it as well. We've been using this
in production code for some time now. The main idea is that we guarantee
that the kernel's BPF infrastructure and JIT (here: x86_64) can patch the
JITed BPF insns with direct jumps instead of having to fall back to using
expensive retpolines. By using inline asm, we guarantee that the compiler
won't merge the call from different paths with potentially different
content of r2/r3.
We're also using Cilium's __throw_build_bug() macro (here as: __bpf_unreachable())
in different places as a neat trick to trigger compilation errors when
compiler does not remove code at compilation time. This works for the BPF
back end as it does not implement the __builtin_trap().
[0] f5537c2602
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/1656a082e077552eb46642d513b4a6bde9a7dd01.1601477936.git.daniel@iogearbox.net
Add a redirect_neigh() helper as redirect() drop-in replacement
for the xmit side. Main idea for the helper is to be very similar
in semantics to the latter just that the skb gets injected into
the neighboring subsystem in order to let the stack do the work
it knows best anyway to populate the L2 addresses of the packet
and then hand over to dev_queue_xmit() as redirect() does.
This solves two bigger items: i) skbs don't need to go up to the
stack on the host facing veth ingress side for traffic egressing
the container to achieve the same for populating L2 which also
has the huge advantage that ii) the skb->sk won't get orphaned in
ip_rcv_core() when entering the IP routing layer on the host stack.
Given that skb->sk neither gets orphaned when crossing the netns
as per 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing
the skb.") the helper can then push the skbs directly to the phys
device where FQ scheduler can do its work and TCP stack gets proper
backpressure given we hold on to skb->sk as long as skb is still
residing in queues.
With the helper used in BPF data path to then push the skb to the
phys device, I observed a stable/consistent TCP_STREAM improvement
on veth devices for traffic going container -> host -> host ->
container from ~10Gbps to ~15Gbps for a single stream in my test
environment.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net
Similarly to 5a52ae4e32a6 ("bpf: Allow to retrieve cgroup v1 classid
from v2 hooks"), add a helper to retrieve cgroup v1 classid solely
based on the skb->sk, so it can be used as key as part of BPF map
lookups out of tc from host ns, in particular given the skb->sk is
retained these days when crossing net ns thanks to 9c4c325252c5
("skbuff: preserve sock reference when scrubbing the skb."). This
is similar to bpf_skb_cgroup_id() which implements the same for v2.
Kubernetes ecosystem is still operating on v1 however, hence net_cls
needs to be used there until this can be dropped in with the v2
helper of bpf_skb_cgroup_id().
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/ed633cf27a1c620e901c5aa99ebdefb028dce600.1601477936.git.daniel@iogearbox.net
Blacklist new tests that are depending on features in latest kernel. Also
temporarily blacklist raw_tp_test_run test, until it is fixed upstream.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit: 2f7de9865ba3cbfcf8b504f07154fdb6124176a4
Checkpoint bpf-next commit: b0efc216f577997bf563d76d51673ed79c3d5f71
Baseline bpf commit: 87f92ac4c12758c4da3bbe4393f1d884b610b8a6
Checkpoint bpf commit: 9cf51446e68607136e42a4e531a30c888c472463
Alan Maguire (2):
bpf: Add bpf_snprintf_btf helper
bpf: Add bpf_seq_printf_btf helper
Andrii Nakryiko (11):
libbpf: Refactor internals of BTF type index
libbpf: Remove assumption of single contiguous memory for BTF data
libbpf: Generalize common logic for managing dynamically-sized arrays
libbpf: Extract generic string hashing function for reuse
libbpf: Allow modification of BTF and add btf__add_str API
libbpf: Add btf__new_empty() to create an empty BTF object
libbpf: Add BTF writing APIs
libbpf: Add btf__str_by_offset() as a more generic variant of
name_by_offset
selftests/bpf: Test BTF writing APIs
libbpf: Support BTF loading and raw data output in both endianness
libbpf: Fix uninitialized variable in btf_parse_type_sec
Martin KaFai Lau (4):
bpf: Change bpf_sk_release and bpf_sk_*cgroup_id to accept
ARG_PTR_TO_BTF_ID_SOCK_COMMON
bpf: Change bpf_sk_storage_*() to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON
bpf: Change bpf_tcp_*_syncookie to accept
ARG_PTR_TO_BTF_ID_SOCK_COMMON
bpf: Change bpf_sk_assign to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON
Song Liu (3):
bpf: Fix comment for helper bpf_current_task_under_cgroup()
bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint
libbpf: Support test run of raw tracepoint programs
Toke Høiland-Jørgensen (2):
bpf: Support attaching freplace programs to multiple attach points
libbpf: Add support for freplace attachment in bpf_link_create
YiFei Zhu (2):
bpf: Add BPF_PROG_BIND_MAP syscall
libbpf: Add BPF_PROG_BIND_MAP syscall and use it on .rodata section
Yonghong Song (1):
libbpf: Fix a compilation error with xsk.c for ubuntu 16.04
include/uapi/linux/bpf.h | 118 ++-
src/bpf.c | 67 +-
src/bpf.h | 39 +-
src/btf.c | 1851 ++++++++++++++++++++++++++++++++------
src/btf.h | 51 ++
src/btf_dump.c | 9 +-
src/hashmap.h | 12 +
src/libbpf.c | 113 ++-
src/libbpf.h | 3 +
src/libbpf.map | 28 +
src/libbpf_internal.h | 8 +
src/xsk.c | 1 +
12 files changed, 1997 insertions(+), 303 deletions(-)
--
2.24.1
Fix obvious unitialized variable use that wasn't reported by compiler. libbpf
Makefile changes to catch such errors are added separately.
Fixes: 3289959b97ca ("libbpf: Support BTF loading and raw data output in both endianness")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200929220604.833631-1-andriin@fb.com
This enables support for attaching freplace programs to multiple attach
points. It does this by amending the UAPI for bpf_link_Create with a target
btf ID that can be used to supply the new attachment point along with the
target program fd. The target must be compatible with the target that was
supplied at program load time.
The implementation reuses the checks that were factored out of
check_attach_btf_id() to ensure compatibility between the BTF types of the
old and new attachment. If these match, a new bpf_tracing_link will be
created for the new attach target, allowing multiple attachments to
co-exist simultaneously.
The code could theoretically support multiple-attach of other types of
tracing programs as well, but since I don't have a use case for any of
those, there is no API support for doing so.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/160138355169.48470.17165680973640685368.stgit@toke.dk
Teach BTF to recognized wrong endianness and transparently convert it
internally to host endianness. Original endianness of BTF will be preserved
and used during btf__get_raw_data() to convert resulting raw data to the same
endianness and a source raw_data. This means that little-endian host can parse
big-endian BTF with no issues, all the type data will be presented to the
client application in native endianness, but when it's time for emitting BTF
to persist it in a file (e.g., after BTF deduplication), original non-native
endianness will be preserved and stored.
It's possible to query original endianness of BTF data with new
btf__endianness() API. It's also possible to override desired output
endianness with btf__set_endianness(), so that if application needs to load,
say, big-endian BTF and store it as little-endian BTF, it's possible to
manually override this. If btf__set_endianness() was used to change
endianness, btf__endianness() will reflect overridden endianness.
Given there are no known use cases for supporting cross-endianness for
.BTF.ext, loading .BTF.ext in non-native endianness is not supported.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929043046.1324350-3-andriin@fb.com
BTF strings are used not just for names, they can be arbitrary strings used
for CO-RE relocations, line/func infos, etc. Thus "name_by_offset" terminology
is too specific and might be misleading. Instead, introduce
btf__str_by_offset() API which uses generic string terminology.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200929020533.711288-3-andriin@fb.com
Add APIs for appending new BTF types at the end of BTF object.
Each BTF kind has either one API of the form btf__add_<kind>(). For types
that have variable amount of additional items (struct/union, enum, func_proto,
datasec), additional API is provided to emit each such item. E.g., for
emitting a struct, one would use the following sequence of API calls:
btf__add_struct(...);
btf__add_field(...);
...
btf__add_field(...);
Each btf__add_field() will ensure that the last BTF type is of STRUCT or
UNION kind and will automatically increment that type's vlen field.
All the strings are provided as C strings (const char *), not a string offset.
This significantly improves usability of BTF writer APIs. All such strings
will be automatically appended to string section or existing string will be
re-used, if such string was already added previously.
Each API attempts to do all the reasonable validations, like enforcing
non-empty names for entities with required names, proper value bounds, various
bit offset restrictions, etc.
Type ID validation is minimal because it's possible to emit a type that refers
to type that will be emitted later, so libbpf has no way to enforce such
cases. User must be careful to properly emit all the necessary types and
specify type IDs that will be valid in the finally generated BTF.
Each of btf__add_<kind>() APIs return new type ID on success or negative
value on error. APIs like btf__add_field() that emit additional items
return zero on success and negative value on error.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200929020533.711288-2-andriin@fb.com
A helper is added to allow seq file writing of kernel data
structures using vmlinux BTF. Its signature is
long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr,
u32 btf_ptr_size, u64 flags);
Flags and struct btf_ptr definitions/use are identical to the
bpf_snprintf_btf helper, and the helper returns 0 on success
or a negative error value.
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1601292670-1616-8-git-send-email-alan.maguire@oracle.com
A helper is added to support tracing kernel type information in BPF
using the BPF Type Format (BTF). Its signature is
long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr,
u32 btf_ptr_size, u64 flags);
struct btf_ptr * specifies
- a pointer to the data to be traced
- the BTF id of the type of data pointed to
- a flags field is provided for future use; these flags
are not to be confused with the BTF_F_* flags
below that control how the btf_ptr is displayed; the
flags member of the struct btf_ptr may be used to
disambiguate types in kernel versus module BTF, etc;
the main distinction is the flags relate to the type
and information needed in identifying it; not how it
is displayed.
For example a BPF program with a struct sk_buff *skb
could do the following:
static struct btf_ptr b = { };
b.ptr = skb;
b.type_id = __builtin_btf_type_id(struct sk_buff, 1);
bpf_snprintf_btf(str, sizeof(str), &b, sizeof(b), 0, 0);
Default output looks like this:
(struct sk_buff){
.transport_header = (__u16)65535,
.mac_header = (__u16)65535,
.end = (sk_buff_data_t)192,
.head = (unsigned char *)0x000000007524fd8b,
.data = (unsigned char *)0x000000007524fd8b,
.truesize = (unsigned int)768,
.users = (refcount_t){
.refs = (atomic_t){
.counter = (int)1,
},
},
}
Flags modifying display are as follows:
- BTF_F_COMPACT: no formatting around type information
- BTF_F_NONAME: no struct/union member names/types
- BTF_F_PTR_RAW: show raw (unobfuscated) pointer values;
equivalent to %px.
- BTF_F_ZERO: show zero-valued struct/union members;
they are not displayed by default
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1601292670-1616-4-git-send-email-alan.maguire@oracle.com
Allow internal BTF representation to switch from default read-only mode, in
which raw BTF data is a single non-modifiable block of memory with BTF header,
types, and strings layed out sequentially and contiguously in memory, into
a writable representation with types and strings data split out into separate
memory regions, that can be dynamically expanded.
Such writable internal representation is transparent to users of libbpf APIs,
but allows to append new types and strings at the end of BTF, which is
a typical use case when generating BTF programmatically. All the basic
guarantees of BTF types and strings layout is preserved, i.e., user can get
`struct btf_type *` pointer and read it directly. Such btf_type pointers might
be invalidated if BTF is modified, so some care is required in such mixed
read/write scenarios.
Switch from read-only to writable configuration happens automatically the
first time when user attempts to modify BTF by either adding a new type or new
string. It is still possible to get raw BTF data, which is a single piece of
memory that can be persisted in ELF section or into a file as raw BTF. Such
raw data memory is also still owned by BTF and will be freed either when BTF
object is freed or if another modification to BTF happens, as any modification
invalidates BTF raw representation.
This patch adds the first two BTF manipulation APIs: btf__add_str(), which
allows to add arbitrary strings to BTF string section, and btf__find_str()
which allows to find existing string offset, but not add it if it's missing.
All the added strings are automatically deduplicated. This is achieved by
maintaining an additional string lookup index for all unique strings. Such
index is built when BTF is switched to modifiable mode. If at that time BTF
strings section contained duplicate strings, they are not de-duplicated. This
is done specifically to not modify the existing content of BTF (types, their
string offsets, etc), which can cause confusion and is especially important
property if there is struct btf_ext associated with struct btf. By following
this "imperfect deduplication" process, btf_ext is kept consitent and correct.
If deduplication of strings is necessary, it can be forced by doing BTF
deduplication, at which point all the strings will be eagerly deduplicated and
all string offsets both in struct btf and struct btf_ext will be updated.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-6-andriin@fb.com