libbpf

mirror of https://github.com/netdata/libbpf.git synced 2026-06-24 23:49:08 +08:00

Author	SHA1	Message	Date
Andrii Nakryiko	e6725d2467	libbpf: Accomodate DWARF/compiler bug with duplicated identical arrays In some cases compiler seems to generate distinct DWARF types for identical arrays within the same CU. That seems like a bug, but it's already out there and breaks type graph equivalence checks, so accommodate it anyway by checking for identical arrays, regardless of their type ID. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201105043402.2530976-10-andrii@kernel.org	2020-11-05 21:20:45 -08:00
Andrii Nakryiko	658ac1ec19	libbpf: Support BTF dedup of split BTFs Add support for deduplication split BTFs. When deduplicating split BTF, base BTF is considered to be immutable and can't be modified or adjusted. 99% of BTF deduplication logic is left intact (module some type numbering adjustments). There are only two differences. First, each type in base BTF gets hashed (expect VAR and DATASEC, of course, those are always considered to be self-canonical instances) and added into a table of canonical table candidates. Hashing is a shallow, fast operation, so mostly eliminates the overhead of having entire base BTF to be a part of BTF dedup. Second difference is very critical and subtle. While deduplicating split BTF types, it is possible to discover that one of immutable base BTF BTF_KIND_FWD types can and should be resolved to a full STRUCT/UNION type from the split BTF part. This is, obviously, can't happen because we can't modify the base BTF types anymore. So because of that, any type in split BTF that directly or indirectly references that newly-to-be-resolved FWD type can't be considered to be equivalent to the corresponding canonical types in base BTF, because that would result in a loss of type resolution information. So in such case, split BTF types will be deduplicated separately and will cause some duplication of type information, which is unavoidable. With those two changes, the rest of the algorithm manages to deduplicate split BTF correctly, pointing all the duplicates to their canonical counter-parts in base BTF, but also is deduplicating whatever unique types are present in split BTF on their own. Also, theoretically, split BTF after deduplication could end up with either empty type section or empty string section. This is handled by libbpf correctly in one of previous patches in the series. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201105043402.2530976-9-andrii@kernel.org	2020-11-05 21:20:45 -08:00
Andrii Nakryiko	dd36215834	libbpf: Fix BTF data layout checks and allow empty BTF Make data section layout checks stricter, disallowing overlap of types and strings data. Additionally, allow BTFs with no type data. There is nothing inherently wrong with having BTF with no types (put potentially with some strings). This could be a situation with kernel module BTFs, if module doesn't introduce any new type information. Also fix invalid offset alignment check for btf->hdr->type_off. Fixes: 8a138aed4a80 ("bpf: btf: Add BTF support to libbpf") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201105043402.2530976-8-andrii@kernel.org	2020-11-05 21:20:45 -08:00
Andrii Nakryiko	2811d54f8b	libbpf: Implement basic split BTF support Support split BTF operation, in which one BTF (base BTF) provides basic set of types and strings, while another one (split BTF) builds on top of base's types and strings and adds its own new types and strings. From API standpoint, the fact that the split BTF is built on top of the base BTF is transparent. Type numeration is transparent. If the base BTF had last type ID #N, then all types in the split BTF start at type ID N+1. Any type in split BTF can reference base BTF types, but not vice versa. Programmatically construction of a split BTF on top of a base BTF is supported: one can create an empty split BTF with btf__new_empty_split() and pass base BTF as an input, or pass raw binary data to btf__new_split(), or use btf__parse_xxx_split() variants to get initial set of split types/strings from the ELF file with .BTF section. String offsets are similarly transparent and are a logical continuation of base BTF's strings. When building BTF programmatically and adding a new string (explicitly with btf__add_str() or implicitly through appending new types/members), string-to-be-added would first be looked up from the base BTF's string section and re-used if it's there. If not, it will be looked up and/or added to the split BTF string section. Similarly to type IDs, types in split BTF can refer to strings from base BTF absolutely transparently (but not vice versa, of course, because base BTF doesn't "know" about existence of split BTF). Internal type index is slightly adjusted to be zero-indexed, ignoring a fake [0] VOID type. This allows to handle split/base BTF type lookups transparently by using btf->start_id type ID offset, which is always 1 for base/non-split BTF and equals btf__get_nr_types(base_btf) + 1 for the split BTF. BTF deduplication is not yet supported for split BTF and support for it will be added in separate patch. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201105043402.2530976-5-andrii@kernel.org	2020-11-05 21:20:45 -08:00
Andrii Nakryiko	be2dc73ee2	libbpf: Unify and speed up BTF string deduplication Revamp BTF dedup's string deduplication to match the approach of writable BTF string management. This allows to transfer deduplicated strings index back to BTF object after deduplication without expensive extra memory copying and hash map re-construction. It also simplifies the code and speeds it up, because hashmap-based string deduplication is faster than sort + unique approach. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201105043402.2530976-4-andrii@kernel.org	2020-11-05 21:20:45 -08:00
Andrii Nakryiko	4953827790	libbpf: Factor out common operations in BTF writing APIs Factor out commiting of appended type data. Also extract fetching the very last type in the BTF (to append members to). These two operations are common across many APIs and will be easier to refactor with split BTF, if they are extracted into a single place. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201105043402.2530976-2-andrii@kernel.org	2020-11-05 21:20:45 -08:00
Andrii Nakryiko	d1fd50d475	helpers: add `struct bpf_redir_neigh` forward declaration This avoids compilation warning if `struct bpf_redir_neigh` is not provided by other kernel headers. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> v0.2	2020-10-28 09:59:37 -07:00
Andrii Nakryiko	f0c6b6bdfb	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: 376dcfe3a4e5a5475a84e6b5f926066a8614f887 Checkpoint bpf-next commit: 3cb12d27ff655e57e8efe3486dca2a22f4e30578 Baseline bpf commit: 28802e7c0c9954218d1830f7507edc9d49b03a00 Checkpoint bpf commit: c66dca98a24cb5f3493dd08d40bcfa94a220fa92 Daniel Borkmann (1): bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static Toke Høiland-Jørgensen (1): bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop include/uapi/linux/bpf.h \| 22 ++++++++++++++++++---- src/bpf_helpers.h \| 2 ++ 2 files changed, 20 insertions(+), 4 deletions(-) -- 2.24.1	2020-10-28 09:08:35 -07:00
Andrii Nakryiko	475ee87969	sync: auto-generate latest BPF helpers Latest changes to BPF helper definitions.	2020-10-28 09:08:35 -07:00
Daniel Borkmann	f754860e35	bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static Yaniv reported a compilation error after pulling latest libbpf: [...] ../libbpf/src/root/usr/include/bpf/bpf_helpers.h:99:10: error: unknown register name 'r0' in asm : "r0", "r1", "r2", "r3", "r4", "r5"); [...] The issue got triggered given Yaniv was compiling tracing programs with native target (e.g. x86) instead of BPF target, hence no BTF generated vmlinux.h nor CO-RE used, and later llc with -march=bpf was invoked to compile from LLVM IR to BPF object file. Given that clang was expecting x86 inline asm and not BPF one the error complained that these regs don't exist on the former. Guard bpf_tail_call_static() with defined(__bpf__) where BPF inline asm is valid to use. BPF tracing programs on more modern kernels use BPF target anyway and thus the bpf_tail_call_static() function will be available for them. BPF inline asm is supported since clang 7 (clang <= 6 otherwise throws same above error), and __bpf_unreachable() since clang 8, therefore include the latter condition in order to prevent compilation errors for older clang versions. Given even an old Ubuntu 18.04 LTS has official LLVM packages all the way up to llvm-10, I did not bother to special case the __bpf_unreachable() inside bpf_tail_call_static() further. Also, undo the sockex3_kern's use of bpf_tail_call_static() sample given they still have the old hacky way to even compile networking progs with native instead of BPF target so bpf_tail_call_static() won't be defined there anymore. Fixes: 0e9f6841f664 ("bpf, libbpf: Add bpf_tail_call_static helper for bpf programs") Reported-by: Yaniv Agman <yanivagman@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Tested-by: Yaniv Agman <yanivagman@gmail.com> Link: https://lore.kernel.org/bpf/CAMy7=ZUk08w5Gc2Z-EKi4JFtuUCaZYmE4yzhJjrExXpYKR4L8w@mail.gmail.com Link: https://lore.kernel.org/bpf/20201021203257.26223-1-daniel@iogearbox.net	2020-10-28 09:08:35 -07:00
Toke Høiland-Jørgensen	78d61150e9	bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop Based on the discussion in [0], update the bpf_redirect_neigh() helper to accept an optional parameter specifying the nexthop information. This makes it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without incurring a duplicate FIB lookup - since the FIB lookup helper will return the nexthop information even if no neighbour is present, this can simply be passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns BPF_FIB_LKUP_RET_NO_NEIGH. Thus fix & extend it before helper API is frozen. [0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/ Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/bpf/160322915615.32199.1187570224032024535.stgit@toke.dk	2020-10-28 09:08:35 -07:00
Andrii Nakryiko	49280406a2	readme: add Ubuntu mentions Ubuntu 20.10 is now a good version to do BPF + CO-RE development.	2020-10-26 21:16:14 -07:00
Andrii Nakryiko	de58d0cccf	sync: update 5.5.0 blacklist Blacklist 2 new selftests, which depend on 5.10 kernel. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2020-10-12 14:27:04 -07:00
Andrii Nakryiko	6fa81d4dbe	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: f4d385e4d51d035c7f0d68a3e9564c9453c13aa4 Checkpoint bpf-next commit: 376dcfe3a4e5a5475a84e6b5f926066a8614f887 Baseline bpf commit: 9cf51446e68607136e42a4e531a30c888c472463 Checkpoint bpf commit: 28802e7c0c9954218d1830f7507edc9d49b03a00 Andrii Nakryiko (3): libbpf: Skip CO-RE relocations for not loaded BPF programs libbpf: Support safe subset of load/store instruction resizing with CO-RE libbpf: Allow specifying both ELF and raw BTF for CO-RE BTF override Daniel Borkmann (3): bpf: Improve bpf_redirect_neigh helper description bpf: Add redirect_peer helper bpf: Allow for map-in-map with dynamic inner array map entries Hangbin Liu (2): libbpf: Close map fd if init map slots failed libbpf: Check if pin_path was set even map fd exist Hao Luo (4): bpf: Introduce pseudo_btf_id bpf/libbpf: BTF support for typed ksyms bpf: Introduce bpf_per_cpu_ptr() bpf: Introducte bpf_this_cpu_ptr() Jakub Wilk (1): bpf: Fix typo in uapi/linux/bpf.h Luigi Rizzo (1): bpf, libbpf: Use valid btf in bpf_program__set_attach_target Magnus Karlsson (1): libbpf: Fix compatibility problem in xsk_socket__create Nikita V. Shirokov (1): bpf: Add tcp_notsent_lowat bpf setsockopt Song Liu (1): bpf: Introduce BPF_F_PRESERVE_ELEMS for perf event array include/uapi/linux/bpf.h \| 104 ++++++++++-- src/libbpf.c \| 348 ++++++++++++++++++++++++++++++++------- src/xsk.c \| 7 +- 3 files changed, 385 insertions(+), 74 deletions(-) -- 2.24.1	2020-10-12 14:27:04 -07:00
Andrii Nakryiko	bc94c2b82f	sync: auto-generate latest BPF helpers Latest changes to BPF helper definitions.	2020-10-12 14:27:04 -07:00
Daniel Borkmann	d47094a2ce	bpf: Allow for map-in-map with dynamic inner array map entries Recent work in f4d05259213f ("bpf: Add map_meta_equal map ops") and 134fede4eecf ("bpf: Relax max_entries check for most of the inner map types") added support for dynamic inner max elements for most map-in-map types. Exceptions were maps like array or prog array where the map_gen_lookup() callback uses the maps' max_entries field as a constant when emitting instructions. We recently implemented Maglev consistent hashing into Cilium's load balancer which uses map-in-map with an outer map being hash and inner being array holding the Maglev backend table for each service. This has been designed this way in order to reduce overall memory consumption given the outer hash map allows to avoid preallocating a large, flat memory area for all services. Also, the number of service mappings is not always known a-priori. The use case for dynamic inner array map entries is to further reduce memory overhead, for example, some services might just have a small number of back ends while others could have a large number. Right now the Maglev backend table for small and large number of backends would need to have the same inner array map entries which adds a lot of unneeded overhead. Dynamic inner array map entries can be realized by avoiding the inlined code generation for their lookup. The lookup will still be efficient since it will be calling into array_map_lookup_elem() directly and thus avoiding retpoline. The patch adds a BPF_F_INNER_MAP flag to map creation which therefore skips inline code generation and relaxes array_map_meta_equal() check to ignore both maps' max_entries. This also still allows to have faster lookups for map-in-map when BPF_F_INNER_MAP is not specified and hence dynamic max_entries not needed. Example code generation where inner map is dynamic sized array: # bpftool p d x i 125 int handle__sys_enter(void * ctx): ; int handle__sys_enter(void ctx) 0: (b4) w1 = 0 ; int key = 0; 1: (63) (u32 )(r10 -4) = r1 2: (bf) r2 = r10 ; 3: (07) r2 += -4 ; inner_map = bpf_map_lookup_elem(&outer_arr_dyn, &key); 4: (18) r1 = map[id:468] 6: (07) r1 += 272 7: (61) r0 = (u32 )(r2 +0) 8: (35) if r0 >= 0x3 goto pc+5 9: (67) r0 <<= 3 10: (0f) r0 += r1 11: (79) r0 = (u64 )(r0 +0) 12: (15) if r0 == 0x0 goto pc+1 13: (05) goto pc+1 14: (b7) r0 = 0 15: (b4) w6 = -1 ; if (!inner_map) 16: (15) if r0 == 0x0 goto pc+6 17: (bf) r2 = r10 ; 18: (07) r2 += -4 ; val = bpf_map_lookup_elem(inner_map, &key); 19: (bf) r1 = r0 \| No inlining but instead 20: (85) call array_map_lookup_elem#149280 \| call to array_map_lookup_elem() ; return val ? val : -1; \| for inner array lookup. 21: (15) if r0 == 0x0 goto pc+1 ; return val ? val : -1; 22: (61) r6 = (u32 *)(r0 +0) ; } 23: (bc) w0 = w6 24: (95) exit Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201010234006.7075-4-daniel@iogearbox.net	2020-10-12 14:27:04 -07:00
Daniel Borkmann	4672fb6790	bpf: Add redirect_peer helper Add an efficient ingress to ingress netns switch that can be used out of tc BPF programs in order to redirect traffic from host ns ingress into a container veth device ingress without having to go via CPU backlog queue [0]. For local containers this can also be utilized and path via CPU backlog queue only needs to be taken once, not twice. On a high level this borrows from ipvlan which does similar switch in __netif_receive_skb_core() and then iterates via another_round. This helps to reduce latency for mentioned use cases. Pod to remote pod with redirect(), TCP_RR [1]: # percpu_netperf 10.217.1.33 RT_LATENCY: 122.450 (per CPU: 122.666 122.401 122.333 122.401 ) MEAN_LATENCY: 121.210 (per CPU: 121.100 121.260 121.320 121.160 ) STDDEV_LATENCY: 120.040 (per CPU: 119.420 119.910 125.460 115.370 ) MIN_LATENCY: 46.500 (per CPU: 47.000 47.000 47.000 45.000 ) P50_LATENCY: 118.500 (per CPU: 118.000 119.000 118.000 119.000 ) P90_LATENCY: 127.500 (per CPU: 127.000 128.000 127.000 128.000 ) P99_LATENCY: 130.750 (per CPU: 131.000 131.000 129.000 132.000 ) TRANSACTION_RATE: 32666.400 (per CPU: 8152.200 8169.842 8174.439 8169.897 ) Pod to remote pod with redirect_peer(), TCP_RR: # percpu_netperf 10.217.1.33 RT_LATENCY: 44.449 (per CPU: 43.767 43.127 45.279 45.622 ) MEAN_LATENCY: 45.065 (per CPU: 44.030 45.530 45.190 45.510 ) STDDEV_LATENCY: 84.823 (per CPU: 66.770 97.290 84.380 90.850 ) MIN_LATENCY: 33.500 (per CPU: 33.000 33.000 34.000 34.000 ) P50_LATENCY: 43.250 (per CPU: 43.000 43.000 43.000 44.000 ) P90_LATENCY: 46.750 (per CPU: 46.000 47.000 47.000 47.000 ) P99_LATENCY: 52.750 (per CPU: 51.000 54.000 53.000 53.000 ) TRANSACTION_RATE: 90039.500 (per CPU: 22848.186 23187.089 22085.077 21919.130 ) [0] https://linuxplumbersconf.org/event/7/contributions/674/attachments/568/1002/plumbers_2020_cilium_load_balancer.pdf [1] https://github.com/borkmann/netperf_scripts/blob/master/percpu_netperf Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201010234006.7075-3-daniel@iogearbox.net	2020-10-12 14:27:04 -07:00
Daniel Borkmann	a8a505a36f	bpf: Improve bpf_redirect_neigh helper description Follow-up to address David's feedback that we should better describe internals of the bpf_redirect_neigh() helper. Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: David Ahern <dsahern@gmail.com> Link: https://lore.kernel.org/bpf/20201010234006.7075-2-daniel@iogearbox.net	2020-10-12 14:27:04 -07:00
Nikita V. Shirokov	e3b9cf7aaa	bpf: Add tcp_notsent_lowat bpf setsockopt Adding support for TCP_NOTSENT_LOWAT sockoption (https://lwn.net/Articles/560082/) in tcp bpf programs. Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201009070325.226855-1-tehnerd@tehnerd.com	2020-10-12 14:27:04 -07:00
Andrii Nakryiko	76764b891b	libbpf: Allow specifying both ELF and raw BTF for CO-RE BTF override Use generalized BTF parsing logic, making it possible to parse BTF both from ELF file, as well as a raw BTF dump. This makes it easier to write custom tests with manually generated BTFs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201008001025.292064-4-andrii@kernel.org	2020-10-12 14:27:04 -07:00
Andrii Nakryiko	8ef6a6e709	libbpf: Support safe subset of load/store instruction resizing with CO-RE Add support for patching instructions of the following form: - rX = (T )(rY + <off>); - (T )(rX + <off>) = rY; - (T )(rX + <off>) = <imm>, where T is one of {u8, u16, u32, u64}. For such instructions, if the actual kernel field recorded in CO-RE relocation has a different size than the one recorded locally (e.g., from vmlinux.h), then libbpf will adjust T to an appropriate 1-, 2-, 4-, or 8-byte loads. In general, such transformation is not always correct and could lead to invalid final value being loaded or stored. But two classes of cases are always safe: - if both local and target (kernel) types are unsigned integers, but of different sizes, then it's OK to adjust load/store instruction according to the necessary memory size. Zero-extending nature of such instructions and unsignedness make sure that the final value is always correct; - pointer size mismatch between BPF target architecture (which is always 64-bit) and 32-bit host kernel architecture can be similarly resolved automatically, because pointer is essentially an unsigned integer. Loading 32-bit pointer into 64-bit BPF register with zero extension will leave correct pointer in the register. Both cases are necessary to support CO-RE on 32-bit kernels, as `unsigned long` in vmlinux.h generated from 32-bit kernel is 32-bit, but when compiled with BPF program for BPF target it will be treated by compiler as 64-bit integer. Similarly, pointers in vmlinux.h are 32-bit for kernel, but treated as 64-bit values by compiler for BPF target. Both problems are now resolved by libbpf for direct memory reads. But similar transformations are useful in general when kernel fields are "resized" from, e.g., unsigned int to unsigned long (or vice versa). Now, similar transformations for signed integers are not safe to perform as they will result in incorrect sign extension of the value. If such situation is detected, libbpf will emit helpful message and will poison the instruction. Not failing immediately means that it's possible to guard the instruction based on kernel version (or other conditions) and make sure it's not reachable. If there is a need to read signed integers that change sizes between different kernels, it's possible to use BPF_CORE_READ_BITFIELD() macro, which works both with bitfields and non-bitfield integers of any signedness and handles sign-extension properly. Also, bpf_core_read() with proper size and/or use of bpf_core_field_size() relocation could allow to deal with such complicated situations explicitly, if not so conventiently as direct memory reads. Selftests added in a separate patch in progs/test_core_autosize.c demonstrate both direct memory and probed use cases. BPF_CORE_READ() is not changed and it won't deal with such situations as automatically as direct memory reads due to the signedness integer limitations, which are much harder to detect and control with compiler macro magic. So it's encouraged to utilize direct memory reads as much as possible. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201008001025.292064-3-andrii@kernel.org	2020-10-12 14:27:04 -07:00
Andrii Nakryiko	44d5bc1709	libbpf: Skip CO-RE relocations for not loaded BPF programs Bypass CO-RE relocations step for BPF programs that are not going to be loaded. This allows to have BPF programs compiled in and disabled dynamically if kernel is not supposed to provide enough relocation information. In such case, there won't be unnecessary warnings about failed relocations. Fixes: d929758101fc ("libbpf: Support disabling auto-loading BPF programs") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201008001025.292064-2-andrii@kernel.org	2020-10-12 14:27:04 -07:00
Magnus Karlsson	95848b59b9	libbpf: Fix compatibility problem in xsk_socket__create Fix a compatibility problem when the old XDP_SHARED_UMEM mode is used together with the xsk_socket__create() call. In the old XDP_SHARED_UMEM mode, only sharing of the same device and queue id was allowed, and in this mode, the fill ring and completion ring were shared between the AF_XDP sockets. Therefore, it was perfectly fine to call the xsk_socket__create() API for each socket and not use the new xsk_socket__create_shared() API. This behavior was ruined by the commit introducing XDP_SHARED_UMEM support between different devices and/or queue ids. This patch restores the ability to use xsk_socket__create in these circumstances so that backward compatibility is not broken. Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1602070946-11154-1-git-send-email-magnus.karlsson@gmail.com	2020-10-12 14:27:04 -07:00
Jakub Wilk	1bc08143b5	bpf: Fix typo in uapi/linux/bpf.h Reported-by: Samanta Navarro <ferivoz@riseup.net> Signed-off-by: Jakub Wilk <jwilk@jwilk.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201007055717.7319-1-jwilk@jwilk.net	2020-10-12 14:27:04 -07:00
Luigi Rizzo	b9682e291d	bpf, libbpf: Use valid btf in bpf_program__set_attach_target bpf_program__set_attach_target(prog, fd, ...) will always fail when fd = 0 (attach to a kernel symbol) because obj->btf_vmlinux is NULL and there is no way to set it (at the moment btf_vmlinux is meant to be temporary storage for use in bpf_object__load_xattr()). Fix this by using libbpf_find_vmlinux_btf_id(). At some point we may want to opportunistically cache btf_vmlinux so it can be reused with multiple programs. Signed-off-by: Luigi Rizzo <lrizzo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Petar Penkov <ppenkov@google.com> Link: https://lore.kernel.org/bpf/20201005224528.389097-1-lrizzo@google.com	2020-10-12 14:27:04 -07:00
Hangbin Liu	54fe2f1e26	libbpf: Check if pin_path was set even map fd exist Say a user reuse map fd after creating a map manually and set the pin_path, then load the object via libbpf. In libbpf bpf_object__create_maps(), bpf_object__reuse_map() will return 0 if there is no pinned map in map->pin_path. Then after checking if map fd exist, we should also check if pin_path was set and do bpf_map__pin() instead of continue the loop. Fix it by creating map if fd not exist and continue checking pin_path after that. Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201006021345.3817033-3-liuhangbin@gmail.com	2020-10-12 14:27:04 -07:00
Hangbin Liu	fd28e0130a	libbpf: Close map fd if init map slots failed Previously we forgot to close the map fd if bpf_map_update_elem() failed during map slot init, which will leak map fd. Let's move map slot initialization to new function init_map_slots() to simplify the code. And close the map fd if init slot failed. Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201006021345.3817033-2-liuhangbin@gmail.com	2020-10-12 14:27:04 -07:00
Hao Luo	f908087023	bpf: Introducte bpf_this_cpu_ptr() Add bpf_this_cpu_ptr() to help access percpu var on this cpu. This helper always returns a valid pointer, therefore no need to check returned value for NULL. Also note that all programs run with preemption disabled, which means that the returned pointer is stable during all the execution of the program. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200929235049.2533242-6-haoluo@google.com	2020-10-12 14:27:04 -07:00
Hao Luo	b3b297aa16	bpf: Introduce bpf_per_cpu_ptr() Add bpf_per_cpu_ptr() to help bpf programs access percpu vars. bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel except that it may return NULL. This happens when the cpu parameter is out of range. So the caller must check the returned value. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200929235049.2533242-5-haoluo@google.com	2020-10-12 14:27:04 -07:00
Hao Luo	6d0fcc3bd5	bpf/libbpf: BTF support for typed ksyms If a ksym is defined with a type, libbpf will try to find the ksym's btf information from kernel btf. If a valid btf entry for the ksym is found, libbpf can pass in the found btf id to the verifier, which validates the ksym's type and value. Typeless ksyms (i.e. those defined as 'void') will not have such btf_id, but it has the symbol's address (read from kallsyms) and its value is treated as a raw pointer. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200929235049.2533242-3-haoluo@google.com	2020-10-12 14:27:04 -07:00
Hao Luo	3706bf773b	bpf: Introduce pseudo_btf_id Pseudo_btf_id is a type of ld_imm insn that associates a btf_id to a ksym so that further dereferences on the ksym can use the BTF info to validate accesses. Internally, when seeing a pseudo_btf_id ld insn, the verifier reads the btf_id stored in the insn[0]'s imm field and marks the dst_reg as PTR_TO_BTF_ID. The btf_id points to a VAR_KIND, which is encoded in btf_vminux by pahole. If the VAR is not of a struct type, the dst reg will be marked as PTR_TO_MEM instead of PTR_TO_BTF_ID and the mem_size is resolved to the size of the VAR's type. >From the VAR btf_id, the verifier can also read the address of the ksym's corresponding kernel var from kallsyms and use that to fill dst_reg. Therefore, the proper functionality of pseudo_btf_id depends on (1) kallsyms and (2) the encoding of kernel global VARs in pahole, which should be available since pahole v1.18. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200929235049.2533242-2-haoluo@google.com	2020-10-12 14:27:04 -07:00
Song Liu	09718f4ecd	bpf: Introduce BPF_F_PRESERVE_ELEMS for perf event array Currently, perf event in perf event array is removed from the array when the map fd used to add the event is closed. This behavior makes it difficult to the share perf events with perf event array. Introduce perf event map that keeps the perf event open with a new flag BPF_F_PRESERVE_ELEMS. With this flag set, perf events in the array are not removed when the original map fd is closed. Instead, the perf event will stay in the map until 1) it is explicitly removed from the array; or 2) the array is freed. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200930224927.1936644-2-songliubraving@fb.com	2020-10-12 14:27:04 -07:00
Andrii Nakryiko	8205f37a56	sync: ignore libc_compat.h Libbpf doesn't rely on libc_compat.h anymore, so ignore it for the purposes of syncing libbpf sources into Github. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2020-10-12 12:18:53 -07:00
Andrii Nakryiko	ecbd504994	makefile: add quiet mode support Add quiet-by-default mode to Makefile, similar to libbpf Makefile in Linux repo. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2020-10-11 00:39:03 -07:00
Andrii Nakryiko	b6dd2f2b7d	vmtests: un-blacklist fixed selftests Signed-off-by: Andrii Nakryiko <andriin@fb.com>	2020-09-30 18:19:55 -07:00
Andrii Nakryiko	a132697261	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: b0efc216f577997bf563d76d51673ed79c3d5f71 Checkpoint bpf-next commit: f4d385e4d51d035c7f0d68a3e9564c9453c13aa4 Baseline bpf commit: 9cf51446e68607136e42a4e531a30c888c472463 Checkpoint bpf commit: 9cf51446e68607136e42a4e531a30c888c472463 Andrii Nakryiko (1): libbpf: Make btf_dump work with modifiable BTF Daniel Borkmann (3): bpf: Add classid helper only based on skb->sk bpf: Add redirect_neigh helper as redirect drop-in bpf, libbpf: Add bpf_tail_call_static helper for bpf programs include/uapi/linux/bpf.h \| 24 ++++++++++++++ src/bpf_helpers.h \| 46 +++++++++++++++++++++++++++ src/btf.c \| 17 ++++++++++ src/btf_dump.c \| 69 +++++++++++++++++++++++++++------------- src/libbpf_internal.h \| 1 + 5 files changed, 135 insertions(+), 22 deletions(-) -- 2.24.1	2020-09-30 18:19:55 -07:00
Andrii Nakryiko	2d0aa12ea3	sync: auto-generate latest BPF helpers Latest changes to BPF helper definitions.	2020-09-30 18:19:55 -07:00
Andrii Nakryiko	317ef1c295	libbpf: Make btf_dump work with modifiable BTF Ensure that btf_dump can accommodate new BTF types being appended to BTF instance after struct btf_dump was created. This came up during attemp to use btf_dump for raw type dumping in selftests, but given changes are not excessive, it's good to not have any gotchas in API usage, so I decided to support such use case in general. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200929232843.1249318-2-andriin@fb.com	2020-09-30 18:19:55 -07:00
Daniel Borkmann	80c7838600	bpf, libbpf: Add bpf_tail_call_static helper for bpf programs Port of tail_call_static() helper function from Cilium's BPF code base [0] to libbpf, so others can easily consume it as well. We've been using this in production code for some time now. The main idea is that we guarantee that the kernel's BPF infrastructure and JIT (here: x86_64) can patch the JITed BPF insns with direct jumps instead of having to fall back to using expensive retpolines. By using inline asm, we guarantee that the compiler won't merge the call from different paths with potentially different content of r2/r3. We're also using Cilium's __throw_build_bug() macro (here as: __bpf_unreachable()) in different places as a neat trick to trigger compilation errors when compiler does not remove code at compilation time. This works for the BPF back end as it does not implement the __builtin_trap(). [0] `f5537c2602` Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/1656a082e077552eb46642d513b4a6bde9a7dd01.1601477936.git.daniel@iogearbox.net	2020-09-30 18:19:55 -07:00
Daniel Borkmann	750801a0d5	bpf: Add redirect_neigh helper as redirect drop-in Add a redirect_neigh() helper as redirect() drop-in replacement for the xmit side. Main idea for the helper is to be very similar in semantics to the latter just that the skb gets injected into the neighboring subsystem in order to let the stack do the work it knows best anyway to populate the L2 addresses of the packet and then hand over to dev_queue_xmit() as redirect() does. This solves two bigger items: i) skbs don't need to go up to the stack on the host facing veth ingress side for traffic egressing the container to achieve the same for populating L2 which also has the huge advantage that ii) the skb->sk won't get orphaned in ip_rcv_core() when entering the IP routing layer on the host stack. Given that skb->sk neither gets orphaned when crossing the netns as per 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb.") the helper can then push the skbs directly to the phys device where FQ scheduler can do its work and TCP stack gets proper backpressure given we hold on to skb->sk as long as skb is still residing in queues. With the helper used in BPF data path to then push the skb to the phys device, I observed a stable/consistent TCP_STREAM improvement on veth devices for traffic going container -> host -> host -> container from ~10Gbps to ~15Gbps for a single stream in my test environment. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net	2020-09-30 18:19:55 -07:00
Daniel Borkmann	b5fd4c774d	bpf: Add classid helper only based on skb->sk Similarly to 5a52ae4e32a6 ("bpf: Allow to retrieve cgroup v1 classid from v2 hooks"), add a helper to retrieve cgroup v1 classid solely based on the skb->sk, so it can be used as key as part of BPF map lookups out of tc from host ns, in particular given the skb->sk is retained these days when crossing net ns thanks to 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb."). This is similar to bpf_skb_cgroup_id() which implements the same for v2. Kubernetes ecosystem is still operating on v1 however, hence net_cls needs to be used there until this can be dropped in with the v2 helper of bpf_skb_cgroup_id(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/ed633cf27a1c620e901c5aa99ebdefb028dce600.1601477936.git.daniel@iogearbox.net	2020-09-30 18:19:55 -07:00
Vladimír Čunát	5a10cd2060	remove internal reallocarray() ... as it's covered by libbpf_reallocarray() since commit `dc70da9c70`.	2020-09-30 12:55:50 -07:00
Andrii Nakryiko	ff797cc905	vmtests: blacklist new tests for 5.5 Blacklist new tests that are depending on features in latest kernel. Also temporarily blacklist raw_tp_test_run test, until it is fixed upstream. Signed-off-by: Andrii Nakryiko <andriin@fb.com>	2020-09-29 18:29:49 -07:00
Andrii Nakryiko	21ea184818	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: 2f7de9865ba3cbfcf8b504f07154fdb6124176a4 Checkpoint bpf-next commit: b0efc216f577997bf563d76d51673ed79c3d5f71 Baseline bpf commit: 87f92ac4c12758c4da3bbe4393f1d884b610b8a6 Checkpoint bpf commit: 9cf51446e68607136e42a4e531a30c888c472463 Alan Maguire (2): bpf: Add bpf_snprintf_btf helper bpf: Add bpf_seq_printf_btf helper Andrii Nakryiko (11): libbpf: Refactor internals of BTF type index libbpf: Remove assumption of single contiguous memory for BTF data libbpf: Generalize common logic for managing dynamically-sized arrays libbpf: Extract generic string hashing function for reuse libbpf: Allow modification of BTF and add btf__add_str API libbpf: Add btf__new_empty() to create an empty BTF object libbpf: Add BTF writing APIs libbpf: Add btf__str_by_offset() as a more generic variant of name_by_offset selftests/bpf: Test BTF writing APIs libbpf: Support BTF loading and raw data output in both endianness libbpf: Fix uninitialized variable in btf_parse_type_sec Martin KaFai Lau (4): bpf: Change bpf_sk_release and bpf_sk_cgroup_id to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON bpf: Change bpf_sk_storage_() to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON bpf: Change bpf_tcp_*_syncookie to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON bpf: Change bpf_sk_assign to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON Song Liu (3): bpf: Fix comment for helper bpf_current_task_under_cgroup() bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint libbpf: Support test run of raw tracepoint programs Toke Høiland-Jørgensen (2): bpf: Support attaching freplace programs to multiple attach points libbpf: Add support for freplace attachment in bpf_link_create YiFei Zhu (2): bpf: Add BPF_PROG_BIND_MAP syscall libbpf: Add BPF_PROG_BIND_MAP syscall and use it on .rodata section Yonghong Song (1): libbpf: Fix a compilation error with xsk.c for ubuntu 16.04 include/uapi/linux/bpf.h \| 118 ++- src/bpf.c \| 67 +- src/bpf.h \| 39 +- src/btf.c \| 1851 ++++++++++++++++++++++++++++++++------ src/btf.h \| 51 ++ src/btf_dump.c \| 9 +- src/hashmap.h \| 12 + src/libbpf.c \| 113 ++- src/libbpf.h \| 3 + src/libbpf.map \| 28 + src/libbpf_internal.h \| 8 + src/xsk.c \| 1 + 12 files changed, 1997 insertions(+), 303 deletions(-) -- 2.24.1	2020-09-29 18:29:49 -07:00
Andrii Nakryiko	760f71ec87	sync: auto-generate latest BPF helpers Latest changes to BPF helper definitions.	2020-09-29 18:29:49 -07:00
Andrii Nakryiko	91e666c94c	libbpf: Fix uninitialized variable in btf_parse_type_sec Fix obvious unitialized variable use that wasn't reported by compiler. libbpf Makefile changes to catch such errors are added separately. Fixes: 3289959b97ca ("libbpf: Support BTF loading and raw data output in both endianness") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200929220604.833631-1-andriin@fb.com	2020-09-29 18:29:49 -07:00
Toke Høiland-Jørgensen	e40af4de0c	libbpf: Add support for freplace attachment in bpf_link_create This adds support for supplying a target btf ID for the bpf_link_create() operation, and adds a new bpf_program__attach_freplace() high-level API for attaching freplace functions with a target. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/160138355387.48470.18026176785351166890.stgit@toke.dk	2020-09-29 18:29:49 -07:00
Toke Høiland-Jørgensen	5e359219aa	bpf: Support attaching freplace programs to multiple attach points This enables support for attaching freplace programs to multiple attach points. It does this by amending the UAPI for bpf_link_Create with a target btf ID that can be used to supply the new attachment point along with the target program fd. The target must be compatible with the target that was supplied at program load time. The implementation reuses the checks that were factored out of check_attach_btf_id() to ensure compatibility between the BTF types of the old and new attachment. If these match, a new bpf_tracing_link will be created for the new attach target, allowing multiple attachments to co-exist simultaneously. The code could theoretically support multiple-attach of other types of tracing programs as well, but since I don't have a use case for any of those, there is no API support for doing so. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/160138355169.48470.17165680973640685368.stgit@toke.dk	2020-09-29 18:29:49 -07:00
Andrii Nakryiko	488110df60	libbpf: Support BTF loading and raw data output in both endianness Teach BTF to recognized wrong endianness and transparently convert it internally to host endianness. Original endianness of BTF will be preserved and used during btf__get_raw_data() to convert resulting raw data to the same endianness and a source raw_data. This means that little-endian host can parse big-endian BTF with no issues, all the type data will be presented to the client application in native endianness, but when it's time for emitting BTF to persist it in a file (e.g., after BTF deduplication), original non-native endianness will be preserved and stored. It's possible to query original endianness of BTF data with new btf__endianness() API. It's also possible to override desired output endianness with btf__set_endianness(), so that if application needs to load, say, big-endian BTF and store it as little-endian BTF, it's possible to manually override this. If btf__set_endianness() was used to change endianness, btf__endianness() will reflect overridden endianness. Given there are no known use cases for supporting cross-endianness for .BTF.ext, loading .BTF.ext in non-native endianness is not supported. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200929043046.1324350-3-andriin@fb.com	2020-09-29 18:29:49 -07:00
Andrii Nakryiko	f007a6bfdf	selftests/bpf: Test BTF writing APIs Add selftests for BTF writer APIs. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200929020533.711288-4-andriin@fb.com	2020-09-29 18:29:49 -07:00

1 2 3 4 5 ...

838 Commits