libbpf

mirror of https://github.com/netdata/libbpf.git synced 2026-05-08 08:29:11 +08:00

Author	SHA1	Message	Date
Andrii Nakryiko	6384ee1968	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: 2e80be60c465a4f8559327340eaf40845dd7797a Checkpoint bpf-next commit: 95cec14b0308085c028c4d4fb3d09fad3902b4c3 Baseline bpf commit: 7787b6fc938e16aa418613c4a765c1dbb268ed9f Checkpoint bpf commit: e6135df45e21f1815a5948f452593124b1544a3e Alexei Starovoitov (3): bpf: Introduce sleepable BPF programs bpf: Add bpf_copy_from_user() helper. libbpf: Support sleepable progs Andrii Nakryiko (7): libbpf: Ensure ELF symbols table is found before further ELF processing libbpf: Parse multi-function sections into multiple BPF programs libbpf: Support CO-RE relocations for multi-prog sections libbpf: Make RELO_CALL work for multi-prog sections and sub-program calls libbpf: Implement generalized .BTF.ext func/line info adjustment libbpf: Add multi-prog section support for struct_ops libbpf: Deprecate notion of BPF program "title" in favor of "section name" Magnus Karlsson (1): libbpf: Support shared umems between queues and devices Tony Ambardar (1): libbpf: Fix build failure from uninitialized variable warning Yonghong Song (1): bpf: Make bpf_link_info.iter similar to bpf_iter_link_info include/uapi/linux/bpf.h \| 22 +- src/btf.h \| 18 +- src/libbpf.c \| 1314 +++++++++++++++++++++++++------------- src/libbpf.h \| 5 +- src/libbpf.map \| 2 + src/libbpf_common.h \| 2 + src/xsk.c \| 376 +++++++---- src/xsk.h \| 9 + 8 files changed, 1156 insertions(+), 592 deletions(-) -- 2.24.1	2020-09-03 21:21:34 -07:00
Andrii Nakryiko	3f9447bf92	sync: auto-generate latest BPF helpers Latest changes to BPF helper definitions.	2020-09-03 21:21:34 -07:00
Tony Ambardar	3b80b6c77e	libbpf: Fix build failure from uninitialized variable warning While compiling libbpf, some GCC versions (at least 8.4.0) have difficulty determining control flow and a emit warning for potentially uninitialized usage of 'map', which results in a build error if using "-Werror": In file included from libbpf.c:56: libbpf.c: In function '__bpf_object__open': libbpf_internal.h:59:2: warning: 'map' may be used uninitialized in this function [-Wmaybe-uninitialized] libbpf_print(level, "libbpf: " fmt, ##__VA_ARGS__); \ ^~~~~~~~~~~~ libbpf.c:5032:18: note: 'map' was declared here struct bpf_map map, targ_map; ^~~ The warning/error is false based on code inspection, so silence it with a NULL initialization. Fixes: 646f02ffdd49 ("libbpf: Add BTF-defined map-in-map support") Reference: 063e68813391 ("libbpf: Fix false uninitialized variable warning") Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200831000304.1696435-1-Tony.Ambardar@gmail.com	2020-09-03 21:21:34 -07:00
Andrii Nakryiko	78cdb58bdf	libbpf: Deprecate notion of BPF program "title" in favor of "section name" BPF program title is ambigious and misleading term. It is ELF section name, so let's just call it that and deprecate bpf_program__title() API in favor of bpf_program__section_name(). Additionally, using bpf_object__find_program_by_title() is now inherently dangerous and ambiguous, as multiple BPF program can have the same section name. So deprecate this API as well and recommend to switch to non-ambiguous bpf_object__find_program_by_name(). Internally, clean up usage and mis-usage of BPF program section name for denoting BPF program name. Shorten the field name to prog->sec_name to be consistent with all other prog->sec_* variables. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200903203542.15944-11-andriin@fb.com	2020-09-03 21:21:34 -07:00
Andrii Nakryiko	4b60f82516	libbpf: Add multi-prog section support for struct_ops Adjust struct_ops handling code to work with multi-program ELF sections properly. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200903203542.15944-7-andriin@fb.com	2020-09-03 21:21:34 -07:00
Andrii Nakryiko	2b28b4fa4d	libbpf: Implement generalized .BTF.ext func/line info adjustment Complete multi-prog sections and multi sub-prog support in libbpf by properly adjusting .BTF.ext's line and function information. Mark exposed btf_ext__reloc_func_info() and btf_ext__reloc_func_info() APIs as deprecated. These APIs have simplistic assumption that all sub-programs are going to be appended to all main BPF programs, which doesn't hold in real life. It's unlikely there are any users of this API, as it's very libbpf internals-specific. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200903203542.15944-6-andriin@fb.com	2020-09-03 21:21:34 -07:00
Andrii Nakryiko	448789ba27	libbpf: Make RELO_CALL work for multi-prog sections and sub-program calls This patch implements general and correct logic for bpf-to-bpf sub-program calls. Only sub-programs used (called into) from entry-point (main) BPF program are going to be appended at the end of main BPF program. This ensures that BPF verifier won't encounter any dead code due to copying unreferenced sub-program. This change means that each entry-point (main) BPF program might have a different set of sub-programs appended to it and potentially in different order. This has implications on how sub-program call relocations need to be handled, described below. All relocations are now split into two categores: data references (maps and global variables) and code references (sub-program calls). This distinction is important because data references need to be relocated just once per each BPF program and sub-program. These relocation are agnostic to instruction locations, because they are not code-relative and they are relocating against static targets (maps, variables with fixes offsets, etc). Sub-program RELO_CALL relocations, on the other hand, are highly-dependent on code position, because they are recorded as instruction-relative offset. So BPF sub-programs (those that do calls into other sub-programs) can't be relocated once, they need to be relocated each time such a sub-program is appended at the end of the main entry-point BPF program. As mentioned above, each main BPF program might have different subset and differen order of sub-programs, so call relocations can't be done just once. Splitting data reference and calls relocations as described above allows to do this efficiently and cleanly. bpf_object__find_program_by_name() will now ignore non-entry BPF programs. Previously one could have looked up '.text' fake BPF program, but the existence of such BPF program was always an implementation detail and you can't do much useful with it. Now, though, all non-entry sub-programs get their own BPF program with name corresponding to a function name, so there is no more '.text' name for BPF program. This means there is no regression, effectively, w.r.t. API behavior. But this is important aspect to highlight, because it's going to be critical once libbpf implements static linking of BPF programs. Non-entry static BPF programs will be allowed to have conflicting names, but global and main-entry BPF program names should be unique. Just like with normal user-space linking process. So it's important to restrict this aspect right now, keep static and non-entry functions as internal implementation details, and not have to deal with regressions in behavior later. This patch leaves .BTF.ext adjustment as is until next patch. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200903203542.15944-5-andriin@fb.com	2020-09-03 21:21:34 -07:00
Andrii Nakryiko	a3abae5122	libbpf: Support CO-RE relocations for multi-prog sections Fix up CO-RE relocation code to handle relocations against ELF sections containing multiple BPF programs. This requires lookup of a BPF program by its section name and instruction index it contains. While it could have been done as a simple loop, it could run into performance issues pretty quickly, as number of CO-RE relocations can be quite large in real-world applications, and each CO-RE relocation incurs BPF program look up now. So instead of simple loop, implement a binary search by section name + insn offset. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200903203542.15944-4-andriin@fb.com	2020-09-03 21:21:34 -07:00
Andrii Nakryiko	bb5e70706a	libbpf: Parse multi-function sections into multiple BPF programs Teach libbpf how to parse code sections into potentially multiple bpf_program instances, based on ELF FUNC symbols. Each BPF program will keep track of its position within containing ELF section for translating section instruction offsets into program instruction offsets: regardless of BPF program's location in ELF section, it's first instruction is always at local instruction offset 0, so when libbpf is working with relocations (which use section-based instruction offsets) this is critical to make proper translations. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200903203542.15944-3-andriin@fb.com	2020-09-03 21:21:34 -07:00
Andrii Nakryiko	994aae7fc8	libbpf: Ensure ELF symbols table is found before further ELF processing libbpf ELF parsing logic might need symbols available before ELF parsing is completed, so we need to make sure that symbols table section is found in a separate pass before all the subsequent sections are processed. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200903203542.15944-2-andriin@fb.com	2020-09-03 21:21:34 -07:00
Magnus Karlsson	a6e9cf1532	libbpf: Support shared umems between queues and devices Add support for shared umems between hardware queues and devices to the AF_XDP part of libbpf. This so that zero-copy can be achieved in applications that want to send and receive packets between HW queues on one device or between different devices/netdevs. In order to create sockets that share a umem between hardware queues and devices, a new function has been added called xsk_socket__create_shared(). It takes the same arguments as xsk_socket_create() plus references to a fill ring and a completion ring. So for every socket that share a umem, you need to have one more set of fill and completion rings. This in order to maintain the single-producer single-consumer semantics of the rings. You can create all the sockets via the new xsk_socket__create_shared() call, or create the first one with xsk_socket__create() and the rest with xsk_socket__create_shared(). Both methods work. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-14-git-send-email-magnus.karlsson@intel.com	2020-09-03 21:21:34 -07:00
Alexei Starovoitov	06ae1b0e38	libbpf: Support sleepable progs Pass request to load program as sleepable via ".s" suffix in the section name. If it happens in the future that all map types and helpers are allowed with BPF_F_SLEEPABLE flag "fmod_ret/" and "lsm/" can be aliased to "fmod_ret.s/" and "lsm.s/" to make all lsm and fmod_ret programs sleepable by default. The fentry and fexit programs would always need to have sleepable vs non-sleepable distinction, since not all fentry/fexit progs will be attached to sleepable kernel functions. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: KP Singh <kpsingh@google.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200827220114.69225-5-alexei.starovoitov@gmail.com	2020-09-03 21:21:34 -07:00
Alexei Starovoitov	b228eb84f1	bpf: Add bpf_copy_from_user() helper. Sleepable BPF programs can now use copy_from_user() to access user memory. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200827220114.69225-4-alexei.starovoitov@gmail.com	2020-09-03 21:21:34 -07:00
Alexei Starovoitov	5bd7cae11d	bpf: Introduce sleepable BPF programs Introduce sleepable BPF programs that can request such property for themselves via BPF_F_SLEEPABLE flag at program load time. In such case they will be able to use helpers like bpf_copy_from_user() that might sleep. At present only fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only when they are attached to kernel functions that are known to allow sleeping. The non-sleepable programs are relying on implicit rcu_read_lock() and migrate_disable() to protect life time of programs, maps that they use and per-cpu kernel structures used to pass info between bpf programs and the kernel. The sleepable programs cannot be enclosed into rcu_read_lock(). migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs should not be enclosed in migrate_disable() as well. Therefore rcu_read_lock_trace is used to protect the life time of sleepable progs. There are many networking and tracing program types. In many cases the 'struct bpf_prog *' pointer itself is rcu protected within some other kernel data structure and the kernel code is using rcu_dereference() to load that program pointer and call BPF_PROG_RUN() on it. All these cases are not touched. Instead sleepable bpf programs are allowed with bpf trampoline only. The program pointers are hard-coded into generated assembly of bpf trampoline and synchronize_rcu_tasks_trace() is used to protect the life time of the program. The same trampoline can hold both sleepable and non-sleepable progs. When rcu_read_lock_trace is held it means that some sleepable bpf program is running from bpf trampoline. Those programs can use bpf arrays and preallocated hash/lru maps. These map types are waiting on programs to complete via synchronize_rcu_tasks_trace(); Updates to trampoline now has to do synchronize_rcu_tasks_trace() and synchronize_rcu_tasks() to wait for sleepable progs to finish and for trampoline assembly to finish. This is the first step of introducing sleepable progs. Eventually dynamically allocated hash maps can be allowed and networking program types can become sleepable too. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200827220114.69225-3-alexei.starovoitov@gmail.com	2020-09-03 21:21:34 -07:00
Yonghong Song	a454a08f53	bpf: Make bpf_link_info.iter similar to bpf_iter_link_info bpf_link_info.iter is used by link_query to return bpf_iter_link_info to user space. Fields may be different, e.g., map_fd vs. map_id, so we cannot reuse the exact structure. But make them similar, e.g., struct bpf_link_info { /* common fields / union { struct { ... } raw_tracepoint; struct { ... } tracing; ... struct { / common fields for iter / union { struct { __u32 map_id; } map; / other structs for other targets */ }; }; }; }; so the structure is extensible the same way as bpf_iter_link_info. Fixes: 6b0a249a301e ("bpf: Implement link_query for bpf iterators") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200828051922.758950-1-yhs@fb.com	2020-09-03 21:21:34 -07:00
Andrii Nakryiko	829e50fc15	sync: improve sync script to handle common issues Few recurring issues are fixed. 1. When there are patches in bpf tree that hasn't been synced yet, but bpf was already merged into bpf-next, merged patches would be applied twice, causing failures, requiring manual resolution. Now this is handled smarter and shouldn't happen. 2. When synced libbpf repo contains fixes from bpf that weren't yet merged into bpf-next, those bpf tree changes would cause inconsistency against bpf-next tree state. That's expected and usually is pretty easy for human to discard during consistency check, but is hard for automation. So instead of failing at the very end, ask human whether discrepancies look good. 3. If sync script detected no new patches needed syncing, it previously didn't restore linux repo state back. Fixed. Signed-off-by: Andrii Nakryiko <andriin@fb.com>	2020-09-03 20:14:51 -07:00
Andrii Nakryiko	66780a46cb	README.md: update Travis CI badge link Update Travis CI status badge to point to travis-ci.com, now that libbpf was migrated there.	2020-08-27 10:15:29 -07:00
Andrii Nakryiko	7bc52e6602	vmtests: blacklist 2 new feature tests and (temporarily) 3 existing selftest Permanently blacklist 2 new selftest on 5.5 and temporarily blacklist 3 existing selftests. Signed-off-by: Andrii Nakryiko <andriin@fb.com>	2020-08-26 23:30:55 -07:00
Andrii Nakryiko	7267270f5f	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: 0fcdfffe80346d015b920228203d0269284d8b13 Checkpoint bpf-next commit: 2e80be60c465a4f8559327340eaf40845dd7797a Baseline bpf commit: 7787b6fc938e16aa418613c4a765c1dbb268ed9f Checkpoint bpf commit: 7787b6fc938e16aa418613c4a765c1dbb268ed9f Alex Gartrell (1): libbpf: Fix unintentional success return code in bpf_object__load Andrii Nakryiko (1): libbpf: Fix compilation warnings for 64-bit printf args Jiri Olsa (1): bpf: Add d_path helper KP Singh (3): bpf: Generalize bpf_sk_storage bpf: Implement bpf_local_storage for inodes bpf: Allow local storage to be used from LSM programs include/uapi/linux/bpf.h \| 69 +++++++++++++++++++++++++++++++++++++--- src/libbpf.c \| 10 +++--- src/libbpf_probes.c \| 5 +-- 3 files changed, 73 insertions(+), 11 deletions(-) -- 2.24.1	2020-08-26 23:30:55 -07:00
Andrii Nakryiko	b16bc44bd3	sync: auto-generate latest BPF helpers Latest changes to BPF helper definitions.	2020-08-26 23:30:55 -07:00
Andrii Nakryiko	4cdad1b34b	libbpf: Fix compilation warnings for 64-bit printf args Fix compilation warnings due to __u64 defined differently as `unsigned long` or `unsigned long long` on different architectures (e.g., ppc64le differs from x86-64). Also cast one argument to size_t to fix printf warning of similar nature. Fixes: eacaaed784e2 ("libbpf: Implement enum value-based CO-RE relocations") Fixes: 50e09460d9f8 ("libbpf: Skip well-known ELF sections when iterating ELF") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200827041109.3613090-1-andriin@fb.com	2020-08-26 23:30:55 -07:00
Alex Gartrell	f557d9e1fc	libbpf: Fix unintentional success return code in bpf_object__load There are code paths where EINVAL is returned directly without setting errno. In that case, errno could be 0, which would mask the failure. For example, if a careless programmer set log_level to 10000 out of laziness, they would have to spend a long time trying to figure out why. Fixes: 4f33ddb4e3e2 ("libbpf: Propagate EPERM to caller on program load") Signed-off-by: Alex Gartrell <alexgartrell@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200826075549.1858580-1-alexgartrell@gmail.com	2020-08-26 23:30:55 -07:00
Jiri Olsa	e82da07e2d	bpf: Add d_path helper Adding d_path helper function that returns full path for given 'struct path' object, which needs to be the kernel BTF 'path' object. The path is returned in buffer provided 'buf' of size 'sz' and is zero terminated. bpf_d_path(&file->f_path, buf, size); The helper calls directly d_path function, so there's only limited set of function it can be called from. Adding just very modest set for the start. Updating also bpf.h tools uapi header and adding 'path' to bpf_helpers_doc.py script. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-11-jolsa@kernel.org	2020-08-26 23:30:55 -07:00
KP Singh	c42c140954	bpf: Allow local storage to be used from LSM programs Adds support for both bpf_{sk, inode}_storage_{get, delete} to be used in LSM programs. These helpers are not used for tracing programs (currently) as their usage is tied to the life-cycle of the object and should only be used where the owning object won't be freed (when the owning object is passed as an argument to the LSM hook). Thus, they are safer to use in LSM hooks than tracing. Usage of local storage in tracing programs will probably follow a per function based whitelist approach. Since the UAPI helper signature for bpf_sk_storage expect a bpf_sock, it, leads to a compilation warning for LSM programs, it's also updated to accept a void * pointer instead. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200825182919.1118197-7-kpsingh@chromium.org	2020-08-26 23:30:55 -07:00
KP Singh	e565f2bfe9	bpf: Implement bpf_local_storage for inodes Similar to bpf_local_storage for sockets, add local storage for inodes. The life-cycle of storage is managed with the life-cycle of the inode. i.e. the storage is destroyed along with the owning inode. The BPF LSM allocates an __rcu pointer to the bpf_local_storage in the security blob which are now stackable and can co-exist with other LSMs. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200825182919.1118197-6-kpsingh@chromium.org	2020-08-26 23:30:55 -07:00
KP Singh	2bd0d158d4	bpf: Generalize bpf_sk_storage Refactor the functionality in bpf_sk_storage.c so that concept of storage linked to kernel objects can be extended to other objects like inode, task_struct etc. Each new local storage will still be a separate map and provide its own set of helpers. This allows for future object specific extensions and still share a lot of the underlying implementation. This includes the changes suggested by Martin in: https://lore.kernel.org/bpf/20200725013047.4006241-1-kafai@fb.com/ adding new map operations to support bpf_local_storage maps: * storages for different kernel objects to optionally have different memory charging strategy (map_local_storage_charge, map_local_storage_uncharge) * Functionality to extract the storage pointer from a pointer to the owning object (map_owner_storage_ptr) Co-developed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200825182919.1118197-4-kpsingh@chromium.org	2020-08-26 23:30:55 -07:00
Andrii Nakryiko	bbe442da7a	sync: allow 3-way merge for patching to simplify manual conflict resolution Allowing --3way leaves conflicts in the local files, which makes manual conflict resolution so much easier. Signed-off-by: Andrii Nakryiko <andriin@fb.com>	2020-08-25 00:53:18 -07:00
Andrii Nakryiko	3f7b5b32b8	vmtests: blacklist tcp_hdr_options selftest for 5.5 Blacklist selftests for a new feature, not supported by 5.5 kernel. Signed-off-by: Andrii Nakryiko <andriin@fb.com>	2020-08-25 00:53:18 -07:00
Andrii Nakryiko	5a913e9401	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: dca5612f8eb9d0cf1dc254eb2adff1f16a588a7d Checkpoint bpf-next commit: 0fcdfffe80346d015b920228203d0269284d8b13 Baseline bpf commit: 4af7b32f84aa4cd60e39b355bc8a1eab6cd8d8a4 Checkpoint bpf commit: 7787b6fc938e16aa418613c4a765c1dbb268ed9f Andrii Nakryiko (6): libbpf: Factor out common ELF operations and improve logging libbpf: Add __noinline macro to bpf_helpers.h libbpf: Skip well-known ELF sections when iterating ELF libbpf: Normalize and improve logging across few functions libbpf: Avoid false unuinitialized variable warning in bpf_core_apply_relo libbpf: Fix type compatibility check copy-paste error Martin KaFai Lau (6): tcp: bpf: Add TCP_BPF_DELACK_MAX setsockopt tcp: bpf: Add TCP_BPF_RTO_MIN for bpf_setsockopt bpf: tcp: Add bpf_skops_parse_hdr() bpf: tcp: Add bpf_skops_hdr_opt_len() and bpf_skops_write_hdr_opt() bpf: tcp: Allow bpf prog to write and parse TCP header option tcp: bpf: Optionally store mac header in TCP_SAVE_SYN include/uapi/linux/bpf.h \| 306 ++++++++++++++++++++++- src/bpf_helpers.h \| 3 + src/libbpf.c \| 525 +++++++++++++++++++++++---------------- 3 files changed, 623 insertions(+), 211 deletions(-) -- 2.24.1	2020-08-25 00:53:18 -07:00
Andrii Nakryiko	cead23ac75	sync: auto-generate latest BPF helpers Latest changes to BPF helper definitions.	2020-08-25 00:53:18 -07:00
Andrii Nakryiko	66091d267c	libbpf: Fix type compatibility check copy-paste error Fix copy-paste error in types compatibility check. Local type is accidentally used instead of target type for the very first type check strictness check. This can result in potentially less strict candidate comparison. Fix the error. Fixes: 3fc32f40c402 ("libbpf: Implement type-based CO-RE relocations support") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200821225653.2180782-1-andriin@fb.com	2020-08-25 00:53:18 -07:00
Andrii Nakryiko	2819b00b74	libbpf: Avoid false unuinitialized variable warning in bpf_core_apply_relo Some versions of GCC report uninitialized targ_spec usage. GCC is wrong, but let's avoid unnecessary warnings. Fixes: ddc7c3042614 ("libbpf: implement BPF CO-RE offset relocation algorithm") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200821225556.2178419-1-andriin@fb.com	2020-08-25 00:53:18 -07:00
Martin KaFai Lau	cb4d6d6f1a	tcp: bpf: Optionally store mac header in TCP_SAVE_SYN This patch is adapted from Eric's patch in an earlier discussion [1]. The TCP_SAVE_SYN currently only stores the network header and tcp header. This patch allows it to optionally store the mac header also if the setsockopt's optval is 2. It requires one more bit for the "save_syn" bit field in tcp_sock. This patch achieves this by moving the syn_smc bit next to the is_mptcp. The syn_smc is currently used with the TCP experimental option. Since syn_smc is only used when CONFIG_SMC is enabled, this patch also puts the "IS_ENABLED(CONFIG_SMC)" around it like the is_mptcp did with "IS_ENABLED(CONFIG_MPTCP)". The mac_hdrlen is also stored in the "struct saved_syn" to allow a quick offset from the bpf prog if it chooses to start getting from the network header or the tcp header. [1]: https://lore.kernel.org/netdev/CANn89iLJNWh6bkH7DNhy_kmcAexuUCccqERqe7z2QsvPhGrYPQ@mail.gmail.com/ Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20200820190123.2886935-1-kafai@fb.com	2020-08-25 00:53:18 -07:00
Martin KaFai Lau	4f160ed607	bpf: tcp: Allow bpf prog to write and parse TCP header option [ Note: The TCP changes here is mainly to implement the bpf pieces into the bpf_skops_() functions introduced in the earlier patches. ] The earlier effort in BPF-TCP-CC allows the TCP Congestion Control algorithm to be written in BPF. It opens up opportunities to allow a faster turnaround time in testing/releasing new congestion control ideas to production environment. The same flexibility can be extended to writing TCP header option. It is not uncommon that people want to test new TCP header option to improve the TCP performance. Another use case is for data-center that has a more controlled environment and has more flexibility in putting header options for internal only use. For example, we want to test the idea in putting maximum delay ACK in TCP header option which is similar to a draft RFC proposal [1]. This patch introduces the necessary BPF API and use them in the TCP stack to allow BPF_PROG_TYPE_SOCK_OPS program to parse and write TCP header options. It currently supports most of the TCP packet except RST. Supported TCP header option: ─────────────────────────── This patch allows the bpf-prog to write any option kind. Different bpf-progs can write its own option by calling the new helper bpf_store_hdr_opt(). The helper will ensure there is no duplicated option in the header. By allowing bpf-prog to write any option kind, this gives a lot of flexibility to the bpf-prog. Different bpf-prog can write its own option kind. It could also allow the bpf-prog to support a recently standardized option on an older kernel. Sockops Callback Flags: ────────────────────── The bpf program will only be called to parse/write tcp header option if the following newly added callback flags are enabled in tp->bpf_sock_ops_cb_flags: BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG A few words on the PARSE CB flags. When the above PARSE CB flags are turned on, the bpf-prog will be called on packets received at a sk that has at least reached the ESTABLISHED state. The parsing of the SYN-SYNACK-ACK will be discussed in the "3 Way HandShake" section. The default is off for all of the above new CB flags, i.e. the bpf prog will not be called to parse or write bpf hdr option. There are details comment on these new cb flags in the UAPI bpf.h. sock_ops->skb_data and bpf_load_hdr_opt() ───────────────────────────────────────── sock_ops->skb_data and sock_ops->skb_data_end covers the whole TCP header and its options. They are read only. The new bpf_load_hdr_opt() helps to read a particular option "kind" from the skb_data. Please refer to the comment in UAPI bpf.h. It has details on what skb_data contains under different sock_ops->op. 3 Way HandShake ─────────────── The bpf-prog can learn if it is sending SYN or SYNACK by reading the sock_ops->skb_tcp_flags. Passive side When writing SYNACK (i.e. sock_ops->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB), the received SYN skb will be available to the bpf prog. The bpf prog can use the SYN skb (which may carry the header option sent from the remote bpf prog) to decide what bpf header option should be written to the outgoing SYNACK skb. The SYN packet can be obtained by getsockopt(TCP_BPF_SYN). More on this later. Also, the bpf prog can learn if it is in syncookie mode (by checking sock_ops->args[0] == BPF_WRITE_HDR_TCP_SYNACK_COOKIE). The bpf prog can store the received SYN pkt by using the existing bpf_setsockopt(TCP_SAVE_SYN). The example in a later patch does it. [ Note that the fullsock here is a listen sk, bpf_sk_storage is not very useful here since the listen sk will be shared by many concurrent connection requests. Extending bpf_sk_storage support to request_sock will add weight to the minisock and it is not necessary better than storing the whole ~100 bytes SYN pkt. ] When the connection is established, the bpf prog will be called in the existing PASSIVE_ESTABLISHED_CB callback. At that time, the bpf prog can get the header option from the saved syn and then apply the needed operation to the newly established socket. The later patch will use the max delay ack specified in the SYN header and set the RTO of this newly established connection as an example. The received ACK (that concludes the 3WHS) will also be available to the bpf prog during PASSIVE_ESTABLISHED_CB through the sock_ops->skb_data. It could be useful in syncookie scenario. More on this later. There is an existing getsockopt "TCP_SAVED_SYN" to return the whole saved syn pkt which includes the IP[46] header and the TCP header. A few "TCP_BPF_SYN" getsockopt has been added to allow specifying where to start getting from, e.g. starting from TCP header, or from IP[46] header. The new getsockopt(TCP_BPF_SYN) will also know where it can get the SYN's packet from: - (a) the just received syn (available when the bpf prog is writing SYNACK) and it is the only way to get SYN during syncookie mode. or - (b) the saved syn (available in PASSIVE_ESTABLISHED_CB and also other existing CB). The bpf prog does not need to know where the SYN pkt is coming from. The getsockopt(TCP_BPF_SYN) will hide this details. Similarly, a flags "BPF_LOAD_HDR_OPT_TCP_SYN" is also added to bpf_load_hdr_opt() to read a particular header option from the SYN packet. * Fastopen Fastopen should work the same as the regular non fastopen case. This is a test in a later patch. * Syncookie For syncookie, the later example patch asks the active side's bpf prog to resend the header options in ACK. The server can use bpf_load_hdr_opt() to look at the options in this received ACK during PASSIVE_ESTABLISHED_CB. * Active side The bpf prog will get a chance to write the bpf header option in the SYN packet during WRITE_HDR_OPT_CB. The received SYNACK pkt will also be available to the bpf prog during the existing ACTIVE_ESTABLISHED_CB callback through the sock_ops->skb_data and bpf_load_hdr_opt(). * Turn off header CB flags after 3WHS If the bpf prog does not need to write/parse header options beyond the 3WHS, the bpf prog can clear the bpf_sock_ops_cb_flags to avoid being called for header options. Or the bpf-prog can select to leave the UNKNOWN_HDR_OPT_CB_FLAG on so that the kernel will only call it when there is option that the kernel cannot handle. [1]: draft-wang-tcpm-low-latency-opt-00 https://tools.ietf.org/html/draft-wang-tcpm-low-latency-opt-00 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820190104.2885895-1-kafai@fb.com	2020-08-25 00:53:18 -07:00
Martin KaFai Lau	647df00570	bpf: tcp: Add bpf_skops_hdr_opt_len() and bpf_skops_write_hdr_opt() The bpf prog needs to parse the SYN header to learn what options have been sent by the peer's bpf-prog before writing its options into SYNACK. This patch adds a "syn_skb" arg to tcp_make_synack() and send_synack(). This syn_skb will eventually be made available (as read-only) to the bpf prog. This will be the only SYN packet available to the bpf prog during syncookie. For other regular cases, the bpf prog can also use the saved_syn. When writing options, the bpf prog will first be called to tell the kernel its required number of bytes. It is done by the new bpf_skops_hdr_opt_len(). The bpf prog will only be called when the new BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG is set in tp->bpf_sock_ops_cb_flags. When the bpf prog returns, the kernel will know how many bytes are needed and then update the "remaining" arg accordingly. 4 byte alignment will be included in the "remaining" before this function returns. The 4 byte aligned number of bytes will also be stored into the opts->bpf_opt_len. "bpf_opt_len" is a newly added member to the struct tcp_out_options. Then the new bpf_skops_write_hdr_opt() will call the bpf prog to write the header options. The bpf prog is only called if it has reserved spaces before (opts->bpf_opt_len > 0). The bpf prog is the last one getting a chance to reserve header space and writing the header option. These two functions are half implemented to highlight the changes in TCP stack. The actual codes preparing the bpf running context and invoking the bpf prog will be added in the later patch with other necessary bpf pieces. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20200820190052.2885316-1-kafai@fb.com	2020-08-25 00:53:18 -07:00
Martin KaFai Lau	44fdfd8e6e	bpf: tcp: Add bpf_skops_parse_hdr() The patch adds a function bpf_skops_parse_hdr(). It will call the bpf prog to parse the TCP header received at a tcp_sock that has at least reached the ESTABLISHED state. For the packets received during the 3WHS (SYN, SYNACK and ACK), the received skb will be available to the bpf prog during the callback in bpf_skops_established() introduced in the previous patch and in the bpf_skops_write_hdr_opt() that will be added in the next patch. Calling bpf prog to parse header is controlled by two new flags in tp->bpf_sock_ops_cb_flags: BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG and BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG. When BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG is set, the bpf prog will only be called when there is unknown option in the TCP header. When BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG is set, the bpf prog will be called on all received TCP header. This function is half implemented to highlight the changes in TCP stack. The actual codes preparing the bpf running context and invoking the bpf prog will be added in the later patch with other necessary bpf pieces. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20200820190046.2885054-1-kafai@fb.com	2020-08-25 00:53:18 -07:00
Martin KaFai Lau	75d2adfe84	tcp: bpf: Add TCP_BPF_RTO_MIN for bpf_setsockopt This patch adds bpf_setsockopt(TCP_BPF_RTO_MIN) to allow bpf prog to set the min rto of a connection. It could be used together with the earlier patch which has added bpf_setsockopt(TCP_BPF_DELACK_MAX). A later selftest patch will communicate the max delay ack in a bpf tcp header option and then the receiving side can use bpf_setsockopt(TCP_BPF_RTO_MIN) to set a shorter rto. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200820190027.2884170-1-kafai@fb.com	2020-08-25 00:53:18 -07:00
Martin KaFai Lau	f0f75f36a7	tcp: bpf: Add TCP_BPF_DELACK_MAX setsockopt This change is mostly from an internal patch and adapts it from sysctl config to the bpf_setsockopt setup. The bpf_prog can set the max delay ack by using bpf_setsockopt(TCP_BPF_DELACK_MAX). This max delay ack can be communicated to its peer through bpf header option. The receiving peer can then use this max delay ack and set a potentially lower rto by using bpf_setsockopt(TCP_BPF_RTO_MIN) which will be introduced in the next patch. Another later selftest patch will also use it like the above to show how to write and parse bpf tcp header option. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200820190021.2884000-1-kafai@fb.com	2020-08-25 00:53:18 -07:00
Andrii Nakryiko	a8fa8b6eea	libbpf: Normalize and improve logging across few functions Make libbpf logs follow similar pattern and provide more context like section name or program name, where appropriate. Also, add BPF_INSN_SZ constant and use it throughout to clean up code a little bit. This commit doesn't have any functional changes and just removes some code changes out of the way before bigger refactoring in libbpf internals. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820231250.1293069-6-andriin@fb.com	2020-08-25 00:53:18 -07:00
Andrii Nakryiko	8a1acb7dfe	libbpf: Skip well-known ELF sections when iterating ELF Skip and don't log ELF sections that libbpf knows about and ignores during ELF processing. This allows to not unnecessarily log details about those ELF sections and cleans up libbpf debug log. Ignored sections include DWARF data, string table, empty .text section and few special (e.g., .llvm_addrsig) useless sections. With such ELF sections out of the way, log unrecognized ELF sections at pr_info level to increase visibility. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820231250.1293069-5-andriin@fb.com	2020-08-25 00:53:18 -07:00
Andrii Nakryiko	b6e179e67c	libbpf: Add __noinline macro to bpf_helpers.h __noinline is pretty frequently used, especially with BPF subprograms, so add them along the __always_inline, for user convenience and completeness. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820231250.1293069-4-andriin@fb.com	2020-08-25 00:53:18 -07:00
Andrii Nakryiko	d81d872279	libbpf: Factor out common ELF operations and improve logging Factor out common ELF operations done throughout the libbpf. This simplifies usage across multiple places in libbpf, as well as hide error reporting from higher-level functions and make error logging more consistent. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820231250.1293069-3-andriin@fb.com	2020-08-25 00:53:18 -07:00
Andrii Nakryiko	4001a658e0	vmtests: add log folding Sprinkle log folds around, including timing. Signed-off-by: Andrii Nakryiko <andriin@fb.com>	2020-08-22 00:57:32 -07:00
Andrii Nakryiko	dc1cd8503f	vmtests: use built-in BPF_PRELOAD_UMD=y config Modules might not be picked up properly in our qemu setup. Signed-off-by: Andrii Nakryiko <andriin@fb.com>	2020-08-21 19:06:11 -07:00
Andrii Nakryiko	9a3a42608d	vmtests: update latest.config Re-generate latest.config. Signed-off-by: Andrii Nakryiko <andriin@fb.com>	2020-08-21 18:22:15 -07:00
Andrii Nakryiko	63c78982c7	vmtests: harden fetching kernel sources Ensure that corrupted tar archive won't screw up build. Signed-off-by: Andrii Nakryiko <andriin@fb.com>	2020-08-21 18:22:15 -07:00
Andrii Nakryiko	28e26bdc3e	sync: add BPF_RAW_INSN macro Add BPF_RAW_INSNS macro used by libbpf. Signed-off-by: Andrii Nakryiko <andriin@fb.com>	2020-08-21 18:22:15 -07:00
Andrii Nakryiko	7297e38474	vmtests: add CONFIG_BPF_PRELOAD=y and CONFIG_BPF_PRELOAD_UMD=m Add new Kconfig values needed for selftests. Signed-off-by: Andrii Nakryiko <andriin@fb.com>	2020-08-21 18:22:15 -07:00
Andrii Nakryiko	a44116bb1f	sync: latest libbpf changes from kernel Syncing latest libbpf commits from kernel repository. Baseline bpf-next commit: 06a4ec1d9dc652e17ee3ac2ceb6c7cf6c2b75cdd Checkpoint bpf-next commit: dca5612f8eb9d0cf1dc254eb2adff1f16a588a7d Baseline bpf commit: 3fb1a96a91120877488071a167d26d76be4be977 Checkpoint bpf commit: 4af7b32f84aa4cd60e39b355bc8a1eab6cd8d8a4 Andrii Nakryiko (17): libbpf: Make kernel feature probing lazy libbpf: Factor out common logic of testing and closing FD libbpf: Sanitize BPF program code for bpf_probe_read_{kernel, user}[_str] libbpf: Switch tracing and CO-RE helper macros to bpf_probe_read_kernel() libbpf: Detect minimal BTF support and skip BTF loading, if missing libbpf: Improve error logging for mismatched BTF kind cases libbpf: Clean up and improve CO-RE reloc logging libbpf: Improve relocation ambiguity detection libbpf: Remove any use of reallocarray() in libbpf tools/bpftool: Remove libbpf_internal.h usage in bpftool libbpf: Centralize poisoning and poison reallocarray() tools: Remove feature-libelf-mmap feature detection libbpf: Implement type-based CO-RE relocations support libbpf: Implement enum value-based CO-RE relocations libbpf: Fix detection of BPF helper call instruction libbpf: Fix libbpf build on compilers missing __builtin_mul_overflow libbpf: Add perf_buffer APIs for better integration with outside epoll loop Tobias Klauser (1): bpf: Fix two typos in uapi/linux/bpf.h Toke Høiland-Jørgensen (1): libbpf: Fix map index used in error message Xu Wang (2): libbpf: Convert comma to semicolon libbpf: Simplify the return expression of build_map_pin_path() Yonghong Song (1): bpf: Implement link_query for bpf iterators include/uapi/linux/bpf.h \| 17 +- src/bpf.c \| 3 - src/bpf_core_read.h \| 120 +++- src/bpf_prog_linfo.c \| 3 - src/bpf_tracing.h \| 4 +- src/btf.c \| 31 +- src/btf.h \| 38 -- src/btf_dump.c \| 9 +- src/hashmap.c \| 3 + src/libbpf.c \| 1177 ++++++++++++++++++++++++++++---------- src/libbpf.h \| 4 + src/libbpf.map \| 8 + src/libbpf_internal.h \| 138 ++++- src/libbpf_probes.c \| 3 - src/netlink.c \| 128 +---- src/nlattr.c \| 9 +- src/ringbuf.c \| 8 +- src/xsk.c \| 3 - 18 files changed, 1149 insertions(+), 557 deletions(-) -- 2.24.1	2020-08-21 18:22:15 -07:00
Andrii Nakryiko	4069acb787	sync: auto-generate latest BPF helpers Latest changes to BPF helper definitions.	2020-08-21 18:22:15 -07:00

1 2 3 4 5 ...

754 Commits