A new relocation RELO_SUBPROG_ADDR is added to capture
subprog addresses loaded with ld_imm64 insns. Such ld_imm64
insns are marked with BPF_PSEUDO_FUNC and will be passed to
kernel. For bpf_for_each_map_elem() case, kernel will
check that the to-be-used subprog address must be a static
function and replace it with proper actual jited func address.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204930.3885367-1-yhs@fb.com
When libbpf initializes the kernel's struct_ops in
"bpf_map__init_kern_struct_ops()", it enforces all
pointer types must be a function pointer and rejects
others. It turns out to be too strict. For example,
when directly using "struct tcp_congestion_ops" from vmlinux.h,
it has a "struct module *owner" member and it is set to NULL
in a bpf_tcp_cc.o.
Instead, it only needs to ensure the member is a function
pointer if it has been set (relocated) to a bpf-prog.
This patch moves the "btf_is_func_proto(kern_mtype)" check
after the existing "if (!prog) { continue; }". The original debug
message in "if (!prog) { continue; }" is also removed since it is
no longer valid. Beside, there is a later debug message to tell
which function pointer is set.
The "btf_is_func_proto(mtype)" has already been guaranteed
in "bpf_object__collect_st_ops_relos()" which has been run
before "bpf_map__init_kern_struct_ops()". Thus, this check
is removed.
v2:
- Remove outdated debug message (Andrii)
Remove because there is a later debug message to tell
which function pointer is set.
- Following mtype->type is no longer needed. Remove:
"skip_mods_and_typedefs(btf, mtype->type, &mtype_id)"
- Do "if (!prog)" test before skip_mods_and_typedefs.
Fixes: 590a00888250 ("bpf: libbpf: Add STRUCT_OPS support")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212021030.266932-1-kafai@fb.com
For very large ELF objects (with many sections), we could
get special value SHN_XINDEX (65535) for elf object's string
table index - e_shstrndx.
Call elf_getshdrstrndx to get the proper string table index,
instead of reading it directly from ELF header.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210121202203.9346-4-jolsa@kernel.org
Empty BTFs do come up (e.g., simple kernel modules with no new types and
strings, compared to the vmlinux BTF) and there is nothing technically wrong
with them. So remove unnecessary check preventing loading empty BTFs.
Fixes: d8123624506c ("libbpf: Fix BTF data layout checks and allow empty BTF")
Reported-by: Christopher William Snowhill <chris@kode54.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210110070341.1380086-2-andrii@kernel.org
Add support for searching for ksym externs not just in vmlinux BTF, but across
all module BTFs, similarly to how it's done for CO-RE relocations. Kernels
that expose module BTFs through sysfs are assumed to support new ldimm64
instruction extension with BTF FD provided in insn[1].imm field, so no extra
feature detection is performed.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20210112075520.4103414-7-andrii@kernel.org
Add comments clarifying that USER variants of CO-RE reading macro are still
only going to work with kernel types, defined in kernel or kernel module BTF.
This should help preventing invalid use of those macro to read user-defined
types (which doesn't work with CO-RE).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210108194408.3468860-1-andrii@kernel.org
BPF_CORE_READ(), in addition to handling CO-RE relocations, also allows much
nicer way to read data structures with nested pointers. Instead of writing
a sequence of bpf_probe_read() calls to follow links, one can just write
BPF_CORE_READ(a, b, c, d) to effectively do a->b->c->d read. This is a welcome
ability when porting BCC code, which (in most cases) allows exactly the
intuitive a->b->c->d variant.
This patch adds non-CO-RE variants of BPF_CORE_READ() family of macros for
cases where CO-RE is not supported (e.g., old kernels). In such cases, the
property of shortening a sequence of bpf_probe_read()s to a simple
BPF_PROBE_READ(a, b, c, d) invocation is still desirable, especially when
porting BCC code to libbpf. Yet, no CO-RE relocation is going to be emitted.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20201218235614.2284956-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add BPF_CORE_READ_USER(), BPF_CORE_READ_USER_STR() and their _INTO()
variations to allow reading CO-RE-relocatable kernel data structures from the
user-space. One of such cases is reading input arguments of syscalls, while
reaping the benefits of CO-RE relocations w.r.t. handling 32/64 bit
conversions and handling missing/new fields in UAPI data structs.
Suggested-by: Gilad Reti <gilad.reti@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20201218235614.2284956-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Various workflows (--define-prefix, --define-variable=prefix) require variables in
the pc file to use a literal so that it is overridden. Change the Makefile
so that, by default and unless is specified, it is set as expected.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Support finding kernel targets in kernel modules when using
bpf_program__set_attach_target() API. This brings it up to par with what
libbpf supports when doing declarative SEC()-based target determination.
Some minor internal refactoring was needed to make sure vmlinux BTF can be
loaded before bpf_object's load phase.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20201211215825.3646154-2-andrii@kernel.org
Fix ring_buffer__poll() to return the number of non-discarded records
consumed, just like its documentation states. It's also consistent with
ring_buffer__consume() return. Fix up selftests with wrong expected results.
Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
Fixes: cb1c9ddd5525 ("selftests/bpf: Add BPF ringbuf selftests")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201130223336.904192-1-andrii@kernel.org
Some versions of GCC are really nit-picky about strncpy() use. Use memcpy(),
as they are pretty much equivalent for the case of fixed length strings.
Fixes: e459f49b4394 ("libbpf: Separate XDP program load with xsk socket creation")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201203235440.2302137-1-andrii@kernel.org
Teach libbpf to search for candidate types for CO-RE relocations across kernel
modules BTFs, in addition to vmlinux BTF. If at least one candidate type is
found in vmlinux BTF, kernel module BTFs are not iterated. If vmlinux BTF has
no matching candidates, then find all kernel module BTFs and search for all
matching candidates across all of them.
Kernel's support for module BTFs are inferred from the support for BTF name
pointer in BPF UAPI.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201203204634.1325171-6-andrii@kernel.org
I've seen a situation, where a process that's under pprof constantly
generates SIGPROF which prevents program loading indefinitely.
The right thing to do probably is to disable signals in the upper
layers while loading, but it still would be nice to get some error from
libbpf instead of an endless loop.
Let's add some small retry limit to the program loading:
try loading the program 5 (arbitrary) times and give up.
v2:
* 10 -> 5 retires (Andrii Nakryiko)
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201202231332.3923644-1-sdf@google.com
When we added sanitising of map names before loading programs to libbpf, we
still allowed periods in the name. While the kernel will accept these for
the map names themselves, they are not allowed in file names when pinning
maps. This means that bpf_object__pin_maps() will fail if called on an
object that contains internal maps (such as sections .rodata).
Fix this by replacing periods with underscores when constructing map pin
paths. This only affects the paths generated by libbpf when
bpf_object__pin_maps() is called with a path argument. Any pin paths set
by bpf_map__set_pin_path() are unaffected, and it will still be up to the
caller to avoid invalid characters in those.
Fixes: 113e6b7e15e2 ("libbpf: Sanitise internal map names so they are not rejected by the kernel")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201203093306.107676-1-toke@redhat.com
Before this patch, a program with unspecified type
(BPF_PROG_TYPE_UNSPEC) would be passed to the BPF syscall, only to have
the kernel reject it with an opaque invalid argument error. This patch
makes libbpf reject such programs with a nicer error message - in
particular libbpf now tries to diagnose bad ELF section names at both
open time and load time.
Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201203043410.59699-1-andreimatei1@gmail.com
Add support for separation of eBPF program load and xsk socket
creation.
This is needed for use-case when you want to privide as little
privileges as possible to the data plane application that will
handle xsk socket creation and incoming traffic.
With this patch the data entity container can be run with only
CAP_NET_RAW capability to fulfill its purpose of creating xsk
socket and handling packages. In case your umem is larger or
equal process limit for MEMLOCK you need either increase the
limit or CAP_IPC_LOCK capability.
To resolve privileges issue two APIs are introduced:
- xsk_setup_xdp_prog - loads the built in XDP program. It can
also return xsks_map_fd which is needed by unprivileged process
to update xsks_map with AF_XDP socket "fd"
- xsk_socket__update_xskmap - inserts an AF_XDP socket into an xskmap
for a particular xsk_socket
Signed-off-by: Mariusz Dudek <mariuszx.dudek@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20201203090546.11976-2-mariuszx.dudek@intel.com
Replace size_t with __u32 in the xsk interfaces that contain this.
There is no reason to have size_t since the internal variable that
is manipulated is a __u32. The following APIs are affected:
__u32 xsk_ring_prod__reserve(struct xsk_ring_prod *prod, __u32 nb, __u32 *idx)
void xsk_ring_prod__submit(struct xsk_ring_prod *prod, __u32 nb)
__u32 xsk_ring_cons__peek(struct xsk_ring_cons *cons, __u32 nb, __u32 *idx)
void xsk_ring_cons__cancel(struct xsk_ring_cons *cons, __u32 nb)
void xsk_ring_cons__release(struct xsk_ring_cons *cons, __u32 nb)
The "nb" variable and the return values have been changed from size_t
to __u32.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1606383455-8243-1-git-send-email-magnus.karlsson@gmail.com
Add a new function for returning descriptors the user received
after an xsk_ring_cons__peek call. After the application has
gotten a number of descriptors from a ring, it might not be able
to or want to process them all for various reasons. Therefore,
it would be useful to have an interface for returning or
cancelling a number of them so that they are returned to the ring.
This patch adds a new function called xsk_ring_cons__cancel that
performs this operation on nb descriptors counted from the end of
the batch of descriptors that was received through the peek call.
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ Magnus Karlsson: rewrote changelog ]
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/1606202474-8119-1-git-send-email-lirongqing@baidu.com
When operating on split BTF, btf__find_by_name[_kind] will not
iterate over all types since they use btf->nr_types to show
the number of types to iterate over. For split BTF this is
the number of types _on top of base BTF_, so it will
underestimate the number of types to iterate over, especially
for vmlinux + module BTF, where the latter is much smaller.
Use btf__get_nr_types() instead.
Fixes: ba451366bf44 ("libbpf: Implement basic split BTF support")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1605437195-2175-1-git-send-email-alan.maguire@oracle.com
If BPF code contains unused BPF subprogram and there are no other subprogram
calls (which can realistically happen in real-world applications given
sufficiently smart Clang code optimizations), libbpf will erroneously assume
that subprograms are entry-point programs and will attempt to load them with
UNSPEC program type.
Fix by not relying on subcall instructions and rather detect it based on the
structure of BPF object's sections.
Fixes: 9a94f277c4fb ("tools: libbpf: restore the ability to load programs from .text section")
Reported-by: Dmitrii Banshchikov <dbanschikov@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201107000251.256821-1-andrii@kernel.org
If bits is 0, the case when the map is empty, then the >> is the size of
the register which is undefined behavior - on x86 it is the same as a
shift by 0.
Fix by handling the 0 case explicitly and guarding calls to hash_bits for
empty maps in hashmap__for_each_key_entry and hashmap__for_each_entry_safe.
Fixes: e3b924224028 ("libbpf: add resizable non-thread safe internal hashmap")
Suggested-by: Andrii Nakryiko <andriin@fb.com>,
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201029223707.494059-1-irogers@google.com
In some cases compiler seems to generate distinct DWARF types for identical
arrays within the same CU. That seems like a bug, but it's already out there
and breaks type graph equivalence checks, so accommodate it anyway by checking
for identical arrays, regardless of their type ID.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-10-andrii@kernel.org
Add support for deduplication split BTFs. When deduplicating split BTF, base
BTF is considered to be immutable and can't be modified or adjusted. 99% of
BTF deduplication logic is left intact (module some type numbering adjustments).
There are only two differences.
First, each type in base BTF gets hashed (expect VAR and DATASEC, of course,
those are always considered to be self-canonical instances) and added into
a table of canonical table candidates. Hashing is a shallow, fast operation,
so mostly eliminates the overhead of having entire base BTF to be a part of
BTF dedup.
Second difference is very critical and subtle. While deduplicating split BTF
types, it is possible to discover that one of immutable base BTF BTF_KIND_FWD
types can and should be resolved to a full STRUCT/UNION type from the split
BTF part. This is, obviously, can't happen because we can't modify the base
BTF types anymore. So because of that, any type in split BTF that directly or
indirectly references that newly-to-be-resolved FWD type can't be considered
to be equivalent to the corresponding canonical types in base BTF, because
that would result in a loss of type resolution information. So in such case,
split BTF types will be deduplicated separately and will cause some
duplication of type information, which is unavoidable.
With those two changes, the rest of the algorithm manages to deduplicate split
BTF correctly, pointing all the duplicates to their canonical counter-parts in
base BTF, but also is deduplicating whatever unique types are present in split
BTF on their own.
Also, theoretically, split BTF after deduplication could end up with either
empty type section or empty string section. This is handled by libbpf
correctly in one of previous patches in the series.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-9-andrii@kernel.org
Make data section layout checks stricter, disallowing overlap of types and
strings data.
Additionally, allow BTFs with no type data. There is nothing inherently wrong
with having BTF with no types (put potentially with some strings). This could
be a situation with kernel module BTFs, if module doesn't introduce any new
type information.
Also fix invalid offset alignment check for btf->hdr->type_off.
Fixes: 8a138aed4a80 ("bpf: btf: Add BTF support to libbpf")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-8-andrii@kernel.org
Support split BTF operation, in which one BTF (base BTF) provides basic set of
types and strings, while another one (split BTF) builds on top of base's types
and strings and adds its own new types and strings. From API standpoint, the
fact that the split BTF is built on top of the base BTF is transparent.
Type numeration is transparent. If the base BTF had last type ID #N, then all
types in the split BTF start at type ID N+1. Any type in split BTF can
reference base BTF types, but not vice versa. Programmatically construction of
a split BTF on top of a base BTF is supported: one can create an empty split
BTF with btf__new_empty_split() and pass base BTF as an input, or pass raw
binary data to btf__new_split(), or use btf__parse_xxx_split() variants to get
initial set of split types/strings from the ELF file with .BTF section.
String offsets are similarly transparent and are a logical continuation of
base BTF's strings. When building BTF programmatically and adding a new string
(explicitly with btf__add_str() or implicitly through appending new
types/members), string-to-be-added would first be looked up from the base
BTF's string section and re-used if it's there. If not, it will be looked up
and/or added to the split BTF string section. Similarly to type IDs, types in
split BTF can refer to strings from base BTF absolutely transparently (but not
vice versa, of course, because base BTF doesn't "know" about existence of
split BTF).
Internal type index is slightly adjusted to be zero-indexed, ignoring a fake
[0] VOID type. This allows to handle split/base BTF type lookups transparently
by using btf->start_id type ID offset, which is always 1 for base/non-split
BTF and equals btf__get_nr_types(base_btf) + 1 for the split BTF.
BTF deduplication is not yet supported for split BTF and support for it will
be added in separate patch.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-5-andrii@kernel.org
Revamp BTF dedup's string deduplication to match the approach of writable BTF
string management. This allows to transfer deduplicated strings index back to
BTF object after deduplication without expensive extra memory copying and hash
map re-construction. It also simplifies the code and speeds it up, because
hashmap-based string deduplication is faster than sort + unique approach.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-4-andrii@kernel.org
This avoids compilation warning if `struct bpf_redir_neigh` is not provided by
other kernel headers.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Yaniv reported a compilation error after pulling latest libbpf:
[...]
../libbpf/src/root/usr/include/bpf/bpf_helpers.h:99:10: error:
unknown register name 'r0' in asm
: "r0", "r1", "r2", "r3", "r4", "r5");
[...]
The issue got triggered given Yaniv was compiling tracing programs with native
target (e.g. x86) instead of BPF target, hence no BTF generated vmlinux.h nor
CO-RE used, and later llc with -march=bpf was invoked to compile from LLVM IR
to BPF object file. Given that clang was expecting x86 inline asm and not BPF
one the error complained that these regs don't exist on the former.
Guard bpf_tail_call_static() with defined(__bpf__) where BPF inline asm is valid
to use. BPF tracing programs on more modern kernels use BPF target anyway and
thus the bpf_tail_call_static() function will be available for them. BPF inline
asm is supported since clang 7 (clang <= 6 otherwise throws same above error),
and __bpf_unreachable() since clang 8, therefore include the latter condition
in order to prevent compilation errors for older clang versions. Given even an
old Ubuntu 18.04 LTS has official LLVM packages all the way up to llvm-10, I did
not bother to special case the __bpf_unreachable() inside bpf_tail_call_static()
further.
Also, undo the sockex3_kern's use of bpf_tail_call_static() sample given they
still have the old hacky way to even compile networking progs with native instead
of BPF target so bpf_tail_call_static() won't be defined there anymore.
Fixes: 0e9f6841f664 ("bpf, libbpf: Add bpf_tail_call_static helper for bpf programs")
Reported-by: Yaniv Agman <yanivagman@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Tested-by: Yaniv Agman <yanivagman@gmail.com>
Link: https://lore.kernel.org/bpf/CAMy7=ZUk08w5Gc2Z-EKi4JFtuUCaZYmE4yzhJjrExXpYKR4L8w@mail.gmail.com
Link: https://lore.kernel.org/bpf/20201021203257.26223-1-daniel@iogearbox.net