Commit Graph

1523 Commits

Author SHA1 Message Date
Jiri Olsa
42f78dd5ac libbpf: Add libbpf_kallsyms_parse function
Move the kallsyms parsing in internal libbpf_kallsyms_parse
function, so it can be used from other places.

It will be used in following changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220316122419.933957-8-jolsa@kernel.org
2022-03-19 23:08:50 -07:00
Jiri Olsa
50ae8c25d2 bpf: Add cookie support to programs attached with kprobe multi link
Adding support to call bpf_get_attach_cookie helper from
kprobe programs attached with kprobe multi link.

The cookie is provided by array of u64 values, where each
value is paired with provided function address or symbol
with the same array index.

When cookie array is provided it's sorted together with
addresses (check bpf_kprobe_multi_cookie_swap). This way
we can find cookie based on the address in
bpf_get_attach_cookie helper.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220316122419.933957-7-jolsa@kernel.org
2022-03-19 23:08:50 -07:00
Jiri Olsa
e85e26492d bpf: Add multi kprobe link
Adding new link type BPF_LINK_TYPE_KPROBE_MULTI that attaches kprobe
program through fprobe API.

The fprobe API allows to attach probe on multiple functions at once
very fast, because it works on top of ftrace. On the other hand this
limits the probe point to the function entry or return.

The kprobe program gets the same pt_regs input ctx as when it's attached
through the perf API.

Adding new attach type BPF_TRACE_KPROBE_MULTI that allows attachment
kprobe to multiple function with new link.

User provides array of addresses or symbols with count to attach the
kprobe program to. The new link_create uapi interface looks like:

  struct {
          __u32           flags;
          __u32           cnt;
          __aligned_u64   syms;
          __aligned_u64   addrs;
  } kprobe_multi;

The flags field allows single BPF_TRACE_KPROBE_MULTI bit to create
return multi kprobe.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220316122419.933957-4-jolsa@kernel.org
2022-03-19 23:08:50 -07:00
Roberto Sassu
9fb154ee77 bpf-lsm: Introduce new helper bpf_ima_file_hash()
ima_file_hash() has been modified to calculate the measurement of a file on
demand, if it has not been already performed by IMA or the measurement is
not fresh. For compatibility reasons, ima_inode_hash() remains unchanged.

Keep the same approach in eBPF and introduce the new helper
bpf_ima_file_hash() to take advantage of the modified behavior of
ima_file_hash().

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220302111404.193900-4-roberto.sassu@huawei.com
2022-03-19 23:08:50 -07:00
Hengqi Chen
34d57cc0eb bpf: Fix comment for helper bpf_current_task_under_cgroup()
Fix the descriptions of the return values of helper bpf_current_task_under_cgroup().

Fixes: c6b5fb8690fa ("bpf: add documentation for eBPF helpers (42-50)")
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220310155335.1278783-1-hengqi.chen@gmail.com
2022-03-19 23:08:50 -07:00
Martin KaFai Lau
a557610d11 bpf: Remove BPF_SKB_DELIVERY_TIME_NONE and rename s/delivery_time_/tstamp_/
This patch is to simplify the uapi bpf.h regarding to the tstamp type
and use a similar way as the kernel to describe the value stored
in __sk_buff->tstamp.

My earlier thought was to avoid describing the semantic and
clock base for the rcv timestamp until there is more clarity
on the use case, so the __sk_buff->delivery_time_type naming instead
of __sk_buff->tstamp_type.

With some thoughts, it can reuse the UNSPEC naming.  This patch first
removes BPF_SKB_DELIVERY_TIME_NONE and also

rename BPF_SKB_DELIVERY_TIME_UNSPEC to BPF_SKB_TSTAMP_UNSPEC
and    BPF_SKB_DELIVERY_TIME_MONO   to BPF_SKB_TSTAMP_DELIVERY_MONO.

The semantic of BPF_SKB_TSTAMP_DELIVERY_MONO is the same:
__sk_buff->tstamp has delivery time in mono clock base.

BPF_SKB_TSTAMP_UNSPEC means __sk_buff->tstamp has the (rcv)
tstamp at ingress and the delivery time at egress.  At egress,
the clock base could be found from skb->sk->sk_clockid.
__sk_buff->tstamp == 0 naturally means NONE, so NONE is not needed.

With BPF_SKB_TSTAMP_UNSPEC for the rcv tstamp at ingress,
the __sk_buff->delivery_time_type is also renamed to __sk_buff->tstamp_type
which was also suggested in the earlier discussion:
https://lore.kernel.org/bpf/b181acbe-caf8-502d-4b7b-7d96b9fc5d55@iogearbox.net/

The above will then make __sk_buff->tstamp and __sk_buff->tstamp_type
the same as its kernel skb->tstamp and skb->mono_delivery_time
counter part.

The internal kernel function bpf_skb_convert_dtime_type_read() is then
renamed to bpf_skb_convert_tstamp_type_read() and it can be simplified
with the BPF_SKB_DELIVERY_TIME_NONE gone.  A BPF_ALU32_IMM(BPF_AND)
insn is also saved by using BPF_JMP32_IMM(BPF_JSET).

The bpf helper bpf_skb_set_delivery_time() is also renamed to
bpf_skb_set_tstamp().  The arg name is changed from dtime
to tstamp also.  It only allows setting tstamp 0 for
BPF_SKB_TSTAMP_UNSPEC and it could be relaxed later
if there is use case to change mono delivery time to
non mono.

prog->delivery_time_access is also renamed to prog->tstamp_type_access.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220309090509.3712315-1-kafai@fb.com
2022-03-19 23:08:50 -07:00
Toke Høiland-Jørgensen
5ad674a007 libbpf: Support batch_size option to bpf_prog_test_run
Add support for setting the new batch_size parameter to BPF_PROG_TEST_RUN
to libbpf; just add it as an option and pass it through to the kernel.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-4-toke@redhat.com
2022-03-19 23:08:50 -07:00
Toke Høiland-Jørgensen
d647265e4b bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.

The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.

When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.

Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.

To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.

Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.

Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-19 23:08:50 -07:00
Guo Zhengkui
21cd83a1d1 libbpf: Fix array_size.cocci warning
Fix the following coccicheck warning:
tools/lib/bpf/bpf.c:114:31-32: WARNING: Use ARRAY_SIZE
tools/lib/bpf/xsk.c:484:34-35: WARNING: Use ARRAY_SIZE
tools/lib/bpf/xsk.c:485:35-36: WARNING: Use ARRAY_SIZE

It has been tested with gcc (Debian 8.3.0-6) 8.3.0 on x86_64.

Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220306023426.19324-1-guozhengkui@vivo.com
2022-03-19 23:08:50 -07:00
lic121
6e77ef94f0 libbpf: Unmap rings when umem deleted
xsk_umem__create() does mmap for fill/comp rings, but xsk_umem__delete()
doesn't do the unmap. This works fine for regular cases, because
xsk_socket__delete() does unmap for the rings. But for the case that
xsk_socket__create_shared() fails, umem rings are not unmapped.

fill_save/comp_save are checked to determine if rings have already be
unmapped by xsk. If fill_save and comp_save are NULL, it means that the
rings have already been used by xsk. Then they are supposed to be
unmapped by xsk_socket__delete(). Otherwise, xsk_umem__delete() does the
unmap.

Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices")
Signed-off-by: Cheng Li <lic121@chinatelecom.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220301132623.GA19995@vscode.7~
2022-03-19 23:08:50 -07:00
Andrii Nakryiko
c84815ee37 ci: enable CONFIG_FPROBE=y for multi-attach kprobe tests
Recently landed multi-attach kprobe functionality expects
CONFIG_FPROBE=y.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-03-18 00:52:43 -07:00
Mykola Lysenko
4282f3cdec ci: Add troubleshooting steps to s390x setup readme
Related to libbpf CI. Added more information on how
to setup and troubleshoot GitHub action runners for
s390x platform.

Signed-off-by: Mykola Lysenko <mykolal@fb.com>
2022-03-17 21:19:03 -07:00
Andrii Nakryiko
3591deb9bc ci: blacklist s390x tests
Blacklist timer_crash_mode as requiring BPF trampoline.

Temporary blacklist sk_lookup due to big-endian problems that haven't
been resolved upstream yet.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-03-07 22:16:11 -08:00
Andrii Nakryiko
767badc609 Makefile: update libbpf version to 0.8.0
New version cycle, bump LIBBPF_MINOR_VERSION to 8 in Makefile.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-03-07 22:16:11 -08:00
Andrii Nakryiko
8e654d74c4 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   b75dacaac4650478ed5a9d33975b91b99016daff
Checkpoint bpf-next commit: c344b9fc2108eeaa347c387219886cf87e520e93
Baseline bpf commit:        75134f16e7dd0007aa474b281935c5f42e79f2c8
Checkpoint bpf commit:      18b1ab7aa76bde181bdb1ab19a87fa9523c32f21

Andrii Nakryiko (2):
  libbpf: Allow BPF program auto-attach handlers to bail out
  libbpf: Support custom SEC() handlers

Hangbin Liu (1):
  bonding: add new option ns_ip6_target

Martin KaFai Lau (1):
  bpf: Add __sk_buff->delivery_time_type and
    bpf_skb_set_skb_delivery_time()

Stijn Tintel (1):
  libbpf: Fix BPF_MAP_TYPE_PERF_EVENT_ARRAY auto-pinning

Xu Kuohai (1):
  libbpf: Skip forward declaration when counting duplicated type names

Yuntao Wang (3):
  libbpf: Remove redundant check in btf_fixup_datasec()
  libbpf: Simplify the find_elf_sec_sz() function
  libbpf: Add a check to ensure that page_cnt is non-zero

 include/uapi/linux/bpf.h     |  41 +++-
 include/uapi/linux/if_link.h |   1 +
 src/btf_dump.c               |   5 +
 src/libbpf.c                 | 388 +++++++++++++++++++++++------------
 src/libbpf.h                 | 109 ++++++++++
 src/libbpf.map               |   6 +
 src/libbpf_version.h         |   2 +-
 7 files changed, 423 insertions(+), 129 deletions(-)

--
2.30.2
2022-03-07 22:16:11 -08:00
Andrii Nakryiko
dac1e23c97 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-03-07 22:16:11 -08:00
Andrii Nakryiko
dc679587eb libbpf: Support custom SEC() handlers
Allow registering and unregistering custom handlers for BPF program.
This allows user applications and libraries to plug into libbpf's
declarative SEC() definition handling logic. This allows to offload
complex and intricate custom logic into external libraries, but still
provide a great user experience.

One such example is USDT handling library, which has a lot of code and
complexity which doesn't make sense to put into libbpf directly, but it
would be really great for users to be able to specify BPF programs with
something like SEC("usdt/<path-to-binary>:<usdt_provider>:<usdt_name>")
and have correct BPF program type set (BPF_PROGRAM_TYPE_KPROBE, as it is
uprobe) and even support BPF skeleton's auto-attach logic.

In some cases, it might be even good idea to override libbpf's default
handling, like for SEC("perf_event") programs. With custom library, it's
possible to extend logic to support specifying perf event specification
right there in SEC() definition without burdening libbpf with lots of
custom logic or extra library dependecies (e.g., libpfm4). With current
patch it's possible to override libbpf's SEC("perf_event") handling and
specify a completely custom ones.

Further, it's possible to specify a generic fallback handling for any
SEC() that doesn't match any other custom or standard libbpf handlers.
This allows to accommodate whatever legacy use cases there might be, if
necessary.

See doc comments for libbpf_register_prog_handler() and
libbpf_unregister_prog_handler() for detailed semantics.

This patch also bumps libbpf development version to v0.8 and adds new
APIs there.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20220305010129.1549719-3-andrii@kernel.org
2022-03-07 22:16:11 -08:00
Andrii Nakryiko
0d834905d8 libbpf: Allow BPF program auto-attach handlers to bail out
Allow some BPF program types to support auto-attach only in subste of
cases. Currently, if some BPF program type specifies attach callback, it
is assumed that during skeleton attach operation all such programs
either successfully attach or entire skeleton attachment fails. If some
program doesn't support auto-attachment from skeleton, such BPF program
types shouldn't have attach callback specified.

This is limiting for cases when, depending on how full the SEC("")
definition is, there could either be enough details to support
auto-attach or there might not be and user has to use some specific API
to provide more details at runtime.

One specific example of such desired behavior might be SEC("uprobe"). If
it's specified as just uprobe auto-attach isn't possible. But if it's
SEC("uprobe/<some_binary>:<some_func>") then there are enough details to
support auto-attach. Note that there is a somewhat subtle difference
between auto-attach behavior of BPF skeleton and using "generic"
bpf_program__attach(prog) (which uses the same attach handlers under the
cover). Skeleton allow some programs within bpf_object to not have
auto-attach implemented and doesn't treat that as an error. Instead such
BPF programs are just skipped during skeleton's (optional) attach step.
bpf_program__attach(), on the other hand, is called when user *expects*
auto-attach to work, so if specified program doesn't implement or
doesn't support auto-attach functionality, that will be treated as an
error.

Another improvement to the way libbpf is handling SEC()s would be to not
require providing dummy kernel function name for kprobe. Currently,
SEC("kprobe/whatever") is necessary even if actual kernel function is
determined by user at runtime and bpf_program__attach_kprobe() is used
to specify it. With changes in this patch, it's possible to support both
SEC("kprobe") and SEC("kprobe/<actual_kernel_function"), while only in
the latter case auto-attach will be performed. In the former one, such
kprobe will be skipped during skeleton attach operation.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20220305010129.1549719-2-andrii@kernel.org
2022-03-07 22:16:11 -08:00
Yuntao Wang
0a43bc8905 libbpf: Add a check to ensure that page_cnt is non-zero
The page_cnt parameter is used to specify the number of memory pages
allocated for each per-CPU buffer, it must be non-zero and a power of 2.

Currently, the __perf_buffer__new() function attempts to validate that
the page_cnt is a power of 2 but forgets checking for the case where
page_cnt is zero, we can fix it by replacing 'page_cnt & (page_cnt - 1)'
with 'page_cnt == 0 || (page_cnt & (page_cnt - 1))'.

If so, we also don't need to add a check in perf_buffer__new_v0_6_0() to
make sure that page_cnt is non-zero and the check for zero in
perf_buffer__new_raw_v0_6_0() can also be removed.

The code will be cleaner and more readable.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220303005921.53436-1-ytcoode@gmail.com
2022-03-07 22:16:11 -08:00
Xu Kuohai
5d491d5d07 libbpf: Skip forward declaration when counting duplicated type names
Currently if a declaration appears in the BTF before the definition, the
definition is dumped as a conflicting name, e.g.:

    $ bpftool btf dump file vmlinux format raw | grep "'unix_sock'"
    [81287] FWD 'unix_sock' fwd_kind=struct
    [89336] STRUCT 'unix_sock' size=1024 vlen=14

    $ bpftool btf dump file vmlinux format c | grep "struct unix_sock"
    struct unix_sock;
    struct unix_sock___2 {	<--- conflict, the "___2" is unexpected
		    struct unix_sock___2 *unix_sk;

This causes a compilation error if the dump output is used as a header file.

Fix it by skipping declaration when counting duplicated type names.

Fixes: 351131b51c7a ("libbpf: add btf_dump API for BTF-to-C conversion")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220301053250.1464204-2-xukuohai@huawei.com
2022-03-07 22:16:11 -08:00
Stijn Tintel
9b53decb02 libbpf: Fix BPF_MAP_TYPE_PERF_EVENT_ARRAY auto-pinning
When a BPF map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY doesn't have the
max_entries parameter set, the map will be created with max_entries set
to the number of available CPUs. When we try to reuse such a pinned map,
map_is_reuse_compat will return false, as max_entries in the map
definition differs from max_entries of the existing map, causing the
following error:

  libbpf: couldn't reuse pinned map at '/sys/fs/bpf/m_logging': parameter mismatch

Fix this by overwriting max_entries in the map definition. For this to
work, we need to do this in bpf_object__create_maps, before calling
bpf_object__reuse_map.

Fixes: 57a00f41644f ("libbpf: Add auto-pinning of maps when loading BPF objects")
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220225152355.315204-1-stijn@linux-ipv6.be
2022-03-07 22:16:11 -08:00
Yuntao Wang
426672106e libbpf: Simplify the find_elf_sec_sz() function
The check in the last return statement is unnecessary, we can just return
the ret variable.

But we can simplify the function further by returning 0 immediately if we
find the section size and -ENOENT otherwise.

Thus we can also remove the ret variable.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220223085244.3058118-1-ytcoode@gmail.com
2022-03-07 22:16:11 -08:00
Yuntao Wang
c85a8bbe9c libbpf: Remove redundant check in btf_fixup_datasec()
The check 't->size && t->size != size' is redundant because if t->size
compares unequal to 0, we will just skip straight to sorting variables.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220220072750.209215-1-ytcoode@gmail.com
2022-03-07 22:16:11 -08:00
Martin KaFai Lau
e7997e49ea bpf: Add __sk_buff->delivery_time_type and bpf_skb_set_skb_delivery_time()
* __sk_buff->delivery_time_type:
This patch adds __sk_buff->delivery_time_type.  It tells if the
delivery_time is stored in __sk_buff->tstamp or not.

It will be most useful for ingress to tell if the __sk_buff->tstamp
has the (rcv) timestamp or delivery_time.  If delivery_time_type
is 0 (BPF_SKB_DELIVERY_TIME_NONE), it has the (rcv) timestamp.

Two non-zero types are defined for the delivery_time_type,
BPF_SKB_DELIVERY_TIME_MONO and BPF_SKB_DELIVERY_TIME_UNSPEC.  For UNSPEC,
it can only happen in egress because only mono delivery_time can be
forwarded to ingress now.  The clock of UNSPEC delivery_time
can be deduced from the skb->sk->sk_clockid which is how
the sch_etf doing it also.

* Provide forwarded delivery_time to tc-bpf@ingress:
With the help of the new delivery_time_type, the tc-bpf has a way
to tell if the __sk_buff->tstamp has the (rcv) timestamp or
the delivery_time.  During bpf load time, the verifier will learn if
the bpf prog has accessed the new __sk_buff->delivery_time_type.
If it does, it means the tc-bpf@ingress is expecting the
skb->tstamp could have the delivery_time.  The kernel will then
read the skb->tstamp as-is during bpf insn rewrite without
checking the skb->mono_delivery_time.  This is done by adding a
new prog->delivery_time_access bit.  The same goes for
writing skb->tstamp.

* bpf_skb_set_delivery_time():
The bpf_skb_set_delivery_time() helper is added to allow setting both
delivery_time and the delivery_time_type at the same time.  If the
tc-bpf does not need to change the delivery_time_type, it can directly
write to the __sk_buff->tstamp as the existing tc-bpf has already been
doing.  It will be most useful at ingress to change the
__sk_buff->tstamp from the (rcv) timestamp to
a mono delivery_time and then bpf_redirect_*().

bpf only has mono clock helper (bpf_ktime_get_ns), and
the current known use case is the mono EDT for fq, and
only mono delivery time can be kept during forward now,
so bpf_skb_set_delivery_time() only supports setting
BPF_SKB_DELIVERY_TIME_MONO.  It can be extended later when use cases
come up and the forwarding path also supports other clock bases.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-07 22:16:11 -08:00
Hangbin Liu
4c560383a6 bonding: add new option ns_ip6_target
This patch add a new bonding option ns_ip6_target, which correspond
to the arp_ip_target. With this we set IPv6 targets and send IPv6 NS
request to determine the health of the link.

For other related options like the validation, we still use
arp_validate, and will change to ns_validate later.

Note: the sysfs configuration support was removed based on
https://lore.kernel.org/netdev/8863.1645071997@famine

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-07 22:16:11 -08:00
Andrii Nakryiko
9c44c8a8e0 LICENSE: fix BSD-2-Clause by adding year and authors
Seems like 2015 is the year of the first libbpf commit. So use Lorenz's
suggestion and add "(c) 2015 The Libbpf Authors".

Closes: https://github.com/libbpf/libbpf/issues/461
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-02-23 17:55:21 -08:00
Andrii Nakryiko
1c173e5fc8 libbpf: fix libbpf.pc generation w.r.t. patch versions
Ensure that libbpf.pc gets full libbpf's version, including patch
releases. Also add some mechanism to ensure that official released
version (e.g., 0.7.1) and the one recorded in libbpf.map (which never
bumps patch version, so will be 0.7.0) are in sync up to major and minor
versions. This should ensure that major mistakes are captured. We'll
still need to be very careful with zeroing out patch version on minor
version bumps.

Closes: https://github.com/libbpf/libbpf/issues/455
Reported-by: Michel Salim <michel@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-02-22 20:06:42 -08:00
Andrii Nakryiko
93c570ca4b sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   2e3f7bed28376a1a41ce4a58b7163b586e97a546
Checkpoint bpf-next commit: b75dacaac4650478ed5a9d33975b91b99016daff
Baseline bpf commit:        45ce4b4f9009102cd9f581196d480a59208690c1
Checkpoint bpf commit:      75134f16e7dd0007aa474b281935c5f42e79f2c8

Andrii Nakryiko (1):
  libbpf: Fix memleak in libbpf_netlink_recv()

 src/netlink.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

--
2.30.2
2022-02-17 11:33:57 -08:00
Andrii Nakryiko
33201b7ebd libbpf: Fix memleak in libbpf_netlink_recv()
Ensure that libbpf_netlink_recv() frees dynamically allocated buffer in
all code paths.

Fixes: 9c3de619e13e ("libbpf: Use dynamically allocated buffer when receiving netlink messages")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20220217073958.276959-1-andrii@kernel.org
2022-02-17 11:33:57 -08:00
Andrii Nakryiko
6edaacad4f sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   8cbf062a250ed52148badf6f3ffd03657dd4a3f0
Checkpoint bpf-next commit: 2e3f7bed28376a1a41ce4a58b7163b586e97a546
Baseline bpf commit:        61d06f01f9710b327a53492e5add9f972eb909b3
Checkpoint bpf commit:      45ce4b4f9009102cd9f581196d480a59208690c1

Mauricio Vásquez (2):
  libbpf: Split bpf_core_apply_relo()
  libbpf: Expose bpf_core_{add,free}_cands() to bpftool

 src/libbpf.c          | 88 ++++++++++++++++++++++++-------------------
 src/libbpf_internal.h |  9 +++++
 src/relo_core.c       | 79 +++++++++++---------------------------
 src/relo_core.h       | 42 ++++++++++++++++++---
 4 files changed, 118 insertions(+), 100 deletions(-)

--
2.30.2
2022-02-16 13:58:30 -08:00
Mauricio Vásquez
af29a83fe2 libbpf: Expose bpf_core_{add,free}_cands() to bpftool
Expose bpf_core_add_cands() and bpf_core_free_cands() to handle
candidates list.

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com>
Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co>
Signed-off-by: Leonardo Di Donato <leonardo.didonato@elastic.co>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220215225856.671072-3-mauricio@kinvolk.io
2022-02-16 13:58:30 -08:00
Mauricio Vásquez
6387d3900f libbpf: Split bpf_core_apply_relo()
BTFGen needs to run the core relocation logic in order to understand
what are the types involved in a given relocation.

Currently bpf_core_apply_relo() calculates and **applies** a relocation
to an instruction. Having both operations in the same function makes it
difficult to only calculate the relocation without patching the
instruction. This commit splits that logic in two different phases: (1)
calculate the relocation and (2) patch the instruction.

For the first phase bpf_core_apply_relo() is renamed to
bpf_core_calc_relo_insn() who is now only on charge of calculating the
relocation, the second phase uses the already existing
bpf_core_patch_insn(). bpf_object__relocate_core() uses both of them and
the BTFGen will use only bpf_core_calc_relo_insn().

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com>
Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co>
Signed-off-by: Leonardo Di Donato <leonardo.didonato@elastic.co>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220215225856.671072-2-mauricio@kinvolk.io
2022-02-16 13:58:30 -08:00
Andrii Nakryiko
196da61f1d sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   dc37dc617fabfb1c3a16d49f5d8cc20e9e3608ca
Checkpoint bpf-next commit: 8cbf062a250ed52148badf6f3ffd03657dd4a3f0
Baseline bpf commit:        fe68195daf34d5dddacd3f93dd3eafc4beca3a0e
Checkpoint bpf commit:      61d06f01f9710b327a53492e5add9f972eb909b3

Alexei Starovoitov (1):
  libbpf: Prepare light skeleton for the kernel.

Jakub Sitnicki (1):
  selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup

Marco Elver (1):
  perf: uapi: Document perf_event_attr::sig_data truncation on 32 bit
    architectures

Toke Høiland-Jørgensen (1):
  libbpf: Use dynamically allocated buffer when receiving netlink
    messages

 include/uapi/linux/bpf.h        |   3 +-
 include/uapi/linux/perf_event.h |   2 +
 src/gen_loader.c                |  15 ++-
 src/netlink.c                   |  55 +++++++++-
 src/skel_internal.h             | 185 ++++++++++++++++++++++++++++----
 5 files changed, 234 insertions(+), 26 deletions(-)

--
2.30.2
2022-02-15 22:32:04 -08:00
Marco Elver
db8dc47ce8 perf: uapi: Document perf_event_attr::sig_data truncation on 32 bit architectures
Due to the alignment requirements of siginfo_t, as described in
3ddb3fd8cdb0 ("signal, perf: Fix siginfo_t by avoiding u64 on 32-bit
architectures"), siginfo_t::si_perf_data is limited to an unsigned long.

However, perf_event_attr::sig_data is an u64, to avoid having to deal
with compat conversions. Due to being an u64, it may not immediately be
clear to users that sig_data is truncated on 32 bit architectures.

Add a comment to explicitly point this out, and hopefully help some
users save time by not having to deduce themselves what's happening.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/r/20220131103407.1971678-3-elver@google.com
2022-02-15 22:32:04 -08:00
Toke Høiland-Jørgensen
f7d89c3910 libbpf: Use dynamically allocated buffer when receiving netlink messages
When receiving netlink messages, libbpf was using a statically allocated
stack buffer of 4k bytes. This happened to work fine on systems with a 4k
page size, but on systems with larger page sizes it can lead to truncated
messages. The user-visible impact of this was that libbpf would insist no
XDP program was attached to some interfaces because that bit of the netlink
message got chopped off.

Fix this by switching to a dynamically allocated buffer; we borrow the
approach from iproute2 of using recvmsg() with MSG_PEEK|MSG_TRUNC to get
the actual size of the pending message before receiving it, adjusting the
buffer as necessary. While we're at it, also add retries on interrupted
system calls around the recvmsg() call.

v2:
  - Move peek logic to libbpf_netlink_recv(), don't double free on ENOMEM.

Fixes: 8bbb77b7c7a2 ("libbpf: Add various netlink helpers")
Reported-by: Zhiqian Guan <zhguan@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20220211234819.612288-1-toke@redhat.com
2022-02-15 22:32:04 -08:00
Alexei Starovoitov
0d6262ad0a libbpf: Prepare light skeleton for the kernel.
Prepare light skeleton to be used in the kernel module and in the user space.
The look and feel of lskel.h is mostly the same with the difference that for
user space the skel->rodata is the same pointer before and after skel_load
operation, while in the kernel the skel->rodata after skel_open and the
skel->rodata after skel_load are different pointers.
Typical usage of skeleton remains the same for kernel and user space:
skel = my_bpf__open();
skel->rodata->my_global_var = init_val;
err = my_bpf__load(skel);
err = my_bpf__attach(skel);
// access skel->rodata->my_global_var;
// access skel->bss->another_var;

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209232001.27490-3-alexei.starovoitov@gmail.com
2022-02-15 22:32:04 -08:00
Jakub Sitnicki
7593fc7a85 selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup
Extend the context access tests for sk_lookup prog to cover the surprising
case of a 4-byte load from the remote_port field, where the expected value
is actually shifted by 16 bits.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220209184333.654927-3-jakub@cloudflare.com
2022-02-15 22:32:04 -08:00
Andrii Nakryiko
67f813c8a8 README: add libbpf distro packaging badge
Add badge displaying libbpf's packaging status across various Linux distros.
2022-02-11 21:21:10 -08:00
Andrii Nakryiko
2cd2d03f63 libbpf: Fix libbpf.map inheritance chain for LIBBPF_0.7.0
Ensure that LIBBPF_0.7.0 inherits everything from LIBBPF_0.6.0.

Fixes: dbdd2c7f8cec ("libbpf: Add API to get/set log_level at per-program level")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220211205235.2089104-1-andrii@kernel.org
2022-02-11 13:01:37 -08:00
Andrii Nakryiko
528094c0c1 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   227a0713b319e7a8605312dee1c97c97a719a9fc
Checkpoint bpf-next commit: dc37dc617fabfb1c3a16d49f5d8cc20e9e3608ca
Baseline bpf commit:        77b1b8b43ec3c060ecf7e926a92b0f8772171046
Checkpoint bpf commit:      fe68195daf34d5dddacd3f93dd3eafc4beca3a0e

Andrii Nakryiko (1):
  libbpf: Fix compilation warning due to mismatched printf format

Dan Carpenter (1):
  libbpf: Fix signedness bug in btf_dump_array_data()

Hengqi Chen (1):
  libbpf: Add BPF_KPROBE_SYSCALL macro

Ilya Leoshkevich (7):
  libbpf: Add PT_REGS_SYSCALL_REGS macro
  libbpf: Fix accessing syscall arguments on powerpc
  libbpf: Fix riscv register names
  libbpf: Fix accessing syscall arguments on riscv
  libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL
  libbpf: Fix accessing the first syscall argument on arm64
  libbpf: Fix accessing the first syscall argument on s390

Mauricio Vásquez (1):
  libbpf: Remove mode check in libbpf_set_strict_mode()

 src/bpf_tracing.h | 85 +++++++++++++++++++++++++++++++++++++++++------
 src/btf_dump.c    |  6 ++--
 src/libbpf.c      |  8 -----
 3 files changed, 79 insertions(+), 20 deletions(-)

--
2.30.2
2022-02-09 09:48:32 -08:00
Andrii Nakryiko
37493e639f libbpf: Fix compilation warning due to mismatched printf format
On ppc64le architecture __s64 is long int and requires %ld. Cast to
ssize_t and use %zd to avoid architecture-specific specifiers.

Fixes: 4172843ed4a3 ("libbpf: Fix signedness bug in btf_dump_array_data()")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220209063909.1268319-1-andrii@kernel.org
2022-02-09 09:48:32 -08:00
Hengqi Chen
b0a3b9e8fe libbpf: Add BPF_KPROBE_SYSCALL macro
Add syscall-specific variant of BPF_KPROBE named BPF_KPROBE_SYSCALL ([0]).
The new macro hides the underlying way of getting syscall input arguments.
With the new macro, the following code:

    SEC("kprobe/__x64_sys_close")
    int BPF_KPROBE(do_sys_close, struct pt_regs *regs)
    {
        int fd;

        fd = PT_REGS_PARM1_CORE(regs);
        /* do something with fd */
    }

can be written as:

    SEC("kprobe/__x64_sys_close")
    int BPF_KPROBE_SYSCALL(do_sys_close, int fd)
    {
        /* do something with fd */
    }

  [0] Closes: https://github.com/libbpf/libbpf/issues/425

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220207143134.2977852-2-hengqi.chen@gmail.com
2022-02-09 09:48:32 -08:00
Ilya Leoshkevich
f7e08b4a8f libbpf: Fix accessing the first syscall argument on s390
On s390, the first syscall argument should be accessed via orig_gpr2
(see arch/s390/include/asm/syscall.h). Currently gpr[2] is used
instead, leading to bpf_syscall_macro test failure.

orig_gpr2 cannot be added to user_pt_regs, since its layout is a part
of the ABI. Therefore provide access to it only through
PT_REGS_PARM1_CORE_SYSCALL() by using a struct pt_regs flavor.

Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-11-iii@linux.ibm.com
2022-02-09 09:48:32 -08:00
Ilya Leoshkevich
9f6e3a7a59 libbpf: Fix accessing the first syscall argument on arm64
On arm64, the first syscall argument should be accessed via orig_x0
(see arch/arm64/include/asm/syscall.h). Currently regs[0] is used
instead, leading to bpf_syscall_macro test failure.

orig_x0 cannot be added to struct user_pt_regs, since its layout is a
part of the ABI. Therefore provide access to it only through
PT_REGS_PARM1_CORE_SYSCALL() by using a struct pt_regs flavor.

Reported-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-10-iii@linux.ibm.com
2022-02-09 09:48:32 -08:00
Ilya Leoshkevich
f1a756d793 libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL
arm64 and s390 need a special way to access the first syscall argument.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-9-iii@linux.ibm.com
2022-02-09 09:48:32 -08:00
Ilya Leoshkevich
32c19d8505 libbpf: Fix accessing syscall arguments on riscv
riscv does not select ARCH_HAS_SYSCALL_WRAPPER, so its syscall
handlers take "unpacked" syscall arguments. Indicate this to libbpf
using PT_REGS_SYSCALL_REGS macro.

Reported-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-7-iii@linux.ibm.com
2022-02-09 09:48:32 -08:00
Ilya Leoshkevich
497ec1d35c libbpf: Fix riscv register names
riscv registers are accessed via struct user_regs_struct, not struct
pt_regs. The program counter member in this struct is called pc, not
epc. The frame pointer is called s0, not fp.

Fixes: 3cc31d794097 ("libbpf: Normalize PT_REGS_xxx() macro definitions")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-6-iii@linux.ibm.com
2022-02-09 09:48:32 -08:00
Ilya Leoshkevich
8a28842a20 libbpf: Fix accessing syscall arguments on powerpc
powerpc does not select ARCH_HAS_SYSCALL_WRAPPER, so its syscall
handlers take "unpacked" syscall arguments. Indicate this to libbpf
using PT_REGS_SYSCALL_REGS macro.

Reported-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-5-iii@linux.ibm.com
2022-02-09 09:48:32 -08:00
Ilya Leoshkevich
50b4d99bbc libbpf: Add PT_REGS_SYSCALL_REGS macro
Architectures that select ARCH_HAS_SYSCALL_WRAPPER pass a pointer to
struct pt_regs to syscall handlers, others unpack it into individual
function parameters. Introduce a macro to describe what a particular
arch does.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-3-iii@linux.ibm.com
2022-02-09 09:48:32 -08:00
Dan Carpenter
f5bd7054f9 libbpf: Fix signedness bug in btf_dump_array_data()
The btf__resolve_size() function returns negative error codes so
"elem_size" must be signed for the error handling to work.

Fixes: 920d16af9b42 ("libbpf: BTF dumper support for typed data")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220208071552.GB10495@kili
2022-02-09 09:48:32 -08:00