Compare commits

...

1125 Commits

Author SHA1 Message Date
thiagoftsm
11eaf2b0b7 netdata_patch_1_4_1: Add patch to run on Debian 10 2024-05-02 11:25:12 +00:00
thiagoftsm
b39b7f426f Merge branch 'libbpf:master' into master 2024-05-02 11:20:27 +00:00
Quentin Monnet
e055420033 sync: Commit .mailmap changes from script when sync-ing repo
In commit 4794f18bf4 ("sync: Sync .mailmap entries"), we updated the
sync-up script to automatically update libbpf's .mailmap; however, the
script would not take care of committing the changes. Let's address
this.

The code is copied and adapted from the part where we commit changes to
src/bpf_helper_defs.h.

Signed-off-by: Quentin Monnet <qmo@kernel.org>
2024-05-01 17:38:26 -07:00
Andrii Nakryiko
255b705a16 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   1bba3b3d373dbafae891e7cb06b8c82c8d62aba1
Checkpoint bpf-next commit: 0737df6de94661ae55fd3343ce9abec32c687e62
Baseline bpf commit:        b867247555c4181bf84eb10b72b176862c29112d
Checkpoint bpf commit:      3e9bc0472b910d4115e16e9c2d684c7757cb6c60

Andrii Nakryiko (1):
  libbpf: better fix for handling nulled-out struct_ops program

Jiri Olsa (3):
  bpf: Add support for kprobe session attach
  libbpf: Add support for kprobe session attach
  libbpf: Add kprobe session attach type name to attach_type_name

Viktor Malik (1):
  libbpf: support "module: Function" syntax for tracing programs

 include/uapi/linux/bpf.h |   1 +
 src/bpf.c                |   1 +
 src/libbpf.c             | 112 +++++++++++++++++++++++++++++++--------
 src/libbpf.h             |   4 +-
 4 files changed, 95 insertions(+), 23 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-05-01 15:20:15 -07:00
Andrii Nakryiko
6a41f02ad4 libbpf: better fix for handling nulled-out struct_ops program
Previous attempt to fix the handling of nulled-out (from skeleton)
struct_ops program is working well only if struct_ops program is defined
as non-autoloaded by default (i.e., has SEC("?struct_ops") annotation,
with question mark).

Unfortunately, that fix is incomplete due to how
bpf_object_adjust_struct_ops_autoload() is marking referenced or
non-referenced struct_ops program as autoloaded (or not). Because
bpf_object_adjust_struct_ops_autoload() is run after
bpf_map__init_kern_struct_ops() step, which sets program slot to NULL,
such programs won't be considered "referenced", and so its autoload
property won't be changed.

This all sounds convoluted and it is, but the desire is to have as
natural behavior (as far as struct_ops usage is concerned) as possible.

This fix is redoing the original fix but makes it work for
autoloaded-by-default struct_ops programs as well. We achieve this by
forcing prog->autoload to false if prog was declaratively set for some
struct_ops map, but then nulled-out from skeleton (programmatically).
This achieves desired effect of not autoloading it. If such program is
still referenced somewhere else (different struct_ops map or different
callback field), it will get its autoload property adjusted by
bpf_object_adjust_struct_ops_autoload() later.

We also fix selftest, which accidentally used SEC("?struct_ops")
annotation. It was meant to use autoload-by-default program from the
very beginning.

Fixes: f973fccd43d3 ("libbpf: handle nulled-out program in struct_ops correctly")
Cc: Kui-Feng Lee <thinker.li@gmail.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240501041706.3712608-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-01 15:20:15 -07:00
Viktor Malik
dd589c3b31 libbpf: support "module: Function" syntax for tracing programs
In some situations, it is useful to explicitly specify a kernel module
to search for a tracing program target (e.g. when a function of the same
name exists in multiple modules or in vmlinux).

This patch enables that by allowing the "module:function" syntax for the
find_kernel_btf_id function. Thanks to this, the syntax can be used both
from a SEC macro (i.e. `SEC(fentry/module:function)`) and via the
bpf_program__set_attach_target API call.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/9085a8cb9a552de98e554deb22ff7e977d025440.1714469650.git.vmalik@redhat.com
2024-05-01 15:20:15 -07:00
Jiri Olsa
045a0372ef libbpf: Add kprobe session attach type name to attach_type_name
Adding kprobe session attach type name to attach_type_name,
so libbpf_bpf_attach_type_str returns proper string name for
BPF_TRACE_KPROBE_SESSION attach type.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240430112830.1184228-6-jolsa@kernel.org
2024-05-01 15:20:15 -07:00
Jiri Olsa
6c3cf5108e libbpf: Add support for kprobe session attach
Adding support to attach program in kprobe session mode
with bpf_program__attach_kprobe_multi_opts function.

Adding session bool to bpf_kprobe_multi_opts struct that allows
to load and attach the bpf program via kprobe session.
the attachment to create kprobe multi session.

Also adding new program loader section that allows:
 SEC("kprobe.session/bpf_fentry_test*")

and loads/attaches kprobe program as kprobe session.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240430112830.1184228-5-jolsa@kernel.org
2024-05-01 15:20:15 -07:00
Jiri Olsa
b63d2945ff bpf: Add support for kprobe session attach
Adding support to attach bpf program for entry and return probe
of the same function. This is common use case which at the moment
requires to create two kprobe multi links.

Adding new BPF_TRACE_KPROBE_SESSION attach type that instructs
kernel to attach single link program to both entry and exit probe.

It's possible to control execution of the bpf program on return
probe simply by returning zero or non zero from the entry bpf
program execution to execute or not the bpf program on return
probe respectively.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240430112830.1184228-2-jolsa@kernel.org
2024-05-01 15:20:15 -07:00
Andrii Nakryiko
d3e18fceec ci: remove tcp_rtt test from 5.5 ALLOWLIST
It's been updated to expecte the very latest kernel, can't succeed on
5.5 anymore.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-04-30 09:09:32 -07:00
Andrii Nakryiko
22bd976613 ci: update vmlinux.h
Regenerate vmlinux.h to get all the latest types.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-04-30 09:09:32 -07:00
Andrii Nakryiko
f9f3fbf72d sync: update .mailmap
Update .mailmap generated during sync.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-04-30 09:09:32 -07:00
Andrii Nakryiko
37b8e0eb2d sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   82e38a505c9868e784ec31e743fd8a9fa5ca1084
Checkpoint bpf-next commit: 1bba3b3d373dbafae891e7cb06b8c82c8d62aba1
Baseline bpf commit:        5bcf0dcbf9066348058b88a510c57f70f384c92c
Checkpoint bpf commit:      b867247555c4181bf84eb10b72b176862c29112d

Andrii Nakryiko (1):
  libbpf: handle nulled-out program in struct_ops correctly

Jose E. Marchesi (1):
  bpf_helpers.h: Define bpf_tail_call_static when building with GCC

Philo Lu (1):
  bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args

 include/uapi/linux/bpf.h | 2 ++
 src/bpf_helpers.h        | 4 +++-
 src/libbpf.c             | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-04-30 09:09:32 -07:00
Andrii Nakryiko
f28271ab72 libbpf: handle nulled-out program in struct_ops correctly
If struct_ops has one of program callbacks set declaratively and host
kernel is old and doesn't support this callback, libbpf will allow to
load such struct_ops as long as that callback was explicitly nulled-out
(presumably through skeleton). This is all working correctly, except we
won't reset corresponding program slot to NULL before bailing out, which
will lead to libbpf not detecting that BPF program has to be not
auto-loaded. Fix this by unconditionally resetting corresponding program
slot to NULL.

Fixes: c911fc61a7ce ("libbpf: Skip zeroed or null fields if not found in the kernel type.")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240428030954.3918764-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-30 09:09:32 -07:00
Jose E. Marchesi
b1051d9361 bpf_helpers.h: Define bpf_tail_call_static when building with GCC
The definition of bpf_tail_call_static in tools/lib/bpf/bpf_helpers.h
is guarded by a preprocessor check to assure that clang is recent
enough to support it.  This patch updates the guard so the function is
compiled when using GCC 13 or later as well.

Tested in bpf-next master. No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240426145158.14409-1-jose.marchesi@oracle.com
2024-04-30 09:09:32 -07:00
Philo Lu
43df08cd17 bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args
Two important arguments in RTT estimation, mrtt and srtt, are passed to
tcp_bpf_rtt(), so that bpf programs get more information about RTT
computation in BPF_SOCK_OPS_RTT_CB.

The difference between bpf_sock_ops->srtt_us and the srtt here is: the
former is an old rtt before update, while srtt passed by tcp_bpf_rtt()
is that after update.

Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240425161724.73707-2-lulie@linux.alibaba.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-30 09:09:32 -07:00
Quentin Monnet
4794f18bf4 sync: Sync .mailmap entries
The kernel repository has a .mailmap file to remap author names and
email addresses to their desired format in Git logs (for details, see
gitmailmap documentation [0]). Alas, this is only visible for author
information when looking at the logs locally, as GitHub does not support
mailmaps at the moment [1].

This commit adds a .mailmap file for libbpf, automatically generated
from the kernel's version. The script to generate the .mailmap is added,
too: it works by grepping email addresses from authors in the
repository, and collecting all lines ending with this address in the
kernel's .mailmap - in other words, all lines where this address is used
as a pattern for a remapping.

To keep the .mailmap up-to-date, add a call to the script to
sync-kernel.sh.

[0] https://git-scm.com/docs/gitmailmap
[1] https://github.com/orgs/community/discussions/22518

Signed-off-by: Quentin Monnet <qmo@kernel.org>
2024-04-25 22:42:05 -07:00
Yonghong Song
2fdcc365a0 ci: regenerate latest vmlinux.h
Update vmlinux.h to make BPF selftests compile.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
2024-04-24 15:16:35 -07:00
Yonghong Song
52c37177cc Makefile: Ensure github libbpf version the same as the kernel one
The kernel libbpf version is 1.5 now. So change github libbpf
version to be 1.5 as well.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
2024-04-24 15:16:35 -07:00
Yonghong Song
7cbfddfdf2 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   14bb1e8c8d4ad5d9d2febb7d19c70a3cf536e1e5
Checkpoint bpf-next commit: 82e38a505c9868e784ec31e743fd8a9fa5ca1084
Baseline bpf commit:        443574b033876c85a35de4c65c14f7fe092222b2
Checkpoint bpf commit:      5bcf0dcbf9066348058b88a510c57f70f384c92c

Andrea Righi (3):
  libbpf: Start v1.5 development cycle
  libbpf: ringbuf: Allow to consume up to a certain amount of items
  libbpf: Add ring__consume_n / ring_buffer__consume_n

Anton Protopopov (2):
  bpf: Add support for passing mark with bpf_fib_lookup
  bpf: Pack struct bpf_fib_lookup

Benjamin Tissoires (1):
  tools: sync include/uapi/linux/bpf.h

David Lechner (1):
  bpf: Fix typo in uapi doc comments

Mykyta Yatsenko (1):
  bpf: improve error message for unsupported helper

Quentin Deslandes (2):
  libbpf: Fix misaligned array closing bracket
  libbpf: Fix dump of subsequent char arrays

Tobias Böhm (1):
  libbpf: Use local bpf_helpers.h include

Yonghong Song (4):
  libbpf: Mark libbpf_kallsyms_parse static function
  libbpf: Handle <orig_name>.llvm.<hash> symbol properly
  bpf: Add bpf_link support for sk_msg and sk_skb progs
  libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP

 include/uapi/linux/bpf.h | 35 +++++++++++++++++++++----
 src/bpf_core_read.h      |  2 +-
 src/btf_dump.c           |  5 ++++
 src/libbpf.c             | 33 ++++++++++++++++++++++--
 src/libbpf.h             | 14 ++++++++++
 src/libbpf.map           |  7 +++++
 src/libbpf_internal.h    |  5 ----
 src/libbpf_probes.c      |  6 +++--
 src/libbpf_version.h     |  2 +-
 src/ringbuf.c            | 55 +++++++++++++++++++++++++++++++++-------
 10 files changed, 139 insertions(+), 25 deletions(-)

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
2024-04-24 15:16:35 -07:00
Yonghong Song
f2fe16ec95 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
2024-04-24 15:16:35 -07:00
Benjamin Tissoires
a911ca1e3e tools: sync include/uapi/linux/bpf.h
cp include/uapi/linux/bpf.h tools/include/uapi/linux/bpf.h

Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-6-6c986a5a741f@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24 15:16:35 -07:00
Quentin Deslandes
24924003c6 libbpf: Fix dump of subsequent char arrays
When dumping a character array, libbpf will watch for a '\0' and set
is_array_terminated=true if found. This prevents libbpf from printing
the remaining characters of the array, treating it as a nul-terminated
string.

However, once this flag is set, it's never reset, leading to subsequent
characters array not being printed properly:

.str_multi = (__u8[2][16])[
    [
        'H',
        'e',
        'l',
    ],
],

This patch saves the is_array_terminated flag and restores its
default (false) value before looping over the elements of an array,
then restores it afterward. This way, libbpf's behavior is unchanged
when dumping the characters of an array, but subsequent arrays are
printed properly:

.str_multi = (__u8[2][16])[
    [
        'H',
        'e',
        'l',
    ],
    [
        'l',
        'o',
    ],
],

Signed-off-by: Quentin Deslandes <qde@naccy.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240413211258.134421-3-qde@naccy.de
2024-04-24 15:16:35 -07:00
Quentin Deslandes
2c6f445a8e libbpf: Fix misaligned array closing bracket
In btf_dump_array_data(), libbpf will call btf_dump_dump_type_data() for
each element. For an array of characters, each element will be
processed the following way:

- btf_dump_dump_type_data() is called to print the character
- btf_dump_data_pfx() prefixes the current line with the proper number
  of indentations
- btf_dump_int_data() is called to print the character
- After the last character is printed, btf_dump_dump_type_data() calls
  btf_dump_data_pfx() before writing the closing bracket

However, for an array containing characters, btf_dump_int_data() won't
print any '\0' and subsequent characters. This leads to situations where
the line prefix is written, no character is added, then the prefix is
written again before adding the closing bracket:

(struct sk_metadata){
    .str_array = (__u8[14])[
        'H',
        'e',
        'l',
        'l',
        'o',
                ],

This change solves this issue by printing the '\0' character, which
has two benefits:

- The bracket closing the array is properly aligned
- It's clear from a user point of view that libbpf uses '\0' as a
  terminator for arrays of characters.

Signed-off-by: Quentin Deslandes <qde@naccy.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240413211258.134421-2-qde@naccy.de
2024-04-24 15:16:35 -07:00
Yonghong Song
09397e309a libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP
Introduce a libbpf API function bpf_program__attach_sockmap()
which allow user to get a bpf_link for their corresponding programs.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240410043532.3737722-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24 15:16:35 -07:00
Yonghong Song
62217fb32a bpf: Add bpf_link support for sk_msg and sk_skb progs
Add bpf_link support for sk_msg and sk_skb programs. We have an
internal request to support bpf_link for sk_msg programs so user
space can have a uniform handling with bpf_link based libbpf
APIs. Using bpf_link based libbpf API also has a benefit which
makes system robust by decoupling prog life cycle and
attachment life cycle.

Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240410043527.3737160-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24 15:16:35 -07:00
Andrea Righi
b521a722b9 libbpf: Add ring__consume_n / ring_buffer__consume_n
Introduce a new API to consume items from a ring buffer, limited to a
specified amount, and return to the caller the actual number of items
consumed.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/lkml/20240310154726.734289-1-andrea.righi@canonical.com/T
Link: https://lore.kernel.org/bpf/20240406092005.92399-4-andrea.righi@canonical.com
2024-04-24 15:16:35 -07:00
Andrea Righi
98de9ace4d libbpf: ringbuf: Allow to consume up to a certain amount of items
In some cases, instead of always consuming all items from ring buffers
in a greedy way, we may want to consume up to a certain amount of items,
for example when we need to copy items from the BPF ring buffer to a
limited user buffer.

This change allows to set an upper limit to the amount of items consumed
from one or more ring buffers.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240406092005.92399-3-andrea.righi@canonical.com
2024-04-24 15:16:35 -07:00
Andrea Righi
26d9ab5f78 libbpf: Start v1.5 development cycle
Bump libbpf.map to v1.5.0 to start a new libbpf version cycle.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240406092005.92399-2-andrea.righi@canonical.com
2024-04-24 15:16:35 -07:00
Anton Protopopov
c5219d1b3d bpf: Pack struct bpf_fib_lookup
The struct bpf_fib_lookup is supposed to be of size 64. A recent commit
59b418c7063d ("bpf: Add a check for struct bpf_fib_lookup size") added
a static assertion to check this property so that future changes to the
structure will not accidentally break this assumption.

As it immediately turned out, on some 32-bit arm systems, when AEABI=n,
the total size of the structure was equal to 68, see [1]. This happened
because the bpf_fib_lookup structure contains a union of two 16-bit
fields:

    union {
            __u16 tot_len;
            __u16 mtu_result;
    };

which was supposed to compile to a 16-bit-aligned 16-bit field. On the
aforementioned setups it was instead both aligned and padded to 32-bits.

Declare this inner union as __attribute__((packed, aligned(2))) such
that it always is of size 2 and is aligned to 16 bits.

  [1] https://lore.kernel.org/all/CA+G9fYtsoP51f-oP_Sp5MOq-Ffv8La2RztNpwvE6+R1VtFiLrw@mail.gmail.com/#t

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes: e1850ea9bd9e ("bpf: bpf_fib_lookup return MTU value as output when looked up")
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240403123303.1452184-1-aspsk@isovalent.com
2024-04-24 15:16:35 -07:00
Tobias Böhm
8d3a3e138b libbpf: Use local bpf_helpers.h include
Commit 20d59ee55172fdf6 ("libbpf: add bpf_core_cast() macro") added a
bpf_helpers include in bpf_core_read.h as a system include. Usually, the
includes are local, though, like in bpf_tracing.h. This commit adjusts
the include to be local as well.

Signed-off-by: Tobias Böhm <tobias@aibor.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/q5d5bgc6vty2fmaazd5e73efd6f5bhiru2le6fxn43vkw45bls@fhlw2s5ootdb
2024-04-24 15:16:35 -07:00
David Lechner
9f2853a352 bpf: Fix typo in uapi doc comments
In a few places in the bpf uapi headers, EOPNOTSUPP is missing a "P" in
the doc comments. This adds the missing "P".

Signed-off-by: David Lechner <dlechner@baylibre.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240329152900.398260-2-dlechner@baylibre.com
2024-04-24 15:16:35 -07:00
Yonghong Song
d2f83fb976 libbpf: Handle <orig_name>.llvm.<hash> symbol properly
With CONFIG_LTO_CLANG_THIN enabled, with some of previous
version of kernel code base ([1]), I hit the following
error:
   test_ksyms:PASS:kallsyms_fopen 0 nsec
   test_ksyms:FAIL:ksym_find symbol 'bpf_link_fops' not found
   #118     ksyms:FAIL

The reason is that 'bpf_link_fops' is renamed to
   bpf_link_fops.llvm.8325593422554671469
Due to cross-file inlining, the static variable 'bpf_link_fops'
in syscall.c is used by a function in another file. To avoid
potential duplicated names, the llvm added suffix
'.llvm.<hash>' ([2]) to 'bpf_link_fops' variable.
Such renaming caused a problem in libbpf if 'bpf_link_fops'
is used in bpf prog as a ksym but 'bpf_link_fops' does not
match any symbol in /proc/kallsyms.

To fix this issue, libbpf needs to understand that suffix '.llvm.<hash>'
is caused by clang lto kernel and to process such symbols properly.

With latest bpf-next code base built with CONFIG_LTO_CLANG_THIN,
I cannot reproduce the above failure any more. But such an issue
could happen with other symbols or in the future for bpf_link_fops symbol.

For example, with my current kernel, I got the following from
/proc/kallsyms:
  ffffffff84782154 d __func__.net_ratelimit.llvm.6135436931166841955
  ffffffff85f0a500 d tk_core.llvm.726630847145216431
  ffffffff85fdb960 d __fs_reclaim_map.llvm.10487989720912350772
  ffffffff864c7300 d fake_dst_ops.llvm.54750082607048300

I could not easily create a selftest to test newly-added
libbpf functionality with a static C test since I do not know
which symbol is cross-file inlined. But based on my particular kernel,
the following test change can run successfully.

>  diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms.c b/tools/testing/selftests/bpf/prog_tests/ksyms.c
>  index 6a86d1f07800..904a103f7b1d 100644
>  --- a/tools/testing/selftests/bpf/prog_tests/ksyms.c
>  +++ b/tools/testing/selftests/bpf/prog_tests/ksyms.c
>  @@ -42,6 +42,7 @@ void test_ksyms(void)
>          ASSERT_EQ(data->out__bpf_link_fops, link_fops_addr, "bpf_link_fops");
>          ASSERT_EQ(data->out__bpf_link_fops1, 0, "bpf_link_fops1");
>          ASSERT_EQ(data->out__btf_size, btf_size, "btf_size");
>  +       ASSERT_NEQ(data->out__fake_dst_ops, 0, "fake_dst_ops");
>          ASSERT_EQ(data->out__per_cpu_start, per_cpu_start_addr, "__per_cpu_start");
>
>   cleanup:
>  diff --git a/tools/testing/selftests/bpf/progs/test_ksyms.c b/tools/testing/selftests/bpf/progs/test_ksyms.c
>  index 6c9cbb5a3bdf..fe91eef54b66 100644
>  --- a/tools/testing/selftests/bpf/progs/test_ksyms.c
>  +++ b/tools/testing/selftests/bpf/progs/test_ksyms.c
>  @@ -9,11 +9,13 @@ __u64 out__bpf_link_fops = -1;
>   __u64 out__bpf_link_fops1 = -1;
>   __u64 out__btf_size = -1;
>   __u64 out__per_cpu_start = -1;
>  +__u64 out__fake_dst_ops = -1;
>
>   extern const void bpf_link_fops __ksym;
>   extern const void __start_BTF __ksym;
>   extern const void __stop_BTF __ksym;
>   extern const void __per_cpu_start __ksym;
>  +extern const void fake_dst_ops __ksym;
>   /* non-existing symbol, weak, default to zero */
>   extern const void bpf_link_fops1 __ksym __weak;
>
>  @@ -23,6 +25,7 @@ int handler(const void *ctx)
>          out__bpf_link_fops = (__u64)&bpf_link_fops;
>          out__btf_size = (__u64)(&__stop_BTF - &__start_BTF);
>          out__per_cpu_start = (__u64)&__per_cpu_start;
>  +       out__fake_dst_ops = (__u64)&fake_dst_ops;
>
>          out__bpf_link_fops1 = (__u64)&bpf_link_fops1;

This patch fixed the issue in libbpf such that
the suffix '.llvm.<hash>' will be ignored during comparison of
bpf prog ksym vs. symbols in /proc/kallsyms, this resolved the issue.
Currently, only static variables in /proc/kallsyms are checked
with '.llvm.<hash>' suffix since in bpf programs function ksyms
with '.llvm.<hash>' suffix are most likely kfunc's and unlikely
to be cross-file inlined.

Note that currently kernel does not support gcc build with lto.

  [1] https://lore.kernel.org/bpf/20240302165017.1627295-1-yonghong.song@linux.dev/
  [2] https://github.com/llvm/llvm-project/blob/release/18.x/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1714-L1719

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240326041458.1198161-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24 15:16:35 -07:00
Yonghong Song
8b9cb7d479 libbpf: Mark libbpf_kallsyms_parse static function
Currently libbpf_kallsyms_parse() function is declared as a global
function but actually it is not a API and there is no external
users in bpftool/bpf-selftests. So let us mark the function as
static.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240326041453.1197949-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24 15:16:35 -07:00
Mykyta Yatsenko
b062410166 bpf: improve error message for unsupported helper
BPF verifier emits "unknown func" message when given BPF program type
does not support BPF helper. This message may be confusing for users, as
important context that helper is unknown only to current program type is
not provided.

This patch changes message to "program of this type cannot use helper "
and aligns dependent code in libbpf and tests. Any suggestions on
improving/changing this message are welcome.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/r/20240325152210.377548-1-yatsenko@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24 15:16:35 -07:00
Anton Protopopov
89d8cdf741 bpf: Add support for passing mark with bpf_fib_lookup
Extend the bpf_fib_lookup() helper by making it to utilize mark if
the BPF_FIB_LOOKUP_MARK flag is set. In order to pass the mark the
four bytes of struct bpf_fib_lookup are used, shared with the
output-only smac/dmac fields.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240326101742.17421-2-aspsk@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24 15:16:35 -07:00
Song Liu
8a2054f417 ci/diffs: Add temporary fix for mitigation config
Upstream is discussing the exact config to ship. In the meanwhile, which
would unblock CI.

More discussions here:

https://lore.kernel.org/lkml/20240423045548.1324969-1-song@kernel.org/T/#u

Signed-off-by: Song Liu <song@kernel.org>
2024-04-24 10:42:01 -07:00
thiagoftsm
b981a3a138 Merge branch 'libbpf:master' into master 2024-04-21 14:42:55 +00:00
Geyslan Gregório
46eafba62e ci: bump run-on-arch action to v2.7.1
More info: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/

Signed-off-by: Geyslan Gregório <geyslan@gmail.com>
2024-04-02 21:51:25 -07:00
Geyslan Gregório
6d3595d215 ci: bump checkout action to v4
Due to the transition from Node 16 to Node 20, the checkout action
needs to be updated to v4.

More info: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/

Signed-off-by: Geyslan Gregório <geyslan@gmail.com>
2024-04-02 10:25:11 -07:00
Andrii Nakryiko
20ea95b450 ci: sync DENYLISTs with BPF CI
Keep all the denylisted tests in sync.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
902af6913a ci: clean up temporary patch
It's already applied upstream.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
25a9cc27d7 ci: regenerate latest vmlinux.h
Update vmlinux.h to make BPF selftests compile.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
8db4a2feeb sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   e63985ecd22681c7f5975f2e8637187a326b6791
Checkpoint bpf-next commit: 14bb1e8c8d4ad5d9d2febb7d19c70a3cf536e1e5
Baseline bpf commit:        2487007aa3b9fafbd2cb14068f49791ce1d7ede5
Checkpoint bpf commit:      443574b033876c85a35de4c65c14f7fe092222b2

Alexei Starovoitov (6):
  libbpf: Allow specifying 64-bit integers in map BTF.
  bpf: Introduce bpf_arena.
  bpf: Disasm support for addr_space_cast instruction.
  libbpf: Add __arg_arena to bpf_helpers.h
  libbpf: Add support for bpf_arena.
  libbpf, selftests/bpf: Adjust libbpf, bpftool, selftests to match LLVM

Andrii Nakryiko (4):
  libbpf: Recognize __arena global variables.
  bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs
  libbpf: add support for BPF cookie for raw_tp/tp_btf programs
  libbpf: fix u64-to-pointer cast on 32-bit arches

Arnaldo Carvalho de Melo (1):
  libbpf: Define MFD_CLOEXEC if not available

Jakub Kicinski (2):
  netdev: add per-queue statistics
  netdev: add queue stat for alloc failures

Kui-Feng Lee (1):
  libbpf: Skip zeroed or null fields if not found in the kernel type.

Mykyta Yatsenko (1):
  libbpbpf: Check bpf_map/bpf_program fd validity

Quentin Monnet (1):
  libbpf: Prevent null-pointer dereference when prog to load has no BTF

Yonghong Song (2):
  libbpf: Add new sec_def "sk_skb/verdict"
  bpf: Sync uapi bpf.h to tools directory

 include/uapi/linux/bpf.h    |  20 ++-
 include/uapi/linux/netdev.h |  20 +++
 src/bpf.c                   |  16 +-
 src/bpf.h                   |   9 +
 src/bpf_helpers.h           |   2 +
 src/libbpf.c                | 322 +++++++++++++++++++++++++++++++-----
 src/libbpf.h                |  13 +-
 src/libbpf.map              |   2 +
 src/libbpf_probes.c         |   7 +
 9 files changed, 366 insertions(+), 45 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-25 21:58:26 -07:00
Arnaldo Carvalho de Melo
7fee466676 libbpf: Define MFD_CLOEXEC if not available
Since its going directly to the syscall to avoid not having
memfd_create() available in some systems, do the same for its
MFD_CLOEXEC flags, defining it if not available.

This fixes the build in those systems, noticed while building perf on a
set of build containers.

Fixes: 9fa5e1a180aa639f ("libbpf: Call memfd_create() syscall directly")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/ZfxZ9nCyKvwmpKkE@x1
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
ddf722fb5c libbpf: fix u64-to-pointer cast on 32-bit arches
It's been reported that (void *)map->map_extra is causing compilation
warnings on 32-bit architectures. It's easy enough to fix this by
casting to long first.

Fixes: 79ff13e99169 ("libbpf: Add support for bpf_arena.")
Reported-by: Ryan Eatmon <reatmon@ti.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319215143.1279312-1-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
137193b655 libbpf, selftests/bpf: Adjust libbpf, bpftool, selftests to match LLVM
The selftests use
to tell LLVM about special pointers. For LLVM there is nothing "arena"
about them. They are simply pointers in a different address space.
Hence LLVM diff https://github.com/llvm/llvm-project/pull/85161 renamed:
. macro __BPF_FEATURE_ARENA_CAST -> __BPF_FEATURE_ADDR_SPACE_CAST
. global variables in __attribute__((address_space(N))) are now
  placed in section named ".addr_space.N" instead of ".arena.N".

Adjust libbpf, bpftool, and selftests to match LLVM.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20240315021834.62988-3-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Yonghong Song
d2676a58de bpf: Sync uapi bpf.h to tools directory
There is a difference between kernel uapi bpf.h and tools
uapi bpf.h. There is no functionality difference, but let
us sync properly to make it easy for later bpf.h update.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240325033842.1693553-1-yonghong.song@linux.dev
2024-03-25 21:58:26 -07:00
Yonghong Song
4d95d8b7f0 libbpf: Add new sec_def "sk_skb/verdict"
The new sec_def specifies sk_skb program type with
BPF_SK_SKB_VERDICT attachment type. This way, libbpf
will set expected_attach_type properly for the program.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240319175412.2941149-1-yonghong.song@linux.dev
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
f5828cc352 libbpf: add support for BPF cookie for raw_tp/tp_btf programs
Wire up BPF cookie passing or raw_tp and tp_btf programs, both in
low-level and high-level APIs.

Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319233852.1977493-5-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
cbd6e3596c bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs
Wire up BPF cookie for raw tracepoint programs (both BTF and non-BTF
aware variants). This brings them up to part w.r.t. BPF cookie usage
with classic tracepoint and fentry/fexit programs.

Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319233852.1977493-4-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-25 21:58:26 -07:00
Mykyta Yatsenko
7cfc365995 libbpbpf: Check bpf_map/bpf_program fd validity
libbpf creates bpf_program/bpf_map structs for each program/map that
user defines, but it allows to disable creating/loading those objects in
kernel, in that case they won't have associated file descriptor
(fd < 0). Such functionality is used for backward compatibility
with some older kernels.

Nothing prevents users from passing these maps or programs with no
kernel counterpart to libbpf APIs. This change introduces explicit
checks for kernel objects existence, aiming to improve visibility of
those edge cases and provide meaningful warnings to users.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240318131808.95959-1-yatsenko@meta.com
2024-03-25 21:58:26 -07:00
Kui-Feng Lee
a5459eac49 libbpf: Skip zeroed or null fields if not found in the kernel type.
Accept additional fields of a struct_ops type with all zero values even if
these fields are not in the corresponding type in the kernel. This provides
a way to be backward compatible. User space programs can use the same map
on a machine running an old kernel by clearing fields that do not exist in
the kernel.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240313214139.685112-2-thinker.li@gmail.com
2024-03-25 21:58:26 -07:00
Quentin Monnet
f84ee80801 libbpf: Prevent null-pointer dereference when prog to load has no BTF
In bpf_objec_load_prog(), there's no guarantee that obj->btf is non-NULL
when passing it to btf__fd(), and this function does not perform any
check before dereferencing its argument (as bpf_object__btf_fd() used to
do). As a consequence, we get segmentation fault errors in bpftool (for
example) when trying to load programs that come without BTF information.

v2: Keep btf__fd() in the fix instead of reverting to bpf_object__btf_fd().

Fixes: df7c3f7d3a3d ("libbpf: make uniform use of btf__fd() accessor inside libbpf")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240314150438.232462-1-qmo@kernel.org
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
2d042d22a7 libbpf: Recognize __arena global variables.
LLVM automatically places __arena variables into ".arena.1" ELF section.
In order to use such global variables bpf program must include definition
of arena map in ".maps" section, like:
struct {
       __uint(type, BPF_MAP_TYPE_ARENA);
       __uint(map_flags, BPF_F_MMAPABLE);
       __uint(max_entries, 1000);         /* number of pages */
       __ulong(map_extra, 2ull << 44);    /* start of mmap() region */
} arena SEC(".maps");

libbpf recognizes both uses of arena and creates single `struct bpf_map *`
instance in libbpf APIs.
".arena.1" ELF section data is used as initial data image, which is exposed
through skeleton and bpf_map__initial_value() to the user, if they need to tune
it before the load phase. During load phase, this initial image is copied over
into mmap()'ed region corresponding to arena, and discarded.

Few small checks here and there had to be added to make sure this
approach works with bpf_map__initial_value(), mostly due to hard-coded
assumption that map->mmaped is set up with mmap() syscall and should be
munmap()'ed. For arena, .arena.1 can be (much) smaller than maximum
arena size, so this smaller data size has to be tracked separately.
Given it is enforced that there is only one arena for entire bpf_object
instance, we just keep it in a separate field. This can be generalized
if necessary later.

All global variables from ".arena.1" section are accessible from user space
via skel->arena->name_of_var.

For bss/data/rodata the skeleton/libbpf perform the following sequence:
1. addr = mmap(MAP_ANONYMOUS)
2. user space optionally modifies global vars
3. map_fd = bpf_create_map()
4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel
5. mmap(addr, MAP_FIXED, map_fd)
after step 5 user spaces see the values it wrote at step 2 at the same addresses

arena doesn't support update_map_elem. Hence skeleton/libbpf do:
1. addr = malloc(sizeof SEC ".arena.1")
2. user space optionally modifies global vars
3. map_fd = bpf_create_map(MAP_TYPE_ARENA)
4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd)
5. memcpy(real_addr, addr) // this will fault-in and allocate pages

At the end look and feel of global data vs __arena global data is the same from
bpf prog pov.

Another complication is:
struct {
  __uint(type, BPF_MAP_TYPE_ARENA);
} arena SEC(".maps");

int __arena foo;
int bar;

  ptr1 = &foo;   // relocation against ".arena.1" section
  ptr2 = &arena; // relocation against ".maps" section
  ptr3 = &bar;   // relocation against ".bss" section

Fo the kernel ptr1 and ptr2 has point to the same arena's map_fd
while ptr3 points to a different global array's map_fd.
For the verifier:
ptr1->type == unknown_scalar
ptr2->type == const_ptr_to_map
ptr3->type == ptr_to_map_value

After verification, from JIT pov all 3 ptr-s are normal ld_imm64 insns.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-11-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
4524a45a2a libbpf: Add support for bpf_arena.
mmap() bpf_arena right after creation, since the kernel needs to
remember the address returned from mmap. This is user_vm_start.
LLVM will generate bpf_arena_cast_user() instructions where
necessary and JIT will add upper 32-bit of user_vm_start
to such pointers.

Fix up bpf_map_mmap_sz() to compute mmap size as
map->value_size * map->max_entries for arrays and
PAGE_SIZE * map->max_entries for arena.

Don't set BTF at arena creation time, since it doesn't support it.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-9-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
086825355f libbpf: Add __arg_arena to bpf_helpers.h
Add __arg_arena to bpf_helpers.h

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-8-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
6de941bc1e bpf: Disasm support for addr_space_cast instruction.
LLVM generates rX = addr_space_cast(rY, dst_addr_space, src_addr_space)
instruction when pointers in non-zero address space are used by the bpf
program. Recognize this insn in uapi and in bpf disassembler.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-3-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
1675c13fae bpf: Introduce bpf_arena.
Introduce bpf_arena, which is a sparse shared memory region between the bpf
program and user space.

Use cases:
1. User space mmap-s bpf_arena and uses it as a traditional mmap-ed
   anonymous region, like memcached or any key/value storage. The bpf
   program implements an in-kernel accelerator. XDP prog can search for
   a key in bpf_arena and return a value without going to user space.
2. The bpf program builds arbitrary data structures in bpf_arena (hash
   tables, rb-trees, sparse arrays), while user space consumes it.
3. bpf_arena is a "heap" of memory from the bpf program's point of view.
   The user space may mmap it, but bpf program will not convert pointers
   to user base at run-time to improve bpf program speed.

Initially, the kernel vm_area and user vma are not populated. User space
can fault in pages within the range. While servicing a page fault,
bpf_arena logic will insert a new page into the kernel and user vmas. The
bpf program can allocate pages from that region via
bpf_arena_alloc_pages(). This kernel function will insert pages into the
kernel vm_area. The subsequent fault-in from user space will populate that
page into the user vma. The BPF_F_SEGV_ON_FAULT flag at arena creation time
can be used to prevent fault-in from user space. In such a case, if a page
is not allocated by the bpf program and not present in the kernel vm_area,
the user process will segfault. This is useful for use cases 2 and 3 above.

bpf_arena_alloc_pages() is similar to user space mmap(). It allocates pages
either at a specific address within the arena or allocates a range with the
maple tree. bpf_arena_free_pages() is analogous to munmap(), which frees
pages and removes the range from the kernel vm_area and from user process
vmas.

bpf_arena can be used as a bpf program "heap" of up to 4GB. The speed of
bpf program is more important than ease of sharing with user space. This is
use case 3. In such a case, the BPF_F_NO_USER_CONV flag is recommended.
It will tell the verifier to treat the rX = bpf_arena_cast_user(rY)
instruction as a 32-bit move wX = wY, which will improve bpf prog
performance. Otherwise, bpf_arena_cast_user is translated by JIT to
conditionally add the upper 32 bits of user vm_start (if the pointer is not
NULL) to arena pointers before they are stored into memory. This way, user
space sees them as valid 64-bit pointers.

Diff https://github.com/llvm/llvm-project/pull/84410 enables LLVM BPF
backend generate the bpf_addr_space_cast() instruction to cast pointers
between address_space(1) which is reserved for bpf_arena pointers and
default address space zero. All arena pointers in a bpf program written in
C language are tagged as __attribute__((address_space(1))). Hence, clang
provides helpful diagnostics when pointers cross address space. Libbpf and
the kernel support only address_space == 1. All other address space
identifiers are reserved.

rX = bpf_addr_space_cast(rY, /* dst_as */ 1, /* src_as */ 0) tells the
verifier that rX->type = PTR_TO_ARENA. Any further operations on
PTR_TO_ARENA register have to be in the 32-bit domain. The verifier will
mark load/store through PTR_TO_ARENA with PROBE_MEM32. JIT will generate
them as kern_vm_start + 32bit_addr memory accesses. The behavior is similar
to copy_from_kernel_nofault() except that no address checks are necessary.
The address is guaranteed to be in the 4GB range. If the page is not
present, the destination register is zeroed on read, and the operation is
ignored on write.

rX = bpf_addr_space_cast(rY, 0, 1) tells the verifier that rX->type =
unknown scalar. If arena->map_flags has BPF_F_NO_USER_CONV set, then the
verifier converts such cast instructions to mov32. Otherwise, JIT will emit
native code equivalent to:
rX = (u32)rY;
if (rY)
  rX |= clear_lo32_bits(arena->user_vm_start); /* replace hi32 bits in rX */

After such conversion, the pointer becomes a valid user pointer within
bpf_arena range. The user process can access data structures created in
bpf_arena without any additional computations. For example, a linked list
built by a bpf program can be walked natively by user space.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Barret Rhoden <brho@google.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-2-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
385d344839 libbpf: Allow specifying 64-bit integers in map BTF.
__uint() macro that is used to specify map attributes like:
  __uint(type, BPF_MAP_TYPE_ARRAY);
  __uint(map_flags, BPF_F_MMAPABLE);
It is limited to 32-bit, since BTF_KIND_ARRAY has u32 "number of elements"
field in "struct btf_array".

Introduce __ulong() macro that allows specifying values bigger than 32-bit.
In map definition "map_extra" is the only u64 field, so far.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20240307031228.42896-5-alexei.starovoitov@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-25 21:58:26 -07:00
Jakub Kicinski
d71c0ed2ef netdev: add queue stat for alloc failures
Rx alloc failures are commonly counted by drivers.
Support reporting those via netdev-genl queue stats.

Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306195509.1502746-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25 21:58:26 -07:00
Jakub Kicinski
5e80833e50 netdev: add per-queue statistics
The ethtool-nl family does a good job exposing various protocol
related and IEEE/IETF statistics which used to get dumped under
ethtool -S, with creative names. Queue stats don't have a netlink
API, yet, and remain a lion's share of ethtool -S output for new
drivers. Not only is that bad because the names differ driver to
driver but it's also bug-prone. Intuitively drivers try to report
only the stats for active queues, but querying ethtool stats
involves multiple system calls, and the number of stats is
read separately from the stats themselves. Worse still when user
space asks for values of the stats, it doesn't inform the kernel
how big the buffer is. If number of stats increases in the meantime
kernel will overflow user buffer.

Add a netlink API for dumping queue stats. Queue information is
exposed via the netdev-genl family, so add the stats there.
Support per-queue and sum-for-device dumps. Latter will be useful
when subsequent patches add more interesting common stats than
just bytes and packets.

The API does not currently distinguish between HW and SW stats.
The expectation is that the source of the stats will either not
matter much (good packets) or be obvious (skb alloc errors).

Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306195509.1502746-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
2778cbce60 ci: add xdp_bonding fixes from bpf/master
bpf tree has fixes for xdp_bonding selftests which are not yet in
bpf-next, so add them as temporary CI-only patches.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-06 15:22:13 -08:00
Andrii Nakryiko
4f875865b7 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   2ab256e93249f5ac1da665861aa0f03fb4208d9c
Checkpoint bpf-next commit: 7d763bc4a44a51e48dde406d6c6c8a26a60ec647
Baseline bpf commit:        dced881ead78e4d6add3735d02a9186ba2415630
Checkpoint bpf commit:      2487007aa3b9fafbd2cb14068f49791ce1d7ede5

Aahil Awatramani (1):
  bonding: Add independent control state machine

Alexei Starovoitov (1):
  bpf: Introduce may_goto instruction

Chen Shen (1):
  libbpf: Correct debug message in btf__load_vmlinux_btf

Eduard Zingerman (7):
  libbpf: Allow version suffixes (___smth) for struct_ops types
  libbpf: Tie struct_ops programs to kernel BTF ids, not to local ids
  libbpf: Honor autocreate flag for struct_ops maps
  libbpf: Sync progs autoload with maps autocreate for struct_ops maps
  libbpf: Replace elf_state->st_ops_* fields with SEC_ST_OPS sec_type
  libbpf: Struct_ops in SEC("?.struct_ops") / SEC("?.struct_ops.link")
  libbpf: Rewrite btf datasec names starting from '?'

Kees Cook (1):
  bpf: Replace bpf_lpm_trie_key 0-length array with flexible array

Kui-Feng Lee (2):
  libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
  libbpf: Convert st_ops->data to shadow type.

 include/uapi/linux/bpf.h     |  24 +++-
 include/uapi/linux/if_link.h |   1 +
 src/btf.c                    |   2 +-
 src/features.c               |  22 +++
 src/libbpf.c                 | 252 +++++++++++++++++++++++++++--------
 src/libbpf_internal.h        |   2 +
 6 files changed, 242 insertions(+), 61 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-06 15:22:13 -08:00
Eduard Zingerman
438adf417d libbpf: Rewrite btf datasec names starting from '?'
Optional struct_ops maps are defined using question mark at the start
of the section name, e.g.:

    SEC("?.struct_ops")
    struct test_ops optional_map = { ... };

This commit teaches libbpf to detect if kernel allows '?' prefix
in datasec names, and if it doesn't then to rewrite such names
by replacing '?' with '_', e.g.:

    DATASEC ?.struct_ops -> DATASEC _.struct_ops

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-13-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
fc8b86bda2 libbpf: Struct_ops in SEC("?.struct_ops") / SEC("?.struct_ops.link")
Allow using two new section names for struct_ops maps:
- SEC("?.struct_ops")
- SEC("?.struct_ops.link")

To specify maps that have bpf_map->autocreate == false after open.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-12-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
d5d0b6e920 libbpf: Replace elf_state->st_ops_* fields with SEC_ST_OPS sec_type
The next patch would add two new section names for struct_ops maps.
To make working with multiple struct_ops sections more convenient:
- remove fields like elf_state->st_ops_{shndx,link_shndx};
- mark section descriptions hosting struct_ops as
  elf_sec_desc->sec_type == SEC_ST_OPS;

After these changes struct_ops sections could be processed uniformly
by iterating bpf_object->efile.secs entries.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-11-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
060c604db8 libbpf: Sync progs autoload with maps autocreate for struct_ops maps
Automatically select which struct_ops programs to load depending on
which struct_ops maps are selected for automatic creation.
E.g. for the BPF code below:

    SEC("struct_ops/test_1") int BPF_PROG(foo) { ... }
    SEC("struct_ops/test_2") int BPF_PROG(bar) { ... }

    SEC(".struct_ops.link")
    struct test_ops___v1 A = {
        .foo = (void *)foo
    };

    SEC(".struct_ops.link")
    struct test_ops___v2 B = {
        .foo = (void *)foo,
        .bar = (void *)bar,
    };

And the following libbpf API calls:

    bpf_map__set_autocreate(skel->maps.A, true);
    bpf_map__set_autocreate(skel->maps.B, false);

The autoload would be enabled for program 'foo' and disabled for
program 'bar'.

During load, for each struct_ops program P, referenced from some
struct_ops map M:
- set P.autoload = true if M.autocreate is true for some M;
- set P.autoload = false if M.autocreate is false for all M;
- don't change P.autoload, if P is not referenced from any map.

Do this after bpf_object__init_kern_struct_ops_maps()
to make sure that shadow vars assignment is done.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-9-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
cb426140d0 libbpf: Honor autocreate flag for struct_ops maps
Skip load steps for struct_ops maps not marked for automatic creation.
This should allow to load bpf object in situations like below:

    SEC("struct_ops/foo") int BPF_PROG(foo) { ... }
    SEC("struct_ops/bar") int BPF_PROG(bar) { ... }

    struct test_ops___v1 {
    	int (*foo)(void);
    };

    struct test_ops___v2 {
    	int (*foo)(void);
    	int (*does_not_exist)(void);
    };

    SEC(".struct_ops.link")
    struct test_ops___v1 map_for_old = {
    	.test_1 = (void *)foo
    };

    SEC(".struct_ops.link")
    struct test_ops___v2 map_for_new = {
    	.test_1 = (void *)foo,
    	.does_not_exist = (void *)bar
    };

Suppose program is loaded on old kernel that does not have definition
for 'does_not_exist' struct_ops member. After this commit it would be
possible to load such object file after the following tweaks:

    bpf_program__set_autoload(skel->progs.bar, false);
    bpf_map__set_autocreate(skel->maps.map_for_new, false);

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20240306104529.6453-4-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
aded62b120 libbpf: Tie struct_ops programs to kernel BTF ids, not to local ids
Enforce the following existing limitation on struct_ops programs based
on kernel BTF id instead of program-local BTF id:

    struct_ops BPF prog can be re-used between multiple .struct_ops &
    .struct_ops.link as long as it's the same struct_ops struct
    definition and the same function pointer field

This allows reusing same BPF program for versioned struct_ops map
definitions, e.g.:

    SEC("struct_ops/test")
    int BPF_PROG(foo) { ... }

    struct some_ops___v1 { int (*test)(void); };
    struct some_ops___v2 { int (*test)(void); };

    SEC(".struct_ops.link") struct some_ops___v1 a = { .test = foo }
    SEC(".struct_ops.link") struct some_ops___v2 b = { .test = foo }

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-3-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
f7fd5dbc07 libbpf: Allow version suffixes (___smth) for struct_ops types
E.g. allow the following struct_ops definitions:

    struct bpf_testmod_ops___v1 { int (*test)(void); };
    struct bpf_testmod_ops___v2 { int (*test)(void); };

    SEC(".struct_ops.link")
    struct bpf_testmod_ops___v1 a = { .test = ... }
    SEC(".struct_ops.link")
    struct bpf_testmod_ops___v2 b = { .test = ... }

Where both bpf_testmod_ops__v1 and bpf_testmod_ops__v2 would be
resolved as 'struct bpf_testmod_ops' from kernel BTF.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-2-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Alexei Starovoitov
00b08dceea bpf: Introduce may_goto instruction
Introduce may_goto instruction that from the verifier pov is similar to
open coded iterators bpf_for()/bpf_repeat() and bpf_loop() helper, but it
doesn't iterate any objects.
In assembly 'may_goto' is a nop most of the time until bpf runtime has to
terminate the program for whatever reason. In the current implementation
may_goto has a hidden counter, but other mechanisms can be used.
For programs written in C the later patch introduces 'cond_break' macro
that combines 'may_goto' with 'break' statement and has similar semantics:
cond_break is a nop until bpf runtime has to break out of this loop.
It can be used in any normal "for" or "while" loop, like

  for (i = zero; i < cnt; cond_break, i++) {

The verifier recognizes that may_goto is used in the program, reserves
additional 8 bytes of stack, initializes them in subprog prologue, and
replaces may_goto instruction with:
aux_reg = *(u64 *)(fp - 40)
if aux_reg == 0 goto pc+off
aux_reg -= 1
*(u64 *)(fp - 40) = aux_reg

may_goto instruction can be used by LLVM to implement __builtin_memcpy,
__builtin_strcmp.

may_goto is not a full substitute for bpf_for() macro.
bpf_for() doesn't have induction variable that verifiers sees,
so 'i' in bpf_for(i, 0, 100) is seen as imprecise and bounded.

But when the code is written as:
for (i = 0; i < 100; cond_break, i++)
the verifier see 'i' as precise constant zero,
hence cond_break (aka may_goto) doesn't help to converge the loop.
A static or global variable can be used as a workaround:
static int zero = 0;
for (i = zero; i < 100; cond_break, i++) // works!

may_goto works well with arena pointers that don't need to be bounds
checked on access. Load/store from arena returns imprecise unbounded
scalar and loops with may_goto pass the verifier.

Reserve new opcode BPF_JMP | BPF_JCOND for may_goto insn.
JCOND stands for conditional pseudo jump.
Since goto_or_nop insn was proposed, it may use the same opcode.
may_goto vs goto_or_nop can be distinguished by src_reg:
code = BPF_JMP | BPF_JCOND
src_reg = 0 - may_goto
src_reg = 1 - goto_or_nop

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240306031929.42666-2-alexei.starovoitov@gmail.com
2024-03-06 13:58:27 -08:00
Chen Shen
bf52494e2b libbpf: Correct debug message in btf__load_vmlinux_btf
In the function btf__load_vmlinux_btf, the debug message incorrectly
refers to 'path' instead of 'sysfs_btf_path'.

Signed-off-by: Chen Shen <peterchenshen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240302062218.3587-1-peterchenshen@gmail.com
2024-03-06 13:58:27 -08:00
Kui-Feng Lee
acfaeffeaa libbpf: Convert st_ops->data to shadow type.
Convert st_ops->data to the shadow type of the struct_ops map. The shadow
type of a struct_ops type is a variant of the original struct type
providing a way to access/change the values in the maps of the struct_ops
type.

bpf_map__initial_value() will return st_ops->data for struct_ops types. The
skeleton is going to use it as the pointer to the shadow type of the
original struct type.

One of the main differences between the original struct type and the shadow
type is that all function pointers of the shadow type are converted to
pointers of struct bpf_program. Users can replace these bpf_program
pointers with other BPF programs. The st_ops->progs[] will be updated
before updating the value of a map to reflect the changes made by users.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240229064523.2091270-3-thinker.li@gmail.com
2024-03-06 13:58:27 -08:00
Kui-Feng Lee
0758d8b0f2 libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
For a struct_ops map, btf_value_type_id is the type ID of it's struct
type. This value is required by bpftool to generate skeleton including
pointers of shadow types. The code generator gets the type ID from
bpf_map__btf_value_type_id() in order to get the type information of the
struct type of a map.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240229064523.2091270-2-thinker.li@gmail.com
2024-03-06 13:58:27 -08:00
Kees Cook
fa4d00254d bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
Replace deprecated 0-length array in struct bpf_lpm_trie_key with
flexible array. Found with GCC 13:

../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=]
  207 |                                        *(__be16 *)&key->data[i]);
      |                                                   ^~~~~~~~~~~~~
../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16'
  102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
      |                                                      ^
../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu'
   97 | #define be16_to_cpu __be16_to_cpu
      |                     ^~~~~~~~~~~~~
../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu'
  206 |                 u16 diff = be16_to_cpu(*(__be16 *)&node->data[i]
^
      |                            ^~~~~~~~~~~
In file included from ../include/linux/bpf.h:7:
../include/uapi/linux/bpf.h:82:17: note: while referencing 'data'
   82 |         __u8    data[0];        /* Arbitrary size */
      |                 ^~~~

And found at run-time under CONFIG_FORTIFY_SOURCE:

  UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:218:49
  index 0 is out of range for type '__u8 [*]'

Changing struct bpf_lpm_trie_key is difficult since has been used by
userspace. For example, in Cilium:

	struct egress_gw_policy_key {
	        struct bpf_lpm_trie_key lpm_key;
	        __u32 saddr;
	        __u32 daddr;
	};

While direct references to the "data" member haven't been found, there
are static initializers what include the final member. For example,
the "{}" here:

        struct egress_gw_policy_key in_key = {
                .lpm_key = { 32 + 24, {} },
                .saddr   = CLIENT_IP,
                .daddr   = EXTERNAL_SVC_IP & 0Xffffff,
        };

To avoid the build time and run time warnings seen with a 0-sized
trailing array for struct bpf_lpm_trie_key, introduce a new struct
that correctly uses a flexible array for the trailing bytes,
struct bpf_lpm_trie_key_u8. As part of this, include the "header"
portion (which is just the "prefixlen" member), so it can be used
by anything building a bpf_lpr_trie_key that has trailing members that
aren't a u8 flexible array (like the self-test[1]), which is named
struct bpf_lpm_trie_key_hdr.

Unfortunately, C++ refuses to parse the __struct_group() helper, so
it is not possible to define struct bpf_lpm_trie_key_hdr directly in
struct bpf_lpm_trie_key_u8, so we must open-code the union directly.

Adjust the kernel code to use struct bpf_lpm_trie_key_u8 through-out,
and for the selftest to use struct bpf_lpm_trie_key_hdr. Add a comment
to the UAPI header directing folks to the two new options.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Closes: https://paste.debian.net/hidden/ca500597/
Link: https://lore.kernel.org/all/202206281009.4332AA33@keescook/ [1]
Link: https://lore.kernel.org/bpf/20240222155612.it.533-kees@kernel.org
2024-03-06 13:58:27 -08:00
Aahil Awatramani
f749be80b7 bonding: Add independent control state machine
Add support for the independent control state machine per IEEE
802.1AX-2008 5.4.15 in addition to the existing implementation of the
coupled control state machine.

Introduces two new states, AD_MUX_COLLECTING and AD_MUX_DISTRIBUTING in
the LACP MUX state machine for separated handling of an initial
Collecting state before the Collecting and Distributing state. This
enables a port to be in a state where it can receive incoming packets
while not still distributing. This is useful for reducing packet loss when
a port begins distributing before its partner is able to collect.

Added new functions such as bond_set_slave_tx_disabled_flags and
bond_set_slave_rx_enabled_flags to precisely manage the port's collecting
and distributing states. Previously, there was no dedicated method to
disable TX while keeping RX enabled, which this patch addresses.

Note that the regular flow process in the kernel's bonding driver remains
unaffected by this patch. The extension requires explicit opt-in by the
user (in order to ensure no disruptions for existing setups) via netlink
support using the new bonding parameter coupled_control. The default value
for coupled_control is set to 1 so as to preserve existing behaviour.

Signed-off-by: Aahil Awatramani <aahila@google.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240202175858.1573852-1-aahila@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-06 13:58:27 -08:00
Andrii Nakryiko
fb98d4bd25 include: fix BPF_CALL_REL definition
Fix our Github-specific definition of BPF_CALL_REL macro. It was missing
the code part.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-01 15:39:45 -08:00
Kui-Feng Lee
f4e9b606f4 ci: clean up bpf_test_no_cfi.ko for v5.5.0 and v4.9.0.
bpf_test_no_cfi.ko is not available for v5.5.0 and v4.9.0.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
2024-02-27 10:14:31 -08:00
Kui-Feng Lee
ff95bd6238 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   92a871ab9fa59a74d013bc04f321026a057618e7
Checkpoint bpf-next commit: 2ab256e93249f5ac1da665861aa0f03fb4208d9c
Baseline bpf commit:        577e4432f3ac810049cb7e6b71f4d96ec7c6e894
Checkpoint bpf commit:      dced881ead78e4d6add3735d02a9186ba2415630

Arnaldo Carvalho de Melo (1):
  tools headers UAPI: Sync linux/fcntl.h with the kernel sources

Cupertino Miranda (1):
  libbpf: Add support to GCC in CORE macro definitions

Martin Kelly (1):
  bpf: Clarify batch lookup/lookup_and_delete semantics

Matt Bobrowski (1):
  libbpf: Make remark about zero-initializing bpf_*_info structs

 include/uapi/linux/bpf.h   |  6 ++++-
 include/uapi/linux/fcntl.h |  3 +++
 src/bpf.h                  | 39 ++++++++++++++++++++++++---------
 src/bpf_core_read.h        | 45 ++++++++++++++++++++++++++++++++------
 4 files changed, 75 insertions(+), 18 deletions(-)

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
2024-02-27 10:14:31 -08:00
Arnaldo Carvalho de Melo
a894b0cb9b tools headers UAPI: Sync linux/fcntl.h with the kernel sources
To get the changes in:

  8a924db2d7b5eb69 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface function")

That don't add anything that is handled by existing hard coded tables or
table generation scripts.

This silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stefan Berger <stefanb@linux.ibm.com>
Link: https://lore.kernel.org/lkml/ZbJv9fGF_k2xXEdr@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-02-27 10:14:31 -08:00
Martin Kelly
afa81fb1cb bpf: Clarify batch lookup/lookup_and_delete semantics
The batch lookup and lookup_and_delete APIs have two parameters,
in_batch and out_batch, to facilitate iterative
lookup/lookup_and_deletion operations for supported maps. Except NULL
for in_batch at the start of these two batch operations, both parameters
need to point to memory equal or larger than the respective map key
size, except for various hashmaps (hash, percpu_hash, lru_hash,
lru_percpu_hash) where the in_batch/out_batch memory size should be
at least 4 bytes.

Document these semantics to clarify the API.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240221211838.1241578-1-martin.kelly@crowdstrike.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-27 10:14:31 -08:00
Matt Bobrowski
16e68ab13c libbpf: Make remark about zero-initializing bpf_*_info structs
In some situations, if you fail to zero-initialize the
bpf_{prog,map,btf,link}_info structs supplied to the set of LIBBPF
helpers bpf_{prog,map,btf,link}_get_info_by_fd(), you can expect the
helper to return an error. This can possibly leave people in a
situation where they're scratching their heads for an unnnecessary
amount of time. Make an explicit remark about the requirement of
zero-initializing the supplied bpf_{prog,map,btf,link}_info structs
for the respective LIBBPF helpers.

Internally, LIBBPF helpers bpf_{prog,map,btf,link}_get_info_by_fd()
call into bpf_obj_get_info_by_fd() where the bpf(2)
BPF_OBJ_GET_INFO_BY_FD command is used. This specific command is
effectively backed by restrictions enforced by the
bpf_check_uarg_tail_zero() helper. This function ensures that if the
size of the supplied bpf_{prog,map,btf,link}_info structs are larger
than what the kernel can handle, trailing bits are zeroed. This can be
a problem when compiling against UAPI headers that don't necessarily
match the sizes of the same underlying types known to the kernel.

Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/ZcyEb8x4VbhieWsL@google.com
2024-02-27 10:14:31 -08:00
Cupertino Miranda
b19fdbf1be libbpf: Add support to GCC in CORE macro definitions
Due to internal differences between LLVM and GCC the current
implementation for the CO-RE macros does not fit GCC parser, as it will
optimize those expressions even before those would be accessible by the
BPF backend.

As examples, the following would be optimized out with the original
definitions:
  - As enums are converted to their integer representation during
  parsing, the IR would not know how to distinguish an integer
  constant from an actual enum value.
  - Types need to be kept as temporary variables, as the existing type
  casts of the 0 address (as expanded for LLVM), are optimized away by
  the GCC C parser, never really reaching GCCs IR.

Although, the macros appear to add extra complexity, the expanded code
is removed from the compilation flow very early in the compilation
process, not really affecting the quality of the generated assembly.

Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240213173543.1397708-1-cupertino.miranda@oracle.com
2024-02-27 10:14:31 -08:00
Manu Bretelle
445486dcbf ci: Pass arch parameter to setup-build-env
Since 1bc40aecb3
arch parameter needs to be passed to `setup-build-env`

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2024-02-15 10:44:45 -08:00
Andrii Nakryiko
820bca2cb6 ci: verifier_global_subprogs can't be run on 5.5
We get:

  libbpf: struct_ops init_kern: struct bpf_dummy_ops is not found in kernel BTF

So even though it's irrelevant to the subtests we do want to test,
entire test has to be skipped, unfortunately.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-02-06 11:52:00 -08:00
Andrii Nakryiko
8a8feae5f4 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   943b043aeecce9accb6d367af47791c633e95e4d
Checkpoint bpf-next commit: 92a871ab9fa59a74d013bc04f321026a057618e7
Baseline bpf commit:        577e4432f3ac810049cb7e6b71f4d96ec7c6e894
Checkpoint bpf commit:      577e4432f3ac810049cb7e6b71f4d96ec7c6e894

Andrii Nakryiko (1):
  libbpf: fix return value for PERF_EVENT __arg_ctx type fix up check

Toke Høiland-Jørgensen (1):
  libbpf: Use OPTS_SET() macro in bpf_xdp_query()

 src/libbpf.c  | 6 +++---
 src/netlink.c | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-02-06 11:52:00 -08:00
Toke Høiland-Jørgensen
a20b60f971 libbpf: Use OPTS_SET() macro in bpf_xdp_query()
When the feature_flags and xdp_zc_max_segs fields were added to the libbpf
bpf_xdp_query_opts, the code writing them did not use the OPTS_SET() macro.
This causes libbpf to write to those fields unconditionally, which means
that programs compiled against an older version of libbpf (with a smaller
size of the bpf_xdp_query_opts struct) will have its stack corrupted by
libbpf writing out of bounds.

The patch adding the feature_flags field has an early bail out if the
feature_flags field is not part of the opts struct (via the OPTS_HAS)
macro, but the patch adding xdp_zc_max_segs does not. For consistency, this
fix just changes the assignments to both fields to use the OPTS_SET()
macro.

Fixes: 13ce2daa259a ("xsk: add new netlink attribute dedicated for ZC max frags")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240206125922.1992815-1-toke@redhat.com
2024-02-06 11:52:00 -08:00
Andrii Nakryiko
b24a6277cc libbpf: fix return value for PERF_EVENT __arg_ctx type fix up check
If PERF_EVENT program has __arg_ctx argument with matching
architecture-specific pt_regs/user_pt_regs/user_regs_struct pointer
type, libbpf should still perform type rewrite for old kernels, but not
emit the warning. Fix copy/paste from kernel code where 0 is meant to
signify "no error" condition. For libbpf we need to return "true" to
proceed with type rewrite (which for PERF_EVENT program will be
a canonical `struct bpf_perf_event_data *` type).

Fixes: 9eea8fafe33e ("libbpf: fix __arg_ctx type enforcement for perf_event programs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240206002243.1439450-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-06 11:52:00 -08:00
Andrii Nakryiko
25fe467af4 ci: allowlist tests validating libbpf's __arg_ctx type rewrite logic
Allowlist test_global_funcs/arg_tag_ctx* and a few of
verifier_global_subprogs subtests that validate libbpf's logic for
rewriting __arg_ctx globl subprog argument types on kernels that don't
natively support __arg_ctx.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-02-06 10:17:28 -08:00
Andrii Nakryiko
f11758a780 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   ced33f2cfa21a14a292a00e31dc9f85c1bfbda1c
Checkpoint bpf-next commit: 943b043aeecce9accb6d367af47791c633e95e4d
Baseline bpf commit:        577e4432f3ac810049cb7e6b71f4d96ec7c6e894
Checkpoint bpf commit:      577e4432f3ac810049cb7e6b71f4d96ec7c6e894

Andrii Nakryiko (8):
  libbpf: integrate __arg_ctx feature detector into kernel_supports()
  libbpf: fix __arg_ctx type enforcement for perf_event programs
  libbpf: add __arg_trusted and __arg_nullable tag macros
  libbpf: add bpf_core_cast() macro
  libbpf: Call memfd_create() syscall directly
  libbpf: Add missing LIBBPF_API annotation to libbpf_set_memlock_rlim
    API
  libbpf: Add btf__new_split() API that was declared but not implemented
  libbpf: Add missed btf_ext__raw_data() API

Eduard Zingerman (1):
  libbpf: Remove unnecessary null check in kernel_supports()

Ian Rogers (1):
  libbpf: Add some details for BTF parsing failures

 src/bpf.h             |  2 +-
 src/bpf_core_read.h   | 13 ++++++
 src/bpf_helpers.h     |  2 +
 src/btf.c             | 33 ++++++++++++---
 src/features.c        | 58 +++++++++++++++++++++++++
 src/libbpf.c          | 99 ++++++++++++++-----------------------------
 src/libbpf.map        |  5 ++-
 src/libbpf_internal.h |  2 +
 src/linker.c          |  2 +-
 9 files changed, 140 insertions(+), 76 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
cbb8ba352d sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
95b4beb502 libbpf: Add missed btf_ext__raw_data() API
Another API that was declared in libbpf.map but actual implementation
was missing. btf_ext__get_raw_data() was intended as a discouraged alias
to consistently-named btf_ext__raw_data(), so make this an actuality.

Fixes: 20eccf29e297 ("libbpf: hide and discourage inconsistently named getters")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-5-andrii@kernel.org
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
5b7613e50f libbpf: Add btf__new_split() API that was declared but not implemented
Seems like original commit adding split BTF support intended to add
btf__new_split() API, and even declared it in libbpf.map, but never
added (trivial) implementation. Fix this.

Fixes: ba451366bf44 ("libbpf: Implement basic split BTF support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-4-andrii@kernel.org
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
245394fb36 libbpf: Add missing LIBBPF_API annotation to libbpf_set_memlock_rlim API
LIBBPF_API annotation seems missing on libbpf_set_memlock_rlim API, so
add it to make this API callable from libbpf's shared library version.

Fixes: e542f2c4cd16 ("libbpf: Auto-bump RLIMIT_MEMLOCK if kernel needs it for BPF")
Fixes: ab9a5a05dc48 ("libbpf: fix up few libbpf.map problems")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-3-andrii@kernel.org
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
3b19b1bb55 libbpf: Call memfd_create() syscall directly
Some versions of Android do not implement memfd_create() wrapper in
their libc implementation, leading to build failures ([0]). On the other
hand, memfd_create() is available as a syscall on quite old kernels
(3.17+, while bpf() syscall itself is available since 3.18+), so it is
ok to assume that syscall availability and call into it with syscall()
helper to avoid Android-specific workarounds.

Validated in libbpf-bootstrap's CI ([1]).

  [0] https://github.com/libbpf/libbpf-bootstrap/actions/runs/7701003207/job/20986080319#step:5:83
  [1] https://github.com/libbpf/libbpf-bootstrap/actions/runs/7715988887/job/21031767212?pr=253

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-2-andrii@kernel.org
2024-02-01 15:10:17 -08:00
Eduard Zingerman
7529e0c4c7 libbpf: Remove unnecessary null check in kernel_supports()
After recent changes, Coverity complained about inconsistent null checks
in kernel_supports() function:

    kernel_supports(const struct bpf_object *obj, ...)
    [...]
    // var_compare_op: Comparing obj to null implies that obj might be null
    if (obj && obj->gen_loader)
        return true;

    // var_deref_op: Dereferencing null pointer obj
    if (obj->token_fd)
        return feat_supported(obj->feat_cache, feat_id);
    [...]

- The original null check was introduced by commit [0], which introduced
  a call `kernel_supports(NULL, ...)` in function bump_rlimit_memlock();
- This call was refactored to use `feat_supported(NULL, ...)` in commit [1].

Looking at all places where kernel_supports() is called:

- There is either `obj->...` access before the call;
- Or `obj` comes from `prog->obj` expression, where `prog` comes from
  enumeration of programs in `obj`;
- Or `obj` comes from `prog->obj`, where `prog` is a parameter to one
  of the API functions:
  - bpf_program__attach_kprobe_opts;
  - bpf_program__attach_kprobe;
  - bpf_program__attach_ksyscall.

Assuming correct API usage, it appears that `obj` can never be null when
passed to kernel_supports(). Silence the Coverity warning by removing
redundant null check.

  [0] e542f2c4cd16 ("libbpf: Auto-bump RLIMIT_MEMLOCK if kernel needs it for BPF")
  [1] d6dd1d49367a ("libbpf: Further decouple feature checking logic from bpf_object")

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240131212615.20112-1-eddyz87@gmail.com
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
688879fb01 libbpf: add bpf_core_cast() macro
Add bpf_core_cast() macro that wraps bpf_rdonly_cast() kfunc. It's more
ergonomic than kfunc, as it automatically extracts btf_id with
bpf_core_type_id_kernel(), and works with type names. It also casts result
to (T *) pointer. See the definition of the macro, it's self-explanatory.

libbpf declares bpf_rdonly_cast() extern as __weak __ksym and should be
safe to not conflict with other possible declarations in user code.

But we do have a conflict with current BPF selftests that declare their
externs with first argument as `void *obj`, while libbpf opts into more
permissive `const void *obj`. This causes conflict, so we fix up BPF
selftests uses in the same patch.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130212023.183765-2-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
0303e25be3 libbpf: add __arg_trusted and __arg_nullable tag macros
Add __arg_trusted to annotate global func args that accept trusted
PTR_TO_BTF_ID arguments.

Also add __arg_nullable to combine with __arg_trusted (and maybe other
tags in the future) to force global subprog itself (i.e., callee) to do
NULL checks, as opposed to default non-NULL semantics (and thus caller's
responsibility to ensure non-NULL values).

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130000648.2144827-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-01 15:10:17 -08:00
Ian Rogers
9b306ac9be libbpf: Add some details for BTF parsing failures
As CONFIG_DEBUG_INFO_BTF is default off the existing "failed to find
valid kernel BTF" message makes diagnosing the kernel build issue somewhat
cryptic. Add a little more detail with the hope of helping users.

Before:
```
libbpf: failed to find valid kernel BTF
libbpf: Error loading vmlinux BTF: -3
```

After not accessible:
```
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
libbpf: failed to find valid kernel BTF
libbpf: Error loading vmlinux BTF: -3
```

After not readable:
```
libbpf: failed to read kernel BTF from (/sys/kernel/btf/vmlinux): -1
```

Closes: https://lore.kernel.org/bpf/CAP-5=fU+DN_+Y=Y4gtELUsJxKNDDCOvJzPHvjUVaUoeFAzNnig@mail.gmail.com/

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240125231840.1647951-1-irogers@google.com
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
c57fb75864 libbpf: fix __arg_ctx type enforcement for perf_event programs
Adjust PERF_EVENT type enforcement around __arg_ctx to match exactly
what kernel is doing.

Fixes: 76ec90a996e3 ("libbpf: warn on unexpected __arg_ctx type when rewriting BTF")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240125205510.3642094-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
0b412d1918 libbpf: integrate __arg_ctx feature detector into kernel_supports()
Now that feature detection code is in bpf-next tree, integrate __arg_ctx
kernel-side support into kernel_supports() framework.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240125205510.3642094-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-01 15:10:17 -08:00
Andrii Nakryiko
3b09738928 sync: remove NETDEV_XSK_FLAGS_MASK which is not in bpf/bpf-next anymore
This part of code is not present in either bpf or bpf-next trees
anymore, so manually remove it.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-29 10:48:12 -08:00
Andrii Nakryiko
5139f12ef1 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   c8632acf193beac64bbdaebef013368c480bf74f
Checkpoint bpf-next commit: ced33f2cfa21a14a292a00e31dc9f85c1bfbda1c
Baseline bpf commit:        0a5bd0ffe790511d802e7f40898429a89e2487df
Checkpoint bpf commit:      577e4432f3ac810049cb7e6b71f4d96ec7c6e894

Andrii Nakryiko (1):
  libbpf: Fix faccessat() usage on Android

 src/libbpf_internal.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-29 10:48:12 -08:00
Andrii Nakryiko
830e0d017b libbpf: Fix faccessat() usage on Android
Android implementation of libc errors out with -EINVAL in faccessat() if
passed AT_EACCESS ([0]), this leads to ridiculous issue with libbpf
refusing to load /sys/kernel/btf/vmlinux on Androids ([1]). Fix by
detecting Android and redefining AT_EACCESS to 0, it's equivalent on
Android.

  [0] https://android.googlesource.com/platform/bionic/+/refs/heads/android13-release/libc/bionic/faccessat.cpp#50
  [1] https://github.com/libbpf/libbpf-bootstrap/issues/250#issuecomment-1911324250

Fixes: 6a4ab8869d0b ("libbpf: Fix the case of running as non-root with capabilities")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240126220944.2497665-1-andrii@kernel.org
2024-01-29 10:48:12 -08:00
Andrii Nakryiko
fad5d91381 libbpf: make sure linux/kernel.h includes linux/compiler.h
This replicates kernel upstream setup and brings READ_ONCE() and
WRITE_ONCE() macros anywhere where linux/kernel.h is included, which is
assumption libbpf code makes.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
8ca30626cc Makefile: add features.o to Makefile
Libbpf got new source code file, features.c, we need to add it to
Makefile here on Github version as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
274d6037f8 libbpf: add BPF_CALL_REL() macro implementation
Add BPF_CALL_REL() macro implementation into include/linux/filter.h
header, which is now used by libbpf code for feature detection.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
0f84f3bef6 ci: regenerate vmlinux.h
Update vmlinux.h for old kernel CI workflows.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
3dea2db84b ci: drop custom patches for fixing upstream kernel issues
All the issues should be fixed upstream already.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
2f81310ec0 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   98e20e5e13d2811898921f999288be7151a11954
Checkpoint bpf-next commit: c8632acf193beac64bbdaebef013368c480bf74f
Baseline bpf commit:        7c5e046bdcb2513f9decb3765d8bf92d604279cf
Checkpoint bpf commit:      0a5bd0ffe790511d802e7f40898429a89e2487df

Andrey Grafin (1):
  libbpf: Apply map_set_def_max_entries() for inner_maps on creation

Andrii Nakryiko (17):
  libbpf: feature-detect arg:ctx tag support in kernel
  libbpf: warn on unexpected __arg_ctx type when rewriting BTF
  libbpf: call dup2() syscall directly
  bpf: Introduce BPF token object
  bpf: Add BPF token support to BPF_MAP_CREATE command
  bpf: Add BPF token support to BPF_BTF_LOAD command
  bpf: Add BPF token support to BPF_PROG_LOAD command
  libbpf: Add bpf_token_create() API
  libbpf: Add BPF token support to bpf_map_create() API
  libbpf: Add BPF token support to bpf_btf_load() API
  libbpf: Add BPF token support to bpf_prog_load() API
  libbpf: Split feature detectors definitions from cached results
  libbpf: Further decouple feature checking logic from bpf_object
  libbpf: Move feature detection code into its own file
  libbpf: Wire up token_fd into feature probing logic
  libbpf: Wire up BPF token support at BPF object level
  libbpf: Support BPF token path setting through LIBBPF_BPF_TOKEN_PATH
    envvar

Daniel Borkmann (1):
  bpf: Sync uapi bpf.h header for the tooling infra

Dima Tisnek (1):
  libbpf: Correct bpf_core_read.h comment wrt bpf_core_relo struct

Jiri Olsa (2):
  bpf: Add cookie to perf_event bpf_link_info records
  bpf: Store cookies in kprobe_multi bpf_link_info data

Kan Liang (2):
  perf: Add branch stack counters
  perf/x86/intel: Support branch counters logging

Kui-Feng Lee (3):
  bpf: pass btf object id in bpf_map_info.
  bpf: pass attached BTF to the bpf_struct_ops subsystem
  libbpf: Find correct module BTFs for struct_ops maps and progs.

Martin KaFai Lau (1):
  libbpf: Ensure undefined bpf_attr field stays 0

 include/uapi/linux/bpf.h        |  79 +++-
 include/uapi/linux/perf_event.h |  13 +
 src/bpf.c                       |  42 +-
 src/bpf.h                       |  38 +-
 src/bpf_core_read.h             |   2 +-
 src/btf.c                       |  10 +-
 src/elf.c                       |   2 -
 src/features.c                  | 503 +++++++++++++++++++++
 src/libbpf.c                    | 744 ++++++++++++--------------------
 src/libbpf.h                    |  21 +-
 src/libbpf.map                  |   1 +
 src/libbpf_internal.h           |  50 ++-
 src/libbpf_probes.c             |  12 +-
 src/str_error.h                 |   3 +
 14 files changed, 1019 insertions(+), 501 deletions(-)
 create mode 100644 src/features.c

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
0e57fade4e sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
a36646e2b3 libbpf: Support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar
To allow external admin authority to override default BPF FS location
(/sys/fs/bpf) for implicit BPF token creation, teach libbpf to recognize
LIBBPF_BPF_TOKEN_PATH envvar. If it is specified and user application
didn't explicitly specify bpf_token_path option, it will be treated
exactly like bpf_token_path option, overriding default /sys/fs/bpf
location and making BPF token mandatory.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-29-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
e1a43809a9 libbpf: Wire up BPF token support at BPF object level
Add BPF token support to BPF object-level functionality.

BPF token is supported by BPF object logic either as an explicitly
provided BPF token from outside (through BPF FS path), or implicitly
(unless prevented through bpf_object_open_opts).

Implicit mode is assumed to be the most common one for user namespaced
unprivileged workloads. The assumption is that privileged container
manager sets up default BPF FS mount point at /sys/fs/bpf with BPF token
delegation options (delegate_{cmds,maps,progs,attachs} mount options).
BPF object during loading will attempt to create BPF token from
/sys/fs/bpf location, and pass it for all relevant operations
(currently, map creation, BTF load, and program load).

In this implicit mode, if BPF token creation fails due to whatever
reason (BPF FS is not mounted, or kernel doesn't support BPF token,
etc), this is not considered an error. BPF object loading sequence will
proceed with no BPF token.

In explicit BPF token mode, user provides explicitly custom BPF FS mount
point path. In such case, BPF object will attempt to create BPF token
from provided BPF FS location. If BPF token creation fails, that is
considered a critical error and BPF object load fails with an error.

Libbpf provides a way to disable implicit BPF token creation, if it
causes any troubles (BPF token is designed to be completely optional and
shouldn't cause any problems even if provided, but in the world of BPF
LSM, custom security logic can be installed that might change outcome
depending on the presence of BPF token). To disable libbpf's default BPF
token creation behavior user should provide either invalid BPF token FD
(negative), or empty bpf_token_path option.

BPF token presence can influence libbpf's feature probing, so if BPF
object has associated BPF token, feature probing is instructed to use
BPF object-specific feature detection cache and token FD.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-26-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
a3b317a9c0 libbpf: Wire up token_fd into feature probing logic
Adjust feature probing callbacks to take into account optional token_fd.
In unprivileged contexts, some feature detectors would fail to detect
kernel support just because BPF program, BPF map, or BTF object can't be
loaded due to privileged nature of those operations. So when BPF object
is loaded with BPF token, this token should be used for feature probing.

This patch is setting support for this scenario, but we don't yet pass
non-zero token FD. This will be added in the next patch.

We also switched BPF cookie detector from using kprobe program to
tracepoint one, as tracepoint is somewhat less dangerous BPF program
type and has higher likelihood of being allowed through BPF token in the
future. This change has no effect on detection behavior.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-25-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
d42f0b8943 libbpf: Move feature detection code into its own file
It's quite a lot of well isolated code, so it seems like a good
candidate to move it out of libbpf.c to reduce its size.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-24-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
9bf95048b7 libbpf: Further decouple feature checking logic from bpf_object
Add feat_supported() helper that accepts feature cache instead of
bpf_object. This allows low-level code in bpf.c to not know or care
about higher-level concept of bpf_object, yet it will be able to utilize
custom feature checking in cases where BPF token might influence the
outcome.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-23-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
9454419946 libbpf: Split feature detectors definitions from cached results
Split a list of supported feature detectors with their corresponding
callbacks from actual cached supported/missing values. This will allow
to have more flexible per-token or per-object feature detectors in
subsequent refactorings.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-22-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
8082a311d3 libbpf: Add BPF token support to bpf_prog_load() API
Wire through token_fd into bpf_prog_load().

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-16-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
ac4a66ea12 libbpf: Add BPF token support to bpf_btf_load() API
Allow user to specify token_fd for bpf_btf_load() API that wraps
kernel's BPF_BTF_LOAD command. This allows loading BTF from unprivileged
process as long as it has BPF token allowing BPF_BTF_LOAD command, which
can be created and delegated by privileged process.

Wire through new btf_flags as well, so that user can provide
BPF_F_TOKEN_FD flag, if necessary.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-15-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
8002c052f3 libbpf: Add BPF token support to bpf_map_create() API
Add ability to provide token_fd for BPF_MAP_CREATE command through
bpf_map_create() API.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-14-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
5cc8482fe2 libbpf: Add bpf_token_create() API
Add low-level wrapper API for BPF_TOKEN_CREATE command in bpf() syscall.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-13-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
21fb08cb35 bpf: Add BPF token support to BPF_PROG_LOAD command
Add basic support of BPF token to BPF_PROG_LOAD. BPF_F_TOKEN_FD flag
should be set in prog_flags field when providing prog_token_fd.

Wire through a set of allowed BPF program types and attach types,
derived from BPF FS at BPF token creation time. Then make sure we
perform bpf_token_capable() checks everywhere where it's relevant.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-7-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
eb9d10835c bpf: Add BPF token support to BPF_BTF_LOAD command
Accept BPF token FD in BPF_BTF_LOAD command to allow BTF data loading
through delegated BPF token. BPF_F_TOKEN_FD flag has to be specified
when passing BPF token FD. Given BPF_BTF_LOAD command didn't have flags
field before, we also add btf_flags field.

BTF loading is a pretty straightforward operation, so as long as BPF
token is created with allow_cmds granting BPF_BTF_LOAD command, kernel
proceeds to parsing BTF data and creating BTF object.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-6-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
1386c15b7b bpf: Add BPF token support to BPF_MAP_CREATE command
Allow providing token_fd for BPF_MAP_CREATE command to allow controlled
BPF map creation from unprivileged process through delegated BPF token.
New BPF_F_TOKEN_FD flag is added to specify together with BPF token FD
for BPF_MAP_CREATE command.

Wire through a set of allowed BPF map types to BPF token, derived from
BPF FS at BPF token creation time. This, in combination with allowed_cmds
allows to create a narrowly-focused BPF token (controlled by privileged
agent) with a restrictive set of BPF maps that application can attempt
to create.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-5-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
5cd6a8493b bpf: Introduce BPF token object
Add new kind of BPF kernel object, BPF token. BPF token is meant to
allow delegating privileged BPF functionality, like loading a BPF
program or creating a BPF map, from privileged process to a *trusted*
unprivileged process, all while having a good amount of control over which
privileged operations could be performed using provided BPF token.

This is achieved through mounting BPF FS instance with extra delegation
mount options, which determine what operations are delegatable, and also
constraining it to the owning user namespace (as mentioned in the
previous patch).

BPF token itself is just a derivative from BPF FS and can be created
through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF
FS FD, which can be attained through open() API by opening BPF FS mount
point. Currently, BPF token "inherits" delegated command, map types,
prog type, and attach type bit sets from BPF FS as is. In the future,
having an BPF token as a separate object with its own FD, we can allow
to further restrict BPF token's allowable set of things either at the
creation time or after the fact, allowing the process to guard itself
further from unintentionally trying to load undesired kind of BPF
programs. But for now we keep things simple and just copy bit sets as is.

When BPF token is created from BPF FS mount, we take reference to the
BPF super block's owning user namespace, and then use that namespace for
checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
capabilities that are normally only checked against init userns (using
capable()), but now we check them using ns_capable() instead (if BPF
token is provided). See bpf_token_capable() for details.

Such setup means that BPF token in itself is not sufficient to grant BPF
functionality. User namespaced process has to *also* have necessary
combination of capabilities inside that user namespace. So while
previously CAP_BPF was useless when granted within user namespace, now
it gains a meaning and allows container managers and sys admins to have
a flexible control over which processes can and need to use BPF
functionality within the user namespace (i.e., container in practice).
And BPF FS delegation mount options and derived BPF tokens serve as
a per-container "flag" to grant overall ability to use bpf() (plus further
restrict on which parts of bpf() syscalls are treated as namespaced).

Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
within the BPF FS owning user namespace, rounding up the ns_capable()
story of BPF token. Also creating BPF token in init user namespace is
currently not supported, given BPF token doesn't have any effect in init
user namespace anyways.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-4-andrii@kernel.org
2024-01-26 18:12:29 -05:00
Martin KaFai Lau
385ae492fa libbpf: Ensure undefined bpf_attr field stays 0
The commit 9e926acda0c2 ("libbpf: Find correct module BTFs for struct_ops maps and progs.")
sets a newly added field (value_type_btf_obj_fd) to -1 in libbpf when
the caller of the libbpf's bpf_map_create did not define this field by
passing a NULL "opts" or passing in a "opts" that does not cover this
new field. OPT_HAS(opts, field) is used to decide if the field is
defined or not:

	((opts) && opts->sz >= offsetofend(typeof(*(opts)), field))

Once OPTS_HAS decided the field is not defined, that field should
be set to 0. For this particular new field (value_type_btf_obj_fd),
its corresponding map_flags "BPF_F_VTYPE_BTF_OBJ_FD" is not set.
Thus, the kernel does not treat it as an fd field.

Fixes: 9e926acda0c2 ("libbpf: Find correct module BTFs for struct_ops maps and progs.")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240124224418.2905133-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-26 18:12:29 -05:00
Dima Tisnek
517ff8d823 libbpf: Correct bpf_core_read.h comment wrt bpf_core_relo struct
Past commit ([0]) removed the last vestiges of struct bpf_field_reloc,
it's called struct bpf_core_relo now.

  [0] 28b93c64499a ("libbpf: Clean up and improve CO-RE reloc logging")

Signed-off-by: Dima Tisnek <dimaqq@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240121060126.15650-1-dimaqq@gmail.com
2024-01-26 18:12:29 -05:00
Kui-Feng Lee
0b0dfaf1be libbpf: Find correct module BTFs for struct_ops maps and progs.
Locate the module BTFs for struct_ops maps and progs and pass them to the
kernel. This ensures that the kernel correctly resolves type IDs from the
appropriate module BTFs.

For the map of a struct_ops object, the FD of the module BTF is set to
bpf_map to keep a reference to the module BTF. The FD is passed to the
kernel as value_type_btf_obj_fd when the struct_ops object is loaded.

For a bpf_struct_ops prog, attach_btf_obj_fd of bpf_prog is the FD of a
module BTF in the kernel.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240119225005.668602-13-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-26 18:12:29 -05:00
Kui-Feng Lee
d767ead51f bpf: pass attached BTF to the bpf_struct_ops subsystem
Pass the fd of a btf from the userspace to the bpf() syscall, and then
convert the fd into a btf. The btf is generated from the module that
defines the target BPF struct_ops type.

In order to inform the kernel about the module that defines the target
struct_ops type, the userspace program needs to provide a btf fd for the
respective module's btf. This btf contains essential information on the
types defined within the module, including the target struct_ops type.

A btf fd must be provided to the kernel for struct_ops maps and for the bpf
programs attached to those maps.

In the case of the bpf programs, the attach_btf_obj_fd parameter is passed
as part of the bpf_attr and is converted into a btf. This btf is then
stored in the prog->aux->attach_btf field. Here, it just let the verifier
access attach_btf directly.

In the case of struct_ops maps, a btf fd is passed as value_type_btf_obj_fd
of bpf_attr. The bpf_struct_ops_map_alloc() function converts the fd to a
btf and stores it as st_map->btf. A flag BPF_F_VTYPE_BTF_OBJ_FD is added
for map_flags to indicate that the value of value_type_btf_obj_fd is set.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240119225005.668602-9-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-26 18:12:29 -05:00
Kui-Feng Lee
d70e2071e2 bpf: pass btf object id in bpf_map_info.
Include btf object id (btf_obj_id) in bpf_map_info so that tools (ex:
bpftools struct_ops dump) know the correct btf from the kernel to look up
type information of struct_ops types.

Since struct_ops types can be defined and registered in a module. The
type information of a struct_ops type are defined in the btf of the
module defining it.  The userspace tools need to know which btf is for
the module defining a struct_ops type.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240119225005.668602-7-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-26 18:12:29 -05:00
Jiri Olsa
fe508381b4 bpf: Store cookies in kprobe_multi bpf_link_info data
Storing cookies in kprobe_multi bpf_link_info data. The cookies
field is optional and if provided it needs to be an array of
__u64 with kprobe_multi.count length.

Acked-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240119110505.400573-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-26 18:12:29 -05:00
Jiri Olsa
de2f366450 bpf: Add cookie to perf_event bpf_link_info records
At the moment we don't store cookie for perf_event probes,
while we do that for the rest of the probes.

Adding cookie fields to struct bpf_link_info perf event
probe records:

  perf_event.uprobe
  perf_event.kprobe
  perf_event.tracepoint
  perf_event.perf_event

And the code to store that in bpf_link_info struct.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20240119110505.400573-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
390c6234eb libbpf: call dup2() syscall directly
We've ran into issues with using dup2() API in production setting, where
libbpf is linked into large production environment and ends up calling
unintended custom implementations of dup2(). These custom implementations
don't provide atomic FD replacement guarantees of dup2() syscall,
leading to subtle and hard to debug issues.

To prevent this in the future and guarantee that no libc implementation
will do their own custom non-atomic dup2() implementation, call dup2()
syscall directly with syscall(SYS_dup2).

Note that some architectures don't seem to provide dup2 and have dup3
instead. Try to detect and pick best syscall.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240119210201.1295511-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-26 18:12:29 -05:00
Andrey Grafin
89ca11a79b libbpf: Apply map_set_def_max_entries() for inner_maps on creation
This patch allows to auto create BPF_MAP_TYPE_ARRAY_OF_MAPS and
BPF_MAP_TYPE_HASH_OF_MAPS with values of BPF_MAP_TYPE_PERF_EVENT_ARRAY
by bpf_object__load().

Previous behaviour created a zero filled btf_map_def for inner maps and
tried to use it for a map creation but the linux kernel forbids to create
a BPF_MAP_TYPE_PERF_EVENT_ARRAY map with max_entries=0.

Fixes: 646f02ffdd49 ("libbpf: Add BTF-defined map-in-map support")
Signed-off-by: Andrey Grafin <conquistador@yandex-team.ru>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20240117130619.9403-1-conquistador@yandex-team.ru
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-26 18:12:29 -05:00
Daniel Borkmann
4c3742d9c1 bpf: Sync uapi bpf.h header for the tooling infra
Both commit 91051f003948 ("tcp: Dump bound-only sockets in inet_diag.")
and commit 985b8ea9ec7e ("bpf, docs: Fix bpf_redirect_peer header doc")
missed the tooling header sync. Fix it.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
f4211a704f libbpf: warn on unexpected __arg_ctx type when rewriting BTF
On kernel that don't support arg:ctx tag, before adjusting global
subprog BTF information to match kernel's expected canonical type names,
make sure that types used by user are meaningful, and if not, warn and
don't do BTF adjustments.

This is similar to checks that kernel performs, but narrower in scope,
as only a small subset of BPF program types can be accommodated by
libbpf using canonical type names.

Libbpf unconditionally allows `struct pt_regs *` for perf_event program
types, unlike kernel, which supports that conditionally on architecture.
This is done to keep things simple and not cause unnecessary false
positives. This seems like a minor and harmless deviation, which in
real-world programs will be caught by kernels with arg:ctx tag support
anyways. So KISS principle.

This logic is hard to test (especially on latest kernels), so manual
testing was performed instead. Libbpf emitted the following warning for
perf_event program with wrong context argument type:

  libbpf: prog 'arg_tag_ctx_perf': subprog 'subprog_ctx_tag' arg#0 is expected to be of `struct bpf_perf_event_data *` type

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240118033143.3384355-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
939ab641b8 libbpf: feature-detect arg:ctx tag support in kernel
Add feature detector of kernel-side arg:ctx (__arg_ctx) tag support. If
this is detected, libbpf will avoid doing any __arg_ctx-related BTF
rewriting and checks in favor of letting kernel handle this completely.

test_global_funcs/ctx_arg_rewrite subtest is adjusted to do the same
feature detection (albeit in much simpler, though round-about and
inefficient, way), and skip the tests. This is done to still be able to
execute this test on older kernels (like in libbpf CI).

Note, BPF token series ([0]) does a major refactor and code moving of
libbpf-internal feature detection "framework", so to avoid unnecessary
conflicts we keep newly added feature detection stand-alone with ad-hoc
result caching. Once things settle, there will be a small follow up to
re-integrate everything back and move code into its final place in
newly-added (by BPF token series) features.c file.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=814209&state=*

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240118033143.3384355-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-26 18:12:29 -05:00
Kan Liang
82ebbd97c8 perf/x86/intel: Support branch counters logging
The branch counters logging (A.K.A LBR event logging) introduces a
per-counter indication of precise event occurrences in LBRs. It can
provide a means to attribute exposed retirement latency to combinations
of events across a block of instructions. It also provides a means of
attributing Timed LBR latencies to events.

The feature is first introduced on SRF/GRR. It is an enhancement of the
ARCH LBR. It adds new fields in the LBR_INFO MSRs to log the occurrences
of events on the GP counters. The information is displayed by the order
of counters.

The design proposed in this patch requires that the events which are
logged must be in a group with the event that has LBR. If there are
more than one LBR group, the counters logging information only from the
current group (overflowed) are stored for the perf tool, otherwise the
perf tool cannot know which and when other groups are scheduled
especially when multiplexing is triggered. The user can ensure it uses
the maximum number of counters that support LBR info (4 by now) by
making the group large enough.

The HW only logs events by the order of counters. The order may be
different from the order of enabling which the perf tool can understand.
When parsing the information of each branch entry, convert the counter
order to the enabled order, and store the enabled order in the extension
space.

Unconditionally reset LBRs for an LBR event group when it's deleted. The
logged counter information is only valid for the current LBR group. If
another LBR group is scheduled later, the information from the stale
LBRs would be otherwise wrongly interpreted.

Add a sanity check in intel_pmu_hw_config(). Disable the feature if other
counter filters (inv, cmask, edge, in_tx) are set or LBR call stack mode
is enabled. (For the LBR call stack mode, we cannot simply flush the
LBR, since it will break the call stack. Also, there is no obvious usage
with the call stack mode for now.)

Only applying the PERF_SAMPLE_BRANCH_COUNTERS doesn't require any branch
stack setup.

Expose the maximum number of supported counters and the width of the
counters into the sysfs. The perf tool can use the information to parse
the logged counters in each branch.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231025201626.3000228-5-kan.liang@linux.intel.com
2024-01-26 18:12:29 -05:00
Kan Liang
9705c4c622 perf: Add branch stack counters
Currently, the additional information of a branch entry is stored in a
u64 space. With more and more information added, the space is running
out. For example, the information of occurrences of events will be added
for each branch.

Two places were suggested to append the counters.
https://lore.kernel.org/lkml/20230802215814.GH231007@hirez.programming.kicks-ass.net/
One place is right after the flags of each branch entry. It changes the
existing struct perf_branch_entry. The later ARCH specific
implementation has to be really careful to consistently pick
the right struct.
The other place is right after the entire struct perf_branch_stack.
The disadvantage is that the pointer of the extra space has to be
recorded. The common interface perf_sample_save_brstack() has to be
updated.

The latter is much straightforward, and should be easily understood and
maintained. It is implemented in the patch.

Add a new branch sample type, PERF_SAMPLE_BRANCH_COUNTERS, to indicate
the event which is recorded in the branch info.

The "u64 counters" may store the occurrences of several events. The
information regarding the number of events/counters and the width of
each counter should be exposed via sysfs as a reference for the perf
tool. Define the branch_counter_nr and branch_counter_width ABI here.
The support will be implemented later in the Intel-specific patch.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231025201626.3000228-1-kan.liang@linux.intel.com
2024-01-26 18:12:29 -05:00
Andrii Nakryiko
528cb9d3e9 README: update Ubuntu link
Do what [0] proposed to do, but with properly formatted commit message
and Signed-off-by.

  [0] https://github.com/libbpf/libbpf/pull/742

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-25 16:47:44 -08:00
thiagoftsm
f8f9df60e0 Merge branch 'libbpf:master' into master 2024-01-24 12:31:08 +00:00
Andrii Nakryiko
f81eef23b3 ci: skip two tests failing due to kernel bug
Add lwt_reroute and tc_links_ingress to DENYLIST, as they are currently
broken due to kernel bug. Fix is underreview and should make it into
bpf-next soon.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
feabd96e00 ci: regenerate vmlinux.h
Need bpf_xfrm_state_opts and others.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
1570d568a0 Makefile: bump to v1.4.0 dev version
Bump Github-only Makefile to match 1.4 development version.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
e2203b3057 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   750011e239a50873251c16207b0fe78eabf8577e
Checkpoint bpf-next commit: 98e20e5e13d2811898921f999288be7151a11954
Baseline bpf commit:        bc4fbf022c68967cb49b2b820b465cf90de974b8
Checkpoint bpf commit:      7c5e046bdcb2513f9decb3765d8bf92d604279cf

Alyssa Ross (1):
  libbpf: Skip DWARF sections in linker sanity check

Amritha Nambiar (4):
  netdev-genl: spec: Extend netdev netlink spec in YAML for queue
  netdev-genl: spec: Extend netdev netlink spec in YAML for NAPI
  netdev-genl: spec: Add irq in netdev netlink YAML spec
  netdev-genl: spec: Add PID in netdev netlink YAML spec

Andrii Nakryiko (24):
  bpf: introduce BPF token object
  bpf: add BPF token support to BPF_MAP_CREATE command
  bpf: add BPF token support to BPF_BTF_LOAD command
  bpf: add BPF token support to BPF_PROG_LOAD command
  libbpf: add bpf_token_create() API
  libbpf: add BPF token support to bpf_map_create() API
  libbpf: add BPF token support to bpf_btf_load() API
  libbpf: add BPF token support to bpf_prog_load() API
  bpf: rename MAX_BPF_LINK_TYPE into __MAX_BPF_LINK_TYPE for consistency
  libbpf: split feature detectors definitions from cached results
  libbpf: further decouple feature checking logic from bpf_object
  libbpf: move feature detection code into its own file
  libbpf: wire up token_fd into feature probing logic
  libbpf: wire up BPF token support at BPF object level
  libbpf: support BPF token path setting through LIBBPF_BPF_TOKEN_PATH
    envvar
  Revert BPF token-related functionality
  libbpf: add __arg_xxx macros for annotating global func args
  libbpf: make uniform use of btf__fd() accessor inside libbpf
  libbpf: use explicit map reuse flag to skip map creation steps
  libbpf: don't rely on map->fd as an indicator of map being created
  libbpf: use stable map placeholder FDs
  libbpf: move exception callbacks assignment logic into relocation step
  libbpf: move BTF loading step after relocation step
  libbpf: implement __arg_ctx fallback logic

Daniel Xu (1):
  libbpf: Add BPF_CORE_WRITE_BITFIELD() macro

David Vernet (1):
  bpf: Load vmlinux btf for any struct_ops map

Eduard Zingerman (1):
  libbpf: Start v1.4 development cycle

Jakub Kicinski (1):
  tools: ynl: add sample for getting page-pool information

Jamal Hadi Salim (5):
  net/sched: Remove uapi support for rsvp classifier
  net/sched: Remove uapi support for tcindex classifier
  net/sched: Remove uapi support for dsmark qdisc
  net/sched: Remove uapi support for ATM qdisc
  net/sched: Remove uapi support for CBQ qdisc

Jiri Olsa (2):
  libbpf: Add st_type argument to elf_resolve_syms_offsets function
  bpf: Add link_info support for uprobe multi link

Larysa Zaremba (1):
  xdp: Add VLAN tag hint

Mingyi Zhang (1):
  libbpf: Fix NULL pointer dereference in bpf_object__collect_prog_relos

Sergei Trofimovich (1):
  libbpf: Add pr_warn() for EINVAL cases in linker_sanity_check_elf

Stanislav Fomichev (3):
  xsk: Support tx_metadata_len
  xsk: Add TX timestamp and TX checksum offload support
  xsk: Add option to calculate TX checksum in SW

 include/uapi/linux/bpf.h       |  14 +-
 include/uapi/linux/if_xdp.h    |  61 +++-
 include/uapi/linux/netdev.h    |  81 ++++-
 include/uapi/linux/pkt_cls.h   |  47 ---
 include/uapi/linux/pkt_sched.h | 109 ------
 src/bpf_core_read.h            |  32 ++
 src/bpf_helpers.h              |   3 +
 src/elf.c                      |   5 +-
 src/libbpf.c                   | 585 +++++++++++++++++++++++++--------
 src/libbpf.map                 |   3 +
 src/libbpf_internal.h          |  17 +-
 src/libbpf_version.h           |   2 +-
 src/linker.c                   |  27 +-
 13 files changed, 673 insertions(+), 313 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
3102067b4e libbpf: implement __arg_ctx fallback logic
Out of all special global func arg tag annotations, __arg_ctx is
practically is the most immediately useful and most critical to have
working across multitude kernel version, if possible. This would allow
end users to write much simpler code if __arg_ctx semantics worked for
older kernels that don't natively understand btf_decl_tag("arg:ctx") in
verifier logic.

Luckily, it is possible to ensure __arg_ctx works on old kernels through
a bit of extra work done by libbpf, at least in a lot of common cases.

To explain the overall idea, we need to go back at how context argument
was supported in global funcs before __arg_ctx support was added. This
was done based on special struct name checks in kernel. E.g., for
BPF_PROG_TYPE_PERF_EVENT the expectation is that argument type `struct
bpf_perf_event_data *` mark that argument as PTR_TO_CTX. This is all
good as long as global function is used from the same BPF program types
only, which is often not the case. If the same subprog has to be called
from, say, kprobe and perf_event program types, there is no single
definition that would satisfy BPF verifier. Subprog will have context
argument either for kprobe (if using bpf_user_pt_regs_t struct name) or
perf_event (with bpf_perf_event_data struct name), but not both.

This limitation was the reason to add btf_decl_tag("arg:ctx"), making
the actual argument type not important, so that user can just define
"generic" signature:

  __noinline int global_subprog(void *ctx __arg_ctx) { ... }

I won't belabor how libbpf is implementing subprograms, see a huge
comment next to bpf_object_relocate_calls() function. The idea is that
each main/entry BPF program gets its own copy of global_subprog's code
appended.

This per-program copy of global subprog code *and* associated func_info
.BTF.ext information, pointing to FUNC -> FUNC_PROTO BTF type chain
allows libbpf to simulate __arg_ctx behavior transparently, even if the
kernel doesn't yet support __arg_ctx annotation natively.

The idea is straightforward: each time we append global subprog's code
and func_info information, we adjust its FUNC -> FUNC_PROTO type
information, if necessary (that is, libbpf can detect the presence of
btf_decl_tag("arg:ctx") just like BPF verifier would do it).

The rest is just mechanical and somewhat painful BTF manipulation code.
It's painful because we need to clone FUNC -> FUNC_PROTO, instead of
reusing it, as same FUNC -> FUNC_PROTO chain might be used by another
main BPF program within the same BPF object, so we can't just modify it
in-place (and cloning BTF types within the same struct btf object is
painful due to constant memory invalidation, see comments in code).
Uploaded BPF object's BTF information has to work for all BPF
programs at the same time.

Once we have FUNC -> FUNC_PROTO clones, we make sure that instead of
using some `void *ctx` parameter definition, we have an expected `struct
bpf_perf_event_data *ctx` definition (as far as BPF verifier and kernel
is concerned), which will mark it as context for BPF verifier. Same
global subprog relocated and copied into another main BPF program will
get different type information according to main program's type. It all
works out in the end in a completely transparent way for end user.

Libbpf maintains internal program type -> expected context struct name
mapping internally. Note, not all BPF program types have named context
struct, so this approach won't work for such programs (just like it
didn't before __arg_ctx). So native __arg_ctx is still important to have
in kernel to have generic context support across all BPF program types.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-8-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
a4f0740b3d libbpf: move BTF loading step after relocation step
With all the preparations in previous patches done we are ready to
postpone BTF loading and sanitization step until after all the
relocations are performed.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
94470256c1 libbpf: move exception callbacks assignment logic into relocation step
Move the logic of finding and assigning exception callback indices from
BTF sanitization step to program relocations step, which seems more
logical and will unblock moving BTF loading to after relocation step.

Exception callbacks discovery and assignment has no dependency on BTF
being loaded into the kernel, it only uses BTF information. It does need
to happen before subprogram relocations happen, though. Which is why the
split.

No functional changes.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
4d68ea90c2 libbpf: use stable map placeholder FDs
Move map creation to later during BPF object loading by pre-creating
stable placeholder FDs (utilizing memfd_create()). Use dup2()
syscall to then atomically make those placeholder FDs point to real
kernel BPF map objects.

This change allows to delay BPF map creation to after all the BPF
program relocations. That, in turn, allows to delay BTF finalization and
loading into kernel to after all the relocations as well. We'll take
advantage of the latter in subsequent patches to allow libbpf to adjust
BTF in a way that helps with BPF global function usage.

Clean up a few places where we close map->fd, which now shouldn't
happen, because map->fd should be a valid FD regardless of whether map
was created or not. Surprisingly and nicely it simplifies a bunch of
error handling code. If this change doesn't backfire, I'm tempted to
pre-create such stable FDs for other entities (progs, maybe even BTF).
We previously did some manipulations to make gen_loader work with fake
map FDs, with stable map FDs this hack is not necessary for maps (we
still have it for BTF, but I left it as is for now).

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
2ea3d8042f libbpf: don't rely on map->fd as an indicator of map being created
With the upcoming switch to preallocated placeholder FDs for maps,
switch various getters/setter away from checking map->fd. Use
map_is_created() helper that detect whether BPF map can be modified based
on map->obj->loaded state, with special provision for maps set up with
bpf_map__reuse_fd().

For backwards compatibility, we take map_is_created() into account in
bpf_map__fd() getter as well. This way before bpf_object__load() phase
bpf_map__fd() will always return -1, just as before the changes in
subsequent patches adding stable map->fd placeholders.

We also get rid of all internal uses of bpf_map__fd() getter, as it's
more oriented for uses external to libbpf. The above map_is_created()
check actually interferes with some of the internal uses, if map FD is
fetched through bpf_map__fd().

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
e9ce55197b libbpf: use explicit map reuse flag to skip map creation steps
Instead of inferring whether map already point to previously
created/pinned BPF map (which user can specify with bpf_map__reuse_fd()) API),
use explicit map->reused flag that is set in such case.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
3fb45d3761 libbpf: make uniform use of btf__fd() accessor inside libbpf
It makes future grepping and code analysis a bit easier.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Jamal Hadi Salim
2e49eb8bf6 net/sched: Remove uapi support for CBQ qdisc
Commit 051d44209842 ("net/sched: Retire CBQ qdisc") retired the CBQ qdisc.
Remove UAPI for it. Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04 19:15:17 -05:00
Jamal Hadi Salim
5473fe6aef net/sched: Remove uapi support for ATM qdisc
Commit fb38306ceb9e ("net/sched: Retire ATM qdisc") retired the ATM qdisc.
Remove UAPI for it. Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04 19:15:17 -05:00
Jamal Hadi Salim
c04d1b669d net/sched: Remove uapi support for dsmark qdisc
Commit bbe77c14ee61 ("net/sched: Retire dsmark qdisc") retired the dsmark
classifier. Remove UAPI support for it.
Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04 19:15:17 -05:00
Jamal Hadi Salim
717798e2f9 net/sched: Remove uapi support for tcindex classifier
commit 8c710f75256b ("net/sched: Retire tcindex classifier") retired the TC
tcindex classifier.
Remove UAPI for it.  Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04 19:15:17 -05:00
Jamal Hadi Salim
f2c790ca1a net/sched: Remove uapi support for rsvp classifier
commit 265b4da82dbf ("net/sched: Retire rsvp classifier") retired the TC RSVP
classifier.
Remove UAPI for it. Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04 19:15:17 -05:00
Mingyi Zhang
c008eb921e libbpf: Fix NULL pointer dereference in bpf_object__collect_prog_relos
An issue occurred while reading an ELF file in libbpf.c during fuzzing:

	Program received signal SIGSEGV, Segmentation fault.
	0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206
	4206 in libbpf.c
	(gdb) bt
	#0 0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206
	#1 0x000000000094f9d6 in bpf_object.collect_relos () at libbpf.c:6706
	#2 0x000000000092bef3 in bpf_object_open () at libbpf.c:7437
	#3 0x000000000092c046 in bpf_object.open_mem () at libbpf.c:7497
	#4 0x0000000000924afa in LLVMFuzzerTestOneInput () at fuzz/bpf-object-fuzzer.c:16
	#5 0x000000000060be11 in testblitz_engine::fuzzer::Fuzzer::run_one ()
	#6 0x000000000087ad92 in tracing::span::Span::in_scope ()
	#7 0x00000000006078aa in testblitz_engine::fuzzer::util::walkdir ()
	#8 0x00000000005f3217 in testblitz_engine::entrypoint::main::{{closure}} ()
	#9 0x00000000005f2601 in main ()
	(gdb)

scn_data was null at this code(tools/lib/bpf/src/libbpf.c):

	if (rel->r_offset % BPF_INSN_SZ || rel->r_offset >= scn_data->d_size) {

The scn_data is derived from the code above:

	scn = elf_sec_by_idx(obj, sec_idx);
	scn_data = elf_sec_data(obj, scn);

	relo_sec_name = elf_sec_str(obj, shdr->sh_name);
	sec_name = elf_sec_name(obj, scn);
	if (!relo_sec_name || !sec_name)// don't check whether scn_data is NULL
		return -EINVAL;

In certain special scenarios, such as reading a malformed ELF file,
it is possible that scn_data may be a null pointer

Signed-off-by: Mingyi Zhang <zhangmingyi5@huawei.com>
Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Changye Wu <wuchangye@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231221033947.154564-1-liuxin350@huawei.com
2024-01-04 19:15:17 -05:00
Alyssa Ross
6252a2fdcc libbpf: Skip DWARF sections in linker sanity check
clang can generate (with -g -Wa,--compress-debug-sections) 4-byte
aligned DWARF sections that declare themselves to be 8-byte aligned in
the section header.  Since DWARF sections are dropped during linking
anyway, just skip running the sanity checks on them.

Reported-by: Sergei Trofimovich <slyich@gmail.com>
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://lore.kernel.org/bpf/ZXcFRJVKbKxtEL5t@nz.home/
Link: https://lore.kernel.org/bpf/20231219110324.8989-1-hi@alyssa.is
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
c378eff58c libbpf: add __arg_xxx macros for annotating global func args
Add a set of __arg_xxx macros which can be used to augment BPF global
subprogs/functions with extra information for use by BPF verifier.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231215011334.2307144-9-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
c65b319c04 Revert BPF token-related functionality
This patch includes the following revert (one  conflicting BPF FS
patch and three token patch sets, represented by merge commits):
  - revert 0f5d5454c723 "Merge branch 'bpf-fs-mount-options-parsing-follow-ups'";
  - revert 750e785796bb "bpf: Support uid and gid when mounting bpffs";
  - revert 733763285acf "Merge branch 'bpf-token-support-in-libbpf-s-bpf-object'";
  - revert c35919dcce28 "Merge branch 'bpf-token-and-bpf-fs-based-delegation'".

Link: https://lore.kernel.org/bpf/CAHk-=wg7JuFYwGy=GOMbRCtOL+jwSQsdUaBsRWkDVYbxipbM5A@mail.gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-04 19:15:17 -05:00
Larysa Zaremba
43e7309228 xdp: Add VLAN tag hint
Implement functionality that enables drivers to expose VLAN tag
to XDP code.

VLAN tag is represented by 2 variables:
- protocol ID, which is passed to bpf code in BE
- VLAN TCI, in host byte order

Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20231205210847.28460-10-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
b166b99eed libbpf: support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar
To allow external admin authority to override default BPF FS location
(/sys/fs/bpf) for implicit BPF token creation, teach libbpf to recognize
LIBBPF_BPF_TOKEN_PATH envvar. If it is specified and user application
didn't explicitly specify neither bpf_token_path nor bpf_token_fd
option, it will be treated exactly like bpf_token_path option,
overriding default /sys/fs/bpf location and making BPF token mandatory.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231213190842.3844987-10-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
5df9eba06a libbpf: wire up BPF token support at BPF object level
Add BPF token support to BPF object-level functionality.

BPF token is supported by BPF object logic either as an explicitly
provided BPF token from outside (through BPF FS path or explicit BPF
token FD), or implicitly (unless prevented through
bpf_object_open_opts).

Implicit mode is assumed to be the most common one for user namespaced
unprivileged workloads. The assumption is that privileged container
manager sets up default BPF FS mount point at /sys/fs/bpf with BPF token
delegation options (delegate_{cmds,maps,progs,attachs} mount options).
BPF object during loading will attempt to create BPF token from
/sys/fs/bpf location, and pass it for all relevant operations
(currently, map creation, BTF load, and program load).

In this implicit mode, if BPF token creation fails due to whatever
reason (BPF FS is not mounted, or kernel doesn't support BPF token,
etc), this is not considered an error. BPF object loading sequence will
proceed with no BPF token.

In explicit BPF token mode, user provides explicitly either custom BPF
FS mount point path or creates BPF token on their own and just passes
token FD directly. In such case, BPF object will either dup() token FD
(to not require caller to hold onto it for entire duration of BPF object
lifetime) or will attempt to create BPF token from provided BPF FS
location. If BPF token creation fails, that is considered a critical
error and BPF object load fails with an error.

Libbpf provides a way to disable implicit BPF token creation, if it
causes any troubles (BPF token is designed to be completely optional and
shouldn't cause any problems even if provided, but in the world of BPF
LSM, custom security logic can be installed that might change outcome
dependin on the presence of BPF token). To disable libbpf's default BPF
token creation behavior user should provide either invalid BPF token FD
(negative), or empty bpf_token_path option.

BPF token presence can influence libbpf's feature probing, so if BPF
object has associated BPF token, feature probing is instructed to use
BPF object-specific feature detection cache and token FD.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231213190842.3844987-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
b14daa8b9b libbpf: wire up token_fd into feature probing logic
Adjust feature probing callbacks to take into account optional token_fd.
In unprivileged contexts, some feature detectors would fail to detect
kernel support just because BPF program, BPF map, or BTF object can't be
loaded due to privileged nature of those operations. So when BPF object
is loaded with BPF token, this token should be used for feature probing.

This patch is setting support for this scenario, but we don't yet pass
non-zero token FD. This will be added in the next patch.

We also switched BPF cookie detector from using kprobe program to
tracepoint one, as tracepoint is somewhat less dangerous BPF program
type and has higher likelihood of being allowed through BPF token in the
future. This change has no effect on detection behavior.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231213190842.3844987-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
fab327c888 libbpf: move feature detection code into its own file
It's quite a lot of well isolated code, so it seems like a good
candidate to move it out of libbpf.c to reduce its size.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231213190842.3844987-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
feda0728e0 libbpf: further decouple feature checking logic from bpf_object
Add feat_supported() helper that accepts feature cache instead of
bpf_object. This allows low-level code in bpf.c to not know or care
about higher-level concept of bpf_object, yet it will be able to utilize
custom feature checking in cases where BPF token might influence the
outcome.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231213190842.3844987-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
11c977ffaf libbpf: split feature detectors definitions from cached results
Split a list of supported feature detectors with their corresponding
callbacks from actual cached supported/missing values. This will allow
to have more flexible per-token or per-object feature detectors in
subsequent refactorings.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231213190842.3844987-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Daniel Xu
9d2f8aaf21 libbpf: Add BPF_CORE_WRITE_BITFIELD() macro
=== Motivation ===

Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
writing wrapper to make the verifier happy.

Two alternatives to this approach are:

1. Use the upcoming `preserve_static_offset` [0] attribute to disable
   CO-RE on specific structs.
2. Use broader byte-sized writes to write to bitfields.

(1) is a bit hard to use. It requires specific and not-very-obvious
annotations to bpftool generated vmlinux.h. It's also not generally
available in released LLVM versions yet.

(2) makes the code quite hard to read and write. And especially if
BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
to have an inverse helper for writing.

=== Implementation details ===

Since the logic is a bit non-obvious, I thought it would be helpful
to explain exactly what's going on.

To start, it helps by explaining what LSHIFT_U64 (lshift) and RSHIFT_U64
(rshift) is designed to mean. Consider the core of the
BPF_CORE_READ_BITFIELD() algorithm:

        val <<= __CORE_RELO(s, field, LSHIFT_U64);
        val = val >> __CORE_RELO(s, field, RSHIFT_U64);

Basically what happens is we lshift to clear the non-relevant (blank)
higher order bits. Then we rshift to bring the relevant bits (bitfield)
down to LSB position (while also clearing blank lower order bits). To
illustrate:

        Start:    ........XXX......
        Lshift:   XXX......00000000
        Rshift:   00000000000000XXX

where `.` means blank bit, `0` means 0 bit, and `X` means bitfield bit.

After the two operations, the bitfield is ready to be interpreted as a
regular integer.

Next, we want to build an alternative (but more helpful) mental model
on lshift and rshift. That is, to consider:

* rshift as the total number of blank bits in the u64
* lshift as number of blank bits left of the bitfield in the u64

Take a moment to consider why that is true by consulting the above
diagram.

With this insight, we can now define the following relationship:

              bitfield
                 _
                | |
        0.....00XXX0...00
        |      |   |    |
        |______|   |    |
         lshift    |    |
                   |____|
              (rshift - lshift)

That is, we know the number of higher order blank bits is just lshift.
And the number of lower order blank bits is (rshift - lshift).

Finally, we can examine the core of the write side algorithm:

        mask = (~0ULL << rshift) >> lshift;              // 1
        val = (val & ~mask) | ((nval << rpad) & mask);   // 2

1. Compute a mask where the set bits are the bitfield bits. The first
   left shift zeros out exactly the number of blank bits, leaving a
   bitfield sized set of 1s. The subsequent right shift inserts the
   correct amount of higher order blank bits.

2. On the left of the `|`, mask out the bitfield bits. This creates
   0s where the new bitfield bits will go. On the right of the `|`,
   bring nval into the correct bit position and mask out any bits
   that fall outside of the bitfield. Finally, by bor'ing the two
   halves, we get the final set of bits to write back.

[0]: https://reviews.llvm.org/D133361
Co-developed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Co-developed-by: Jonathan Lemon <jlemon@aviatrix.com>
Signed-off-by: Jonathan Lemon <jlemon@aviatrix.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/4d3dd215a4fd57d980733886f9c11a45e1a9adf3.1702325874.git.dxu@dxuuu.xyz
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-04 19:15:17 -05:00
Sergei Trofimovich
5f68c571c8 libbpf: Add pr_warn() for EINVAL cases in linker_sanity_check_elf
Before the change on `i686-linux` `systemd` build failed as:

    $ bpftool gen object src/core/bpf/socket_bind/socket-bind.bpf.o src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o
    Error: failed to link 'src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o': Invalid argument (22)

After the change it fails as:

    $ bpftool gen object src/core/bpf/socket_bind/socket-bind.bpf.o src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o
    libbpf: ELF section #9 has inconsistent alignment addr=8 != d=4 in src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o
    Error: failed to link 'src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o': Invalid argument (22)

Now it's slightly easier to figure out what is wrong with an ELF file.

Signed-off-by: Sergei Trofimovich <slyich@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231208215100.435876-1-slyich@gmail.com
2024-01-04 19:15:17 -05:00
David Vernet
235ea85487 bpf: Load vmlinux btf for any struct_ops map
In libbpf, when determining whether we need to load vmlinux btf, we're
currently (among other things) checking whether there is any struct_ops
program present in the object. This works for most realistic struct_ops
maps, as a struct_ops map is of course typically composed of one or more
struct_ops programs. However, that technically need not be the case. A
struct_ops interface could be defined which allows a map to be specified
which one or more non-prog fields, and which provides default behavior
if no struct_ops progs is actually provided otherwise. For sched_ext,
for example, you technically only need to specify the name of the
scheduler in the struct_ops map, with the core scheduler logic providing
default behavior if no prog is actually specified.

If we were to define and try to load such a struct_ops map, we would
crash in libbpf when initializing it as obj->btf_vmlinux will be NULL:

Reading symbols from minimal...
(gdb) r
Starting program: minimal_example
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x000055555558308c in btf__type_cnt (btf=0x0) at btf.c:612
612             return btf->start_id + btf->nr_types;
(gdb) bt
    type_name=0x5555555d99e3 "sched_ext_ops", kind=4) at btf.c:914
    kind=4) at btf.c:942
    type=0x7fffffffe558, type_id=0x7fffffffe548, ...
    data_member=0x7fffffffe568) at libbpf.c:948
    kern_btf=0x0) at libbpf.c:1017
    at libbpf.c:8059

So as to account for such bare-bones struct_ops maps, let's update
obj_needs_vmlinux_btf() to also iterate over an obj's maps and check
whether any of them are struct_ops maps.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20231208061704.400463-1-void@manifault.com
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
400cbd6148 bpf: rename MAX_BPF_LINK_TYPE into __MAX_BPF_LINK_TYPE for consistency
To stay consistent with the naming pattern used for similar cases in BPF
UAPI (__MAX_BPF_ATTACH_TYPE, etc), rename MAX_BPF_LINK_TYPE into
__MAX_BPF_LINK_TYPE.

Also similar to MAX_BPF_ATTACH_TYPE and MAX_BPF_REG, add:

  #define MAX_BPF_LINK_TYPE __MAX_BPF_LINK_TYPE

Not all __MAX_xxx enums have such #define, so I'm not sure if we should
add it or not, but I figured I'll start with a completely backwards
compatible way, and we can drop that, if necessary.

Also adjust a selftest that used MAX_BPF_LINK_TYPE enum.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20231206190920.1651226-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
ec1cab73a7 libbpf: add BPF token support to bpf_prog_load() API
Wire through token_fd into bpf_prog_load().

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-16-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
207b6ebb60 libbpf: add BPF token support to bpf_btf_load() API
Allow user to specify token_fd for bpf_btf_load() API that wraps
kernel's BPF_BTF_LOAD command. This allows loading BTF from unprivileged
process as long as it has BPF token allowing BPF_BTF_LOAD command, which
can be created and delegated by privileged process.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-15-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
a23b8ffcf6 libbpf: add BPF token support to bpf_map_create() API
Add ability to provide token_fd for BPF_MAP_CREATE command through
bpf_map_create() API.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-14-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
f8954ca692 libbpf: add bpf_token_create() API
Add low-level wrapper API for BPF_TOKEN_CREATE command in bpf() syscall.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-13-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
1ebea57322 bpf: add BPF token support to BPF_PROG_LOAD command
Add basic support of BPF token to BPF_PROG_LOAD. Wire through a set of
allowed BPF program types and attach types, derived from BPF FS at BPF
token creation time. Then make sure we perform bpf_token_capable()
checks everywhere where it's relevant.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
544acb9af6 bpf: add BPF token support to BPF_BTF_LOAD command
Accept BPF token FD in BPF_BTF_LOAD command to allow BTF data loading
through delegated BPF token. BTF loading is a pretty straightforward
operation, so as long as BPF token is created with allow_cmds granting
BPF_BTF_LOAD command, kernel proceeds to parsing BTF data and creating
BTF object.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
9abcc5efc8 bpf: add BPF token support to BPF_MAP_CREATE command
Allow providing token_fd for BPF_MAP_CREATE command to allow controlled
BPF map creation from unprivileged process through delegated BPF token.

Wire through a set of allowed BPF map types to BPF token, derived from
BPF FS at BPF token creation time. This, in combination with allowed_cmds
allows to create a narrowly-focused BPF token (controlled by privileged
agent) with a restrictive set of BPF maps that application can attempt
to create.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Andrii Nakryiko
33de35fd83 bpf: introduce BPF token object
Add new kind of BPF kernel object, BPF token. BPF token is meant to
allow delegating privileged BPF functionality, like loading a BPF
program or creating a BPF map, from privileged process to a *trusted*
unprivileged process, all while having a good amount of control over which
privileged operations could be performed using provided BPF token.

This is achieved through mounting BPF FS instance with extra delegation
mount options, which determine what operations are delegatable, and also
constraining it to the owning user namespace (as mentioned in the
previous patch).

BPF token itself is just a derivative from BPF FS and can be created
through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF
FS FD, which can be attained through open() API by opening BPF FS mount
point. Currently, BPF token "inherits" delegated command, map types,
prog type, and attach type bit sets from BPF FS as is. In the future,
having an BPF token as a separate object with its own FD, we can allow
to further restrict BPF token's allowable set of things either at the
creation time or after the fact, allowing the process to guard itself
further from unintentionally trying to load undesired kind of BPF
programs. But for now we keep things simple and just copy bit sets as is.

When BPF token is created from BPF FS mount, we take reference to the
BPF super block's owning user namespace, and then use that namespace for
checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
capabilities that are normally only checked against init userns (using
capable()), but now we check them using ns_capable() instead (if BPF
token is provided). See bpf_token_capable() for details.

Such setup means that BPF token in itself is not sufficient to grant BPF
functionality. User namespaced process has to *also* have necessary
combination of capabilities inside that user namespace. So while
previously CAP_BPF was useless when granted within user namespace, now
it gains a meaning and allows container managers and sys admins to have
a flexible control over which processes can and need to use BPF
functionality within the user namespace (i.e., container in practice).
And BPF FS delegation mount options and derived BPF tokens serve as
a per-container "flag" to grant overall ability to use bpf() (plus further
restrict on which parts of bpf() syscalls are treated as namespaced).

Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
within the BPF FS owning user namespace, rounding up the ns_capable()
story of BPF token.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Amritha Nambiar
ac9cd25de9 netdev-genl: spec: Add PID in netdev netlink YAML spec
Add support in netlink spec(netdev.yaml) for PID of the
NAPI thread. Add code generated from the spec.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/170147335301.5260.11872351477120434501.stgit@anambiarhost.jf.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 19:15:17 -05:00
Amritha Nambiar
cfa6e420f4 netdev-genl: spec: Add irq in netdev netlink YAML spec
Add support in netlink spec(netdev.yaml) for interrupt number
among the NAPI attributes. Add code generated from the spec.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/170147334210.5260.18178387869057516983.stgit@anambiarhost.jf.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 19:15:17 -05:00
Amritha Nambiar
36f30e4c30 netdev-genl: spec: Extend netdev netlink spec in YAML for NAPI
Add support in netlink spec(netdev.yaml) for napi related information.
Add code generated from the spec.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/170147333119.5260.7050639053080529108.stgit@anambiarhost.jf.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 19:15:17 -05:00
Amritha Nambiar
e4fcfe7db7 netdev-genl: spec: Extend netdev netlink spec in YAML for queue
Add support in netlink spec(netdev.yaml) for queue information.
Add code generated from the spec.

Note: The "queue-type" attribute takes values 0 and 1 for rx
and tx queue type respectively.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/170147330963.5260.2576294626647300472.stgit@anambiarhost.jf.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 19:15:17 -05:00
Stanislav Fomichev
419eab9ec7 xsk: Add option to calculate TX checksum in SW
For XDP_COPY mode, add a UMEM option XDP_UMEM_TX_SW_CSUM
to call skb_checksum_help in transmit path. Might be useful
to debugging issues with real hardware. I also use this mode
in the selftests.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20231127190319.1190813-9-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Stanislav Fomichev
95134be22e xsk: Add TX timestamp and TX checksum offload support
This change actually defines the (initial) metadata layout
that should be used by AF_XDP userspace (xsk_tx_metadata).
The first field is flags which requests appropriate offloads,
followed by the offload-specific fields. The supported per-device
offloads are exported via netlink (new xsk-flags).

The offloads themselves are still implemented in a bit of a
framework-y fashion that's left from my initial kfunc attempt.
I'm introducing new xsk_tx_metadata_ops which drivers are
supposed to implement. The drivers are also supposed
to call xsk_tx_metadata_request/xsk_tx_metadata_complete in
the right places. Since xsk_tx_metadata_{request,_complete}
are static inline, we don't incur any extra overhead doing
indirect calls.

The benefit of this scheme is as follows:
- keeps all metadata layout parsing away from driver code
- makes it easy to grep and see which drivers implement what
- don't need any extra flags to maintain to keep track of what
  offloads are implemented; if the callback is implemented - the offload
  is supported (used by netlink reporting code)

Two offloads are defined right now:
1. XDP_TXMD_FLAGS_CHECKSUM: skb-style csum_start+csum_offset
2. XDP_TXMD_FLAGS_TIMESTAMP: writes TX timestamp back into metadata
   area upon completion (tx_timestamp field)

XDP_TXMD_FLAGS_TIMESTAMP is also implemented for XDP_COPY mode: it writes
SW timestamp from the skb destructor (note I'm reusing hwtstamps to pass
metadata pointer).

The struct is forward-compatible and can be extended in the future
by appending more fields.

Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20231127190319.1190813-3-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Stanislav Fomichev
2f95d28664 xsk: Support tx_metadata_len
For zerocopy mode, tx_desc->addr can point to an arbitrary offset
and carry some TX metadata in the headroom. For copy mode, there
is no way currently to populate skb metadata.

Introduce new tx_metadata_len umem config option that indicates how many
bytes to treat as metadata. Metadata bytes come prior to tx_desc address
(same as in RX case).

The size of the metadata has mostly the same constraints as XDP:
- less than 256 bytes
- 8-byte aligned (compared to 4-byte alignment on xdp, due to 8-byte
  timestamp in the completion)
- non-zero

This data is not interpreted in any way right now.

Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20231127190319.1190813-2-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 19:15:17 -05:00
Jiri Olsa
afb384f685 bpf: Add link_info support for uprobe multi link
Adding support to get uprobe_link details through bpf_link_info
interface.

Adding new struct uprobe_multi to struct bpf_link_info to carry
the uprobe_multi link details.

The uprobe_multi.count is passed from user space to denote size
of array fields (offsets/ref_ctr_offsets/cookies). The actual
array size is stored back to uprobe_multi.count (allowing user
to find out the actual array size) and array fields are populated
up to the user passed size.

All the non-array fields (path/count/flags/pid) are always set.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20231125193130.834322-4-jolsa@kernel.org
2024-01-04 19:15:17 -05:00
Jiri Olsa
467dd7bda5 libbpf: Add st_type argument to elf_resolve_syms_offsets function
We need to get offsets for static variables in following changes,
so making elf_resolve_syms_offsets to take st_type value as argument
and passing it to elf_sym_iter_new.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20231125193130.834322-2-jolsa@kernel.org
2024-01-04 19:15:17 -05:00
Eduard Zingerman
9c794e5ab4 libbpf: Start v1.4 development cycle
Bump libbpf.map to v1.4.0 to start a new libbpf version cycle.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231123000439.12025-1-eddyz87@gmail.com
2024-01-04 19:15:17 -05:00
Jakub Kicinski
eb40a93a10 tools: ynl: add sample for getting page-pool information
Regenerate the tools/ code after netdev spec changes.

Add sample to query page-pool info in a concise fashion:

$ ./page-pool
    eth0[2]	page pools: 10 (zombies: 0)
		refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
		recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)

Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-04 19:15:17 -05:00
Eduard Zingerman
1baa3e2355 ci: move /dev/kvm permissions setup from to actions/vmtest.yml
The vmtest action is used by several workflows: test, pahole, ondemand.
At the same time, vmtest action requires valid access rights to /dev/kvm
and is the only action that uses it.
This commit moves /dev/kvm permissions setup from test workflow to
vmtest action, in order to make sure that setup logic is shared by all
workflows that run vmtest.
Should fix CI failures like [1].

[1] https://github.com/libbpf/libbpf/actions/runs/7104762048/job/19340484589

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2023-12-13 15:50:08 -05:00
Andrii Nakryiko
1b2ae67c1d ci: custom patch to patch out BPF_F_TEST_REG_INVARIANTS flag
Without needing to modify tons of BPF selftests file, make sure we don't
pass BPF_F_TEST_REG_INVARIANTS to kernel, to make BPF selftests work on
4.9 and 5.5 kernels.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-12-05 12:51:08 -05:00
Andrii Nakryiko
20c0a9e3d7 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   155addf0814a92d08fce26a11b27e3315cdba977
Checkpoint bpf-next commit: 750011e239a50873251c16207b0fe78eabf8577e
Baseline bpf commit:        83b9dda8afa4e968d9cce253f390b01c0612a2a5
Checkpoint bpf commit:      bc4fbf022c68967cb49b2b820b465cf90de974b8

Andrii Nakryiko (2):
  bpf: add register bounds sanity checks and sanitization
  bpf: rename BPF_F_TEST_SANITY_STRICT to BPF_F_TEST_REG_INVARIANTS

Jordan Rome (1):
  bpf: Add crosstask check to __bpf_get_stack

 include/uapi/linux/bpf.h | 6 ++++++
 1 file changed, 6 insertions(+)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-11-22 16:20:56 -05:00
Andrii Nakryiko
b88b3ac09d sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-11-22 16:20:56 -05:00
Andrii Nakryiko
96ed1c508f bpf: rename BPF_F_TEST_SANITY_STRICT to BPF_F_TEST_REG_INVARIANTS
Rename verifier internal flag BPF_F_TEST_SANITY_STRICT to more neutral
BPF_F_TEST_REG_INVARIANTS. This is a follow up to [0].

A few selftests and veristat need to be adjusted in the same patch as
well.

  [0] https://patchwork.kernel.org/project/netdevbpf/patch/20231112010609.848406-5-andrii@kernel.org/

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231117171404.225508-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-22 16:20:56 -05:00
Andrii Nakryiko
7ccc41c138 bpf: add register bounds sanity checks and sanitization
Add simple sanity checks that validate well-formed ranges (min <= max)
across u64, s64, u32, and s32 ranges. Also for cases when the value is
constant (either 64-bit or 32-bit), we validate that ranges and tnums
are in agreement.

These bounds checks are performed at the end of BPF_ALU/BPF_ALU64
operations, on conditional jumps, and for LDX instructions (where subreg
zero/sign extension is probably the most important to check). This
covers most of the interesting cases.

Also, we validate the sanity of the return register when manually
adjusting it for some special helpers.

By default, sanity violation will trigger a warning in verifier log and
resetting register bounds to "unbounded" ones. But to aid development
and debugging, BPF_F_TEST_SANITY_STRICT flag is added, which will
trigger hard failure of verification with -EFAULT on register bounds
violations. This allows selftests to catch such issues. veristat will
also gain a CLI option to enable this behavior.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Link: https://lore.kernel.org/r/20231112010609.848406-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-22 16:20:56 -05:00
Jordan Rome
785a079966 bpf: Add crosstask check to __bpf_get_stack
Currently get_perf_callchain only supports user stack walking for
the current task. Passing the correct *crosstask* param will return
0 frames if the task passed to __bpf_get_stack isn't the current
one instead of a single incorrect frame/address. This change
passes the correct *crosstask* param but also does a preemptive
check in __bpf_get_stack if the task is current and returns
-EOPNOTSUPP if it is not.

This issue was found using bpf_get_task_stack inside a BPF
iterator ("iter/task"), which iterates over all tasks.
bpf_get_task_stack works fine for fetching kernel stacks
but because get_perf_callchain relies on the caller to know
if the requested *task* is the current one (via *crosstask*)
it was failing in a confusing way.

It might be possible to get user stacks for all tasks utilizing
something like access_process_vm but that requires the bpf
program calling bpf_get_task_stack to be sleepable and would
therefore be a breaking change.

Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()")
Signed-off-by: Jordan Rome <jordalgo@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231108112334.3433136-1-jordalgo@meta.com
2023-11-22 16:20:56 -05:00
Eduard Zingerman
a6b990991c ci: disable sockopt selftest for 5.5 kernel
The following 'sockopt' selftests fail on libbpf CI for kernel 5.5:
- sockopt/getsockopt: read ctx->optlen:FAIL
- sockopt/getsockopt: support smaller ctx->optlen:FAIL
- sockopt/setsockopt: read ctx->level:FAIL
- sockopt/setsockopt: read ctx->optname:FAIL
- sockopt/setsockopt: read ctx->optlen:FAIL
- sockopt/setsockopt: ctx->optlen == -1 is ok:FAIL

Examples of failing CI runs:
- https://github.com/libbpf/libbpf/actions/runs/6961182067
- https://github.com/libbpf/libbpf/actions/runs/6961088131

The failures are strange as all tests were added quite a while ago
(Jun 27 2019) by commit:

  9ec8a4c9489d ("selftests/bpf: add sockopt test")

But seem to be unrelated to libbpf.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2023-11-22 16:20:43 -05:00
Eduard Zingerman
4161e1f41d ci: disable a number of selftest causing CI for LATEST kernel
All tests disabled in this commit pass on main kernel CI and fail or
flip/flop on libbpf CI. Failures do not seem to be related to libbpf.
It appears that common theme for all failing tests is that hardware
perf events are not delivered as expected on github CI worker
machines.

Examples of failed CI runs:
- https://github.com/libbpf/libbpf/actions/runs/6961182067
- https://github.com/libbpf/libbpf/actions/runs/6961088131

Fails with the following log:

  test_send_signal_common:FAIL:incorrect result \
    unexpected incorrect result: actual 48 != expected 50

Test mode of operation:
- fork'
- child:
  - install handler for SIGUSR1;
  - send ready message to parent;
  - wait for SIGUSR1 in busy loop;
  - send message '2' (50) to parent if SIGUSR1 occured;
  - send message '0' (48) to parent if no SIGUSR1 occured.
- parent:
  - wait for ready message from child;
  - install perf_event or tracepoint bpf program that uses
    bpf_send_signal() to send SIGUSR1;
  - wait for message '0' or '2' from child, '2' is expected for test
    success.

It appears that perf event that should be triggered by parent never
happens, thus message 48 is received by parent and test fails.

Fails with the following log:

  test_and_reset_skel:FAIL:found_vm_exec \
    unexpected found_vm_exec: actual 0 != expected 1

Such log is printed if variables set from BPF program are not set
after some timeout. The program that should set the variable is
SEC("perf_event") int handle_pe(void), it appears that it is never run.

Fails with the following log:

  pe_subtest:FAIL:pe_res1 unexpected pe_res1: actual 0 != expected 1048576

Variable pe_res1 should be triggered by program
SEC("perf_event") int handle_pe(struct pt_regs *ctx),
it appears that it is never run.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2023-11-22 16:20:43 -05:00
Eduard Zingerman
93f360cf4b ci: don't set /dev/kvm permissions when CI user is root
s390 tests are executed on selfhosted runner using root user,
avoid setting /dev/kvm permissions in such case.
This should fix CI failures like [0].
(Still necessary for x86 tests executed on standard github runners).

[0] https://github.com/libbpf/libbpf/actions/runs/6898545987/job/18768732980?pr=752

Fixes: 168630f852 ("ci: give /dev/kvm 0666 permissions inside CI runner")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2023-11-17 15:36:52 -05:00
Eduard Zingerman
5ff0102329 ci: use config.vm for kernel config when present
Recent kernel commit [0] changed selftests config snippets structure
by extracting VM specific options to the file 'config.vm'. This file
has to be used in .github/actions/vmtest/action.yml at step
'Prepare to build BPF selftests', otherwise drivers necessary for e.g.
root file system access are not compiled into the kernel, leading to
CI failures like [1].

[0] b0cf0dcde8ca ("selftests/bpf: Consolidate VIRTIO/9P configs in config.vm file")
[1] https://github.com/libbpf/libbpf/actions/runs/6830439839/job/18578379328?pr=747

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2023-11-16 20:25:07 -05:00
Andrii Nakryiko
0c54691bae ci: apply temporary patch to make bpf-next build
Apply fe69a1b1b6ed ("selftests: bpf: xskxceiver: ksft_print_msg: fix
format type error") to make bpf-next build.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-11-13 21:51:02 -05:00
Eduard Zingerman
168630f852 ci: give /dev/kvm 0666 permissions inside CI runner
Starting recently libbpf CI runs started failing with the following
error:

    ##[group]vm_init - Starting virtual machine...
    Starting VM with 4 CPUs...
    INFO: /dev/kvm exists
    KVM acceleration can be used
    Could not access KVM kernel module: Permission denied
    qemu-system-x86_64: failed to initialize KVM: Permission denied
    ##[error]Process completed with exit code 2.

E.g. see here [0]. The error happens because CI user has not enough
rights to access /dev/kvm. On a regular machine the solution would be
to add user to group 'kvm', however that would require a re-login,
which is cumbersome to achieve in CI setting.
Instead, use a recipe described in [1] to make udev set 0666 access
permissions for /dev/kvm.

[0] https://github.com/libbpf/libbpf/actions/runs/6819530119/job/18547589967?pr=746
[1] https://stackoverflow.com/questions/37300811/android-studio-dev-kvm-device-permission-denied/61984745#61984745

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2023-11-13 18:21:02 -08:00
Eduard Zingerman
5d4237d52d ci: regenerate vmlinux.h
Regenerate latest vmlinux.h for old kernel CI tests.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2023-11-13 18:21:02 -08:00
Eduard Zingerman
fa0e866373 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   0e133a13370389d3894891eafe54fec2c44ad735
Checkpoint bpf-next commit: e80742d917492f10926b46b0caca050c6c9231d6
Baseline bpf commit:        8f8abb863fa5a4cc18955c6a0e17af0ded3e4a76
Checkpoint bpf commit:      83b9dda8afa4e968d9cce253f390b01c0612a2a5

Daniel Borkmann (3):
  netkit, bpf: Add bpf programmable net device
  tools: Sync if_link uapi header
  libbpf: Add link-based API for netkit

Yonghong Song (2):
  libbpf: Fix potential uninitialized tail padding with
    LIBBPF_OPTS_RESET
  bpf: Use named fields for certain bpf uapi structs

 include/uapi/linux/bpf.h     |  37 +++++----
 include/uapi/linux/if_link.h | 141 +++++++++++++++++++++++++++++++++++
 src/bpf.c                    |  16 ++++
 src/bpf.h                    |   5 ++
 src/libbpf.c                 |  39 ++++++++++
 src/libbpf.h                 |  15 ++++
 src/libbpf.map               |   1 +
 src/libbpf_common.h          |  13 ++--
 8 files changed, 246 insertions(+), 21 deletions(-)

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
2023-11-13 18:21:02 -08:00
Yonghong Song
0fa5ff4f54 bpf: Use named fields for certain bpf uapi structs
Martin and Vadim reported a verifier failure with bpf_dynptr usage.
The issue is mentioned but Vadim workarounded the issue with source
change ([1]). The below describes what is the issue and why there
is a verification failure.

  int BPF_PROG(skb_crypto_setup) {
    struct bpf_dynptr algo, key;
    ...

    bpf_dynptr_from_mem(..., ..., 0, &algo);
    ...
  }

The bpf program is using vmlinux.h, so we have the following definition in
vmlinux.h:
  struct bpf_dynptr {
        long: 64;
        long: 64;
  };
Note that in uapi header bpf.h, we have
  struct bpf_dynptr {
        long: 64;
        long: 64;
} __attribute__((aligned(8)));

So we lost alignment information for struct bpf_dynptr by using vmlinux.h.
Let us take a look at a simple program below:
  $ cat align.c
  typedef unsigned long long __u64;
  struct bpf_dynptr_no_align {
        __u64 :64;
        __u64 :64;
  };
  struct bpf_dynptr_yes_align {
        __u64 :64;
        __u64 :64;
  } __attribute__((aligned(8)));

  void bar(void *, void *);
  int foo() {
    struct bpf_dynptr_no_align a;
    struct bpf_dynptr_yes_align b;
    bar(&a, &b);
    return 0;
  }
  $ clang --target=bpf -O2 -S -emit-llvm align.c

Look at the generated IR file align.ll:
  ...
  %a = alloca %struct.bpf_dynptr_no_align, align 1
  %b = alloca %struct.bpf_dynptr_yes_align, align 8
  ...

The compiler dictates the alignment for struct bpf_dynptr_no_align is 1 and
the alignment for struct bpf_dynptr_yes_align is 8. So theoretically compiler
could allocate variable %a with alignment 1 although in reallity the compiler
may choose a different alignment by considering other local variables.

In [1], the verification failure happens because variable 'algo' is allocated
on the stack with alignment 4 (fp-28). But the verifer wants its alignment
to be 8.

To fix the issue, the RFC patch ([1]) tried to add '__attribute__((aligned(8)))'
to struct bpf_dynptr plus other similar structs. Andrii suggested that
we could directly modify uapi struct with named fields like struct 'bpf_iter_num':
  struct bpf_iter_num {
        /* opaque iterator state; having __u64 here allows to preserve correct
         * alignment requirements in vmlinux.h, generated from BTF
         */
        __u64 __opaque[1];
  } __attribute__((aligned(8)));

Indeed, adding named fields for those affected structs in this patch can preserve
alignment when bpf program references them in vmlinux.h. With this patch,
the verification failure in [1] can also be resolved.

  [1] https://lore.kernel.org/bpf/1b100f73-7625-4c1f-3ae5-50ecf84d3ff0@linux.dev/
  [2] https://lore.kernel.org/bpf/20231103055218.2395034-1-yonghong.song@linux.dev/

Cc: Vadim Fedorenko <vadfed@meta.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231104024900.1539182-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-10 13:27:01 -08:00
Yonghong Song
2d5df9f626 libbpf: Fix potential uninitialized tail padding with LIBBPF_OPTS_RESET
Martin reported that there is a libbpf complaining of non-zero-value tail
padding with LIBBPF_OPTS_RESET macro if struct bpf_netkit_opts is modified
to have a 4-byte tail padding. This only happens to clang compiler.
The commend line is: ./test_progs -t tc_netkit_multi_links
Martin and I did some investigation and found this indeed the case and
the following are the investigation details.

Clang:
  clang version 18.0.0
  <I tried clang15/16/17 and they all have similar results>

tools/lib/bpf/libbpf_common.h:
  #define LIBBPF_OPTS_RESET(NAME, ...)                                      \
        do {                                                                \
                memset(&NAME, 0, sizeof(NAME));                             \
                NAME = (typeof(NAME)) {                                     \
                        .sz = sizeof(NAME),                                 \
                        __VA_ARGS__                                         \
                };                                                          \
        } while (0)

  #endif

tools/lib/bpf/libbpf.h:
  struct bpf_netkit_opts {
        /* size of this struct, for forward/backward compatibility */
        size_t sz;
        __u32 flags;
        __u32 relative_fd;
        __u32 relative_id;
        __u64 expected_revision;
        size_t :0;
  };
  #define bpf_netkit_opts__last_field expected_revision
In the above struct bpf_netkit_opts, there is no tail padding.

prog_tests/tc_netkit.c:
  static void serial_test_tc_netkit_multi_links_target(int mode, int target)
  {
        ...
        LIBBPF_OPTS(bpf_netkit_opts, optl);
        ...
        LIBBPF_OPTS_RESET(optl,
                .flags = BPF_F_BEFORE,
                .relative_fd = bpf_program__fd(skel->progs.tc1),
        );
        ...
  }

Let us make the following source change, note that we have a 4-byte
tailing padding now.
  diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
  index 6cd9c501624f..0dd83910ae9a 100644
  --- a/tools/lib/bpf/libbpf.h
  +++ b/tools/lib/bpf/libbpf.h
  @@ -803,13 +803,13 @@ bpf_program__attach_tcx(const struct bpf_program *prog, int ifindex,
   struct bpf_netkit_opts {
        /* size of this struct, for forward/backward compatibility */
        size_t sz;
  -       __u32 flags;
        __u32 relative_fd;
        __u32 relative_id;
        __u64 expected_revision;
  +       __u32 flags;
        size_t :0;
   };
  -#define bpf_netkit_opts__last_field expected_revision
  +#define bpf_netkit_opts__last_field flags

The clang 18 generated asm code looks like below:
    ;       LIBBPF_OPTS_RESET(optl,
    55e3: 48 8d 7d 98                   leaq    -0x68(%rbp), %rdi
    55e7: 31 f6                         xorl    %esi, %esi
    55e9: ba 20 00 00 00                movl    $0x20, %edx
    55ee: e8 00 00 00 00                callq   0x55f3 <serial_test_tc_netkit_multi_links_target+0x18d3>
    55f3: 48 c7 85 10 fd ff ff 20 00 00 00      movq    $0x20, -0x2f0(%rbp)
    55fe: 48 8b 85 68 ff ff ff          movq    -0x98(%rbp), %rax
    5605: 48 8b 78 18                   movq    0x18(%rax), %rdi
    5609: e8 00 00 00 00                callq   0x560e <serial_test_tc_netkit_multi_links_target+0x18ee>
    560e: 89 85 18 fd ff ff             movl    %eax, -0x2e8(%rbp)
    5614: c7 85 1c fd ff ff 00 00 00 00 movl    $0x0, -0x2e4(%rbp)
    561e: 48 c7 85 20 fd ff ff 00 00 00 00      movq    $0x0, -0x2e0(%rbp)
    5629: c7 85 28 fd ff ff 08 00 00 00 movl    $0x8, -0x2d8(%rbp)
    5633: 48 8b 85 10 fd ff ff          movq    -0x2f0(%rbp), %rax
    563a: 48 89 45 98                   movq    %rax, -0x68(%rbp)
    563e: 48 8b 85 18 fd ff ff          movq    -0x2e8(%rbp), %rax
    5645: 48 89 45 a0                   movq    %rax, -0x60(%rbp)
    5649: 48 8b 85 20 fd ff ff          movq    -0x2e0(%rbp), %rax
    5650: 48 89 45 a8                   movq    %rax, -0x58(%rbp)
    5654: 48 8b 85 28 fd ff ff          movq    -0x2d8(%rbp), %rax
    565b: 48 89 45 b0                   movq    %rax, -0x50(%rbp)
    ;       link = bpf_program__attach_netkit(skel->progs.tc2, ifindex, &optl);

At -O0 level, the clang compiler creates an intermediate copy.
We have below to store 'flags' with 4-byte store and leave another 4 byte
in the same 8-byte-aligned storage undefined,
    5629: c7 85 28 fd ff ff 08 00 00 00 movl    $0x8, -0x2d8(%rbp)
and later we store 8-byte to the original zero'ed buffer
    5654: 48 8b 85 28 fd ff ff          movq    -0x2d8(%rbp), %rax
    565b: 48 89 45 b0                   movq    %rax, -0x50(%rbp)

This caused a problem as the 4-byte value at [%rbp-0x2dc, %rbp-0x2e0)
may be garbage.

gcc (gcc 11.4) does not have this issue as it does zeroing struct first before
doing assignments:
  ;       LIBBPF_OPTS_RESET(optl,
    50fd: 48 8d 85 40 fc ff ff          leaq    -0x3c0(%rbp), %rax
    5104: ba 20 00 00 00                movl    $0x20, %edx
    5109: be 00 00 00 00                movl    $0x0, %esi
    510e: 48 89 c7                      movq    %rax, %rdi
    5111: e8 00 00 00 00                callq   0x5116 <serial_test_tc_netkit_multi_links_target+0x1522>
    5116: 48 8b 45 f0                   movq    -0x10(%rbp), %rax
    511a: 48 8b 40 18                   movq    0x18(%rax), %rax
    511e: 48 89 c7                      movq    %rax, %rdi
    5121: e8 00 00 00 00                callq   0x5126 <serial_test_tc_netkit_multi_links_target+0x1532>
    5126: 48 c7 85 40 fc ff ff 00 00 00 00      movq    $0x0, -0x3c0(%rbp)
    5131: 48 c7 85 48 fc ff ff 00 00 00 00      movq    $0x0, -0x3b8(%rbp)
    513c: 48 c7 85 50 fc ff ff 00 00 00 00      movq    $0x0, -0x3b0(%rbp)
    5147: 48 c7 85 58 fc ff ff 00 00 00 00      movq    $0x0, -0x3a8(%rbp)
    5152: 48 c7 85 40 fc ff ff 20 00 00 00      movq    $0x20, -0x3c0(%rbp)
    515d: 89 85 48 fc ff ff             movl    %eax, -0x3b8(%rbp)
    5163: c7 85 58 fc ff ff 08 00 00 00 movl    $0x8, -0x3a8(%rbp)
  ;       link = bpf_program__attach_netkit(skel->progs.tc2, ifindex, &optl);

It is not clear how to resolve the compiler code generation as the compiler
generates correct code w.r.t. how to handle unnamed padding in C standard.
So this patch changed LIBBPF_OPTS_RESET macro to avoid uninitialized tail
padding. We already knows LIBBPF_OPTS macro works on both gcc and clang,
even with tail padding. So LIBBPF_OPTS_RESET is changed to be a
LIBBPF_OPTS followed by a memcpy(), thus avoiding uninitialized tail padding.

The below is asm code generated with this patch and with clang compiler:
    ;       LIBBPF_OPTS_RESET(optl,
    55e3: 48 8d bd 10 fd ff ff          leaq    -0x2f0(%rbp), %rdi
    55ea: 31 f6                         xorl    %esi, %esi
    55ec: ba 20 00 00 00                movl    $0x20, %edx
    55f1: e8 00 00 00 00                callq   0x55f6 <serial_test_tc_netkit_multi_links_target+0x18d6>
    55f6: 48 c7 85 10 fd ff ff 20 00 00 00      movq    $0x20, -0x2f0(%rbp)
    5601: 48 8b 85 68 ff ff ff          movq    -0x98(%rbp), %rax
    5608: 48 8b 78 18                   movq    0x18(%rax), %rdi
    560c: e8 00 00 00 00                callq   0x5611 <serial_test_tc_netkit_multi_links_target+0x18f1>
    5611: 89 85 18 fd ff ff             movl    %eax, -0x2e8(%rbp)
    5617: c7 85 1c fd ff ff 00 00 00 00 movl    $0x0, -0x2e4(%rbp)
    5621: 48 c7 85 20 fd ff ff 00 00 00 00      movq    $0x0, -0x2e0(%rbp)
    562c: c7 85 28 fd ff ff 08 00 00 00 movl    $0x8, -0x2d8(%rbp)
    5636: 48 8b 85 10 fd ff ff          movq    -0x2f0(%rbp), %rax
    563d: 48 89 45 98                   movq    %rax, -0x68(%rbp)
    5641: 48 8b 85 18 fd ff ff          movq    -0x2e8(%rbp), %rax
    5648: 48 89 45 a0                   movq    %rax, -0x60(%rbp)
    564c: 48 8b 85 20 fd ff ff          movq    -0x2e0(%rbp), %rax
    5653: 48 89 45 a8                   movq    %rax, -0x58(%rbp)
    5657: 48 8b 85 28 fd ff ff          movq    -0x2d8(%rbp), %rax
    565e: 48 89 45 b0                   movq    %rax, -0x50(%rbp)
    ;       link = bpf_program__attach_netkit(skel->progs.tc2, ifindex, &optl);

In the above code, a temporary buffer is zeroed and then has proper value assigned.
Finally, values in temporary buffer are copied to the original variable buffer,
hence tail padding is guaranteed to be 0.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20231107201511.2548645-1-yonghong.song@linux.dev
2023-11-10 13:27:01 -08:00
Daniel Borkmann
2cb0236318 libbpf: Add link-based API for netkit
This adds bpf_program__attach_netkit() API to libbpf. Overall it is very
similar to tcx. The API looks as following:

  LIBBPF_API struct bpf_link *
  bpf_program__attach_netkit(const struct bpf_program *prog, int ifindex,
                             const struct bpf_netkit_opts *opts);

The struct bpf_netkit_opts is done in similar way as struct bpf_tcx_opts
for supporting bpf_mprog control parameters. The attach location for the
primary and peer device is derived from the program section "netkit/primary"
and "netkit/peer", respectively.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231024214904.29825-4-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-11-10 13:27:01 -08:00
Daniel Borkmann
cc7f085286 tools: Sync if_link uapi header
Sync if_link uapi header to the latest version as we need the refresher
in tooling for netkit device. Given it's been a while since the last sync
and the diff is fairly big, it has been done as its own commit.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231024214904.29825-3-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-11-10 13:27:01 -08:00
Daniel Borkmann
62b1e4905b netkit, bpf: Add bpf programmable net device
This work adds a new, minimal BPF-programmable device called "netkit"
(former PoC code-name "meta") we recently presented at LSF/MM/BPF. The
core idea is that BPF programs are executed within the drivers xmit routine
and therefore e.g. in case of containers/Pods moving BPF processing closer
to the source.

One of the goals was that in case of Pod egress traffic, this allows to
move BPF programs from hostns tcx ingress into the device itself, providing
earlier drop or forward mechanisms, for example, if the BPF program
determines that the skb must be sent out of the node, then a redirect to
the physical device can take place directly without going through per-CPU
backlog queue. This helps to shift processing for such traffic from softirq
to process context, leading to better scheduling decisions/performance (see
measurements in the slides).

In this initial version, the netkit device ships as a pair, but we plan to
extend this further so it can also operate in single device mode. The pair
comes with a primary and a peer device. Only the primary device, typically
residing in hostns, can manage BPF programs for itself and its peer. The
peer device is designated for containers/Pods and cannot attach/detach
BPF programs. Upon the device creation, the user can set the default policy
to 'pass' or 'drop' for the case when no BPF program is attached.

Additionally, the device can be operated in L3 (default) or L2 mode. The
management of BPF programs is done via bpf_mprog, so that multi-attach is
supported right from the beginning with similar API and dependency controls
as tcx. For details on the latter see commit 053c8e1f235d ("bpf: Add generic
attach/detach/query API for multi-progs"). tc BPF compatibility is provided,
so that existing programs can be easily migrated.

Going forward, we plan to use netkit devices in Cilium as the main device
type for connecting Pods. They will be operated in L3 mode in order to
simplify a Pod's neighbor management and the peer will operate in default
drop mode, so that no traffic is leaving between the time when a Pod is
brought up by the CNI plugin and programs attached by the agent.
Additionally, the programs we attach via tcx on the physical devices are
using bpf_redirect_peer() for inbound traffic into netkit device, hence the
latter is also supporting the ndo_get_peer_dev callback. Similarly, we use
bpf_redirect_neigh() for the way out, pushing from netkit peer to phys device
directly. Also, BIG TCP is supported on netkit device. For the follow-up
work in single device mode, we plan to convert Cilium's cilium_host/_net
devices into a single one.

An extensive test suite for checking device operations and the BPF program
and link management API comes as BPF selftests in this series.

Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://github.com/borkmann/iproute2/tree/pr/netkit
Link: http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf (24ff.)
Link: https://lore.kernel.org/r/20231024214904.29825-2-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-11-10 13:27:01 -08:00
Andrii Nakryiko
3189f70538 docs: attempt to fix .readthedocs.yaml
Seems like we need to update the config ([0],[1]).

  [0] https://blog.readthedocs.com/migrate-configuration-v2/
  [1] https://blog.readthedocs.com/use-build-os-config/

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-10-27 14:07:51 -07:00
Yonghong Song
6a5776066c sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   2147c8d07e1abc8dfc3433ca18eed5295e230ede
Checkpoint bpf-next commit: 0e133a13370389d3894891eafe54fec2c44ad735
Baseline bpf commit:        9ff8d2717fc8f63e5cb226ddbda20649eefa2728
Checkpoint bpf commit:      9ff8d2717fc8f63e5cb226ddbda20649eefa2728

Alexandre Ghiti (1):
  libbpf: Fix syscall access arguments on riscv

Andrii Nakryiko (1):
  libbpf: Don't assume SHT_GNU_verdef presence for SHT_GNU_versym
    section

Daan De Meyer (3):
  bpf: Implement cgroup sockaddr hooks for unix sockets
  libbpf: Add support for cgroup unix socket address hooks
  documentation/bpf: Document cgroup unix socket address hooks

David Vernet (1):
  bpf: Add ability to pin bpf timer to calling CPU

Martynas Pumputis (1):
  bpf: Derive source IP addr via bpf_*_fib_lookup()

 docs/program_types.rst   | 10 ++++++++++
 include/uapi/linux/bpf.h | 27 +++++++++++++++++++++++----
 src/bpf_tracing.h        |  2 --
 src/elf.c                | 16 ++++++++++------
 src/libbpf.c             | 10 ++++++++++
 5 files changed, 53 insertions(+), 12 deletions(-)

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
2023-10-26 09:00:01 -07:00
Yonghong Song
acecaf855d sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
2023-10-19 11:36:22 -07:00
Andrii Nakryiko
365cefa149 libbpf: Don't assume SHT_GNU_verdef presence for SHT_GNU_versym section
Fix too eager assumption that SHT_GNU_verdef ELF section is going to be
present whenever binary has SHT_GNU_versym section. It seems like either
SHT_GNU_verdef or SHT_GNU_verneed can be used, so failing on missing
SHT_GNU_verdef actually breaks use cases in production.

One specific reported issue, which was used to manually test this fix,
was trying to attach to `readline` function in BASH binary.

Fixes: bb7fa09399b9 ("libbpf: Support symbol versioning for uprobe")
Reported-by: Liam Wisehart <liamwisehart@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Manu Bretelle <chantr4@gmail.com>
Reviewed-by: Fangrui Song <maskray@google.com>
Acked-by: Hengqi Chen <hengqi.chen@gmail.com>
Link: https://lore.kernel.org/bpf/20231016182840.4033346-1-andrii@kernel.org
2023-10-19 11:36:22 -07:00
Daan De Meyer
f4b6dcfca1 documentation/bpf: Document cgroup unix socket address hooks
Update the documentation to mention the new cgroup unix sockaddr
hooks.

Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Link: https://lore.kernel.org/r/20231011185113.140426-8-daan.j.demeyer@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-19 11:36:22 -07:00
Daan De Meyer
748787456b libbpf: Add support for cgroup unix socket address hooks
Add the necessary plumbing to hook up the new cgroup unix sockaddr
hooks into libbpf.

Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Link: https://lore.kernel.org/r/20231011185113.140426-6-daan.j.demeyer@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-19 11:36:22 -07:00
Daan De Meyer
8a08d63f29 bpf: Implement cgroup sockaddr hooks for unix sockets
These hooks allows intercepting connect(), getsockname(),
getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
socket hooks get write access to the address length because the
address length is not fixed when dealing with unix sockets and
needs to be modified when a unix socket address is modified by
the hook. Because abstract socket unix addresses start with a
NUL byte, we cannot recalculate the socket address in kernelspace
after running the hook by calculating the length of the unix socket
path using strlen().

These hooks can be used when users want to multiplex syscall to a
single unix socket to multiple different processes behind the scenes
by redirecting the connect() and other syscalls to process specific
sockets.

We do not implement support for intercepting bind() because when
using bind() with unix sockets with a pathname address, this creates
an inode in the filesystem which must be cleaned up. If we rewrite
the address, the user might try to clean up the wrong file, leaking
the socket in the filesystem where it is never cleaned up. Until we
figure out a solution for this (and a use case for intercepting bind()),
we opt to not allow rewriting the sockaddr in bind() calls.

We also implement recvmsg() support for connected streams so that
after a connect() that is modified by a sockaddr hook, any corresponding
recmvsg() on the connected socket can also be modified to make the
connected program think it is connected to the "intended" remote.

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Link: https://lore.kernel.org/r/20231011185113.140426-5-daan.j.demeyer@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-19 11:36:22 -07:00
Martynas Pumputis
c9f8eb5310 bpf: Derive source IP addr via bpf_*_fib_lookup()
Extend the bpf_fib_lookup() helper by making it to return the source
IPv4/IPv6 address if the BPF_FIB_LOOKUP_SRC flag is set.

For example, the following snippet can be used to derive the desired
source IP address:

    struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr };

    ret = bpf_skb_fib_lookup(skb, p, sizeof(p),
            BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_SKIP_NEIGH);
    if (ret != BPF_FIB_LKUP_RET_SUCCESS)
        return TC_ACT_SHOT;

    /* the p.ipv4_src now contains the source address */

The inability to derive the proper source address may cause malfunctions
in BPF-based dataplanes for hosts containing netdevs with more than one
routable IP address or for multi-homed hosts.

For example, Cilium implements packet masquerading in BPF. If an
egressing netdev to which the Cilium's BPF prog is attached has
multiple IP addresses, then only one [hardcoded] IP address can be used for
masquerading. This breaks connectivity if any other IP address should have
been selected instead, for example, when a public and private addresses
are attached to the same egress interface.

The change was tested with Cilium [1].

Nikolay Aleksandrov helped to figure out the IPv6 addr selection.

[1]: https://github.com/cilium/cilium/pull/28283

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Link: https://lore.kernel.org/r/20231007081415.33502-2-m@lambda.lt
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-19 11:36:22 -07:00
David Vernet
1c0358823c bpf: Add ability to pin bpf timer to calling CPU
BPF supports creating high resolution timers using bpf_timer_* helper
functions. Currently, only the BPF_F_TIMER_ABS flag is supported, which
specifies that the timeout should be interpreted as absolute time. It
would also be useful to be able to pin that timer to a core. For
example, if you wanted to make a subset of cores run without timer
interrupts, and only have the timer be invoked on a single core.

This patch adds support for this with a new BPF_F_TIMER_CPU_PIN flag.
When specified, the HRTIMER_MODE_PINNED flag is passed to
hrtimer_start(). A subsequent patch will update selftests to validate.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20231004162339.200702-2-void@manifault.com
2023-10-19 11:36:22 -07:00
Alexandre Ghiti
20c1170ea4 libbpf: Fix syscall access arguments on riscv
Since commit 08d0ce30e0e4 ("riscv: Implement syscall wrappers"), riscv
selects ARCH_HAS_SYSCALL_WRAPPER so let's use the generic implementation
of PT_REGS_SYSCALL_REGS().

Fixes: 08d0ce30e0e4 ("riscv: Implement syscall wrappers")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/bpf/20231004110905.49024-2-bjorn@kernel.org
2023-10-19 11:36:22 -07:00
Yonghong Song
b44eb3a8fa libbpf: fix bpf-checkpoint-commit
The previous sync bpf-checkpoint-commit becomes invalid
due to upstream bpf tree force-push. This patch picked
a new valid commit as the bpf-checkpoint-commit so
the sync script can work with newer changes.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
2023-10-19 11:36:22 -07:00
Yonghong Song
14648264b1 ci: Regenerate latest vmlinux.h for old kernel CI testts
Without the change, we will have failures like below:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h' differs from latest version at 'include/uapi/linux/if_xdp.h'
      progs/getsockname_unix_prog.c:27:15: error: no member named 'uaddrlen' in 'struct bpf_sock_addr_kern'
              if (sa_kern->uaddrlen != unaddrlen)
                  ~~~~~~~  ^
      1 error generated.
      make: *** [Makefile:605: /home/runner/work/libbpf/libbpf/.kernel/tools/testing/selftests/bpf/getsockname_unix_prog.bpf.o] Error 1
      make: *** Waiting for unfinished jobs....
      Error: Process completed with exit code 2.

    in Kernel 5.5.0 on ubuntu-20.04 + selftests

    Manu Bretelle kindly helped regenerate the vmlinux.h from latest
    bpf-next kernel for me.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
2023-10-19 11:36:22 -07:00
Song Liu
e26b84dc33 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   45ee73a0722b9e1d0b7a524d06756291b13b5912
Checkpoint bpf-next commit: 2147c8d07e1abc8dfc3433ca18eed5295e230ede
Baseline bpf commit:        57eb5e1c5c57972c95e8efab6bc81b87161b0b07
Checkpoint bpf commit:      4cb893e89221be9c791e43cab6a8e937cd57e17f

Hengqi Chen (3):
  libbpf: Resolve symbol conflicts at the same offset for uprobe
  libbpf: Support symbol versioning for uprobe
  libbpf: Allow Golang symbols in uprobe secdef

Jiri Olsa (2):
  bpf: Add missed value to kprobe_multi link info
  bpf: Add missed value to kprobe perf link info

Kumar Kartikeya Dwivedi (2):
  libbpf: Refactor bpf_object__reloc_code
  libbpf: Add support for custom exception callbacks

Martin Kelly (8):
  libbpf: Refactor cleanup in ring_buffer__add
  libbpf: Switch rings to array of pointers
  libbpf: Add ring_buffer__ring
  libbpf: Add ring__producer_pos, ring__consumer_pos
  libbpf: Add ring__avail_data_size
  libbpf: Add ring__size
  libbpf: Add ring__map_fd
  libbpf: Add ring__consume

 include/uapi/linux/bpf.h |   2 +
 src/elf.c                | 139 ++++++++++++++++++++++++++---
 src/libbpf.c             | 188 ++++++++++++++++++++++++++++++++-------
 src/libbpf.h             |  73 +++++++++++++++
 src/libbpf.map           |   7 ++
 src/ringbuf.c            |  85 +++++++++++++++---
 6 files changed, 439 insertions(+), 55 deletions(-)

Signed-off-by: Song Liu <song@kernel.org>
2023-10-02 11:17:48 -07:00
Hengqi Chen
9a3a2e9303 libbpf: Allow Golang symbols in uprobe secdef
Golang symbols in ELF files are different from C/C++
which contains special characters like '*', '(' and ')'.
With generics, things get more complicated, there are
symbols like:

  github.com/cilium/ebpf/internal.(*Deque[go.shape.interface { Format(fmt.State, int32); TypeName() string;github.com/cilium/ebpf/btf.copy() github.com/cilium/ebpf/btf.Type}]).Grow

Matching such symbols using `%m[^\n]` in sscanf, this
excludes newline which typically does not appear in ELF
symbols. This should work in most use-cases and also
work for unicode letters in identifiers. If newline do
show up in ELF symbols, users can still attach to such
symbol by specifying bpf_uprobe_opts::func_name.

A working example can be found at this repo ([0]).

  [0]: https://github.com/chenhengqi/libbpf-go-symbols

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230929155954.92448-1-hengqi.chen@gmail.com
2023-10-02 11:17:48 -07:00
Jiri Olsa
96d70a52ad bpf: Add missed value to kprobe perf link info
Add missed value to kprobe attached through perf link info to
hold the stats of missed kprobe handler execution.

The kprobe's missed counter gets incremented when kprobe handler
is not executed due to another kprobe running on the same cpu.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230920213145.1941596-4-jolsa@kernel.org
2023-10-02 11:17:48 -07:00
Jiri Olsa
de02cb1697 bpf: Add missed value to kprobe_multi link info
Add missed value to kprobe_multi link info to hold the stats of missed
kprobe_multi probe.

The missed counter gets incremented when fprobe fails the recursion
check or there's no rethook available for return probe. In either
case the attached bpf program is not executed.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Song Liu <song@kernel.org>
Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230920213145.1941596-3-jolsa@kernel.org
2023-10-02 11:17:48 -07:00
Martin Kelly
b520bcd7d8 libbpf: Add ring__consume
Add ring__consume to consume a single ringbuffer, analogous to
ring_buffer__consume.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-14-martin.kelly@crowdstrike.com
2023-10-02 11:17:48 -07:00
Martin Kelly
6413c2d063 libbpf: Add ring__map_fd
Add ring__map_fd to get the file descriptor underlying a given
ringbuffer.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-12-martin.kelly@crowdstrike.com
2023-10-02 11:17:48 -07:00
Martin Kelly
cd3fe56c75 libbpf: Add ring__size
Add ring__size to get the total size of a given ringbuffer.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-10-martin.kelly@crowdstrike.com
2023-10-02 11:17:48 -07:00
Martin Kelly
3e675ed6ab libbpf: Add ring__avail_data_size
Add ring__avail_data_size for querying the currently available data in
the ringbuffer, similar to the BPF_RB_AVAIL_DATA flag in
bpf_ringbuf_query. This is racy during ongoing operations but is still
useful for overall information on how a ringbuffer is behaving.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-8-martin.kelly@crowdstrike.com
2023-10-02 11:17:48 -07:00
Martin Kelly
2ad16b970a libbpf: Add ring__producer_pos, ring__consumer_pos
Add APIs to get the producer and consumer position for a given
ringbuffer.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-6-martin.kelly@crowdstrike.com
2023-10-02 11:17:48 -07:00
Martin Kelly
a20576f5f2 libbpf: Add ring_buffer__ring
Add a new function ring_buffer__ring, which exposes struct ring * to the
user, representing a single ringbuffer.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-4-martin.kelly@crowdstrike.com
2023-10-02 11:17:48 -07:00
Martin Kelly
bfa471bc85 libbpf: Switch rings to array of pointers
Switch rb->rings to be an array of pointers instead of a contiguous
block. This allows for each ring pointer to be stable after
ring_buffer__add is called, which allows us to expose struct ring * to
the user without gotchas. Without this change, the realloc in
ring_buffer__add could invalidate a struct ring *, making it unsafe to
give to the user.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-3-martin.kelly@crowdstrike.com
2023-10-02 11:17:48 -07:00
Martin Kelly
64f2b4ab49 libbpf: Refactor cleanup in ring_buffer__add
Refactor the cleanup code in ring_buffer__add to use a unified err_out
label. This reduces code duplication, as well as plugging a potential
leak if mmap_sz != (__u64)(size_t)mmap_sz (currently this would miss
unmapping tmp because ringbuf_unmap_ring isn't called).

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-2-martin.kelly@crowdstrike.com
2023-10-02 11:17:48 -07:00
Hengqi Chen
cd91ca8f99 libbpf: Support symbol versioning for uprobe
In current implementation, we assume that symbol found in .dynsym section
would have a version suffix and use it to compare with symbol user supplied.
According to the spec ([0]), this assumption is incorrect, the version info
of dynamic symbols are stored in .gnu.version and .gnu.version_d sections
of ELF objects. For example:

    $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
    000000000009b1a0 T __pthread_rwlock_wrlock@GLIBC_2.2.5
    000000000009b1a0 T pthread_rwlock_wrlock@@GLIBC_2.34
    000000000009b1a0 T pthread_rwlock_wrlock@GLIBC_2.2.5

    $ readelf -W --dyn-syms /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
      706: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 __pthread_rwlock_wrlock@GLIBC_2.2.5
      2568: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@@GLIBC_2.34
      2571: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@GLIBC_2.2.5

In this case, specify pthread_rwlock_wrlock@@GLIBC_2.34 or
pthread_rwlock_wrlock@GLIBC_2.2.5 in bpf_uprobe_opts::func_name won't work.
Because the qualified name does NOT match `pthread_rwlock_wrlock` (without
version suffix) in .dynsym sections.

This commit implements the symbol versioning for dynsym and allows user to
specify symbol in the following forms:
  - func
  - func@LIB_VERSION
  - func@@LIB_VERSION

In case of symbol conflicts, error out and users should resolve it by
specifying a qualified name.

  [0]: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/symversion.html

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230918024813.237475-3-hengqi.chen@gmail.com
2023-10-02 11:17:48 -07:00
Hengqi Chen
df9cd9f69c libbpf: Resolve symbol conflicts at the same offset for uprobe
Dynamic symbols in shared library may have the same name, for example:

    $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
    000000000009b1a0 T __pthread_rwlock_wrlock@GLIBC_2.2.5
    000000000009b1a0 T pthread_rwlock_wrlock@@GLIBC_2.34
    000000000009b1a0 T pthread_rwlock_wrlock@GLIBC_2.2.5

    $ readelf -W --dyn-syms /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
     706: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 __pthread_rwlock_wrlock@GLIBC_2.2.5
    2568: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@@GLIBC_2.34
    2571: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@GLIBC_2.2.5

Currently, users can't attach a uprobe to pthread_rwlock_wrlock because
there are two symbols named pthread_rwlock_wrlock and both are global
bind. And libbpf considers it as a conflict.

Since both of them are at the same offset we could accept one of them
harmlessly. Note that we already does this in elf_resolve_syms_offsets.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230918024813.237475-2-hengqi.chen@gmail.com
2023-10-02 11:17:48 -07:00
Kumar Kartikeya Dwivedi
713d1f5a83 libbpf: Add support for custom exception callbacks
Add support to libbpf to append exception callbacks when loading a
program. The exception callback is found by discovering the declaration
tag 'exception_callback:<value>' and finding the callback in the value
of the tag.

The process is done in two steps. First, for each main program, the
bpf_object__sanitize_and_load_btf function finds and marks its
corresponding exception callback as defined by the declaration tag on
it. Second, bpf_object__reloc_code is modified to append the indicated
exception callback at the end of the instruction iteration (since
exception callback will never be appended in that loop, as it is not
directly referenced).

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230912233214.1518551-16-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-02 11:17:48 -07:00
Kumar Kartikeya Dwivedi
998213a1e3 libbpf: Refactor bpf_object__reloc_code
Refactor bpf_object__append_subprog_code out of bpf_object__reloc_code
to be able to reuse it to append subprog related code for the exception
callback to the main program.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230912233214.1518551-15-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-02 11:17:48 -07:00
Andrii Nakryiko
56069cda78 ci: denylist empty_skb temporary
The fix is in bpf tree. Needs to be merged to bpf-next, on which libbpf
CI is tested.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-09-15 15:57:14 -07:00
Andrii Nakryiko
aadf88d4f6 ci: remove outdated temporary patches
Remove patches, they don't apply and are not needed anymore.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-09-15 15:57:14 -07:00
Andrii Nakryiko
10da3d2384 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   9e3b47abeb8f76c39c570ffc924ac0b35f132274
Checkpoint bpf-next commit: 45ee73a0722b9e1d0b7a524d06756291b13b5912
Baseline bpf commit:        23d775f12dcd23d052a4927195f15e970e27ab26
Checkpoint bpf commit:      57eb5e1c5c57972c95e8efab6bc81b87161b0b07

Andrii Nakryiko (1):
  libbpf: Add basic BTF sanity validation

Ravi Bangoria (1):
  perf/mem: Introduce PERF_MEM_LVLNUM_UNC

Stanislav Fomichev (2):
  bpf: expose information about supported xdp metadata kfunc
  bpf: Clarify error expectations from bpf_clone_redirect

Yonghong Song (2):
  libbpf: Add __percpu_kptr macro definition
  bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated

 include/uapi/linux/bpf.h        |  13 ++-
 include/uapi/linux/netdev.h     |  16 ++++
 include/uapi/linux/perf_event.h |   3 +-
 src/bpf_helpers.h               |   1 +
 src/btf.c                       | 160 ++++++++++++++++++++++++++++++++
 5 files changed, 190 insertions(+), 3 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-09-15 15:57:14 -07:00
Andrii Nakryiko
d2838b2be3 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-09-15 15:57:14 -07:00
Stanislav Fomichev
aa44abfdd2 bpf: Clarify error expectations from bpf_clone_redirect
Commit 151e887d8ff9 ("veth: Fixing transmit return status for dropped
packets") exposed the fact that bpf_clone_redirect is capable of
returning raw NET_XMIT_XXX return codes.

This is in the conflict with its UAPI doc which says the following:
"0 on success, or a negative error in case of failure."

Update the UAPI to reflect the fact that bpf_clone_redirect can
return positive error numbers, but don't explicitly define
their meaning.

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230911194731.286342-1-sdf@google.com
2023-09-15 15:57:14 -07:00
Stanislav Fomichev
6070b1bdcf bpf: expose information about supported xdp metadata kfunc
Add new xdp-rx-metadata-features member to netdev netlink
which exports a bitmask of supported kfuncs. Most of the patch
is autogenerated (headers), the only relevant part is netdev.yaml
and the changes in netdev-genl.c to marshal into netlink.

Example output on veth:

$ ip link add veth0 type veth peer name veth1 # ifndex == 12
$ ./tools/net/ynl/samples/netdev 12

Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 12
   veth1[12]    xdp-features (23): basic redirect rx-sg xdp-rx-metadata-features (3): timestamp hash xdp-zc-max-segs=0

Cc: netdev@vger.kernel.org
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230913171350.369987-3-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-15 15:57:14 -07:00
Yonghong Song
6f30f1a00a bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated
Now 'BPF_MAP_TYPE_CGRP_STORAGE + local percpu ptr'
can cover all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE functionality
and more. So mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated.
Also make changes in selftests/bpf/test_bpftool_synctypes.py
and selftest libbpf_str to fix otherwise test errors.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230827152837.2003563-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15 15:57:14 -07:00
Yonghong Song
332198af03 libbpf: Add __percpu_kptr macro definition
Add __percpu_kptr macro definition in bpf_helpers.h.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230827152800.1998492-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15 15:57:14 -07:00
Andrii Nakryiko
2dbdd3b564 libbpf: Add basic BTF sanity validation
Implement a simple and straightforward BTF sanity check when parsing BTF
data. Right now it's very basic and just validates that all the string
offsets and type IDs are within valid range. For FUNC we also check that
it points to FUNC_PROTO kinds.

Even with such simple checks it fixes a bunch of crashes found by OSS
fuzzer ([0]-[5]) and will allow fuzzer to make further progress.

Some other invariants will be checked in follow up patches (like
ensuring there is no infinite type loops), but this seems like a good
start already.

Adding FUNC -> FUNC_PROTO check revealed that one of selftests has
a problem with FUNC pointing to VAR instead, so fix it up in the same
commit.

  [0] https://github.com/libbpf/libbpf/issues/482
  [1] https://github.com/libbpf/libbpf/issues/483
  [2] https://github.com/libbpf/libbpf/issues/485
  [3] https://github.com/libbpf/libbpf/issues/613
  [4] https://github.com/libbpf/libbpf/issues/618
  [5] https://github.com/libbpf/libbpf/issues/619

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Song Liu <song@kernel.org>
Closes: https://github.com/libbpf/libbpf/issues/617
Link: https://lore.kernel.org/bpf/20230825202152.1813394-1-andrii@kernel.org
2023-09-15 15:57:14 -07:00
Ravi Bangoria
d8a4b198da perf/mem: Introduce PERF_MEM_LVLNUM_UNC
Older API PERF_MEM_LVL_UNC can be replaced by PERF_MEM_LVLNUM_UNC.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230725150206.184-2-ravi.bangoria@amd.com
2023-09-15 15:57:14 -07:00
Andrii Nakryiko
5fc0677111 ci: update list of tests/subtests for 5.5 kernel
Some tests can't succeed on 5.5, which is very old.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-09-07 09:11:51 -07:00
Daniel Müller
295b5726f0 Introduce pull request template
This change introduces a pull request template that hopefully helps
prevent more libbpf-specific pull requests that should really be
submitted to the BPF mailing from being opened against this repository.
Recent examples include [0] [1].

[0] https://github.com/libbpf/libbpf/pull/712
[1] https://github.com/libbpf/libbpf/pull/723

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-09-05 11:08:57 -07:00
Andrii Nakryiko
5a46421ad8 ci: deny newly added tc_bpf/tc_bpf_non_root for 5.5
It doesn't work on 5.5 and was just recently introduced as a new subtest
to already existing test. Add subtest to denylist.

Also clean up old denylist, leaving only "exception" relative to
ALLOWLIST.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-08-25 11:51:28 -07:00
Andrii Nakryiko
942a0b8056 Makefile: silence GCC's bogus complaint about possible NULL in printf
GCC started complaining that some of libbpf pr_warn() statements might
be passing NULL for map name. Map name is never NULL for non-NULL map
pointer, so this is a false positive which triggers build failures.
Silence format-overflow warning altogether to avoid this in the future
as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-08-25 11:51:28 -07:00
Andrii Nakryiko
fcc940e6b2 Makefile: add elf.c to a list of built files
Libbpf now has one more .c file, make sure Github Makefile builds it.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-08-25 11:51:28 -07:00
Andrii Nakryiko
2e6b54e5ea sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   0a55264cf966fb95ebf9d03d9f81fa992f069312
Checkpoint bpf-next commit: 9e3b47abeb8f76c39c570ffc924ac0b35f132274
Baseline bpf commit:        23d775f12dcd23d052a4927195f15e970e27ab26
Checkpoint bpf commit:      23d775f12dcd23d052a4927195f15e970e27ab26

Andrii Nakryiko (1):
  libbpf: fix signedness determination in CO-RE relo handling logic

Daniel Xu (1):
  libbpf: Add bpf_object__unpin()

Hao Luo (1):
  libbpf: Free btf_vmlinux when closing bpf_object

Jiri Olsa (15):
  bpf: Switch BPF_F_KPROBE_MULTI_RETURN macro to enum
  bpf: Add multi uprobe link
  bpf: Add cookies support for uprobe_multi link
  bpf: Add pid filter support for uprobe_multi link
  libbpf: Add uprobe_multi attach type and link names
  libbpf: Move elf_find_func_offset* functions to elf object
  libbpf: Add elf_open/elf_close functions
  libbpf: Add elf symbol iterator
  libbpf: Add elf_resolve_syms_offsets function
  libbpf: Add elf_resolve_pattern_offsets function
  libbpf: Add bpf_link_create support for multi uprobes
  libbpf: Add bpf_program__attach_uprobe_multi function
  libbpf: Add support for u[ret]probe.multi[.s] program sections
  libbpf: Add uprobe multi link detection
  libbpf: Add uprobe multi link support to bpf_program__attach_usdt

 include/uapi/linux/bpf.h |  22 +-
 src/bpf.c                |  11 +
 src/bpf.h                |  11 +-
 src/elf.c                | 440 +++++++++++++++++++++++++++++++++++++++
 src/libbpf.c             | 404 ++++++++++++++++++-----------------
 src/libbpf.h             |  52 +++++
 src/libbpf.map           |   2 +
 src/libbpf_internal.h    |  21 ++
 src/relo_core.c          |   2 +-
 src/usdt.c               | 116 +++++++----
 10 files changed, 853 insertions(+), 228 deletions(-)
 create mode 100644 src/elf.c

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-08-25 11:51:28 -07:00
Andrii Nakryiko
b4c8def45f libbpf: fix signedness determination in CO-RE relo handling logic
Extracting btf_int_encoding() is only meaningful for BTF_KIND_INT, so we
need to check that first before inferring signedness.

Closes: https://github.com/libbpf/libbpf/issues/704
Reported-by: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230824000016.2658017-2-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-25 11:51:28 -07:00
Daniel Xu
62a186ea68 libbpf: Add bpf_object__unpin()
For bpf_object__pin_programs() there is bpf_object__unpin_programs().
Likewise bpf_object__unpin_maps() for bpf_object__pin_maps().

But no bpf_object__unpin() for bpf_object__pin(). Adding the former adds
symmetry to the API.

It's also convenient for cleanup in application code. It's an API I
would've used if it was available for a repro I was writing earlier.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/b2f9d41da4a350281a0b53a804d11b68327e14e5.1692832478.git.dxu@dxuuu.xyz
2023-08-25 11:51:28 -07:00
Hao Luo
a687461867 libbpf: Free btf_vmlinux when closing bpf_object
I hit a memory leak when testing bpf_program__set_attach_target().
Basically, set_attach_target() may allocate btf_vmlinux, for example,
when setting attach target for bpf_iter programs. But btf_vmlinux
is freed only in bpf_object_load(), which means if we only open
bpf object but not load it, setting attach target may leak
btf_vmlinux.

So let's free btf_vmlinux in bpf_object__close() anyway.

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230822193840.1509809-1-haoluo@google.com
2023-08-25 11:51:28 -07:00
Jiri Olsa
74188c1740 libbpf: Add uprobe multi link support to bpf_program__attach_usdt
Adding support for usdt_manager_attach_usdt to use uprobe_multi
link to attach to usdt probes.

The uprobe_multi support is detected before the usdt program is
loaded and its expected_attach_type is set accordingly.

If uprobe_multi support is detected the usdt_manager_attach_usdt
gathers uprobes info and calls bpf_program__attach_uprobe to
create all needed uprobes.

If uprobe_multi support is not detected the old behaviour stays.

Also adding usdt.s program section for sleepable usdt probes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-18-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
60cf42249b libbpf: Add uprobe multi link detection
Adding uprobe-multi link detection. It will be used later in
bpf_program__attach_usdt function to check and use uprobe_multi
link over standard uprobe links.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-17-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
bc829bac06 libbpf: Add support for u[ret]probe.multi[.s] program sections
Adding support for several uprobe_multi program sections
to allow auto attach of multi_uprobe programs.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-16-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
9f76dd6dd0 libbpf: Add bpf_program__attach_uprobe_multi function
Adding bpf_program__attach_uprobe_multi function that
allows to attach multiple uprobes with uprobe_multi link.

The user can specify uprobes with direct arguments:

  binary_path/func_pattern/pid

or with struct bpf_uprobe_multi_opts opts argument fields:

  const char **syms;
  const unsigned long *offsets;
  const unsigned long *ref_ctr_offsets;
  const __u64 *cookies;

User can specify 2 mutually exclusive set of inputs:

 1) use only path/func_pattern/pid arguments

 2) use path/pid with allowed combinations of:
    syms/offsets/ref_ctr_offsets/cookies/cnt

    - syms and offsets are mutually exclusive
    - ref_ctr_offsets and cookies are optional

Any other usage results in error.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-15-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
cd21cc08cc libbpf: Add bpf_link_create support for multi uprobes
Adding new uprobe_multi struct to bpf_link_create_opts object
to pass multiple uprobe data to link_create attr uapi.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-14-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
c7ef3a169e libbpf: Add elf_resolve_pattern_offsets function
Adding elf_resolve_pattern_offsets function that looks up
offsets for symbols specified by pattern argument.

The 'pattern' argument allows wildcards (*?' supported).

Offsets are returned in allocated array together with its
size and needs to be released by the caller.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-13-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
91fd655644 libbpf: Add elf_resolve_syms_offsets function
Adding elf_resolve_syms_offsets function that looks up
offsets for symbols specified in syms array argument.

Offsets are returned in allocated array with the 'cnt' size,
that needs to be released by the caller.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-12-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
b7ec9d9669 libbpf: Add elf symbol iterator
Adding elf symbol iterator object (and some functions) that follow
open-coded iterator pattern and some functions to ease up iterating
elf object symbols.

The idea is to iterate single symbol section with:

  struct elf_sym_iter iter;
  struct elf_sym *sym;

  if (elf_sym_iter_new(&iter, elf, binary_path, SHT_DYNSYM))
        goto error;

  while ((sym = elf_sym_iter_next(&iter))) {
        ...
  }

I considered opening the elf inside the iterator and iterate all symbol
sections, but then it gets more complicated wrt user checks for when
the next section is processed.

Plus side is the we don't need 'exit' function, because caller/user is
in charge of that.

The returned iterated symbol object from elf_sym_iter_next function
is placed inside the struct elf_sym_iter, so no extra allocation or
argument is needed.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-11-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
1f8929293e libbpf: Add elf_open/elf_close functions
Adding elf_open/elf_close functions and using it in
elf_find_func_offset_from_file function. It will be
used in following changes to save some common code.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-10-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
0cd5b05f53 libbpf: Move elf_find_func_offset* functions to elf object
Adding new elf object that will contain elf related functions.
There's no functional change.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-9-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
a1c2e05c4f libbpf: Add uprobe_multi attach type and link names
Adding new uprobe_multi attach type and link names,
so the functions can resolve the new values.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-8-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
c1a12134bd bpf: Add pid filter support for uprobe_multi link
Adding support to specify pid for uprobe_multi link and the uprobes
are created only for task with given pid value.

Using the consumer.filter filter callback for that, so the task gets
filtered during the uprobe installation.

We still need to check the task during runtime in the uprobe handler,
because the handler could get executed if there's another system
wide consumer on the same uprobe (thanks Oleg for the insight).

Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230809083440.3209381-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
12466f75db bpf: Add cookies support for uprobe_multi link
Adding support to specify cookies array for uprobe_multi link.

The cookies array share indexes and length with other uprobe_multi
arrays (offsets/ref_ctr_offsets).

The cookies[i] value defines cookie for i-the uprobe and will be
returned by bpf_get_attach_cookie helper when called from ebpf
program hooked to that specific uprobe.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230809083440.3209381-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
ba4a10d764 bpf: Add multi uprobe link
Adding new multi uprobe link that allows to attach bpf program
to multiple uprobes.

Uprobes to attach are specified via new link_create uprobe_multi
union:

  struct {
    __aligned_u64   path;
    __aligned_u64   offsets;
    __aligned_u64   ref_ctr_offsets;
    __u32           cnt;
    __u32           flags;
  } uprobe_multi;

Uprobes are defined for single binary specified in path and multiple
calling sites specified in offsets array with optional reference
counters specified in ref_ctr_offsets array. All specified arrays
have length of 'cnt'.

The 'flags' supports single bit for now that marks the uprobe as
return probe.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230809083440.3209381-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Jiri Olsa
8765ef8276 bpf: Switch BPF_F_KPROBE_MULTI_RETURN macro to enum
Switching BPF_F_KPROBE_MULTI_RETURN macro to anonymous enum,
so it'd show up in vmlinux.h. There's not functional change
compared to having this as macro.

Acked-by: Yafang Shao <laoar.shao@gmail.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230809083440.3209381-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-25 11:51:28 -07:00
Andrii Nakryiko
6a91da19fe fuzz: use https-based URL for elfutils
For environments behind proxies, having https:// URL for pulling GIT is
more convenient.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-08-24 14:14:18 -07:00
Andrii Nakryiko
383198dc49 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   a3e7e6b17946f48badce98d7ac360678a0ea7393
Checkpoint bpf-next commit: 0a55264cf966fb95ebf9d03d9f81fa992f069312
Baseline bpf commit:        496720b7cfb6574a8f6f4d434f23e3d1e6cfaeb9
Checkpoint bpf commit:      23d775f12dcd23d052a4927195f15e970e27ab26

Alan Maguire (1):
  bpf: sync tools/ uapi header with

Arnaldo Carvalho de Melo (1):
  tools headers uapi: Sync linux/fcntl.h with the kernel sources

Daniel Borkmann (5):
  bpf: Add generic attach/detach/query API for multi-progs
  bpf: Add fd-based tcx multi-prog infra with link support
  libbpf: Add opts-based attach/detach/query API for tcx
  libbpf: Add link-based API for tcx
  libbpf: Add helper macro to clear opts structs

Daniel Xu (1):
  netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link

Dave Marchevsky (1):
  libbpf: Support triple-underscore flavors for kfunc relocation

Jiri Olsa (1):
  bpf: Add support for bpf_get_func_ip helper for uprobe program

Lorenz Bauer (1):
  bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign

Maciej Fijalkowski (1):
  xsk: add new netlink attribute dedicated for ZC max frags

Magnus Karlsson (2):
  selftests/xsk: transmit and receive multi-buffer packets
  selftests/xsk: add basic multi-buffer test

Marco Vedovati (1):
  libbpf: Set close-on-exec flag on gzopen

Sergey Kacheev (1):
  libbpf: Use local includes inside the library

Stanislav Fomichev (1):
  ynl: regenerate all headers

Yafang Shao (2):
  bpf: Support ->fill_link_info for kprobe_multi
  bpf: Support ->fill_link_info for perf_event

Yonghong Song (1):
  bpf: Support new sign-extension load insns

 include/uapi/linux/bpf.h    | 128 +++++++++++++++++++++++++++++++-----
 include/uapi/linux/fcntl.h  |   5 ++
 include/uapi/linux/if_xdp.h |   9 +++
 include/uapi/linux/netdev.h |   4 +-
 src/bpf.c                   | 127 ++++++++++++++++++++++++-----------
 src/bpf.h                   |  97 +++++++++++++++++++++++----
 src/bpf_tracing.h           |   2 +-
 src/libbpf.c                |  94 +++++++++++++++++++++-----
 src/libbpf.h                |  18 ++++-
 src/libbpf.map              |   2 +
 src/libbpf_common.h         |  16 +++++
 src/netlink.c               |   5 ++
 src/usdt.bpf.h              |   4 +-
 13 files changed, 423 insertions(+), 88 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-08-21 13:27:45 -07:00
Andrii Nakryiko
839c08a6d8 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-08-21 13:27:45 -07:00
Dave Marchevsky
6d704c7ffd libbpf: Support triple-underscore flavors for kfunc relocation
The function signature of kfuncs can change at any time due to their
intentional lack of stability guarantees. As kfuncs become more widely
used, BPF program writers will need facilities to support calling
different versions of a kfunc from a single BPF object. Consider this
simplified example based on a real scenario we ran into at Meta:

  /* initial kfunc signature */
  int some_kfunc(void *ptr)

  /* Oops, we need to add some flag to modify behavior. No problem,
    change the kfunc. flags = 0 retains original behavior */
  int some_kfunc(void *ptr, long flags)

If the initial version of the kfunc is deployed on some portion of the
fleet and the new version on the rest, a fleetwide service that uses
some_kfunc will currently need to load different BPF programs depending
on which some_kfunc is available.

Luckily CO-RE provides a facility to solve a very similar problem,
struct definition changes, by allowing program writers to declare
my_struct___old and my_struct___new, with ___suffix being considered a
'flavor' of the non-suffixed name and being ignored by
bpf_core_type_exists and similar calls.

This patch extends the 'flavor' facility to the kfunc extern
relocation process. BPF program writers can now declare

  extern int some_kfunc___old(void *ptr)
  extern int some_kfunc___new(void *ptr, int flags)

then test which version of the kfunc exists with bpf_ksym_exists.
Relocation and verifier's dead code elimination will work in concert as
expected, allowing this pattern:

  if (bpf_ksym_exists(some_kfunc___old))
    some_kfunc___old(ptr);
  else
    some_kfunc___new(ptr, 0);

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230817225353.2570845-1-davemarchevsky@fb.com
2023-08-21 13:27:45 -07:00
Marco Vedovati
20699ecf61 libbpf: Set close-on-exec flag on gzopen
Enable the close-on-exec flag when using gzopen. This is especially important
for multithreaded programs making use of libbpf, where a fork + exec could
race with libbpf library calls, potentially resulting in a file descriptor
leaked to the new process. This got missed in 59842c5451fe ("libbpf: Ensure
libbpf always opens files with O_CLOEXEC").

Fixes: 59842c5451fe ("libbpf: Ensure libbpf always opens files with O_CLOEXEC")
Signed-off-by: Marco Vedovati <marco.vedovati@crowdstrike.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230810214350.106301-1-martin.kelly@crowdstrike.com
2023-08-21 13:27:45 -07:00
Jiri Olsa
cd85f34103 bpf: Add support for bpf_get_func_ip helper for uprobe program
Adding support for bpf_get_func_ip helper for uprobe program to return
probed address for both uprobe and return uprobe.

We discussed this in [1] and agreed that uprobe can have special use
of bpf_get_func_ip helper that differs from kprobe.

The kprobe bpf_get_func_ip returns:
  - address of the function if probe is attach on function entry
    for both kprobe and return kprobe
  - 0 if the probe is not attach on function entry

The uprobe bpf_get_func_ip returns:
  - address of the probe for both uprobe and return uprobe

The reason for this semantic change is that kernel can't really tell
if the probe user space address is function entry.

The uprobe program is actually kprobe type program attached as uprobe.
One of the consequences of this design is that uprobes do not have its
own set of helpers, but share them with kprobes.

As we need different functionality for bpf_get_func_ip helper for uprobe,
I'm adding the bool value to the bpf_trace_run_ctx, so the helper can
detect that it's executed in uprobe context and call specific code.

The is_uprobe bool is set as true in bpf_prog_run_array_sleepable, which
is currently used only for executing bpf programs in uprobe.

Renaming bpf_prog_run_array_sleepable to bpf_prog_run_array_uprobe
to address that it's only used for uprobes and that it sets the
run_ctx.is_uprobe as suggested by Yafang Shao.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
[1] https://lore.kernel.org/bpf/CAEf4BzZ=xLVkG5eurEuvLU79wAMtwho7ReR+XJAgwhFF4M-7Cg@mail.gmail.com/
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Viktor Malik <vmalik@redhat.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230807085956.2344866-2-jolsa@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-21 13:27:45 -07:00
Sergey Kacheev
26e32f542b libbpf: Use local includes inside the library
In our monrepo, we try to minimize special processing when importing
(aka vendor) third-party source code. Ideally, we try to import
directly from the repositories with the code without changing it, we
try to stick to the source code dependency instead of the artifact
dependency. In the current situation, a patch has to be made for
libbpf to fix the includes in bpf headers so that they work directly
from libbpf/src.

Signed-off-by: Sergey Kacheev <s.kacheev@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/CAJVhQqUg6OKq6CpVJP5ng04Dg+z=igevPpmuxTqhsR3dKvd9+Q@mail.gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-21 13:27:45 -07:00
Daniel Xu
c5f64030de netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
This commit adds support for enabling IP defrag using pre-existing
netfilter defrag support. Basically all the flag does is bump a refcnt
while the link the active. Checks are also added to ensure the prog
requesting defrag support is run _after_ netfilter defrag hooks.

We also take care to avoid any issues w.r.t. module unloading -- while
defrag is active on a link, the module is prevented from unloading.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/5cff26f97e55161b7d56b09ddcf5f8888a5add1d.1689970773.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Yonghong Song
3d0e1c5a3a bpf: Support new sign-extension load insns
Add interpreter/jit support for new sign-extension load insns
which adds a new mode (BPF_MEMSX).
Also add verifier support to recognize these insns and to
do proper verification with new insns. In verifier, besides
to deduce proper bounds for the dst_reg, probed memory access
is also properly handled.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011156.3711870-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Lorenz Bauer
36cabf8a4a bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT
sockets. This means we can't use the helper to steer traffic to Envoy,
which configures SO_REUSEPORT on its sockets. In turn, we're blocked
from removing TPROXY from our setup.

The reason that bpf_sk_assign refuses such sockets is that the
bpf_sk_lookup helpers don't execute SK_REUSEPORT programs. Instead,
one of the reuseport sockets is selected by hash. This could cause
dispatch to the "wrong" socket:

    sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash
    bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed

Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup
helpers unfortunately. In the tc context, L2 headers are at the start
of the skb, while SK_REUSEPORT expects L3 headers instead.

Instead, we execute the SK_REUSEPORT program when the assigned socket
is pulled out of the skb, further up the stack. This creates some
trickiness with regards to refcounting as bpf_sk_assign will put both
refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU
freed. We can infer that the sk_assigned socket is RCU freed if the
reuseport lookup succeeds, but convincing yourself of this fact isn't
straight forward. Therefore we defensively check refcounting on the
sk_assign sock even though it's probably not required in practice.

Fixes: 8e368dc72e86 ("bpf: Fix use of sk->sk_reuseport from sk_assign")
Fixes: cf7fbe660f2d ("bpf: Add socket assign support")
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@cilium.io>
Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-7-7021b683cdae@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-21 13:27:45 -07:00
Stanislav Fomichev
1180ab4066 ynl: regenerate all headers
Also add support to pass topdir to ynl-regen.sh (Jakub) and call
it from the makefile to update the UAPI headers.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Co-developed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230727163001.3952878-4-sdf@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-21 13:27:45 -07:00
Arnaldo Carvalho de Melo
e6ab647970 tools headers uapi: Sync linux/fcntl.h with the kernel sources
To get the changes in:

  96b2b072ee62be8a ("exportfs: allow exporting non-decodeable file handles to userspace")

That don't add anything that is handled by existing hard coded tables or
table generation scripts.

This silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZK11P5AwRBUxxutI@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-21 13:27:45 -07:00
Alan Maguire
71d8eadb90 bpf: sync tools/ uapi header with
Seeing the following:

Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h'

...so sync tools version missing some list_node/rb_tree fields.

Fixes: c3c510ce431c ("bpf: Add 'owner' field to bpf_{list,rb}_node")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20230719162257.20818-1-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Daniel Borkmann
488031955d libbpf: Add helper macro to clear opts structs
Add a small and generic LIBBPF_OPTS_RESET() helper macros which clears an
opts structure and reinitializes its .sz member to place the structure
size. Additionally, the user can pass option-specific data to reinitialize
via varargs.

I found this very useful when developing selftests, but it is also generic
enough as a macro next to the existing LIBBPF_OPTS() which hides the .sz
initialization, too.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20230719140858.13224-6-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Daniel Borkmann
0fadd4ba39 libbpf: Add link-based API for tcx
Implement tcx BPF link support for libbpf.

The bpf_program__attach_fd() API has been refactored slightly in order to pass
bpf_link_create_opts pointer as input.

A new bpf_program__attach_tcx() has been added on top of this which allows for
passing all relevant data via extensible struct bpf_tcx_opts.

The program sections tcx/ingress and tcx/egress correspond to the hook locations
for tc ingress and egress, respectively.

For concrete usage examples, see the extensive selftests that have been
developed as part of this series.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230719140858.13224-5-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Daniel Borkmann
bb5d7c1be8 libbpf: Add opts-based attach/detach/query API for tcx
Extend libbpf attach opts and add a new detach opts API so this can be used
to add/remove fd-based tcx BPF programs. The old-style bpf_prog_detach() and
bpf_prog_detach2() APIs are refactored to reuse the new bpf_prog_detach_opts()
internally.

The bpf_prog_query_opts() API got extended to be able to handle the new
link_ids, link_attach_flags and revision fields.

For concrete usage examples, see the extensive selftests that have been
developed as part of this series.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230719140858.13224-4-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Daniel Borkmann
b064c40d94 bpf: Add fd-based tcx multi-prog infra with link support
This work refactors and adds a lightweight extension ("tcx") to the tc BPF
ingress and egress data path side for allowing BPF program management based
on fds via bpf() syscall through the newly added generic multi-prog API.
The main goal behind this work which we also presented at LPC [0] last year
and a recent update at LSF/MM/BPF this year [3] is to support long-awaited
BPF link functionality for tc BPF programs, which allows for a model of safe
ownership and program detachment.

Given the rise in tc BPF users in cloud native environments, this becomes
necessary to avoid hard to debug incidents either through stale leftover
programs or 3rd party applications accidentally stepping on each others toes.
As a recap, a BPF link represents the attachment of a BPF program to a BPF
hook point. The BPF link holds a single reference to keep BPF program alive.
Moreover, hook points do not reference a BPF link, only the application's
fd or pinning does. A BPF link holds meta-data specific to attachment and
implements operations for link creation, (atomic) BPF program update,
detachment and introspection. The motivation for BPF links for tc BPF programs
is multi-fold, for example:

  - From Meta: "It's especially important for applications that are deployed
    fleet-wide and that don't "control" hosts they are deployed to. If such
    application crashes and no one notices and does anything about that, BPF
    program will keep running draining resources or even just, say, dropping
    packets. We at FB had outages due to such permanent BPF attachment
    semantics. With fd-based BPF link we are getting a framework, which allows
    safe, auto-detachable behavior by default, unless application explicitly
    opts in by pinning the BPF link." [1]

  - From Cilium-side the tc BPF programs we attach to host-facing veth devices
    and phys devices build the core datapath for Kubernetes Pods, and they
    implement forwarding, load-balancing, policy, EDT-management, etc, within
    BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently
    experienced hard-to-debug issues in a user's staging environment where
    another Kubernetes application using tc BPF attached to the same prio/handle
    of cls_bpf, accidentally wiping all Cilium-based BPF programs from underneath
    it. The goal is to establish a clear/safe ownership model via links which
    cannot accidentally be overridden. [0,2]

BPF links for tc can co-exist with non-link attachments, and the semantics are
in line also with XDP links: BPF links cannot replace other BPF links, BPF
links cannot replace non-BPF links, non-BPF links cannot replace BPF links and
lastly only non-BPF links can replace non-BPF links. In case of Cilium, this
would solve mentioned issue of safe ownership model as 3rd party applications
would not be able to accidentally wipe Cilium programs, even if they are not
BPF link aware.

Earlier attempts [4] have tried to integrate BPF links into core tc machinery
to solve cls_bpf, which has been intrusive to the generic tc kernel API with
extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could
be wiped from the qdisc also. Locking a tc BPF program in place this way, is
getting into layering hacks given the two object models are vastly different.

We instead implemented the tcx (tc 'express') layer which is an fd-based tc BPF
attach API, so that the BPF link implementation blends in naturally similar to
other link types which are fd-based and without the need for changing core tc
internal APIs. BPF programs for tc can then be successively migrated from classic
cls_bpf to the new tc BPF link without needing to change the program's source
code, just the BPF loader mechanics for attaching is sufficient.

For the current tc framework, there is no change in behavior with this change
and neither does this change touch on tc core kernel APIs. The gist of this
patch is that the ingress and egress hook have a lightweight, qdisc-less
extension for BPF to attach its tc BPF programs, in other words, a minimal
entry point for tc BPF. The name tcx has been suggested from discussion of
earlier revisions of this work as a good fit, and to more easily differ between
the classic cls_bpf attachment and the fd-based one.

For the ingress and egress tcx points, the device holds a cache-friendly array
with program pointers which is separated from control plane (slow-path) data.
Earlier versions of this work used priority to determine ordering and expression
of dependencies similar as with classic tc, but it was challenged that for
something more future-proof a better user experience is required. Hence this
resulted in the design and development of the generic attach/detach/query API
for multi-progs. See prior patch with its discussion on the API design. tcx is
the first user and later we plan to integrate also others, for example, one
candidate is multi-prog support for XDP which would benefit and have the same
'look and feel' from API perspective.

The goal with tcx is to have maximum compatibility to existing tc BPF programs,
so they don't need to be rewritten specifically. Compatibility to call into
classic tcf_classify() is also provided in order to allow successive migration
or both to cleanly co-exist where needed given its all one logical tc layer and
the tcx plus classic tc cls/act build one logical overall processing pipeline.

tcx supports the simplified return codes TCX_NEXT which is non-terminating (go
to next program) and terminating ones with TCX_PASS, TCX_DROP, TCX_REDIRECT.
The fd-based API is behind a static key, so that when unused the code is also
not entered. The struct tcx_entry's program array is currently static, but
could be made dynamic if necessary at a point in future. The a/b pair swap
design has been chosen so that for detachment there are no allocations which
otherwise could fail.

The work has been tested with tc-testing selftest suite which all passes, as
well as the tc BPF tests from the BPF CI, and also with Cilium's L4LB.

Thanks also to Nikolay Aleksandrov and Martin Lau for in-depth early reviews
of this work.

  [0] https://lpc.events/event/16/contributions/1353/
  [1] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com
  [2] https://colocatedeventseu2023.sched.com/event/1Jo6O/tales-from-an-ebpf-programs-murder-mystery-hemanth-malla-guillaume-fournier-datadog
  [3] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
  [4] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230719140858.13224-3-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Daniel Borkmann
d7e583a6ea bpf: Add generic attach/detach/query API for multi-progs
This adds a generic layer called bpf_mprog which can be reused by different
attachment layers to enable multi-program attachment and dependency resolution.
In-kernel users of the bpf_mprog don't need to care about the dependency
resolution internals, they can just consume it with few API calls.

The initial idea of having a generic API sparked out of discussion [0] from an
earlier revision of this work where tc's priority was reused and exposed via
BPF uapi as a way to coordinate dependencies among tc BPF programs, similar
as-is for classic tc BPF. The feedback was that priority provides a bad user
experience and is hard to use [1], e.g.:

  I cannot help but feel that priority logic copy-paste from old tc, netfilter
  and friends is done because "that's how things were done in the past". [...]
  Priority gets exposed everywhere in uapi all the way to bpftool when it's
  right there for users to understand. And that's the main problem with it.

  The user don't want to and don't need to be aware of it, but uapi forces them
  to pick the priority. [...] Your cover letter [0] example proves that in
  real life different service pick the same priority. They simply don't know
  any better. Priority is an unnecessary magic that apps _have_ to pick, so
  they just copy-paste and everyone ends up using the same.

The course of the discussion showed more and more the need for a generic,
reusable API where the "same look and feel" can be applied for various other
program types beyond just tc BPF, for example XDP today does not have multi-
program support in kernel, but also there was interest around this API for
improving management of cgroup program types. Such common multi-program
management concept is useful for BPF management daemons or user space BPF
applications coordinating internally about their attachments.

Both from Cilium and Meta side [2], we've collected the following requirements
for a generic attach/detach/query API for multi-progs which has been implemented
as part of this work:

  - Support prog-based attach/detach and link API
  - Dependency directives (can also be combined):
    - BPF_F_{BEFORE,AFTER} with relative_{fd,id} which can be {prog,link,none}
      - BPF_F_ID flag as {fd,id} toggle; the rationale for id is so that user
        space application does not need CAP_SYS_ADMIN to retrieve foreign fds
        via bpf_*_get_fd_by_id()
      - BPF_F_LINK flag as {prog,link} toggle
      - If relative_{fd,id} is none, then BPF_F_BEFORE will just prepend, and
        BPF_F_AFTER will just append for attaching
      - Enforced only at attach time
    - BPF_F_REPLACE with replace_bpf_fd which can be prog, links have their
      own infra for replacing their internal prog
    - If no flags are set, then it's default append behavior for attaching
  - Internal revision counter and optionally being able to pass expected_revision
  - User space application can query current state with revision, and pass it
    along for attachment to assert current state before doing updates
  - Query also gets extension for link_ids array and link_attach_flags:
    - prog_ids are always filled with program IDs
    - link_ids are filled with link IDs when link was used, otherwise 0
    - {prog,link}_attach_flags for holding {prog,link}-specific flags
  - Must be easy to integrate/reuse for in-kernel users

The uapi-side changes needed for supporting bpf_mprog are rather minimal,
consisting of the additions of the attachment flags, revision counter, and
expanding existing union with relative_{fd,id} member.

The bpf_mprog framework consists of an bpf_mprog_entry object which holds
an array of bpf_mprog_fp (fast-path structure). The bpf_mprog_cp (control-path
structure) is part of bpf_mprog_bundle. Both have been separated, so that
fast-path gets efficient packing of bpf_prog pointers for maximum cache
efficiency. Also, array has been chosen instead of linked list or other
structures to remove unnecessary indirections for a fast point-to-entry in
tc for BPF.

The bpf_mprog_entry comes as a pair via bpf_mprog_bundle so that in case of
updates the peer bpf_mprog_entry is populated and then just swapped which
avoids additional allocations that could otherwise fail, for example, in
detach case. bpf_mprog_{fp,cp} arrays are currently static, but they could
be converted to dynamic allocation if necessary at a point in future.
Locking is deferred to the in-kernel user of bpf_mprog, for example, in case
of tcx which uses this API in the next patch, it piggybacks on rtnl.

An extensive test suite for checking all aspects of this API for prog-based
attach/detach and link API comes as BPF selftests in this series.

Thanks also to Andrii Nakryiko for early API discussions wrt Meta's BPF prog
management.

  [0] https://lore.kernel.org/bpf/20221004231143.19190-1-daniel@iogearbox.net
  [1] https://lore.kernel.org/bpf/CAADnVQ+gEY3FjCR=+DmjDR4gp5bOYZUFJQXj4agKFHT9CQPZBw@mail.gmail.com
  [2] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20230719140858.13224-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Magnus Karlsson
071630384b selftests/xsk: add basic multi-buffer test
Add the first basic multi-buffer test that sends a stream of 9K
packets and validates that they are received at the other end. In
order to enable sending and receiving multi-buffer packets, code that
sets the MTU is introduced as well as modifications to the XDP
programs so that they signal that they are multi-buffer enabled.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-20-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Magnus Karlsson
658b107d4d selftests/xsk: transmit and receive multi-buffer packets
Add the ability to send and receive packets that are larger than the
size of a umem frame, using the AF_XDP /XDP multi-buffer
support. There are three pieces of code that need to be changed to
achieve this: the Rx path, the Tx path, and the validation logic.

Both the Rx path and Tx could only deal with a single fragment per
packet. The Tx path is extended with a new function called
pkt_nb_frags() that can be used to retrieve the number of fragments a
packet will consume. We then create these many fragments in a loop and
fill the N-1 first ones to the max size limit to use the buffer space
efficiently, and the Nth one with whatever data that is left. This
goes on until we have filled in at the most BATCH_SIZE worth of
descriptors and fragments. If we detect that the next packet would
lead to BATCH_SIZE number of fragments sent being exceeded, we do not
send this packet and finish the batch. This packet is instead sent in
the next iteration of BATCH_SIZE fragments.

For Rx, we loop over all fragments we receive as usual, but for every
descriptor that we receive we call a new validation function called
is_frag_valid() to validate the consistency of this fragment. The code
then checks if the packet continues in the next frame. If so, it loops
over the next packet and performs the same validation. once we have
received the last fragment of the packet we also call the function
is_pkt_valid() to validate the packet as a whole. If we get to the end
of the batch and we are not at the end of the current packet, we back
out the partial packet and end the loop. Once we get into the receive
loop next time, we start over from the beginning of that packet. This
so the code becomes simpler at the cost of some performance.

The validation function is_frag_valid() checks that the sequence and
packet numbers are correct at the start and end of each fragment.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-19-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Maciej Fijalkowski
8ae70bcbdf xsk: add new netlink attribute dedicated for ZC max frags
Introduce new netlink attribute NETDEV_A_DEV_XDP_ZC_MAX_SEGS that will
carry maximum fragments that underlying ZC driver is able to handle on
TX side. It is going to be included in netlink response only when driver
supports ZC. Any value higher than 1 implies multi-buffer ZC support on
underlying device.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-11-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Yafang Shao
4cd8e50d37 bpf: Support ->fill_link_info for perf_event
By introducing support for ->fill_link_info to the perf_event link, users
gain the ability to inspect it using `bpftool link show`. While the current
approach involves accessing this information via `bpftool perf show`,
consolidating link information for all link types in one place offers
greater convenience. Additionally, this patch extends support to the
generic perf event, which is not currently accommodated by
`bpftool perf show`. While only the perf type and config are exposed to
userspace, other attributes such as sample_period and sample_freq are
ignored. It's important to note that if kptr_restrict is not permitted, the
probed address will not be exposed, maintaining security measures.

A new enum bpf_perf_event_type is introduced to help the user understand
which struct is relevant.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230709025630.3735-9-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
Yafang Shao
b89ede420b bpf: Support ->fill_link_info for kprobe_multi
With the addition of support for fill_link_info to the kprobe_multi link,
users will gain the ability to inspect it conveniently using the
`bpftool link show`. This enhancement provides valuable information to the
user, including the count of probed functions and their respective
addresses. It's important to note that if the kptr_restrict setting is not
permitted, the probed address will not be exposed, ensuring security.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230709025630.3735-2-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 13:27:45 -07:00
thiagoftsm
360a2fd909 Merge branch 'libbpf:master' into master 2023-07-12 12:10:00 +00:00
Andrii Nakryiko
05f94ddbb8 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   c628747cc8800cf6d33d09f7f42c8b6f91e64dc7
Checkpoint bpf-next commit: a3e7e6b17946f48badce98d7ac360678a0ea7393
Baseline bpf commit:        496720b7cfb6574a8f6f4d434f23e3d1e6cfaeb9
Checkpoint bpf commit:      496720b7cfb6574a8f6f4d434f23e3d1e6cfaeb9

Andrii Nakryiko (1):
  libbpf: Fix realloc API handling in zero-sized edge cases

John Sanpe (1):
  libbpf: Remove HASHMAP_INIT static initialization helper

 src/hashmap.h | 10 ----------
 src/libbpf.c  | 15 ++++++++++++---
 src/usdt.c    |  5 ++++-
 3 files changed, 16 insertions(+), 14 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-07-11 10:03:25 -07:00
John Sanpe
bf88aaa6fe libbpf: Remove HASHMAP_INIT static initialization helper
Remove the wrong HASHMAP_INIT. It's not used anywhere in libbpf.

Signed-off-by: John Sanpe <sanpeqf@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230711070712.2064144-1-sanpeqf@gmail.com
2023-07-11 10:03:25 -07:00
Andrii Nakryiko
f117080307 libbpf: Fix realloc API handling in zero-sized edge cases
realloc() and reallocarray() can either return NULL or a special
non-NULL pointer, if their size argument is zero. This requires a bit
more care to handle NULL-as-valid-result situation differently from
NULL-as-error case. This has caused real issues before ([0]), and just
recently bit again in production when performing bpf_program__attach_usdt().

This patch fixes 4 places that do or potentially could suffer from this
mishandling of NULL, including the reported USDT-related one.

There are many other places where realloc()/reallocarray() is used and
NULL is always treated as an error value, but all those have guarantees
that their size is always non-zero, so those spot don't need any extra
handling.

  [0] d08ab82f59d5 ("libbpf: Fix double-free when linker processes empty sections")

Fixes: 999783c8bbda ("libbpf: Wire up spec management and other arch-independent USDT logic")
Fixes: b63b3c490eee ("libbpf: Add bpf_program__set_insns function")
Fixes: 697f104db8a6 ("libbpf: Support custom SEC() handlers")
Fixes: b12688267280 ("libbpf: Change the order of data and text relocations.")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230711024150.1566433-1-andrii@kernel.org
2023-07-11 10:03:25 -07:00
thiagoftsm
8b905090e8 Merge branch 'libbpf:master' into master 2023-07-10 22:36:12 +00:00
Andrii Nakryiko
6c020e6c47 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   856fe03d929205b4c8c8fa51296342cd85592e3f
Checkpoint bpf-next commit: c628747cc8800cf6d33d09f7f42c8b6f91e64dc7
Baseline bpf commit:        496720b7cfb6574a8f6f4d434f23e3d1e6cfaeb9
Checkpoint bpf commit:      496720b7cfb6574a8f6f4d434f23e3d1e6cfaeb9

Andrii Nakryiko (1):
  libbpf: only reset sec_def handler when necessary

 src/libbpf.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-07-10 14:24:42 -07:00
Andrii Nakryiko
1743bd1e40 libbpf: only reset sec_def handler when necessary
Don't reset recorded sec_def handler unconditionally on
bpf_program__set_type(). There are two situations where this is wrong.

First, if the program type didn't actually change. In that case original
SEC handler should work just fine.

Second, catch-all custom SEC handler is supposed to work with any BPF
program type and SEC() annotation, so it also doesn't make sense to
reset that.

This patch fixes both issues. This was reported recently in the context
of breaking perf tool, which uses custom catch-all handler for fancy BPF
prologue generation logic. This patch should fix the issue.

  [0] https://lore.kernel.org/linux-perf-users/ab865e6d-06c5-078e-e404-7f90686db50d@amd.com/

Fixes: d6e6286a12e7 ("libbpf: disassociate section handler on explicit bpf_program__set_type() call")
Reported-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230707231156.1711948-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-10 14:24:42 -07:00
Andrii Nakryiko
a2258003f2 ci: install headers before building selftests
Ensure latest kernel headers are available. Similar to [0].

  [0] https://github.com/libbpf/ci/pull/102

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-07-07 18:55:44 -07:00
Andrii Nakryiko
add1aac281 ci: add kprobe_multi_bench_attach to DENYLIST
It is suspected to be causing kernel crashes in libbpf CI, which we
don't see in kernel-patches CI.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-07-07 18:55:44 -07:00
Andrii Nakryiko
ea27ebcffd sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   25085b4e9251c77758964a8e8651338972353642
Checkpoint bpf-next commit: 856fe03d929205b4c8c8fa51296342cd85592e3f
Baseline bpf commit:        ad96f1c9138e0897bee7f7c5e54b3e24f8b62f57
Checkpoint bpf commit:      496720b7cfb6574a8f6f4d434f23e3d1e6cfaeb9

Andrea Terzolo (1):
  libbpf: Skip modules BTF loading when CAP_SYS_ADMIN is missing

Florian Westphal (1):
  libbpf: Add netfilter link attach helper

Jackie Liu (2):
  libbpf: Cross-join available_filter_functions and kallsyms for
    multi-kprobes
  libbpf: Use available_filter_functions_addrs with multi-kprobes

 src/bpf.c      |   8 ++
 src/bpf.h      |   6 ++
 src/libbpf.c   | 216 ++++++++++++++++++++++++++++++++++++++++++++++---
 src/libbpf.h   |  15 ++++
 src/libbpf.map |   1 +
 5 files changed, 233 insertions(+), 13 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-07-07 18:55:44 -07:00
Jackie Liu
b9c4ad5468 libbpf: Use available_filter_functions_addrs with multi-kprobes
Now that kernel provides a new available_filter_functions_addrs file
which can help us avoid the need to cross-validate
available_filter_functions and kallsyms, we can improve efficiency of
multi-attach kprobes. For example, on my device, the sample program [1]
of start time:

$ sudo ./funccount "tcp_*"

before   after
1.2s     1.0s

  [1]: https://github.com/JackieLiu1/ketones/tree/master/src/funccount

Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230705091209.3803873-2-liu.yun@linux.dev
2023-07-07 18:55:44 -07:00
Jackie Liu
732c4c6df2 libbpf: Cross-join available_filter_functions and kallsyms for multi-kprobes
When using regular expression matching with "kprobe multi", it scans all
the functions under "/proc/kallsyms" that can be matched. However, not all
of them can be traced by kprobe.multi. If any one of the functions fails
to be traced, it will result in the failure of all functions. The best
approach is to filter out the functions that cannot be traced to ensure
proper tracking of the functions.

Closes: https://lore.kernel.org/oe-kbuild-all/202307030355.TdXOHklM-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Jiri Olsa <jolsa@kernel.org>
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230705091209.3803873-1-liu.yun@linux.dev
2023-07-07 18:55:44 -07:00
Florian Westphal
6bec18258c libbpf: Add netfilter link attach helper
Add new api function: bpf_program__attach_netfilter.

It takes a bpf program (netfilter type), and a pointer to a option struct
that contains the desired attachment (protocol family, priority, hook
location, ...).

It returns a pointer to a 'bpf_link' structure or NULL on error.

Next patch adds new netfilter_basic test that uses this function to
attach a program to a few pf/hook/priority combinations.

v2: change name and use bpf_link_create.

Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/bpf/CAEf4BzZrmUv27AJp0dDxBDMY_B8e55-wLs8DUKK69vCWsCG_pQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/CAEf4BzZ69YgrQW7DHCJUT_X+GqMq_ZQQPBwopaJJVGFD5=d5Vg@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20230628152738.22765-2-fw@strlen.de
2023-07-07 18:55:44 -07:00
Andrea Terzolo
3f33f9a6b8 libbpf: Skip modules BTF loading when CAP_SYS_ADMIN is missing
If during CO-RE relocations libbpf is not able to find the target type
in the running kernel BTF, it searches for it in modules' BTF.
The downside of this approach is that loading modules' BTF requires
CAP_SYS_ADMIN and this prevents BPF applications from running with more
granular capabilities (e.g. CAP_BPF) when they don't need to search
types into modules' BTF.

This patch skips by default modules' BTF loading phase when
CAP_SYS_ADMIN is missing.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Co-developed-by: Federico Di Pierro <nierro92@gmail.com>
Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/CAGQdkDvYU_e=_NX+6DRkL_-TeH3p+QtsdZwHkmH0w3Fuzw0C4w@mail.gmail.com
Link: https://lore.kernel.org/bpf/20230626093614.21270-1-andreaterzolo3@gmail.com
2023-07-07 18:55:44 -07:00
Manu Bretelle
ec6f716eda ci: Add bpf_nf/{xdp,tc-bpf}-ct to denylist for x86
This test is consistently failing on x86 for unknown reasons.

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2023-06-17 00:07:28 +00:00
Manu Bretelle
3c7fcfe0ce sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   fcf1fa29c8ea75bf104c35ce29b65ce2ba6a6a9d
Checkpoint bpf-next commit: 25085b4e9251c77758964a8e8651338972353642
Baseline bpf commit:        f726e03564ef4e754dd93beb54303e2e1671049e
Checkpoint bpf commit:      ad96f1c9138e0897bee7f7c5e54b3e24f8b62f57

Andrii Nakryiko (2):
  libbpf: Ensure libbpf always opens files with O_CLOEXEC
  libbpf: Ensure FD >= 3 during bpf_map__reuse_fd()

Florian Westphal (1):
  bpf: netfilter: Add BPF_NETFILTER bpf_attach_type

JP Kobryn (1):
  libbpf: Change var type in datasec resize func

Louis DeLosSantos (1):
  bpf: Add table ID to bpf_fib_lookup BPF helper

 include/uapi/linux/bpf.h | 22 +++++++++++++++++++---
 src/btf.c                |  2 +-
 src/libbpf.c             | 26 +++++++++++++-------------
 src/libbpf_probes.c      |  4 +++-
 src/usdt.c               |  5 ++---
 5 files changed, 38 insertions(+), 21 deletions(-)

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2023-06-17 00:07:28 +00:00
Manu Bretelle
ef3e2ef82a sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2023-06-17 00:07:28 +00:00
Florian Westphal
45188d0d01 bpf: netfilter: Add BPF_NETFILTER bpf_attach_type
Andrii Nakryiko writes:

 And we currently don't have an attach type for NETLINK BPF link.
 Thankfully it's not too late to add it. I see that link_create() in
 kernel/bpf/syscall.c just bypasses attach_type check. We shouldn't
 have done that. Instead we need to add BPF_NETLINK attach type to enum
 bpf_attach_type. And wire all that properly throughout the kernel and
 libbpf itself.

This adds BPF_NETFILTER and uses it.  This breaks uabi but this
wasn't in any non-rc release yet, so it should be fine.

v2: check link_attack prog type in link_create too

Fixes: 84601d6ee68a ("bpf: add bpf_link support for BPF_NETFILTER programs")
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/CAEf4BzZ69YgrQW7DHCJUT_X+GqMq_ZQQPBwopaJJVGFD5=d5Vg@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20230605131445.32016-1-fw@strlen.de
2023-06-17 00:07:28 +00:00
Louis DeLosSantos
f02ec78083 bpf: Add table ID to bpf_fib_lookup BPF helper
Add ability to specify routing table ID to the `bpf_fib_lookup` BPF
helper.

A new field `tbid` is added to `struct bpf_fib_lookup` used as
parameters to the `bpf_fib_lookup` BPF helper.

When the helper is called with the `BPF_FIB_LOOKUP_DIRECT` and
`BPF_FIB_LOOKUP_TBID` flags the `tbid` field in `struct bpf_fib_lookup`
will be used as the table ID for the fib lookup.

If the `tbid` does not exist the fib lookup will fail with
`BPF_FIB_LKUP_RET_NOT_FWDED`.

The `tbid` field becomes a union over the vlan related output fields
in `struct bpf_fib_lookup` and will be zeroed immediately after usage.

This functionality is useful in containerized environments.

For instance, if a CNI wants to dictate the next-hop for traffic leaving
a container it can create a container-specific routing table and perform
a fib lookup against this table in a "host-net-namespace-side" TC program.

This functionality also allows `ip rule` like functionality at the TC
layer, allowing an eBPF program to pick a routing table based on some
aspect of the sk_buff.

As a concrete use case, this feature will be used in Cilium's SRv6 L3VPN
datapath.

When egress traffic leaves a Pod an eBPF program attached by Cilium will
determine which VRF the egress traffic should target, and then perform a
FIB lookup in a specific table representing this VRF's FIB.

Signed-off-by: Louis DeLosSantos <louis.delos.devel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230505-bpf-add-tbid-fib-lookup-v2-1-0a31c22c748c@gmail.com
2023-06-17 00:07:28 +00:00
Andrii Nakryiko
fa1a18d38b libbpf: Ensure FD >= 3 during bpf_map__reuse_fd()
Improve bpf_map__reuse_fd() logic and ensure that dup'ed map FD is
"good" (>= 3) and has O_CLOEXEC flags. Use fcntl(F_DUPFD_CLOEXEC) for
that, similarly to ensure_good_fd() helper we already use in low-level
APIs that work with bpf() syscall.

Suggested-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230525221311.2136408-2-andrii@kernel.org
2023-06-17 00:07:28 +00:00
Andrii Nakryiko
ba7a44da68 libbpf: Ensure libbpf always opens files with O_CLOEXEC
Make sure that libbpf code always gets FD with O_CLOEXEC flag set,
regardless if file is open through open() or fopen(). For the latter
this means to add "e" to mode string, which is supported since pretty
ancient glibc v2.7.

Also drop the outdated TODO comment in usdt.c, which was already completed.

Suggested-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230525221311.2136408-1-andrii@kernel.org
2023-06-17 00:07:28 +00:00
Manu Bretelle
cb23f981c3 ci: Dump kconfig before running tests
This helps troubleshooting by validating what the Kconfig of the testing
environment is.

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2023-06-15 14:04:53 -07:00
Daniel Müller
f7eb43b90f ci: add fix for sockopt sub-tests
Sockopt sub-tests currently don't honor denylisting properly. Fix them.
Upstream fix was found at [0].

[0] https://lore.kernel.org/bpf/20230525232248.640465-1-deso@posteo.net/T/#u

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Daniel Müller
9710829e78 ci: Gracefully handle test names with spaces inside
Cherry pick of pieces of f909f8bf110d ("ci: temporarily disable
test_btf_dump_case") from vmtest to handle spaces in test names
properly.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
JP Kobryn
e021ccbd7d libbpf: Change var type in datasec resize func
This changes a local variable type that stores a new array id to match
the return type of btf__add_array().

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230525001323.8554-1-inwardvessel@gmail.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Daniel Müller
0755b497cf ci: add fix for multi-kprobe as temporary patch
This fixes 39d954200bf6 ("fprobe: Skip exit_handler if entry_handler
returns !0"), which causes multiple multi-kprobe tests to fail. Upstream
fix was found at [0].

[0] https://lore.kernel.org/all/168100731160.79534.374827110083836722.stgit@devnote2/#r

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Daniel Müller
c4ffdf1e72 ci: Adjust allow/deny lists for most recent sync
Adjust the allow & deny lists for use after the most recent sync with
upstream.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Daniel Müller
c850306199 ci: Regenerate latest vmlinux.h for old kernel CI tests.
CI will fail without it.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Daniel Müller
fb6998382d libbpf: Bump version to v1.3 in Makefile
Bump LIBBPF_MINOR_VERSION to 3 for v1.3 dev cycle.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Daniel Müller
9aea1da2bb sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   2ddade322925641ee2a75f13665c51f2e74d7791
Checkpoint bpf-next commit: fcf1fa29c8ea75bf104c35ce29b65ce2ba6a6a9d
Baseline bpf commit:        71b547f561247897a0a14f3082730156c0533fed
Checkpoint bpf commit:      f726e03564ef4e754dd93beb54303e2e1671049e

Alexey Dobriyan (1):
  ELF: fix all "Elf" typos

Andrii Nakryiko (4):
  libbpf: fix offsetof() and container_of() to work with CO-RE
  libbpf: Start v1.3 development cycle
  bpf: Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
  libbpf: Add opts-based bpf_obj_pin() API and add support for path_fd

Florian Westphal (1):
  tools: bpftool: print netfilter link info

JP Kobryn (1):
  libbpf: Add capability for resizing datasec maps

Jiri Olsa (1):
  libbpf: Store zero fd to fd_array for loader kfunc relocation

Kenjiro Nakayama (1):
  libbpf: Fix comment about arc and riscv arch in bpf_tracing.h

Martin KaFai Lau (1):
  libbpf: btf_dump_type_data_check_overflow needs to consider
    BTF_MEMBER_BITFIELD_SIZE

 include/uapi/linux/bpf.h |  24 +++++++
 src/bpf.c                |  17 ++++-
 src/bpf.h                |  18 ++++-
 src/bpf_helpers.h        |  15 +++--
 src/bpf_tracing.h        |   3 +-
 src/btf_dump.c           |  22 +++++-
 src/gen_loader.c         |  14 ++--
 src/libbpf.c             | 140 ++++++++++++++++++++++++++++++++++++---
 src/libbpf.h             |  18 ++++-
 src/libbpf.map           |   5 ++
 src/libbpf_probes.c      |   1 +
 src/libbpf_version.h     |   2 +-
 src/usdt.c               |   2 +-
 13 files changed, 246 insertions(+), 35 deletions(-)

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
JP Kobryn
8b4e1b39a4 libbpf: Add capability for resizing datasec maps
This patch updates bpf_map__set_value_size() so that if the given map is
memory mapped, it will attempt to resize the mapped region. Initial
contents of the mapped region are preserved. BTF is not required, but
after the mapping is resized an attempt is made to adjust the associated
BTF information if the following criteria is met:
 - BTF info is present
 - the map is a datasec
 - the final variable in the datasec is an array

... the resulting BTF info will be updated so that the final array
variable is associated with a new BTF array type sized to cover the
requested size.

Note that the initial resizing of the memory mapped region can succeed
while the subsequent BTF adjustment can fail. In this case, BTF info is
dropped from the map by clearing the key and value type.

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230524004537.18614-2-inwardvessel@gmail.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Andrii Nakryiko
a50544ef45 libbpf: Add opts-based bpf_obj_pin() API and add support for path_fd
Add path_fd support for bpf_obj_pin() and bpf_obj_get() operations
(through their opts-based variants). This allows to take advantage of
new kernel-side support for O_PATH-based pin/get location specification.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230523170013.728457-4-andrii@kernel.org
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Andrii Nakryiko
bfb0454244 bpf: Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
Current UAPI of BPF_OBJ_PIN and BPF_OBJ_GET commands of bpf() syscall
forces users to specify pinning location as a string-based absolute or
relative (to current working directory) path. This has various
implications related to security (e.g., symlink-based attacks), forces
BPF FS to be exposed in the file system, which can cause races with
other applications.

One of the feedbacks we got from folks working with containers heavily
was that inability to use purely FD-based location specification was an
unfortunate limitation and hindrance for BPF_OBJ_PIN and BPF_OBJ_GET
commands. This patch closes this oversight, adding path_fd field to
BPF_OBJ_PIN and BPF_OBJ_GET UAPI, following conventions established by
*at() syscalls for dirfd + pathname combinations.

This now allows interesting possibilities like working with detached BPF
FS mount (e.g., to perform multiple pinnings without running a risk of
someone interfering with them), and generally making pinning/getting
more secure and not prone to any races and/or security attacks.

This is demonstrated by a selftest added in subsequent patch that takes
advantage of new mount APIs (fsopen, fsconfig, fsmount) to demonstrate
creating detached BPF FS mount, pinning, and then getting BPF map out of
it, all while never exposing this private instance of BPF FS to outside
worlds.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/bpf/20230523170013.728457-4-andrii@kernel.org
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Andrii Nakryiko
79811cad50 libbpf: Start v1.3 development cycle
Bump libbpf.map to v1.3.0 to start a new libbpf version cycle.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230523170013.728457-3-andrii@kernel.org
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Jiri Olsa
4bb0b0ca09 libbpf: Store zero fd to fd_array for loader kfunc relocation
When moving some of the test kfuncs to bpf_testmod I hit an issue
when some of the kfuncs that object uses are in module and some
in vmlinux.

The problem is that both vmlinux and module kfuncs get allocated
btf_fd_idx index into fd_array, but we store to it the BTF fd value
only for module's kfunc, not vmlinux's one because (it's zero).

Then after the program is loaded we check if fd_array[btf_fd_idx] != 0
and close the fd.

When the object has kfuncs from both vmlinux and module, the fd from
fd_array[btf_fd_idx] from previous load will be stored in there for
vmlinux's kfunc, so we close unrelated fd (of the program we just
loaded in my case).

Fixing this by storing zero to fd_array[btf_fd_idx] for vmlinux
kfuncs, so the we won't close stale fd.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Andrii Nakryiko
ac42790129 libbpf: fix offsetof() and container_of() to work with CO-RE
It seems like __builtin_offset() doesn't preserve CO-RE field
relocations properly. So if offsetof() macro is defined through
__builtin_offset(), CO-RE-enabled BPF code using container_of() will be
subtly and silently broken.

To avoid this problem, redefine offsetof() and container_of() in the
form that works with CO-RE relocations more reliably.

Fixes: 5fbc220862fc ("tools/libpf: Add offsetof/container_of macro in bpf_helpers.h")
Reported-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230509065502.2306180-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Kenjiro Nakayama
6a6cf6dcdc libbpf: Fix comment about arc and riscv arch in bpf_tracing.h
To make comments about arc and riscv arch in bpf_tracing.h accurate,
this patch fixes the comment about arc and adds the comment for riscv.

Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230504035443.427927-1-nakayamakenjiro@gmail.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Martin KaFai Lau
b9711e7015 libbpf: btf_dump_type_data_check_overflow needs to consider BTF_MEMBER_BITFIELD_SIZE
The btf_dump/struct_data selftest is failing with:

  [...]
  test_btf_dump_struct_data:FAIL:unexpected return value dumping fs_context unexpected unexpected return value dumping fs_context: actual -7 != expected 264
  [...]

The reason is in btf_dump_type_data_check_overflow(). It does not use
BTF_MEMBER_BITFIELD_SIZE from the struct's member (btf_member). Instead,
it is using the enum size which is 4. It had been working till the recent
commit 4e04143c869c ("fs_context: drop the unused lsm_flags member")
removed an integer member which also removed the 4 bytes padding at the
end of the fs_context. Missing this 4 bytes padding exposed this bug. In
particular, when btf_dump_type_data_check_overflow() reaches the member
'phase', -E2BIG is returned.

The fix is to pass bit_sz to btf_dump_type_data_check_overflow(). In
btf_dump_type_data_check_overflow(), it does a different size check when
bit_sz is not zero.

The current fs_context:

[3600] ENUM 'fs_context_purpose' encoding=UNSIGNED size=4 vlen=3
	'FS_CONTEXT_FOR_MOUNT' val=0
	'FS_CONTEXT_FOR_SUBMOUNT' val=1
	'FS_CONTEXT_FOR_RECONFIGURE' val=2
[3601] ENUM 'fs_context_phase' encoding=UNSIGNED size=4 vlen=7
	'FS_CONTEXT_CREATE_PARAMS' val=0
	'FS_CONTEXT_CREATING' val=1
	'FS_CONTEXT_AWAITING_MOUNT' val=2
	'FS_CONTEXT_AWAITING_RECONF' val=3
	'FS_CONTEXT_RECONF_PARAMS' val=4
	'FS_CONTEXT_RECONFIGURING' val=5
	'FS_CONTEXT_FAILED' val=6
[3602] STRUCT 'fs_context' size=264 vlen=21
	'ops' type_id=3603 bits_offset=0
	'uapi_mutex' type_id=235 bits_offset=64
	'fs_type' type_id=872 bits_offset=1216
	'fs_private' type_id=21 bits_offset=1280
	'sget_key' type_id=21 bits_offset=1344
	'root' type_id=781 bits_offset=1408
	'user_ns' type_id=251 bits_offset=1472
	'net_ns' type_id=984 bits_offset=1536
	'cred' type_id=1785 bits_offset=1600
	'log' type_id=3621 bits_offset=1664
	'source' type_id=42 bits_offset=1792
	'security' type_id=21 bits_offset=1856
	's_fs_info' type_id=21 bits_offset=1920
	'sb_flags' type_id=20 bits_offset=1984
	'sb_flags_mask' type_id=20 bits_offset=2016
	's_iflags' type_id=20 bits_offset=2048
	'purpose' type_id=3600 bits_offset=2080 bitfield_size=8
	'phase' type_id=3601 bits_offset=2088 bitfield_size=8
	'need_free' type_id=67 bits_offset=2096 bitfield_size=1
	'global' type_id=67 bits_offset=2097 bitfield_size=1
	'oldapi' type_id=67 bits_offset=2098 bitfield_size=1

Fixes: 920d16af9b42 ("libbpf: BTF dumper support for typed data")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230428013638.1581263-1-martin.lau@linux.dev
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Alexey Dobriyan
4c484d662c ELF: fix all "Elf" typos
ELF is acronym and therefore should be spelled in all caps.

I left one exception at Documentation/arm/nwfpe/nwfpe.rst which looks like
being written in the first person.

Link: https://lkml.kernel.org/r/Y/3wGWQviIOkyLJW@p183
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Florian Westphal
1c9aa4791a tools: bpftool: print netfilter link info
Dump protocol family, hook and priority value:
$ bpftool link
2: netfilter  prog 14
        ip input prio -128
        pids install(3264)
5: netfilter  prog 14
        ip6 forward prio 21
        pids a.out(3387)
9: netfilter  prog 14
        ip prerouting prio 123
        pids a.out(5700)
10: netfilter  prog 14
        ip input prio 21
        pids test2(5701)

v2: Quentin Monnet suggested to also add 'bpftool net' support:

$ bpftool net
xdp:

tc:

flow_dissector:

netfilter:

        ip prerouting prio 21 prog_id 14
        ip input prio -128 prog_id 14
        ip input prio 21 prog_id 14
        ip forward prio 21 prog_id 14
        ip output prio 21 prog_id 14
        ip postrouting prio 21 prog_id 14

'bpftool net' only dumps netfilter link type, links are sorted by protocol
family, hook and priority.

v5: fix bpf ci failure: libbpf needs small update to prog_type_name[]
    and probe_prog_load helper.
v4: don't fail with -EOPNOTSUPP in libbpf probe_prog_load, update
    prog_type_name[] with "netfilter" entry (bpf ci)
v3: fix bpf.h copy, 'reserved' member was removed (Alexei)
    use p_err, not fprintf (Quentin)

Suggested-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/eeeaac99-9053-90c2-aa33-cc1ecb1ae9ca@isovalent.com/
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-6-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-05-25 16:44:19 -07:00
Andrii Nakryiko
3f591a6610 git: make .gitattributes compatible with git-archive-all action
As reported by Quentin, using Github Action to archive all submodules
(e.g., for retsnoop release packaging) is impacted by it not supporting
"<glob>/" pattern in .gitattributes. Use "<glob>/**" instead.

  [0] https://github.com/anakryiko/retsnoop/pull/42#issuecomment-1560797837

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-05-25 13:14:58 -07:00
Evgeny Vereshchagin
532293bdf4 fuzz: bump elfutils to 0.189
The elfutils project has fixed several issues found by fuzz targets so it
should help to prevent the libbpf fuzz target from running into them.

Signed-off-by: Evgeny Vereshchagin <evvers@ya.ru>
2023-05-12 14:29:41 -07:00
thiagoftsm
dd7dd01114 Merge branch 'libbpf:master' into master 2023-05-04 16:40:08 +00:00
Song Liu
fbd60dbff5 ci: Fix test_progs failure
Fix test_progs failure xdp_bonding/xdp_bonding_redirect_multi with a
missing commit (in bpf, but not in bpf-next yet).

Signed-off-by: Song Liu <song@kernel.org>
2023-04-20 12:01:06 -07:00
Song Liu
44b0bc9ad7 ci: Regenerate latest vmlinux.h for old kernel CI tests.
CI fails without it.

Signed-off-by: Song Liu <song@kernel.org>
2023-04-19 16:15:07 -07:00
Song Liu
f0e39b4946 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   4ca13d1002f37c10038ff4ed3cfdc70dbe049d60
Checkpoint bpf-next commit: 2ddade322925641ee2a75f13665c51f2e74d7791
Baseline bpf commit:        a6f6a95f25803500079513780d11a911ce551d76
Checkpoint bpf commit:      71b547f561247897a0a14f3082730156c0533fed

Andrii Nakryiko (9):
  libbpf: Don't enforce unnecessary verifier log restrictions on libbpf
    side
  bpf: Add log_true_size output field to return necessary log buffer
    size
  libbpf: Wire through log_true_size returned from kernel for
    BPF_PROG_LOAD
  libbpf: Wire through log_true_size for bpf_btf_load() API
  libbpf: misc internal libbpf clean ups around log fixup
  libbpf: report vmlinux vs module name when dealing with ksyms
  libbpf: improve handling of unresolved kfuncs
  libbpf: move bpf_for(), bpf_for_each(), and bpf_repeat() into
    bpf_helpers.h
  libbpf: mark bpf_iter_num_{new,next,destroy} as __weak

Arnaldo Carvalho de Melo (1):
  tools include UAPI: Synchronize linux/fcntl.h with the kernel sources

Dave Marchevsky (1):
  bpf: Introduce opaque bpf_refcount struct and add btf_record plumbing

Herbert Xu (1):
  macvlan: Add netlink attribute for broadcast cutoff

Lorenzo Bianconi (1):
  xdp: add xdp_set_features_flag utility routine

 include/uapi/linux/bpf.h     |  16 +++++-
 include/uapi/linux/fcntl.h   |   1 +
 include/uapi/linux/if_link.h |   1 +
 include/uapi/linux/netdev.h  |   2 +
 src/bpf.c                    |  17 +++---
 src/bpf.h                    |  22 +++++--
 src/bpf_helpers.h            | 103 +++++++++++++++++++++++++++++++++
 src/libbpf.c                 | 107 ++++++++++++++++++++++++++++-------
 8 files changed, 237 insertions(+), 32 deletions(-)

Signed-off-by: Song Liu <song@kernel.org>
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
294c85e9b3 libbpf: mark bpf_iter_num_{new,next,destroy} as __weak
Mark bpf_iter_num_{new,next,destroy}() kfuncs declared for
bpf_for()/bpf_repeat() macros as __weak to allow users to feature-detect
their presence and guard bpf_for()/bpf_repeat() loops accordingly for
backwards compatibility with old kernels.

Now that libbpf supports kfunc calls poisoning and better reporting of
unresolved (but called) kfuncs, declaring number iterator kfuncs in
bpf_helpers.h won't degrade user experience and won't cause unnecessary
kernel feature dependencies.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
2293c20f82 libbpf: move bpf_for(), bpf_for_each(), and bpf_repeat() into bpf_helpers.h
To make it easier for bleeding-edge BPF applications, such as sched_ext,
to utilize open-coded iterators, move bpf_for(), bpf_for_each(), and
bpf_repeat() macros from selftests/bpf-internal bpf_misc.h helper, to
libbpf-provided bpf_helpers.h header.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
e6cc30f445 libbpf: improve handling of unresolved kfuncs
Currently, libbpf leaves `call #0` instruction for __weak unresolved
kfuncs, which might lead to a confusing verifier log situations, where
invalid `call #0` will be treated as successfully validated.

We can do better. Libbpf already has an established mechanism of
poisoning instructions that failed some form of resolution (e.g., CO-RE
relocation and BPF map set to not be auto-created). Libbpf doesn't fail
them outright to allow users to guard them through other means, and as
long as BPF verifier can prove that such poisoned instructions cannot be
ever reached, this doesn't consistute an invalid BPF program. If user
didn't guard such code, libbpf will extract few pieces of information to
tie such poisoned instructions back to additional information about what
entitity wasn't resolved (e.g., BPF map name, or CO-RE relocation
information).

__weak unresolved kfuncs fit this model well, so this patch extends
libbpf with poisioning and log fixup logic for kfunc calls.

Note, this poisoning is done only for kfunc *calls*, not kfunc address
resolution (ldimm64 instructions). The former cannot be ever valid, if
reached, so it's safe to poison them. The latter is a valid mechanism to
check if __weak kfunc ksym was resolved, and do necessary guarding and
work arounds based on this result, supported in most recent kernels. As
such, libbpf keeps such ldimm64 instructions as loading zero, never
poisoning them.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
6fd310547d libbpf: report vmlinux vs module name when dealing with ksyms
Currently libbpf always reports "kernel" as a source of ksym BTF type,
which is ambiguous given ksym's BTF can come from either vmlinux or
kernel module BTFs. Make this explicit and log module name, if used BTF
is from kernel module.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
0db753a9f8 libbpf: misc internal libbpf clean ups around log fixup
Normalize internal constants, field names, and comments related to log
fixup. Also add explicit `ext_idx` alias for relocation where relocation
is pointing to extern description for additional information.

No functional changes, just a clean up before subsequent additions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-19 16:15:07 -07:00
Dave Marchevsky
44f59ec077 bpf: Introduce opaque bpf_refcount struct and add btf_record plumbing
A 'struct bpf_refcount' is added to the set of opaque uapi/bpf.h types
meant for use in BPF programs. Similarly to other opaque types like
bpf_spin_lock and bpf_rbtree_node, the verifier needs to know where in
user-defined struct types a bpf_refcount can be located, so necessary
btf_record plumbing is added to enable this. bpf_refcount is sized to
hold a refcount_t.

Similarly to bpf_spin_lock, the offset of a bpf_refcount is cached in
btf_record as refcount_off in addition to being in the field array.
Caching refcount_off makes sense for this field because further patches
in the series will modify functions that take local kptrs (e.g.
bpf_obj_drop) to change their behavior if the type they're operating on
is refcounted. So enabling fast "is this type refcounted?" checks is
desirable.

No such verifier behavior changes are introduced in this patch, just
logic to recognize 'struct bpf_refcount' in btf_record.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230415201811.343116-3-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
2f01564c50 libbpf: Wire through log_true_size for bpf_btf_load() API
Similar to what we did for bpf_prog_load() in previous patch, wire
returning of log_true_size value from kernel back to the user through
OPTS out field.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230406234205.323208-17-andrii@kernel.org
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
c2fe7adb33 libbpf: Wire through log_true_size returned from kernel for BPF_PROG_LOAD
Add output-only log_true_size field to bpf_prog_load_opts to return
bpf_attr->log_true_size value back from bpf() syscall.

Note, that we have to drop const modifier from opts in bpf_prog_load().
This could potentially cause compilation error for some users. But
the usual practice is to define bpf_prog_load_ops
as a local variable next to bpf_prog_load() call and pass pointer to it,
so const vs non-const makes no difference and won't even come up in most
(if not all) cases.

There are no runtime and ABI backwards/forward compatibility issues at all.
If user provides old struct bpf_prog_load_opts, libbpf won't set new
fields. If old libbpf is provided new bpf_prog_load_opts, nothing will
happen either as old libbpf doesn't yet know about this new field.

Adding a new variant of bpf_prog_load() just for this seems like a big
and unnecessary overkill. As a corroborating evidence is the fact that
entire selftests/bpf code base required not adjustment whatsoever.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230406234205.323208-16-andrii@kernel.org
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
88004dd87a bpf: Add log_true_size output field to return necessary log buffer size
Add output-only log_true_size and btf_log_true_size field to
BPF_PROG_LOAD and BPF_BTF_LOAD commands, respectively. It will return
the size of log buffer necessary to fit in all the log contents at
specified log_level. This is very useful for BPF loader libraries like
libbpf to be able to size log buffer correctly, but could be used by
users directly, if necessary, as well.

This patch plumbs all this through the code, taking into account actual
bpf_attr size provided by user to determine if these new fields are
expected by users. And if they are, set them from kernel on return.

We refactory btf_parse() function to accommodate this, moving attr and
uattr handling inside it. The rest is very straightforward code, which
is split from the logging accounting changes in the previous patch to
make it simpler to review logic vs UAPI changes.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-13-andrii@kernel.org
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
a22abb9c85 libbpf: Don't enforce unnecessary verifier log restrictions on libbpf side
This basically prevents any forward compatibility. And we either way
just return -EINVAL, which would otherwise be returned from bpf()
syscall anyways.

Similarly, drop enforcement of non-NULL log_buf when log_level > 0. This
won't be true anymore soon.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-5-andrii@kernel.org
2023-04-19 16:15:07 -07:00
Herbert Xu
2c0c927a38 macvlan: Add netlink attribute for broadcast cutoff
Make the broadcast cutoff configurable through netlink.  Note
that macvlan is weird because there is no central device for
us to configure (the lowerdev could be anything).  So all the
options are duplicated over what could be thousands of child
devices.

IFLA_MACVLAN_BC_QUEUE_LEN took the approach of taking the maximum
of all child device settings.  This is unnecessary as we could
simply store the option in the port device and take the last
child device that gets updated as the value to use.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-19 16:15:07 -07:00
Andrii Nakryiko
d9d17f6d71 git: add .gitattributes file ignoring assets/ during archiving
We don't need to archive assets/ subdir when packaging libbpf sources in
retsnoop and veristat repos. Mark assets/ as export-ignore to skip it.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-04-01 15:36:54 -07:00
Daniel Müller
3783577161 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   226bc6ae6405c46a6e9865835c36a1d45fc0b3bf
Checkpoint bpf-next commit: 4ca13d1002f37c10038ff4ed3cfdc70dbe049d60
Baseline bpf commit:        915efd8a446b74442039d31689d5d863caf82517
Checkpoint bpf commit:      a6f6a95f25803500079513780d11a911ce551d76

Andrii Nakryiko (1):
  libbpf: disassociate section handler on explicit
    bpf_program__set_type() call

Arnaldo Carvalho de Melo (1):
  tools include UAPI: Synchronize linux/fcntl.h with the kernel sources

Eduard Zingerman (1):
  libbpf: Fix double-free when linker processes empty sections

JP Kobryn (1):
  libbpf: Ensure print callback usage is thread-safe

Jakub Kicinski (1):
  ynl: broaden the license even more

Lorenzo Bianconi (1):
  xdp: add xdp_set_features_flag utility routine

 include/uapi/linux/fcntl.h  |  1 +
 include/uapi/linux/netdev.h |  4 +++-
 src/libbpf.c                | 10 +++++++---
 src/libbpf.h                |  2 ++
 src/linker.c                | 14 +++++++++++++-
 5 files changed, 26 insertions(+), 5 deletions(-)

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-30 16:24:24 -07:00
Jakub Kicinski
75c14163b9 ynl: broaden the license even more
I relicensed Netlink spec code to GPL-2.0 OR BSD-3-Clause but
we still put a slightly different license on the uAPI header
than the rest of the code. Use the Linux-syscall-note on all
the specs and all generated code. It's moot for kernel code,
but should not hurt. This way the licenses match everywhere.

Cc: Chuck Lever <chuck.lever@oracle.com>
Fixes: 37d9df224d1e ("ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause")
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-30 16:24:24 -07:00
Lorenzo Bianconi
056e9bcc19 xdp: add xdp_set_features_flag utility routine
Introduce xdp_set_features_flag utility routine in order to update
dynamically xdp_features according to the dynamic hw configuration via
ethtool (e.g. changing number of hw rx/tx queues).
Add xdp_clear_features_flag() in order to clear all xdp_feature flag.

Reviewed-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-30 16:24:24 -07:00
Arnaldo Carvalho de Melo
14ae9422db tools include UAPI: Synchronize linux/fcntl.h with the kernel sources
To pick up the changes in:

  6fd7353829cafc40 ("mm/memfd: add F_SEAL_EXEC")

That doesn't add or change any perf tools functionality, only addresses
these build warnings:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/fcntl.h' differs from latest version at 'include/uapi/linux/fcntl.h'
  diff -u tools/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-30 16:24:24 -07:00
Andrii Nakryiko
3fd6eebb2d libbpf: disassociate section handler on explicit bpf_program__set_type() call
If user explicitly overrides programs's type with
bpf_program__set_type() API call, we need to disassociate whatever
SEC_DEF handler libbpf determined initially based on program's SEC()
definition, as it's not goind to be valid anymore and could lead to
crashes and/or confusing failures.

Also, fix up bpf_prog_test_load() helper in selftests/bpf, which is
force-setting program type (even if that's completely unnecessary; this
is quite a legacy piece of code), and thus should expect auto-attach to
not work, yet one of the tests explicitly relies on auto-attach for
testing.

Instead, force-set program type only if it differs from the desired one.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230327185202.1929145-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-30 16:24:24 -07:00
Eduard Zingerman
4218389b1e libbpf: Fix double-free when linker processes empty sections
Double-free error in bpf_linker__free() was reported by James Hilliard.
The error is caused by miss-use of realloc() in extend_sec().
The error occurs when two files with empty sections of the same name
are linked:
- when first file is processed:
  - extend_sec() calls realloc(dst->raw_data, dst_align_sz)
    with dst->raw_data == NULL and dst_align_sz == 0;
  - dst->raw_data is set to a special pointer to a memory block of
    size zero;
- when second file is processed:
  - extend_sec() calls realloc(dst->raw_data, dst_align_sz)
    with dst->raw_data == <special pointer> and dst_align_sz == 0;
  - realloc() "frees" dst->raw_data special pointer and returns NULL;
  - extend_sec() exits with -ENOMEM, and the old dst->raw_data value
    is preserved (it is now invalid);
  - eventually, bpf_linker__free() attempts to free dst->raw_data again.

This patch fixes the bug by avoiding -ENOMEM exit for dst_align_sz == 0.
The fix was suggested by Andrii Nakryiko <andrii.nakryiko@gmail.com>.

Reported-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: James Hilliard <james.hilliard1@gmail.com>
Link: https://lore.kernel.org/bpf/CADvTj4o7ZWUikKwNTwFq0O_AaX+46t_+Ca9gvWMYdWdRtTGeHQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20230328004738.381898-3-eddyz87@gmail.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-30 16:24:24 -07:00
JP Kobryn
ae32d7169d libbpf: Ensure print callback usage is thread-safe
This patch prevents races on the print function pointer, allowing the
libbpf_set_print() function to become thread-safe.

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230325010845.46000-1-inwardvessel@gmail.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-30 16:24:24 -07:00
Andrii Nakryiko
b362bb6e10 ci: update libbpf/ci references to use "main"
Seems like deafult branch was renamed s/master/main/, adopt libbpf CI to
not fail.

Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-03-27 10:45:47 -07:00
Andrii Nakryiko
f8cd00f613 ci: fallback to llvm-16 and clang-16 again
Seems like upstream LLVM/Clang packaging still has issues with
llvm/clang 17. Fallback to 16 again, for now.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-03-23 13:10:17 -07:00
Andrii Nakryiko
dc4e7076ad sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   b8a2e3f93d412114a1539ea97b59b3e6ed6e1f9a
Checkpoint bpf-next commit: 226bc6ae6405c46a6e9865835c36a1d45fc0b3bf
Baseline bpf commit:        a33a6eaa19d3af261e8708bfc8ba62020703117f
Checkpoint bpf commit:      915efd8a446b74442039d31689d5d863caf82517

Alexei Starovoitov (5):
  libbpf: Fix relocation of kfunc ksym in ld_imm64 insn.
  libbpf: Introduce bpf_ksym_exists() macro.
  libbpf: Fix ld_imm64 copy logic for ksym in light skeleton.
  libbpf: Rename RELO_EXTERN_VAR/FUNC.
  libbpf: Support kfunc detection in light skeleton.

Daniel Müller (1):
  libbpf: Ignore warnings about "inefficient alignment"

Kui-Feng Lee (5):
  bpf: Create links for BPF struct_ops maps.
  libbpf: Create a bpf_link in bpf_map__attach_struct_ops().
  bpf: Update the struct_ops of a bpf_link.
  libbpf: Update a bpf_link with another struct_ops.
  libbpf: Use .struct_ops.link section to indicate a struct_ops with a
    link.

Liu Pan (1):
  libbpf: Explicitly call write to append content to file

Sreevani Sreejith (1):
  bpf, docs: Libbpf overview documentation

 docs/index.rst           |  25 +++--
 docs/libbpf_overview.rst | 228 +++++++++++++++++++++++++++++++++++++
 include/uapi/linux/bpf.h |  33 +++++-
 src/bpf.c                |   8 +-
 src/bpf.h                |   3 +-
 src/bpf_gen_internal.h   |   4 +-
 src/bpf_helpers.h        |   5 +
 src/gen_loader.c         |  48 ++++----
 src/libbpf.c             | 235 +++++++++++++++++++++++++++++----------
 src/libbpf.h             |   1 +
 src/libbpf.map           |   1 +
 src/zip.c                |   6 +
 12 files changed, 501 insertions(+), 96 deletions(-)
 create mode 100644 docs/libbpf_overview.rst

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-03-23 13:10:17 -07:00
Kui-Feng Lee
465a73051d libbpf: Use .struct_ops.link section to indicate a struct_ops with a link.
Flags a struct_ops is to back a bpf_link by putting it to the
".struct_ops.link" section.  Once it is flagged, the created
struct_ops can be used to create a bpf_link or update a bpf_link that
has been backed by another struct_ops.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230323032405.3735486-8-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-23 13:10:17 -07:00
Kui-Feng Lee
e51cdaaca0 libbpf: Update a bpf_link with another struct_ops.
Introduce bpf_link__update_map(), which allows to atomically update
underlying struct_ops implementation for given struct_ops BPF link.

Also add old_map_fd to struct bpf_link_update_opts to handle
BPF_F_REPLACE feature.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230323032405.3735486-7-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-23 13:10:17 -07:00
Kui-Feng Lee
055cbdcc9f bpf: Update the struct_ops of a bpf_link.
By improving the BPF_LINK_UPDATE command of bpf(), it should allow you
to conveniently switch between different struct_ops on a single
bpf_link. This would enable smoother transitions from one struct_ops
to another.

The struct_ops maps passing along with BPF_LINK_UPDATE should have the
BPF_F_LINK flag.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230323032405.3735486-6-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-23 13:10:17 -07:00
Kui-Feng Lee
c6893dccd9 libbpf: Create a bpf_link in bpf_map__attach_struct_ops().
bpf_map__attach_struct_ops() was creating a dummy bpf_link as a
placeholder, but now it is constructing an authentic one by calling
bpf_link_create() if the map has the BPF_F_LINK flag.

You can flag a struct_ops map with BPF_F_LINK by calling
bpf_map__set_map_flags().

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230323032405.3735486-5-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-23 13:10:17 -07:00
Kui-Feng Lee
077bf73900 bpf: Create links for BPF struct_ops maps.
Make bpf_link support struct_ops.  Previously, struct_ops were always
used alone without any associated links. Upon updating its value, a
struct_ops would be activated automatically. Yet other BPF program
types required to make a bpf_link with their instances before they
could become active. Now, however, you can create an inactive
struct_ops, and create a link to activate it later.

With bpf_links, struct_ops has a behavior similar to other BPF program
types. You can pin/unpin them from their links and the struct_ops will
be deactivated when its link is removed while previously need someone
to delete the value for it to be deactivated.

bpf_links are responsible for registering their associated
struct_ops. You can only use a struct_ops that has the BPF_F_LINK flag
set to create a bpf_link, while a structs without this flag behaves in
the same manner as before and is registered upon updating its value.

The BPF_LINK_TYPE_STRUCT_OPS serves a dual purpose. Not only is it
used to craft the links for BPF struct_ops programs, but also to
create links for BPF struct_ops them-self.  Since the links of BPF
struct_ops programs are only used to create trampolines internally,
they are never seen in other contexts. Thus, they can be reused for
struct_ops themself.

To maintain a reference to the map supporting this link, we add
bpf_struct_ops_link as an additional type. The pointer of the map is
RCU and won't be necessary until later in the patchset.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230323032405.3735486-4-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-23 13:10:17 -07:00
Alexei Starovoitov
68cd7cd386 libbpf: Support kfunc detection in light skeleton.
Teach gen_loader to find {btf_id, btf_obj_fd} of kernel variables and kfuncs
and populate corresponding ld_imm64 and bpf_call insns.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230321203854.3035-4-alexei.starovoitov@gmail.com
2023-03-23 13:10:17 -07:00
Alexei Starovoitov
a5464a5b0e libbpf: Rename RELO_EXTERN_VAR/FUNC.
RELO_EXTERN_VAR/FUNC names are not correct anymore. RELO_EXTERN_VAR represent
ksym symbol in ld_imm64 insn. It can point to kernel variable or kfunc.
Rename RELO_EXTERN_VAR->RELO_EXTERN_LD64 and RELO_EXTERN_FUNC->RELO_EXTERN_CALL
to match what they actually represent.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230321203854.3035-2-alexei.starovoitov@gmail.com
2023-03-23 13:10:17 -07:00
Liu Pan
753e4d07d1 libbpf: Explicitly call write to append content to file
Write data to fd by calling "vdprintf", in most implementations
of the standard library, the data is finally written by the writev syscall.
But "uprobe_events/kprobe_events" does not allow segmented writes,
so switch the "append_to_file" function to explicit write() call.

Signed-off-by: Liu Pan <patteliu@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230320030720.650-1-patteliu@gmail.com
2023-03-23 13:10:17 -07:00
Alexei Starovoitov
5b45c90c49 libbpf: Fix ld_imm64 copy logic for ksym in light skeleton.
Unlike normal libbpf the light skeleton 'loader' program is doing
btf_find_by_name_kind() call at run-time to find ksym in the kernel and
populate its {btf_id, btf_obj_fd} pair in ld_imm64 insn. To avoid doing the
search multiple times for the same ksym it remembers the first patched ld_imm64
insn and copies {btf_id, btf_obj_fd} from it into subsequent ld_imm64 insn.
Fix a bug in copying logic, since it may incorrectly clear BPF_PSEUDO_BTF_ID flag.

Also replace always true if (btf_obj_fd >= 0) check with unconditional JMP_JA
to clarify the code.

Fixes: d995816b77eb ("libbpf: Avoid reload of imm for weak, unresolved, repeating ksym")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230319203014.55866-1-alexei.starovoitov@gmail.com
2023-03-23 13:10:17 -07:00
Sreevani Sreejith
2db620d982 bpf, docs: Libbpf overview documentation
This patch documents overview of libbpf, including its features for
developing BPF programs.

Signed-off-by: Sreevani Sreejith <ssreevani@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230315195405.2051559-1-ssreevani@meta.com
2023-03-23 13:10:17 -07:00
Alexei Starovoitov
c401b96718 libbpf: Introduce bpf_ksym_exists() macro.
Introduce bpf_ksym_exists() macro that can be used by BPF programs
to detect at load time whether particular ksym (either variable or kfunc)
is present in the kernel.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230317201920.62030-4-alexei.starovoitov@gmail.com
2023-03-23 13:10:17 -07:00
Alexei Starovoitov
fd28ca4b5b libbpf: Fix relocation of kfunc ksym in ld_imm64 insn.
void *p = kfunc; -> generates ld_imm64 insn.
kfunc() -> generates bpf_call insn.

libbpf patches bpf_call insn correctly while only btf_id part of ld_imm64 is
set in the former case. Which means that pointers to kfuncs in modules are not
patched correctly and the verifier rejects load of such programs due to btf_id
being out of range. Fix libbpf to patch ld_imm64 for kfunc.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230317201920.62030-3-alexei.starovoitov@gmail.com
2023-03-23 13:10:17 -07:00
Daniel Müller
c722f76593 libbpf: Ignore warnings about "inefficient alignment"
Some consumers of libbpf compile the code base with different warnings
enabled. In a report for perf, for example, -Wpacked was set which
caused warnings about "inefficient alignment" to be emitted on a subset
of supported architectures.

With this change we silence specifically those warnings, as we intentionally
worked with packed structs.

This is a similar resolution as in b2f10cd4e805 ("perf cpumap: Fix alignment
for masks in event encoding").

Fixes: 1eebcb60633f ("libbpf: Implement basic zip archive parsing support")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/bpf/CA+G9fYtBnwxAWXi2+GyNByApxnf_DtP1-6+_zOKAdJKnJBexjg@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20230315171550.1551603-1-deso@posteo.net
2023-03-23 13:10:17 -07:00
David Vernet
b5e9722ec2 ci: Regenerate latest vmlinux.h for old kernel CI tests.
CI will fail without it.

Signed-off-by: David Vernet <void@manifault.com>
2023-03-15 13:18:34 -07:00
David Vernet
7fdf16de6d sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   db55174d05ee6bed9d0583ba08e99c891ef0ed05
Checkpoint bpf-next commit: b8a2e3f93d412114a1539ea97b59b3e6ed6e1f9a
Baseline bpf commit:        d900f3d20cc3169ce42ec72acc850e662a4d4db2
Checkpoint bpf commit:      a33a6eaa19d3af261e8708bfc8ba62020703117f

Andrii Nakryiko (1):
  bpf: implement numbers iterator

Daniel Müller (1):
  libbpf: Fix theoretical u32 underflow in find_cd() function

Jakub Kicinski (1):
  ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause

Jesus Sanchez-Palencia (1):
  libbpf: Revert poisoning of strlcpy

Menglong Dong (1):
  libbpf: Add support to set kprobe/uprobe attach mode

Michael Weiß (1):
  bpf: Fix a typo for BPF_F_ANY_ALIGNMENT in bpf.h

Puranjay Mohan (2):
  libbpf: Refactor parse_usdt_arg() to re-use code
  libbpf: USDT arm arg parsing support

Ross Zwisler (1):
  bpf: use canonical ftrace path

 include/uapi/linux/bpf.h    |  18 +++-
 include/uapi/linux/netdev.h |   2 +-
 src/libbpf.c                |  48 ++++++++-
 src/libbpf.h                |  50 ++++++---
 src/libbpf_internal.h       |   4 +-
 src/usdt.c                  | 196 ++++++++++++++++++++++--------------
 src/zip.c                   |   3 +-
 7 files changed, 219 insertions(+), 102 deletions(-)

Signed-off-by: David Vernet <void@manifault.com>
2023-03-15 13:18:34 -07:00
David Vernet
faae78aac4 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: David Vernet <void@manifault.com>
2023-03-15 13:18:34 -07:00
Jesus Sanchez-Palencia
950cffc036 libbpf: Revert poisoning of strlcpy
This reverts commit 6d0c4b11e743("libbpf: Poison strlcpy()").

It added the pragma poison directive to libbpf_internal.h to protect
against accidental usage of strlcpy but ended up breaking the build for
toolchains based on libcs which provide the strlcpy() declaration from
string.h (e.g. uClibc-ng). The include order which causes the issue is:

    string.h,
    from Iibbpf_common.h:12,
    from libbpf.h:20,
    from libbpf_internal.h:26,
    from strset.c:9:

Fixes: 6d0c4b11e743 ("libbpf: Poison strlcpy()")
Signed-off-by: Jesus Sanchez-Palencia <jesussanp@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230309004836.2808610-1-jesussanp@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-15 13:18:34 -07:00
Jakub Kicinski
bdc7c5e217 ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause
I was intending to make all the Netlink Spec code BSD-3-Clause
to ease the adoption but it appears that:
 - I fumbled the uAPI and used "GPL WITH uAPI note" there
 - it gives people pause as they expect GPL in the kernel
As suggested by Chuck re-license under dual. This gives us benefit
of full BSD freedom while fulfilling the broad "kernel is under GPL"
expectations.

Link: https://lore.kernel.org/all/20230304120108.05dd44c5@kernel.org/
Link: https://lore.kernel.org/r/20230306200457.3903854-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15 13:18:34 -07:00
Ross Zwisler
e8107c3959 bpf: use canonical ftrace path
The canonical location for the tracefs filesystem is at /sys/kernel/tracing.

But, from Documentation/trace/ftrace.rst:

  Before 4.1, all ftrace tracing control files were within the debugfs
  file system, which is typically located at /sys/kernel/debug/tracing.
  For backward compatibility, when mounting the debugfs file system,
  the tracefs file system will be automatically mounted at:

  /sys/kernel/debug/tracing

Many comments and samples in the bpf code still refer to this older
debugfs path, so let's update them to avoid confusion.  There are a few
spots where the bpf code explicitly checks both tracefs and debugfs
(tools/bpf/bpftool/tracelog.c and tools/lib/api/fs/fs.c) and I've left
those alone so that the tools can continue to work with both paths.

Signed-off-by: Ross Zwisler <zwisler@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230313205628.1058720-2-zwisler@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-15 13:18:34 -07:00
Michael Weiß
c5be1b0770 bpf: Fix a typo for BPF_F_ANY_ALIGNMENT in bpf.h
Fix s/BPF_PROF_LOAD/BPF_PROG_LOAD/ typo in the documentation comment
for BPF_F_ANY_ALIGNMENT in bpf.h.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230309133823.944097-1-michael.weiss@aisec.fraunhofer.de
2023-03-15 13:18:34 -07:00
Andrii Nakryiko
32d34a9415 bpf: implement numbers iterator
Implement the first open-coded iterator type over a range of integers.

It's public API consists of:
  - bpf_iter_num_new() constructor, which accepts [start, end) range
    (that is, start is inclusive, end is exclusive).
  - bpf_iter_num_next() which will keep returning read-only pointer to int
    until the range is exhausted, at which point NULL will be returned.
    If bpf_iter_num_next() is kept calling after this, NULL will be
    persistently returned.
  - bpf_iter_num_destroy() destructor, which needs to be called at some
    point to clean up iterator state. BPF verifier enforces that iterator
    destructor is called at some point before BPF program exits.

Note that `start = end = X` is a valid combination to setup an empty
iterator. bpf_iter_num_new() will return 0 (success) for any such
combination.

If bpf_iter_num_new() detects invalid combination of input arguments, it
returns error, resets iterator state to, effectively, empty iterator, so
any subsequent call to bpf_iter_num_next() will keep returning NULL.

BPF verifier has no knowledge that returned integers are in the
[start, end) value range, as both `start` and `end` are not statically
known and enforced: they are runtime values.

While the implementation is pretty trivial, some care needs to be taken
to avoid overflows and underflows. Subsequent selftests will validate
correctness of [start, end) semantics, especially around extremes
(INT_MIN and INT_MAX).

Similarly to bpf_loop(), we enforce that no more than BPF_MAX_LOOPS can
be specified.

bpf_iter_num_{new,next,destroy}() is a logical evolution from bounded
BPF loops and bpf_loop() helper and is the basis for implementing
ergonomic BPF loops with no statically known or verified bounds.
Subsequent patches implement bpf_for() macro, demonstrating how this can
be wrapped into something that works and feels like a normal for() loop
in C language.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230308184121.1165081-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-15 13:18:34 -07:00
Puranjay Mohan
aab5f194e1 libbpf: USDT arm arg parsing support
Parsing of USDT arguments is architecture-specific; on arm it is
relatively easy since registers used are r[0-10], fp, ip, sp, lr,
pc. Format is slightly different compared to aarch64; forms are

- "size @ [ reg, #offset ]" for dereferences, for example
  "-8 @ [ sp, #76 ]" ; " -4 @ [ sp ]"
- "size @ reg" for register values; for example
  "-4@r0"
- "size @ #value" for raw values; for example
  "-8@#1"

Add support for parsing USDT arguments for ARM architecture.

To test the above changes QEMU's virt[1] board with cortex-a15
CPU was used. libbpf-bootstrap's usdt example[2] was modified to attach
to a test program with DTRACE_PROBE1/2/3/4... probes to test different
combinations.

[1] https://www.qemu.org/docs/master/system/arm/virt.html
[2] https://github.com/libbpf/libbpf-bootstrap/blob/master/examples/c/usdt.bpf.c

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307120440.25941-3-puranjay12@gmail.com
2023-03-15 13:18:34 -07:00
Puranjay Mohan
c5fe344018 libbpf: Refactor parse_usdt_arg() to re-use code
The parse_usdt_arg() function is defined differently for each
architecture but the last part of the function is repeated
verbatim for each architecture.

Refactor parse_usdt_arg() to fill the arg_sz and then do the repeated
post-processing in parse_usdt_spec().

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307120440.25941-2-puranjay12@gmail.com
2023-03-15 13:18:34 -07:00
Daniel Müller
232f42135a libbpf: Fix theoretical u32 underflow in find_cd() function
Coverity reported a potential underflow of the offset variable used in
the find_cd() function. Switch to using a signed 64 bit integer for the
representation of offset to make sure we can never underflow.

Fixes: 1eebcb60633f ("libbpf: Implement basic zip archive parsing support")
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307215504.837321-1-deso@posteo.net
2023-03-15 13:18:34 -07:00
Menglong Dong
cc7177624f libbpf: Add support to set kprobe/uprobe attach mode
By default, libbpf will attach the kprobe/uprobe BPF program in the
latest mode that supported by kernel. In this patch, we add the support
to let users manually attach kprobe/uprobe in legacy or perf mode.

There are 3 mode that supported by the kernel to attach kprobe/uprobe:

  LEGACY: create perf event in legacy way and don't use bpf_link
  PERF: create perf event with perf_event_open() and don't use bpf_link

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Biao Jiang <benbjiang@tencent.com>
Link: create perf event with perf_event_open() and use bpf_link
Link: https://lore.kernel.org/bpf/20230113093427.1666466-1-imagedong@tencent.com/
Link: https://lore.kernel.org/bpf/20230306064833.7932-2-imagedong@tencent.com

Users now can manually choose the mode with
bpf_program__attach_uprobe_opts()/bpf_program__attach_kprobe_opts().
2023-03-15 13:18:34 -07:00
Daniel Müller
cf46d44f0a sync: Add section about need for Makefile adjustments
When performing a sync with the kernel repository using the
sync-kernel.sh script, it may be necessary to manually adjust the
library's Makefile if:
- new source files were added upstream
- new public headers were added upstream

This change adds a new section to `SYNC.md` to spell out this need.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 13:06:39 -08:00
Daniel Müller
a41e6ef325 Stop running l4lb_all test on 5.5.0
The l4lb_all/l4lb_noinline_dynptr test no does not run on kernel 5.5.0,
because functionality is missing there. Do not allow running it.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Daniel Müller
c2495832ce libbpf: Properly build zip.o
The sync script does not seem to be automatically adding newly added
files added to the kernel repo build to the local Makefile. Do that now.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Daniel Müller
bfb1e97426 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   c8ee37bde4021a275d2e4f33bd48d54912bb00c4
Checkpoint bpf-next commit: db55174d05ee6bed9d0583ba08e99c891ef0ed05
Baseline bpf commit:        2d311f480b52eeb2e1fd432d64b78d82952c3808
Checkpoint bpf commit:      d900f3d20cc3169ce42ec72acc850e662a4d4db2

Alexei Starovoitov (1):
  bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted.

Daniel Müller (3):
  libbpf: Implement basic zip archive parsing support
  libbpf: Introduce elf_find_func_offset_from_file() function
  libbpf: Add support for attaching uprobes to shared objects in APKs

Joanne Koong (3):
  bpf: Add skb dynptrs
  bpf: Add xdp dynptrs
  bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr

Tero Kristo (1):
  bpf: Add support for absolute value BPF timers

Viktor Malik (3):
  libbpf: Remove unnecessary ternary operator
  libbpf: Remove several dead assignments
  libbpf: Cleanup linker_append_elf_relos

 include/uapi/linux/bpf.h |  33 +++-
 src/bpf_helpers.h        |   2 +-
 src/btf.c                |   2 -
 src/libbpf.c             | 149 ++++++++++++++----
 src/linker.c             |  11 +-
 src/relo_core.c          |   3 -
 src/zip.c                | 328 +++++++++++++++++++++++++++++++++++++++
 src/zip.h                |  47 ++++++
 8 files changed, 529 insertions(+), 46 deletions(-)
 create mode 100644 src/zip.c
 create mode 100644 src/zip.h

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Daniel Müller
a468b16788 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Alexei Starovoitov
6c673bb00b bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted.
__kptr meant to store PTR_UNTRUSTED kernel pointers inside bpf maps.
The concept felt useful, but didn't get much traction,
since bpf_rdonly_cast() was added soon after and bpf programs received
a simpler way to access PTR_UNTRUSTED kernel pointers
without going through restrictive __kptr usage.

Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted to indicate
its intended usage.
The main goal of __kptr_untrusted was to read/write such pointers
directly while bpf_kptr_xchg was a mechanism to access refcnted
kernel pointers. The next patch will allow RCU protected __kptr access
with direct read. At that point __kptr_untrusted will be deprecated.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-2-alexei.starovoitov@gmail.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Tero Kristo
b6c58f7619 bpf: Add support for absolute value BPF timers
Add a new flag BPF_F_TIMER_ABS that can be passed to bpf_timer_start()
to start an absolute value timer instead of the default relative value.
This makes the timer expire at an exact point in time, instead of a time
with latencies induced by both the BPF and timer subsystems.

Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Link: https://lore.kernel.org/r/20230302114614.2985072-2-tero.kristo@linux.intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Daniel Müller
db26142ffb libbpf: Add support for attaching uprobes to shared objects in APKs
This change adds support for attaching uprobes to shared objects located
in APKs, which is relevant for Android systems where various libraries
may reside in APKs. To make that happen, we extend the syntax for the
"binary path" argument to attach to with that supported by various
Android tools:
  <archive>!/<binary-in-archive>

For example:
  /system/app/test-app/test-app.apk!/lib/arm64-v8a/libc++_shared.so

APKs need to be specified via full path, i.e., we do not attempt to
resolve mere file names by searching system directories.

We cannot currently test this functionality end-to-end in an automated
fashion, because it relies on an Android system being present, but there
is no support for that in CI. I have tested the functionality manually,
by creating a libbpf program containing a uretprobe, attaching it to a
function inside a shared object inside an APK, and verifying the sanity
of the returned values.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230301212308.1839139-4-deso@posteo.net
2023-03-06 09:47:37 -08:00
Daniel Müller
47eb62005a libbpf: Introduce elf_find_func_offset_from_file() function
This change splits the elf_find_func_offset() function in two:
elf_find_func_offset(), which now accepts an already opened Elf object
instead of a path to a file that is to be opened, as well as
elf_find_func_offset_from_file(), which opens a binary based on a
path and then invokes elf_find_func_offset() on the Elf object. Having
this split in responsibilities will allow us to call
elf_find_func_offset() from other code paths on Elf objects that did not
necessarily come from a file on disk.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230301212308.1839139-3-deso@posteo.net
2023-03-06 09:47:37 -08:00
Daniel Müller
9ca6f946cd libbpf: Implement basic zip archive parsing support
This change implements support for reading zip archives, including
opening an archive, finding an entry based on its path and name in it,
and closing it.
The code was copied from https://github.com/iovisor/bcc/pull/4440, which
implements similar functionality for bcc. The author confirmed that he
is fine with this usage and the corresponding relicensing. I adjusted it
to adhere to libbpf coding standards.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Michał Gregorczyk <michalgr@meta.com>
Link: https://lore.kernel.org/bpf/20230301212308.1839139-2-deso@posteo.net
2023-03-06 09:47:37 -08:00
Viktor Malik
87695e9723 libbpf: Cleanup linker_append_elf_relos
Clang Static Analyser (scan-build) reports some unused symbols and dead
assignments in the linker_append_elf_relos function. Clean these up.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/c5c8fe9f411b69afada8399d23bb048ef2a70535.1677658777.git.vmalik@redhat.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Viktor Malik
3706449b1b libbpf: Remove several dead assignments
Clang Static Analyzer (scan-build) reports several dead assignments in
libbpf where the assigned value is unconditionally overridden by another
value before it is read. Remove these assignments.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/5503d18966583e55158471ebbb2f67374b11bf5e.1677658777.git.vmalik@redhat.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Viktor Malik
4c75268933 libbpf: Remove unnecessary ternary operator
Coverity reports that the first check of 'err' in bpf_object__init_maps
is always false as 'err' is initialized to 0 at that point. Remove the
unnecessary ternary operator.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/78a3702f2ea9f32a84faaae9b674c56269d330a7.1677658777.git.vmalik@redhat.com
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Joanne Koong
3fe3cccb06 bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr
Two new kfuncs are added, bpf_dynptr_slice and bpf_dynptr_slice_rdwr.
The user must pass in a buffer to store the contents of the data slice
if a direct pointer to the data cannot be obtained.

For skb and xdp type dynptrs, these two APIs are the only way to obtain
a data slice. However, for other types of dynptrs, there is no
difference between bpf_dynptr_slice(_rdwr) and bpf_dynptr_data.

For skb type dynptrs, the data is copied into the user provided buffer
if any of the data is not in the linear portion of the skb. For xdp type
dynptrs, the data is copied into the user provided buffer if the data is
between xdp frags.

If the skb is cloned and a call to bpf_dynptr_data_rdwr is made, then
the skb will be uncloned (see bpf_unclone_prologue()).

Please note that any bpf_dynptr_write() automatically invalidates any prior
data slices of the skb dynptr. This is because the skb may be cloned or
may need to pull its paged buffer into the head. As such, any
bpf_dynptr_write() will automatically have its prior data slices
invalidated, even if the write is to data in the skb head of an uncloned
skb. Please note as well that any other helper calls that change the
underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data
slices of the skb dynptr as well, for the same reasons.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-10-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Joanne Koong
0c5b5b5d91 bpf: Add xdp dynptrs
Add xdp dynptrs, which are dynptrs whose underlying pointer points
to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main
benefits. One is that they allow operations on sizes that are not
statically known at compile-time (eg variable-sized accesses).
Another is that parsing the packet data through dynptrs (instead of
through direct access of xdp->data and xdp->data_end) can be more
ergonomic and less brittle (eg does not need manual if checking for
being within bounds of data_end).

For reads and writes on the dynptr, this includes reading/writing
from/to and across fragments. Data slices through the bpf_dynptr_data
API are not supported; instead bpf_dynptr_slice() and
bpf_dynptr_slice_rdwr() should be used.

For examples of how xdp dynptrs can be used, please see the attached
selftests.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-9-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Joanne Koong
d16fc1f0f5 bpf: Add skb dynptrs
Add skb dynptrs, which are dynptrs whose underlying pointer points
to a skb. The dynptr acts on skb data. skb dynptrs have two main
benefits. One is that they allow operations on sizes that are not
statically known at compile-time (eg variable-sized accesses).
Another is that parsing the packet data through dynptrs (instead of
through direct access of skb->data and skb->data_end) can be more
ergonomic and less brittle (eg does not need manual if checking for
being within bounds of data_end).

For bpf prog types that don't support writes on skb data, the dynptr is
read-only (bpf_dynptr_write() will return an error)

For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write()
interfaces, reading and writing from/to data in the head as well as from/to
non-linear paged buffers is supported. Data slices through the
bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and
bpf_dynptr_slice_rdwr() (added in subsequent commit) should be used.

For examples of how skb dynptrs can be used, please see the attached
selftests.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-8-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
2023-03-06 09:47:37 -08:00
Andrii Nakryiko
37922c6fb2 sync: add sync process documentation at SYNC.md
Explain sync setup expectations, necessary steps, common gotchas and
necessary manual adjustments.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-02-28 09:22:25 -08:00
Yonghong Song
19cd9a1d4b sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   951bce29c8988209cc359e1fa35a4aaa35542fd5
Checkpoint bpf-next commit: c8ee37bde4021a275d2e4f33bd48d54912bb00c4
Baseline bpf commit:        3a70e0d4c9d74cb00f7c0ec022f5599f9f7ba07d
Checkpoint bpf commit:      2d311f480b52eeb2e1fd432d64b78d82952c3808

Ilya Leoshkevich (1):
  libbpf: Document bpf_{btf,link,map,prog}_get_info_by_fd()

Puranjay Mohan (1):
  libbpf: Fix arm syscall regs spec in bpf_tracing.h

Rob Herring (1):
  perf: Add perf_event_attr::config3

Tariq Toukan (1):
  netdev-genl: fix repeated typo oflloading -> offloading

Tiezhu Yang (1):
  libbpf: Use struct user_pt_regs to define __PT_REGS_CAST() for
    LoongArch

Yonghong Song (1):
  libbpf: Fix bpf_xdp_query() in old kernels

 include/uapi/linux/netdev.h     |  2 +-
 include/uapi/linux/perf_event.h |  3 ++
 src/bpf.h                       | 69 ++++++++++++++++++++++++++++++---
 src/bpf_tracing.h               |  3 ++
 src/netlink.c                   |  8 +++-
 5 files changed, 78 insertions(+), 7 deletions(-)

Signed-off-by: Yonghong Song <yhs@fb.com>
2023-02-28 09:17:25 -08:00
Tariq Toukan
a6c64dbfa2 netdev-genl: fix repeated typo oflloading -> offloading
Fix a repeated copy/paste typo.

Fixes: d3d854fd6a1d ("netdev-genl: create a simple family for netdev stuff")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-28 09:17:25 -08:00
Yonghong Song
0d7ac28818 libbpf: Fix bpf_xdp_query() in old kernels
Commit 04d58f1b26a4("libbpf: add API to get XDP/XSK supported features")
added feature_flags to struct bpf_xdp_query_opts. If a user uses
bpf_xdp_query_opts with feature_flags member, the bpf_xdp_query()
will check whether 'netdev' family exists or not in the kernel.
If it does not exist, the bpf_xdp_query() will return -ENOENT.

But 'netdev' family does not exist in old kernels as it is
introduced in the same patch set as Commit 04d58f1b26a4.
So old kernel with newer libbpf won't work properly with
bpf_xdp_query() api call.

To fix this issue, if the return value of
libbpf_netlink_resolve_genl_family_id() is -ENOENT, bpf_xdp_query()
will just return 0, skipping the rest of xdp feature query.
This preserves backward compatibility.

Fixes: 04d58f1b26a4 ("libbpf: add API to get XDP/XSK supported features")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230227224943.1153459-1-yhs@fb.com
2023-02-28 09:17:25 -08:00
Ilya Leoshkevich
3fdc11b883 libbpf: Document bpf_{btf,link,map,prog}_get_info_by_fd()
Replace the short informal description with the proper doc comments.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230220234958.764997-1-iii@linux.ibm.com
2023-02-28 09:17:25 -08:00
Puranjay Mohan
e198fdc928 libbpf: Fix arm syscall regs spec in bpf_tracing.h
The syscall register definitions for ARM in bpf_tracing.h doesn't define
the fifth parameter for the syscalls. Because of this some KPROBES based
selftests fail to compile for ARM architecture.

Define the fifth parameter that is passed in the R5 register (uregs[4]).

Fixes: 3a95c42d65d5 ("libbpf: Define arm syscall regs spec in bpf_tracing.h")
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230223095346.10129-1-puranjay12@gmail.com
2023-02-28 09:17:25 -08:00
Tiezhu Yang
e114bd2657 libbpf: Use struct user_pt_regs to define __PT_REGS_CAST() for LoongArch
LoongArch provides struct user_pt_regs instead of struct pt_regs
to userspace, use struct user_pt_regs to define __PT_REGS_CAST()
to fix the following build error:

     CLNG-BPF [test_maps] loop1.bpf.o
  progs/loop1.c:22:9: error: incomplete definition of type 'struct pt_regs'
                                  m = PT_REGS_RC(ctx);
                                      ^~~~~~~~~~~~~~~
  tools/testing/selftests/bpf/tools/include/bpf/bpf_tracing.h:493:41: note: expanded from macro 'PT_REGS_RC'
  #define PT_REGS_RC(x) (__PT_REGS_CAST(x)->__PT_RC_REG)
                         ~~~~~~~~~~~~~~~~~^
  tools/testing/selftests/bpf/tools/include/bpf/bpf_helper_defs.h:20:8: note: forward declaration of 'struct pt_regs'
  struct pt_regs;
         ^
  1 error generated.
  make: *** [Makefile:572: tools/testing/selftests/bpf/loop1.bpf.o] Error 1
  make: Leaving directory 'tools/testing/selftests/bpf'

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1677235015-21717-2-git-send-email-yangtiezhu@loongson.cn
2023-02-28 09:17:25 -08:00
Rob Herring
bb0f8b32a5 perf: Add perf_event_attr::config3
Arm SPEv1.2 adds another 64-bits of event filtering control. As the
existing perf_event_attr::configN fields are all used up for SPE PMU, an
additional field is needed. Add a new 'config3' field.

Tested-by: James Clark <james.clark@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220825-arm-spe-v8-7-v4-7-327f860daf28@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2023-02-28 09:17:25 -08:00
Andrii Nakryiko
f9106f6bac ci: start using llvm-17 now
LLVM 17 problems were fixed upstream, so switch to using latest v17 in CI.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-02-24 14:11:17 -08:00
Yonghong Song
7ef34fa945 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   6c20822fada1b8adb77fa450d03a0d449686a4a9
Checkpoint bpf-next commit: 951bce29c8988209cc359e1fa35a4aaa35542fd5
Baseline bpf commit:        6c20822fada1b8adb77fa450d03a0d449686a4a9
Checkpoint bpf commit:      3a70e0d4c9d74cb00f7c0ec022f5599f9f7ba07d

Ilya Leoshkevich (2):
  libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
  libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()

Martin KaFai Lau (1):
  bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup

 include/uapi/linux/bpf.h |  6 ++++++
 src/bpf.c                | 20 ++++++++++++++++++++
 src/bpf.h                |  9 +++++++++
 src/btf.c                |  8 ++++----
 src/libbpf.c             | 14 +++++++-------
 src/libbpf.map           |  5 +++++
 src/netlink.c            |  2 +-
 src/ringbuf.c            |  4 ++--
 8 files changed, 54 insertions(+), 14 deletions(-)

Signed-off-by: Yonghong Song <yhs@fb.com>
2023-02-21 22:27:55 -08:00
Yonghong Song
7cfc12cb41 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Yonghong Song <yhs@fb.com>
2023-02-21 22:27:55 -08:00
Martin KaFai Lau
c16cae9381 bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
The bpf_fib_lookup() also looks up the neigh table.
This was done before bpf_redirect_neigh() was added.

In the use case that does not manage the neigh table
and requires bpf_fib_lookup() to lookup a fib to
decide if it needs to redirect or not, the bpf prog can
depend only on using bpf_redirect_neigh() to lookup the
neigh. It also keeps the neigh entries fresh and connected.

This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid
the double neigh lookup when the bpf prog always call
bpf_redirect_neigh() to do the neigh lookup. The params->smac
output is skipped together when SKIP_NEIGH is set because
bpf_redirect_neigh() will figure out the smac also.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev
2023-02-21 22:27:55 -08:00
Ilya Leoshkevich
768164af0e libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
Use the new type-safe wrappers around bpf_obj_get_info_by_fd().

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230214231221.249277-3-iii@linux.ibm.com
2023-02-21 22:27:55 -08:00
Ilya Leoshkevich
30f6bc3c0a libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
These are type-safe wrappers around bpf_obj_get_info_by_fd(). They
found one problem in selftests, and are also useful for adding
Memory Sanitizer annotations.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230214231221.249277-2-iii@linux.ibm.com
2023-02-21 22:27:55 -08:00
Song Liu
ea28429902 ci: Remove xdp_info from ALLOWLIST-5.5.0
xdp_info now depends on newer functionalities. Let's skip it for 5.5.0
kernel.

Signed-off-by: Song Liu <song@kernel.org>
2023-02-17 17:17:27 -08:00
Song Liu
34212c94a6 ci: regenerate vmlinux.h
Regenerate latest vmlinux.h for old kernel CI tests.

Signed-off-by: Song Liu <song@kernel.org>
2023-02-17 17:17:27 -08:00
Song Liu
6f1c8eddb2 sync: Add netdev.h from kernel tree
Add netdev.h to include/uapi/linux to make build success.

Signed-off-by: Song Liu <song@kernel.org>
2023-02-17 17:17:27 -08:00
Song Liu
4b492df97e sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   a5f6b9d577eba18601c14bba2dbff4a9b76af962
Checkpoint bpf-next commit: 6c20822fada1b8adb77fa450d03a0d449686a4a9
Baseline bpf commit:        e8c8fd9b8393d7064152c8806f5ac446d760a23e
Checkpoint bpf commit:      6c20822fada1b8adb77fa450d03a0d449686a4a9

Dave Marchevsky (1):
  bpf: Add basic bpf_rb_{root,node} support

Florian Lehner (1):
  bpf: fix typo in header for bpf_perf_prog_read_value

Grant Seltzer (2):
  libbpf: Fix malformed documentation formatting
  libbpf: Add documentation to map pinning API functions

Hao Xiang (1):
  libbpf: Correctly set the kernel code version in Debian kernel.

Ilya Leoshkevich (4):
  libbpf: Simplify barrier_var()
  libbpf: Fix unbounded memory access in bpf_usdt_arg()
  libbpf: Fix BPF_PROBE_READ{_STR}_INTO() on s390x
  libbpf: Fix alen calculation in libbpf_nla_dump_errormsg()

Jon Doron (1):
  libbpf: Add sample_period to creation options

Lorenzo Bianconi (3):
  libbpf: add the capability to specify netlink proto in
    libbpf_netlink_send_recv
  libbpf: add API to get XDP/XSK supported features
  libbpf: Always use libbpf_err to return an error in bpf_xdp_query()

Randy Dunlap (1):
  Documentation: bpf: correct spelling

Tiezhu Yang (1):
  tools/bpf: Use tab instead of white spaces to sync bpf.h

 docs/libbpf_naming_convention.rst |   6 +-
 include/uapi/linux/bpf.h          |  17 ++++-
 src/bpf_core_read.h               |   4 +-
 src/bpf_helpers.h                 |   2 +-
 src/libbpf.c                      |  46 ++----------
 src/libbpf.h                      |  97 +++++++++++++++++++++---
 src/libbpf_probes.c               |  83 +++++++++++++++++++++
 src/netlink.c                     | 118 +++++++++++++++++++++++++++---
 src/nlattr.c                      |   2 +-
 src/nlattr.h                      |  12 +++
 src/usdt.bpf.h                    |   5 +-
 11 files changed, 321 insertions(+), 71 deletions(-)

Signed-off-by: Song Liu <song@kernel.org>
2023-02-17 17:17:27 -08:00
Song Liu
24476fe699 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Song Liu <song@kernel.org>
2023-02-17 17:17:27 -08:00
Dave Marchevsky
d74065659a bpf: Add basic bpf_rb_{root,node} support
This patch adds special BPF_RB_{ROOT,NODE} btf_field_types similar to
BPF_LIST_{HEAD,NODE}, adds the necessary plumbing to detect the new
types, and adds bpf_rb_root_free function for freeing bpf_rb_root in
map_values.

structs bpf_rb_root and bpf_rb_node are opaque types meant to
obscure structs rb_root_cached rb_node, respectively.

btf_struct_access will prevent BPF programs from touching these special
fields automatically now that they're recognized.

btf_check_and_fixup_fields now groups list_head and rb_root together as
"graph root" fields and {list,rb}_node as "graph node", and does same
ownership cycle checking as before. Note that this function does _not_
prevent ownership type mixups (e.g. rb_root owning list_node) - that's
handled by btf_parse_graph_root.

After this patch, a bpf program can have a struct bpf_rb_root in a
map_value, but not add anything to nor do anything useful with it.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230214004017.2534011-2-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-17 17:17:27 -08:00
Ilya Leoshkevich
418962b686 libbpf: Fix alen calculation in libbpf_nla_dump_errormsg()
The code assumes that everything that comes after nlmsgerr are nlattrs.
When calculating their size, it does not account for the initial
nlmsghdr. This may lead to accessing uninitialized memory.

Fixes: bbf48c18ee0c ("libbpf: add error reporting in XDP")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230210001210.395194-8-iii@linux.ibm.com
2023-02-17 17:17:27 -08:00
Jon Doron
8c8243a409 libbpf: Add sample_period to creation options
Add option to set when the perf buffer should wake up, by default the
perf buffer becomes signaled for every event that is being pushed to it.

In case of a high throughput of events it will be more efficient to wake
up only once you have X events ready to be read.

So your application can wakeup once and drain the entire perf buffer.

Signed-off-by: Jon Doron <jond@wiz.io>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230207081916.3398417-1-arilou@gmail.com
2023-02-17 17:17:27 -08:00
Lorenzo Bianconi
6333ea6a3a libbpf: Always use libbpf_err to return an error in bpf_xdp_query()
In order to properly set errno, rely on libbpf_err utility routine in
bpf_xdp_query() to return an error to the caller.

Fixes: 04d58f1b26a4 ("libbpf: add API to get XDP/XSK supported features")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/827d40181f9f90fb37702f44328e1614df7c0503.1675768112.git.lorenzo@kernel.org
2023-02-17 17:17:27 -08:00
Hao Xiang
855bf91055 libbpf: Correctly set the kernel code version in Debian kernel.
In a previous commit, Ubuntu kernel code version is correctly set
by retrieving the information from /proc/version_signature.

commit<5b3d72987701d51bf31823b39db49d10970f5c2d>
(libbpf: Improve LINUX_VERSION_CODE detection)

The /proc/version_signature file doesn't present in at least the
older versions of Debian distributions (eg, Debian 9, 10). The Debian
kernel has a similar issue where the release information from uname()
syscall doesn't give the kernel code version that matches what the
kernel actually expects. Below is an example content from Debian 10.

release: 4.19.0-23-amd64
version: #1 SMP Debian 4.19.269-1 (2022-12-20) x86_64

Debian reports incorrect kernel version in utsname::release returned
by uname() syscall, which in older kernels (Debian 9, 10) leads to
kprobe BPF programs failing to load due to the version check mismatch.

Fortunately, the correct kernel code version presents in the
utsname::version returned by uname() syscall in Debian kernels. This
change adds another get kernel version function to handle Debian in
addition to the previously added get kernel version function to handle
Ubuntu. Some minor refactoring work is also done to make the code more
readable.

Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230203234842.2933903-1-hao.xiang@bytedance.com
2023-02-17 17:17:27 -08:00
Florian Lehner
5e0270f66e bpf: fix typo in header for bpf_perf_prog_read_value
Fix a simple typo in the documentation for bpf_perf_prog_read_value.

Signed-off-by: Florian Lehner <dev@der-flo.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20230203121439.25884-1-dev@der-flo.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-17 17:17:27 -08:00
Lorenzo Bianconi
547881e04e libbpf: add API to get XDP/XSK supported features
Extend bpf_xdp_query routine in order to get XDP/XSK supported features
of netdev over route netlink interface.
Extend libbpf netlink implementation in order to support netlink_generic
protocol.

Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Co-developed-by: Marek Majtyka <alardam@gmail.com>
Signed-off-by: Marek Majtyka <alardam@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/a72609ef4f0de7fee5376c40dbf54ad7f13bfb8d.1675245258.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-17 17:17:27 -08:00
Lorenzo Bianconi
41b96a8c08 libbpf: add the capability to specify netlink proto in libbpf_netlink_send_recv
This is a preliminary patch in order to introduce netlink_generic
protocol support to libbpf.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/7878a54667e74afeec3ee519999c044bd514b44c.1675245258.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-17 17:17:27 -08:00
Tiezhu Yang
700d755151 tools/bpf: Use tab instead of white spaces to sync bpf.h
Just silence the following build warning:

Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h'

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/1675319486-27744-2-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-17 17:17:27 -08:00
Ilya Leoshkevich
981da2b380 libbpf: Fix BPF_PROBE_READ{_STR}_INTO() on s390x
BPF_PROBE_READ_INTO() and BPF_PROBE_READ_STR_INTO() should map to
bpf_probe_read() and bpf_probe_read_str() respectively in order to work
correctly on architectures with !ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230128000650.1516334-24-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-17 17:17:27 -08:00
Ilya Leoshkevich
23898cf858 libbpf: Fix unbounded memory access in bpf_usdt_arg()
Loading programs that use bpf_usdt_arg() on s390x fails with:

    ; if (arg_num >= BPF_USDT_MAX_ARG_CNT || arg_num >= spec->arg_cnt)
    128: (79) r1 = *(u64 *)(r10 -24)      ; frame1: R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
    129: (25) if r1 > 0xb goto pc+83      ; frame1: R1_w=scalar(umax=11,var_off=(0x0; 0xf))
    ...
    ; arg_spec = &spec->args[arg_num];
    135: (79) r1 = *(u64 *)(r10 -24)      ; frame1: R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
    ...
    ; switch (arg_spec->arg_type) {
    139: (61) r1 = *(u32 *)(r2 +8)
    R2 unbounded memory access, make sure to bounds check any such access

The reason is that, even though the C code enforces that
arg_num < BPF_USDT_MAX_ARG_CNT, the verifier cannot propagate this
constraint to the arg_spec assignment yet. Help it by forcing r1 back
to stack after comparison.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230128000650.1516334-23-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-17 17:17:27 -08:00
Ilya Leoshkevich
7285d529cf libbpf: Simplify barrier_var()
Use a single "+r" constraint instead of the separate "=r" and "0".

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230128000650.1516334-22-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-17 17:17:27 -08:00
Randy Dunlap
dd460a52bc Documentation: bpf: correct spelling
Correct spelling problems for Documentation/bpf/ as reported
by codespell.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20230128195046.13327-1-rdunlap@infradead.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-17 17:17:27 -08:00
Grant Seltzer
44c1d381ff libbpf: Add documentation to map pinning API functions
This adds documentation for the following API functions:

- bpf_map__set_pin_path()
- bpf_map__pin_path()
- bpf_map__is_pinned()
- bpf_map__pin()
- bpf_map__unpin()
- bpf_object__pin_maps()
- bpf_object__unpin_maps()

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230126024225.520685-1-grantseltzer@gmail.com
2023-02-17 17:17:27 -08:00
Grant Seltzer
522fe6f721 libbpf: Fix malformed documentation formatting
This fixes the doxygen format documentation above the
user_ring_buffer__* APIs. There has to be a newline
before the @brief, otherwise doxygen won't render them
for libbpf.readthedocs.org.

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230126024749.522278-1-grantseltzer@gmail.com
2023-02-17 17:17:27 -08:00
Andrii Nakryiko
04aafdf9c9 ci: replicate BPF CI changes for clang installation
Add ability to install specified version of Clang. This replicates what
was done in https://github.com/libbpf/ci/pull/86.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-02-09 16:49:06 -08:00
Andrii Nakryiko
416620416f sync: sync include/uapi/linux/openat2.h
As reported in [0], we are missing openat2.h in libbpf-local UAPI
headers. Sync it and adjust sync script to keep syncing it going
forward.

  [0] https://github.com/libbpf/libbpf/issues/649

Closes: https://github.com/libbpf/libbpf/issues/649
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-01-31 16:46:18 -08:00
Joanne Koong
6b4a3f3131 ci: Update default llvm version to 17
Currently, CI is unable to locate llvm-16 on
aarch64/gcc, aarch64/llvm-16, and s390x/gcc [0].

This change upgrades the default llvm version to 17.

[0] https://github.com/kernel-patches/bpf/actions/runs/4040302668

Signed-off-by: Joanne Koong <joannekoong@gmail.com>
2023-01-30 17:20:12 -08:00
Dave Marchevsky
d73ecc91e1 Add patch fixing s390 issues
Signed-off-by: Dave Marchevsky <davemarchevsky@gmail.com>
2023-01-26 11:23:52 -08:00
Andrii Nakryiko
c2e797c8de ci: temporarily denylist decap_sanity test
It is mysteriously fails in CI, for now don't run it.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
f99818dd1a libbpf: regenerate vmlinux.h
Regenerate latest vmlinux.h for old kernel CI tests.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
b2e29a1026 libbpf: dump version to v1.2 in Makefile
Bump LIBBPF_MINOR_VERSION to 2 for v1.2 dev cycle.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
e398e7eaf4 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   7b43df6c6ec38c9097420902a1c8165c4b25bf70
Checkpoint bpf-next commit: a5f6b9d577eba18601c14bba2dbff4a9b76af962
Baseline bpf commit:        54c3f1a81421f85e60ae2eaae7be3727a09916ee
Checkpoint bpf commit:      e8c8fd9b8393d7064152c8806f5ac446d760a23e

Alexei Starovoitov (1):
  libbpf: Restore errno after pr_warn.

Andrii Nakryiko (24):
  libbpf: start v1.2 development cycle
  libbpf: Add support for fetching up to 8 arguments in kprobes
  libbpf: Add 6th argument support for x86-64 in bpf_tracing.h
  libbpf: Fix arm and arm64 specs in bpf_tracing.h
  libbpf: Complete mips spec in bpf_tracing.h
  libbpf: Complete powerpc spec in bpf_tracing.h
  libbpf: Complete sparc spec in bpf_tracing.h
  libbpf: Complete riscv arch spec in bpf_tracing.h
  libbpf: Fix and complete ARC spec in bpf_tracing.h
  libbpf: Complete LoongArch (loongarch) spec in bpf_tracing.h
  libbpf: Add BPF_UPROBE and BPF_URETPROBE macro aliases
  libbpf: Improve syscall tracing support in bpf_tracing.h
  libbpf: Define x86-64 syscall regs spec in bpf_tracing.h
  libbpf: Define i386 syscall regs spec in bpf_tracing.h
  libbpf: Define s390x syscall regs spec in bpf_tracing.h
  libbpf: Define arm syscall regs spec in bpf_tracing.h
  libbpf: Define arm64 syscall regs spec in bpf_tracing.h
  libbpf: Define mips syscall regs spec in bpf_tracing.h
  libbpf: Define powerpc syscall regs spec in bpf_tracing.h
  libbpf: Define sparc syscall regs spec in bpf_tracing.h
  libbpf: Define riscv syscall regs spec in bpf_tracing.h
  libbpf: Define arc syscall regs spec in bpf_tracing.h
  libbpf: Define loongarch syscall regs spec in bpf_tracing.h
  libbpf: Clean up now not needed __PT_PARM{1-6}_SYSCALL_REG defaults

Changbin Du (1):
  libbpf: Return -ENODATA for missing btf section

Daniel T. Lee (1):
  libbpf: Fix invalid return address register in s390

David Vernet (1):
  libbpf: Support sleepable struct_ops.s section

Hengqi Chen (1):
  libbpf: Add LoongArch support to bpf_tracing.h

Ludovic L'Hours (1):
  libbpf: Fix map creation flags sanitization

Menglong Dong (1):
  libbpf: Replace '.' with '_' in legacy kprobe event name

Rong Tao (1):
  libbpf: Poison strlcpy()

Stanislav Fomichev (1):
  bpf: Introduce device-bound XDP programs

Xin Liu (2):
  libbpf: fix errno is overwritten after being closed.
  libbpf: Added the description of some API functions

Ziyang Xuan (1):
  bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()

 include/uapi/linux/bpf.h |  12 ++
 src/bpf_tracing.h        | 320 ++++++++++++++++++++++++++++++++++-----
 src/btf.c                |   2 +-
 src/libbpf.c             |  10 +-
 src/libbpf.h             |  29 +++-
 src/libbpf.map           |   3 +
 src/libbpf_internal.h    |   5 +-
 src/libbpf_version.h     |   2 +-
 8 files changed, 341 insertions(+), 42 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
c93ba3907f sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-01-25 20:44:09 -08:00
David Vernet
112479afb7 libbpf: Support sleepable struct_ops.s section
In a prior change, the verifier was updated to support sleepable
BPF_PROG_TYPE_STRUCT_OPS programs. A caller could set the program as
sleepable with bpf_program__set_flags(), but it would be more ergonomic
and more in-line with other sleepable program types if we supported
suffixing a struct_ops section name with .s to indicate that it's
sleepable.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125164735.785732-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
004ed7120b libbpf: Clean up now not needed __PT_PARM{1-6}_SYSCALL_REG defaults
Each architecture supports at least 6 syscall argument registers, so now
that specs for each architecture is defined in bpf_tracing.h, remove
unnecessary macro overrides, which previously were required to keep
existing BPF_KSYSCALL() uses compiling and working.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-26-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
97740e5103 libbpf: Define loongarch syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-24-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
ef191974b3 libbpf: Define arc syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-23-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
ed66fb297d libbpf: Define riscv syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Pu Lehui <pulehui@huawei.com> # RISC-V
Link: https://lore.kernel.org/bpf/20230120200914.3008030-22-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
2c58ba33fb libbpf: Define sparc syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-21-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
9a6f8da473 libbpf: Define powerpc syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.
Note that 7th arg is supported on 32-bit powerpc architecture, by not on
powerpc64.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-20-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
a005bb2ff8 libbpf: Define mips syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-19-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
7f627a6202 libbpf: Define arm64 syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.
We need PT_REGS_PARM1_[CORE_]SYSCALL macros overrides, similarly to
s390x, due to orig_x0 not being present in UAPI's pt_regs, so we need to
utilize BPF CO-RE and custom pt_regs___arm64 definition.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Link: https://lore.kernel.org/bpf/20230120200914.3008030-18-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
a095b4f04d libbpf: Define arm syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-17-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
bd6e1ec311 libbpf: Define s390x syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.
Note that we need custom overrides for PT_REGS_PARM1_[CORE_]SYSCALL
macros due to the need to use BPF CO-RE and custom local pt_regs
definitions to fetch orig_gpr2, storing 1st argument.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # s390x
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-16-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
b2d8a8d269 libbpf: Define i386 syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-15-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
df16188dc2 libbpf: Define x86-64 syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.
Remove now unnecessary overrides of PT_REGS_PARM5_[CORE_]SYSCALL macros.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-14-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
672401ae09 libbpf: Improve syscall tracing support in bpf_tracing.h
Set up generic support in bpf_tracing.h for up to 7 syscall arguments
tracing with BPF_KSYSCALL, which seems to be the limit according to
syscall(2) manpage. Also change the way that syscall convention is
specified to be more explicit. Subsequent patches will adjust and define
proper per-architecture syscall conventions.

__PT_PARM1_SYSCALL_REG through __PT_PARM6_SYSCALL_REG is added
temporarily to keep everything working before each architecture has
syscall reg tables defined. They will be removed afterwards.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Link: https://lore.kernel.org/bpf/20230120200914.3008030-13-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
ed8b4c90ea libbpf: Add BPF_UPROBE and BPF_URETPROBE macro aliases
Add BPF_UPROBE and BPF_URETPROBE macros, aliased to BPF_KPROBE and
BPF_KRETPROBE, respectively. This makes uprobe-based BPF program code
much less confusing, especially to people new to tracing, at no cost in
terms of maintainability. We'll use this macro in selftests in
subsequent patch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-11-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
2094e1b37e libbpf: Complete LoongArch (loongarch) spec in bpf_tracing.h
Add PARM6 through PARM8 definitions. Add kernel docs link describing ABI
for LoongArch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-10-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
c978366c38 libbpf: Fix and complete ARC spec in bpf_tracing.h
Add PARM6 through PARM8 definitions. Also fix frame pointer (FP)
register definition. Also leave a link to where to find ABI spec.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-9-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
9db84de5f0 libbpf: Complete riscv arch spec in bpf_tracing.h
Add PARM6 through PARM8 definitions for RISC V (riscv) arch. Leave the
link for ABI doc for future reference.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Pu Lehui <pulehui@huawei.com> # RISC-V
Link: https://lore.kernel.org/bpf/20230120200914.3008030-8-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
ffbc84cf6f libbpf: Complete sparc spec in bpf_tracing.h
Add PARM6 definition for sparc architecture. Leave a link to calling
convention documentation.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-7-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
7b86294a90 libbpf: Complete powerpc spec in bpf_tracing.h
Add definitions of PARM6 through PARM8 for powerpc architecture. Add
also a link to a functiona call sequence documentation for future reference.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-6-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
31e29d9346 libbpf: Complete mips spec in bpf_tracing.h
Add registers for PARM6 through PARM8. Add a link to an ABI. We don't
distinguish between O32, N32, and N64, so document that we assume N64
right now.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-5-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
1b48e879a4 libbpf: Fix arm and arm64 specs in bpf_tracing.h
Remove invalid support for PARM5 on 32-bit arm, as per ABI. Add three
more argument registers for arm64. Also leave links to ABI specs for
future reference.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Link: https://lore.kernel.org/bpf/20230120200914.3008030-4-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
4759a83309 libbpf: Add 6th argument support for x86-64 in bpf_tracing.h
Add r9 as register containing 6th argument on x86-64 architecture, as
per its ABI. Add also a link to a page describing ABI for easier future
reference.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-3-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
c94a3fd806 libbpf: Add support for fetching up to 8 arguments in kprobes
Add BPF_KPROBE() and PT_REGS_PARMx() support for up to 8 arguments, if
target architecture supports this. Currently all architectures are
limited to only 5 register-placed arguments, which is limiting even on
x86-64.

This patch adds generic macro machinery to support up to 8 arguments
both when explicitly fetching it from pt_regs through PT_REGS_PARMx()
macros, as well as more ergonomic access in BPF_KPROBE().

Also, for i386 architecture we now don't have to define fake PARM4 and
PARM5 definitions, they will be generically substituted, just like for
PARM6 through PARM8.

Subsequent patches will fill out architecture-specific definitions,
where appropriate.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # s390x
Link: https://lore.kernel.org/bpf/20230120200914.3008030-2-andrii@kernel.org
2023-01-25 20:44:09 -08:00
Stanislav Fomichev
7d68fca99c bpf: Introduce device-bound XDP programs
New flag BPF_F_XDP_DEV_BOUND_ONLY plus all the infra to have a way
to associate a netdev with a BPF program at load time.

netdevsim checks are dropped in favor of generic check in dev_xdp_attach.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-6-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-25 20:44:09 -08:00
Ziyang Xuan
ed09f7e65b bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()
Add ipip6 and ip6ip decap support for bpf_skb_adjust_room().
Main use case is for using cls_bpf on ingress hook to decapsulate
IPv4 over IPv6 and IPv6 over IPv4 tunnel packets.

Add two new flags BPF_F_ADJ_ROOM_DECAP_L3_IPV{4,6} to indicate the
new IP header version after decapsulating the outer IP header.

Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/b268ec7f0ff9431f4f43b1b40ab856ebb28cb4e1.1673574419.git.william.xuanziyang@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-25 20:44:09 -08:00
Menglong Dong
49e950dcfa libbpf: Replace '.' with '_' in legacy kprobe event name
'.' is not allowed in the event name of kprobe. Therefore, we will get a
EINVAL if the kernel function name has a '.' in legacy kprobe attach
case, such as 'icmp_reply.constprop.0'.

In order to adapt this case, we need to replace the '.' with other char
in gen_kprobe_legacy_event_name(). And I use '_' for this propose.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20230113093427.1666466-1-imagedong@tencent.com
2023-01-25 20:44:09 -08:00
Ludovic L'Hours
ce8d078ac7 libbpf: Fix map creation flags sanitization
As BPF_F_MMAPABLE flag is now conditionnaly set (by map_is_mmapable),
it should not be toggled but disabled if not supported by kernel.

Fixes: 4fcac46c7e10 ("libbpf: only add BPF_F_MMAPABLE flag for data maps with global vars")
Signed-off-by: Ludovic L'Hours <ludovic.lhours@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230108182018.24433-1-ludovic.lhours@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-25 20:44:09 -08:00
Rong Tao
42d77b062c libbpf: Poison strlcpy()
Since commit 9fc205b413b3("libbpf: Add sane strncpy alternative and use
it internally") introduce libbpf_strlcpy(), thus add strlcpy() to a poison
list to prevent accidental use of it.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/tencent_5695A257C4D16B4413036BA1DAACDECB0B07@qq.com
2023-01-25 20:44:09 -08:00
Changbin Du
d572b6359e libbpf: Return -ENODATA for missing btf section
As discussed before, return -ENODATA (No data available) would be more
meaningful than ENOENT (No such file or directory).

Suggested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221231151436.6541-1-changbin.du@gmail.com
2023-01-25 20:44:09 -08:00
Hengqi Chen
f758104b07 libbpf: Add LoongArch support to bpf_tracing.h
Add PT_REGS macros for LoongArch ([0]).

  [0]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://lore.kernel.org/bpf/20221231100757.3177034-1-hengqi.chen@gmail.com
2023-01-25 20:44:09 -08:00
Alexei Starovoitov
b92963bbe2 libbpf: Restore errno after pr_warn.
pr_warn calls into user-provided callback, which can clobber errno, so
`errno = saved_errno` should happen after pr_warn.

Fixes: 07453245620c ("libbpf: fix errno is overwritten after being closed.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 20:44:09 -08:00
Xin Liu
34fadd0fbe libbpf: Added the description of some API functions
Currently, many API functions are not described in the document.
Add add API description of the following four API functions:
  - libbpf_set_print;
  - bpf_object__open;
  - bpf_object__load;
  - bpf_object__close.

Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221224112058.12038-1-liuxin350@huawei.com
2023-01-25 20:44:09 -08:00
Daniel T. Lee
09f1324bd7 libbpf: Fix invalid return address register in s390
There is currently an invalid register mapping in the s390 return
address register. As the manual[1] states, the return address can be
found at r14. In bpf_tracing.h, the s390 registers were named
gprs(general purpose registers). This commit fixes the problem by
correcting the mistyped mapping.

[1]: https://uclibc.org/docs/psABI-s390x.pdf#page=14

Fixes: 3cc31d794097 ("libbpf: Normalize PT_REGS_xxx() macro definitions")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221224071527.2292-7-danieltimlee@gmail.com
2023-01-25 20:44:09 -08:00
Xin Liu
8cd371816b libbpf: fix errno is overwritten after being closed.
In the ensure_good_fd function, if the fcntl function succeeds but
the close function fails, ensure_good_fd returns a normal fd and
sets errno, which may cause users to misunderstand. The close
failure is not a serious problem, and the correct FD has been
handed over to the upper-layer application. Let's restore errno here.

Signed-off-by: Xin Liu <liuxin350@huawei.com>
Link: https://lore.kernel.org/r/20221223133618.10323-1-liuxin350@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 20:44:09 -08:00
Andrii Nakryiko
7d075a739e libbpf: start v1.2 development cycle
Bump current version for new development cycle to v1.2.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20221221180049.853365-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-25 20:44:09 -08:00
Quentin Monnet
3423d5e7cd sync: Remove "git format-patch" signature (version) from cover letter
When syncing with the kernel, the script generates a cover letter for
the latest changes using "git format-patch". Unless specified otherwise,
it uses a signature (as in, email footer signature) which defaults to
the Git version in use, and ends up in the commit logs. This doesn't
bring any useful information in there: let's get rid of this version
number.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
2023-01-04 09:23:49 -08:00
Daniel Müller
e3a40329bb ci: Add patch setting CONFIG_FUNCTION_ERROR_INJECTION in CI
Similar to what we did for vmtest [0], libbpf needs the patch setting
CONFIG_FUNCTION_ERROR_INJECTION in CI. Add it.

[0] https://github.com/kernel-patches/vmtest/pull/181

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-12-21 12:11:32 -08:00
thiagoftsm
a16e904d6c Merge branch 'libbpf:master' into master 2022-12-21 16:15:11 +00:00
Andrii Nakryiko
6597330c45 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   0e43662e61f2569500ab83b8188c065603530785
Checkpoint bpf-next commit: 7b43df6c6ec38c9097420902a1c8165c4b25bf70
Baseline bpf commit:        f506439ec3dee11e0e77b0a1f3fb3eec22c97873
Checkpoint bpf commit:      54c3f1a81421f85e60ae2eaae7be3727a09916ee

Changbin Du (1):
  libbpf: Show error info about missing ".BTF" section

Christian Ehrig (1):
  bpf: Add flag BPF_F_NO_TUNNEL_KEY to bpf_skb_set_tunnel_key()

Khem Raj (1):
  libbpf: Fix build warning on ref_ctr_off for 32-bit architectures

 include/uapi/linux/bpf.h | 4 ++++
 src/btf.c                | 1 +
 src/libbpf.c             | 2 +-
 3 files changed, 6 insertions(+), 1 deletion(-)

--
2.30.2

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-20 22:23:18 -08:00
Andrii Nakryiko
2e287cd201 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-20 22:23:18 -08:00
Changbin Du
49bd40e869 libbpf: Show error info about missing ".BTF" section
Show the real problem instead of just saying "No such file or directory".

Now will print below info:
libbpf: failed to find '.BTF' ELF section in /home/changbin/work/linux/vmlinux
Error: failed to load BTF from /home/changbin/work/linux/vmlinux: No such file or directory

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221217223509.88254-2-changbin.du@gmail.com
2022-12-20 22:23:18 -08:00
Khem Raj
f7dba2c313 libbpf: Fix build warning on ref_ctr_off for 32-bit architectures
Clang warns on 32-bit ARM on this comparision:

libbpf.c:10497:18: error: result of comparison of constant 4294967296 with expression of type 'size_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
        if (ref_ctr_off >= (1ULL << PERF_UPROBE_REF_CTR_OFFSET_BITS))
            ~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Typecast ref_ctr_off to __u64 in the check conditional, it is false on
32bit anyways.

Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221219191526.296264-1-raj.khem@gmail.com
2022-12-20 22:23:18 -08:00
Christian Ehrig
41ac436073 bpf: Add flag BPF_F_NO_TUNNEL_KEY to bpf_skb_set_tunnel_key()
This patch allows to remove TUNNEL_KEY from the tunnel flags bitmap
when using bpf_skb_set_tunnel_key by providing a BPF_F_NO_TUNNEL_KEY
flag. On egress, the resulting tunnel header will not contain a tunnel
key if the protocol and implementation supports it.

At the moment bpf_tunnel_key wants a user to specify a numeric tunnel
key. This will wrap the inner packet into a tunnel header with the key
bit and value set accordingly. This is problematic when using a tunnel
protocol that supports optional tunnel keys and a receiving tunnel
device that is not expecting packets with the key bit set. The receiver
won't decapsulate and drop the packet.

RFC 2890 and RFC 2784 GRE tunnels are examples where this flag is
useful. It allows for generating packets, that can be decapsulated by
a GRE tunnel device not operating in collect metadata mode or not
expecting the key bit set.

Signed-off-by: Christian Ehrig <cehrig@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221218051734.31411-1-cehrig@cloudflare.com
2022-12-20 22:23:18 -08:00
Andrii Nakryiko
75987cc295 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   b148c8b9b926e257a59c8eb2cd6fa3adfd443254
Checkpoint bpf-next commit: 0e43662e61f2569500ab83b8188c065603530785
Baseline bpf commit:        4121d4481b72501aa4d22680be4ea1096d69d133
Checkpoint bpf commit:      f506439ec3dee11e0e77b0a1f3fb3eec22c97873

Andrii Nakryiko (1):
  libbpf: Fix btf_dump's packed struct determination

 src/btf_dump.c | 33 ++++++---------------------------
 1 file changed, 6 insertions(+), 27 deletions(-)

--
2.30.2

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-15 14:34:52 -08:00
Andrii Nakryiko
b9f1a06c70 libbpf: Fix btf_dump's packed struct determination
Fix bug in btf_dump's logic of determining if a given struct type is
packed or not. The notion of "natural alignment" is not needed and is
even harmful in this case, so drop it altogether. The biggest difference
in btf_is_struct_packed() compared to its original implementation is
that we don't really use btf__align_of() to determine overall alignment
of a struct type (because it could be 1 for both packed and non-packed
struct, depending on specifci field definitions), and just use field's
actual alignment to calculate whether any field is requiring packing or
struct's size overall necessitates packing.

Add two simple test cases that demonstrate the difference this change
would make.

Fixes: ea2ce1ba99aa ("libbpf: Fix BTF-to-C converter's padding logic")
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20221215183605.4149488-1-andrii@kernel.org
2022-12-15 14:34:52 -08:00
Andrii Nakryiko
30554b08fe sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   706819495921ddad6b3780140b9d9e9293b6dedc
Checkpoint bpf-next commit: b148c8b9b926e257a59c8eb2cd6fa3adfd443254
Baseline bpf commit:        e931a173a685fe213127ae5aa6b7f2196c1d875d
Checkpoint bpf commit:      4121d4481b72501aa4d22680be4ea1096d69d133

Andrii Nakryiko (4):
  libbpf: Fix single-line struct definition output in btf_dump
  libbpf: Handle non-standardly sized enums better in BTF-to-C dumper
  libbpf: Fix btf__align_of() by taking into account field offsets
  libbpf: Fix BTF-to-C converter's padding logic

Eyal Birger (1):
  tools: add IFLA_XFRM_COLLECT_METADATA to uapi/linux/if_link.h

Kumar Kartikeya Dwivedi (1):
  bpf: Rework process_dynptr_func

Timo Hunziker (1):
  libbpf: Parse usdt args without offset on x86 (e.g. 8@(%rsp))

Xin Liu (1):
  libbpf: Optimized return value in libbpf_strerror when errno is libbpf
    errno

 include/uapi/linux/bpf.h     |   8 +-
 include/uapi/linux/if_link.h |   1 +
 src/btf.c                    |  13 +++
 src/btf_dump.c               | 214 +++++++++++++++++++++++++++--------
 src/libbpf_errno.c           |  16 ++-
 src/usdt.c                   |   8 ++
 6 files changed, 204 insertions(+), 56 deletions(-)

--
2.30.2

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-14 22:09:00 -08:00
Andrii Nakryiko
b0ff8e90f7 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-14 22:09:00 -08:00
Andrii Nakryiko
0b80970cb6 libbpf: Fix BTF-to-C converter's padding logic
Turns out that btf_dump API doesn't handle a bunch of tricky corner
cases, as reported by Per, and further discovered using his testing
Python script ([0]).

This patch revamps btf_dump's padding logic significantly, making it
more correct and also avoiding unnecessary explicit padding, where
compiler would pad naturally. This overall topic turned out to be very
tricky and subtle, there are lots of subtle corner cases. The comments
in the code tries to give some clues, but comments themselves are
supposed to be paired with good understanding of C alignment and padding
rules. Plus some experimentation to figure out subtle things like
whether `long :0;` means that struct is now forced to be long-aligned
(no, it's not, turns out).

Anyways, Per's script, while not completely correct in some known
situations, doesn't show any obvious cases where this logic breaks, so
this is a nice improvement over the previous state of this logic.

Some selftests had to be adjusted to accommodate better use of natural
alignment rules, eliminating some unnecessary padding, or changing it to
`type: 0;` alignment markers.

Note also that for when we are in between bitfields, we emit explicit
bit size, while otherwise we use `: 0`, this feels much more natural in
practice.

Next patch will add few more test cases, found through randomized Per's
script.

  [0] https://lore.kernel.org/bpf/85f83c333f5355c8ac026f835b18d15060725fcb.camel@ericsson.com/

Reported-by: Per Sundström XP <per.xp.sundstrom@ericsson.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-6-andrii@kernel.org
2022-12-14 22:09:00 -08:00
Andrii Nakryiko
58b164237a libbpf: Fix btf__align_of() by taking into account field offsets
btf__align_of() is supposed to be return alignment requirement of
a requested BTF type. For STRUCT/UNION it doesn't always return correct
value, because it calculates alignment only based on field types. But
for packed structs this is not enough, we need to also check field
offsets and struct size. If field offset isn't aligned according to
field type's natural alignment, then struct must be packed. Similarly,
if struct size is not a multiple of struct's natural alignment, then
struct must be packed as well.

This patch fixes this issue precisely by additionally checking these
conditions.

Fixes: 3d208f4ca111 ("libbpf: Expose btf__align_of() API")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-5-andrii@kernel.org
2022-12-14 22:09:00 -08:00
Andrii Nakryiko
e6e0e3fd85 libbpf: Handle non-standardly sized enums better in BTF-to-C dumper
Turns out C allows to force enum to be 1-byte or 8-byte explicitly using
mode(byte) or mode(word), respecticely. Linux sources are using this in
some cases. This is imporant to handle correctly, as enum size
determines corresponding fields in a struct that use that enum type. And
if enum size is incorrect, this will lead to invalid struct layout. So
add mode(byte) and mode(word) attribute support to btf_dump APIs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-3-andrii@kernel.org
2022-12-14 22:09:00 -08:00
Andrii Nakryiko
db11704944 libbpf: Fix single-line struct definition output in btf_dump
btf_dump APIs emit unnecessary tabs when emitting struct/union
definition that fits on the single line. Before this patch we'd get:

struct blah {<tab>};

This patch fixes this and makes sure that we get more natural:

struct blah {};

Fixes: 44a726c3f23c ("bpftool: Print newline before '}' for struct with padding only fields")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-2-andrii@kernel.org
2022-12-14 22:09:00 -08:00
Xin Liu
8d719b0c08 libbpf: Optimized return value in libbpf_strerror when errno is libbpf errno
This is a small improvement in libbpf_strerror. When libbpf_strerror
is used to obtain the system error description, if the length of the
buf is insufficient, libbpf_sterror returns ERANGE and sets errno to
ERANGE.

However, this processing is not performed when the error code
customized by libbpf is obtained. Make some minor improvements here,
return -ERANGE and set errno to ERANGE when buf is not enough for
custom description.

Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221210082045.233697-1-liuxin350@huawei.com
2022-12-14 22:09:00 -08:00
Kumar Kartikeya Dwivedi
6b90604fa7 bpf: Rework process_dynptr_func
Recently, user ringbuf support introduced a PTR_TO_DYNPTR register type
for use in callback state, because in case of user ringbuf helpers,
there is no dynptr on the stack that is passed into the callback. To
reflect such a state, a special register type was created.

However, some checks have been bypassed incorrectly during the addition
of this feature. First, for arg_type with MEM_UNINIT flag which
initialize a dynptr, they must be rejected for such register type.
Secondly, in the future, there are plans to add dynptr helpers that
operate on the dynptr itself and may change its offset and other
properties.

In all of these cases, PTR_TO_DYNPTR shouldn't be allowed to be passed
to such helpers, however the current code simply returns 0.

The rejection for helpers that release the dynptr is already handled.

For fixing this, we take a step back and rework existing code in a way
that will allow fitting in all classes of helpers and have a coherent
model for dealing with the variety of use cases in which dynptr is used.

First, for ARG_PTR_TO_DYNPTR, it can either be set alone or together
with a DYNPTR_TYPE_* constant that denotes the only type it accepts.

Next, helpers which initialize a dynptr use MEM_UNINIT to indicate this
fact. To make the distinction clear, use MEM_RDONLY flag to indicate
that the helper only operates on the memory pointed to by the dynptr,
not the dynptr itself. In C parlance, it would be equivalent to taking
the dynptr as a point to const argument.

When either of these flags are not present, the helper is allowed to
mutate both the dynptr itself and also the memory it points to.
Currently, the read only status of the memory is not tracked in the
dynptr, but it would be trivial to add this support inside dynptr state
of the register.

With these changes and renaming PTR_TO_DYNPTR to CONST_PTR_TO_DYNPTR to
better reflect its usage, it can no longer be passed to helpers that
initialize a dynptr, i.e. bpf_dynptr_from_mem, bpf_ringbuf_reserve_dynptr.

A note to reviewers is that in code that does mark_stack_slots_dynptr,
and unmark_stack_slots_dynptr, we implicitly rely on the fact that
PTR_TO_STACK reg is the only case that can reach that code path, as one
cannot pass CONST_PTR_TO_DYNPTR to helpers that don't set MEM_RDONLY. In
both cases such helpers won't be setting that flag.

The next patch will add a couple of selftest cases to make sure this
doesn't break.

Fixes: 205715673844 ("bpf: Add bpf_user_ringbuf_drain() helper")
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207204141.308952-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-14 22:09:00 -08:00
Timo Hunziker
74244c5bd7 libbpf: Parse usdt args without offset on x86 (e.g. 8@(%rsp))
Parse USDT arguments like "8@(%rsp)" on x86. These are emmited by
SystemTap. The argument syntax is similar to the existing "memory
dereference case" but the offset left out as it's zero (i.e. read
the value from the address in the register). We treat it the same
as the the "memory dereference case", but set the offset to 0.

I've tested that this fixes the "unrecognized arg #N spec: 8@(%rsp).."
error I've run into when attaching to a probe with such an argument.
Attaching and reading the correct argument values works.

Something similar might be needed for the other supported
architectures.

  [0] Closes: https://github.com/libbpf/libbpf/issues/559

Signed-off-by: Timo Hunziker <timo.hunziker@gmx.ch>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221203123746.2160-1-timo.hunziker@eclipso.ch
2022-12-14 22:09:00 -08:00
Eyal Birger
da08611c65 tools: add IFLA_XFRM_COLLECT_METADATA to uapi/linux/if_link.h
Needed for XFRM metadata tests.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Link: https://lore.kernel.org/r/20221203084659.1837829-4-eyal.birger@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-14 22:09:00 -08:00
Andrii Nakryiko
1e479aec4f ci: don't run test_maps in libbpf CI
It crashes often, it doesn't really test libbpf much.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-07 09:28:07 -08:00
Andrii Nakryiko
8846dc7a20 ci: fix Ubuntu version for kernel tests and pahole workflows
Having too new build environment in workflows that build selftests on
the host, but run them in a separate QEMU image can lead to problems
with runtime linker complaining about missing new enough version of
glibc and other dependencies.

Until we update images, fix used Ubuntu version to ubuntu-20.04 to
mitigate.

Suggested-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-05 11:52:11 -08:00
Andrii Nakryiko
eb9b5c567d sync: regenerate vmlinux.h
Update checked in vmlinux.h for 5.5 and 4.9 kernels.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-02 22:12:29 -08:00
Andrii Nakryiko
be8f15bb93 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   5b1d640800de7fe02d68bf592d9d101de24c87f2
Checkpoint bpf-next commit: 706819495921ddad6b3780140b9d9e9293b6dedc
Baseline bpf commit:        47df8a2f78bc34ff170d147d05b121f84e252b85
Checkpoint bpf commit:      e931a173a685fe213127ae5aa6b7f2196c1d875d

Alexei Starovoitov (1):
  selftests/bpf: Workaround for llvm nop-4 bug

Andrii Nakryiko (2):
  libbpf: Ignore hashmap__find() result explicitly in btf_dump
  libbpf: Avoid enum forward-declarations in public API in C++ mode

Donald Hunter (1):
  docs/bpf: Add table of BPF program types to libbpf docs

Hou Tao (4):
  libbpf: Use page size as max_entries when probing ring buffer map
  libbpf: Handle size overflow for ringbuf mmap
  libbpf: Handle size overflow for user ringbuf mmap
  libbpf: Check the validity of size in user_ring_buffer__reserve()

Ji Rongfeng (1):
  bpf: Update bpf_{g,s}etsockopt() documentation

 docs/index.rst           |   3 +
 docs/program_types.rst   | 203 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/bpf.h |  23 +++--
 src/bpf.h                |   7 ++
 src/btf_dump.c           |   2 +-
 src/libbpf.c             |   3 +-
 src/libbpf_probes.c      |   2 +-
 src/ringbuf.c            |  26 +++--
 8 files changed, 250 insertions(+), 19 deletions(-)
 create mode 100644 docs/program_types.rst

--
2.30.2

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-02 22:12:29 -08:00
Andrii Nakryiko
2bf5ed3a48 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-02 22:12:29 -08:00
Andrii Nakryiko
0fbf777e0b libbpf: Avoid enum forward-declarations in public API in C++ mode
C++ enum forward declarations are fundamentally not compatible with pure
C enum definitions, and so libbpf's use of `enum bpf_stats_type;`
forward declaration in libbpf/bpf.h public API header is causing C++
compilation issues.

More details can be found in [0], but it comes down to C++ supporting
enum forward declaration only with explicitly specified backing type:

  enum bpf_stats_type: int;

In C (and I believe it's a GCC extension also), such forward declaration
is simply:

  enum bpf_stats_type;

Further, in Linux UAPI this enum is defined in pure C way:

enum bpf_stats_type { BPF_STATS_RUN_TIME = 0; }

And even though in both cases backing type is int, which can be
confirmed by looking at DWARF information, for C++ compiler actual enum
definition and forward declaration are incompatible.

To eliminate this problem, for C++ mode define input argument as int,
which makes enum unnecessary in libbpf public header. This solves the
issue and as demonstrated by next patch doesn't cause any unwanted
compiler warnings, at least with default warnings setting.

  [0] https://stackoverflow.com/questions/42766839/c11-enum-forward-causes-underlying-type-mismatch
  [1] Closes: https://github.com/libbpf/libbpf/issues/249

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221130200013.2997831-1-andrii@kernel.org
2022-12-02 22:12:29 -08:00
Hou Tao
4d21c979ce libbpf: Check the validity of size in user_ring_buffer__reserve()
The top two bits of size are used as busy and discard flags, so reject
the reservation that has any of these special bits in the size. With the
addition of validity check, these is also no need to check whether or
not total_size is overflowed.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221116072351.1168938-5-houtao@huaweicloud.com
2022-12-02 22:12:29 -08:00
Hou Tao
11ad834557 libbpf: Handle size overflow for user ringbuf mmap
Similar with the overflow problem on ringbuf mmap, in user_ringbuf_map()
2 * max_entries may overflow u32 when mapping writeable region.

Fixing it by casting the size of writable mmap region into a __u64 and
checking whether or not there will be overflow during mmap.

Fixes: b66ccae01f1d ("bpf: Add libbpf logic for user-space ring buffer")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221116072351.1168938-4-houtao@huaweicloud.com
2022-12-02 22:12:29 -08:00
Hou Tao
f056d1bd54 libbpf: Handle size overflow for ringbuf mmap
The maximum size of ringbuf is 2GB on x86-64 host, so 2 * max_entries
will overflow u32 when mapping producer page and data pages. Only
casting max_entries to size_t is not enough, because for 32-bits
application on 64-bits kernel the size of read-only mmap region
also could overflow size_t.

So fixing it by casting the size of read-only mmap region into a __u64
and checking whether or not there will be overflow during mmap.

Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221116072351.1168938-3-houtao@huaweicloud.com
2022-12-02 22:12:29 -08:00
Hou Tao
b822a139e3 libbpf: Use page size as max_entries when probing ring buffer map
Using page size as max_entries when probing ring buffer map, else the
probe may fail on host with 64KB page size (e.g., an ARM64 host).

After the fix, the output of "bpftool feature" on above host will be
correct.

Before :
    eBPF map_type ringbuf is NOT available
    eBPF map_type user_ringbuf is NOT available

After :
    eBPF map_type ringbuf is available
    eBPF map_type user_ringbuf is available

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221116072351.1168938-2-houtao@huaweicloud.com
2022-12-02 22:12:29 -08:00
Ji Rongfeng
a5b4a53781 bpf: Update bpf_{g,s}etsockopt() documentation
* append missing optnames to the end
* simplify bpf_getsockopt()'s doc

Signed-off-by: Ji Rongfeng <SikoJobs@outlook.com>
Link: https://lore.kernel.org/r/DU0P192MB15479B86200B1216EC90E162D6099@DU0P192MB1547.EURP192.PROD.OUTLOOK.COM
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-02 22:12:29 -08:00
Donald Hunter
e84419ff5a docs/bpf: Add table of BPF program types to libbpf docs
Extend the libbpf documentation with a table of program types,
attach points and ELF section names.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20221121121734.98329-1-donald.hunter@gmail.com
2022-12-02 22:12:29 -08:00
Alexei Starovoitov
ca515c0dda selftests/bpf: Workaround for llvm nop-4 bug
Currently LLVM fails to recognize .data.* as data section and defaults to .text
section. Later BPF backend tries to emit 4-byte NOP instruction which doesn't
exist in BPF ISA and aborts.
The fix for LLVM is pending:
https://reviews.llvm.org/D138477

While waiting for the fix lets workaround the linked_list test case
by using .bss.* prefix which is properly recognized by LLVM as BSS section.

Fix libbpf to support .bss. prefix and adjust tests.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-02 22:12:29 -08:00
Andrii Nakryiko
95959419a7 libbpf: Ignore hashmap__find() result explicitly in btf_dump
Coverity is reporting that btf_dump_name_dups() doesn't check return
result of hashmap__find() call. This is intentional, so make it explicit
with (void) cast.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221117192824.4093553-1-andrii@kernel.org
2022-12-02 22:12:29 -08:00
Andrii Nakryiko
3c659715ec sync: fix sync scripts commit_signature function
After recent lint changes, commit_signature() function now gets optional
array of paths as multiple arguments, instead of entire array as second
argument. So adjust commit_signature() to handle this correctly.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-02 21:04:03 -08:00
Andrii Nakryiko
f46b17ef0e sync: add Signed-off-by for auto-generated sync commits
Now that we enforce Signed-off-by on every commit, make sure that
auto-generatd sync commits also get corrected Signed-off-by tags.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-02 20:51:21 -08:00
Evgeny Vereshchagin
1596a09b5d oss-fuzz: bump elfutils
to make it less likely for the libbpf fuzz target to run into
elfutils bugs that have been fixed upstream since two new fuzz
targets were added there back in April.

Signed-off-by: Evgeny Vereshchagin <evvers@ya.ru>
2022-11-18 13:54:40 -08:00
Kui-Feng Lee
5322b8e76c sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   b548b17a93fd18357a5a6f535c10c1e68719ad32
Checkpoint bpf-next commit: 5b1d640800de7fe02d68bf592d9d101de24c87f2
Baseline bpf commit:        9cbd48d5fa14e4c65f8580de16686077f7cea02b
Checkpoint bpf commit:      47df8a2f78bc34ff170d147d05b121f84e252b85

David Michael (1):
  libbpf: Fix uninitialized warning in btf_dump_dump_type_data

Jiri Olsa (1):
  libbpf: Use correct return pointer in attach_raw_tp

Kang Minchul (3):
  libbpf: checkpatch: Fixed code alignments in btf.c
  libbpf: Fixed various checkpatch issues in libbpf.c
  libbpf: checkpatch: Fixed code alignments in ringbuf.c

Kumar Kartikeya Dwivedi (1):
  bpf: Support bpf_list_head in map values

 include/uapi/linux/bpf.h | 10 +++++++++
 src/btf.c                |  5 +++--
 src/btf_dump.c           |  2 +-
 src/libbpf.c             | 47 +++++++++++++++++++++++++---------------
 src/ringbuf.c            |  4 ++--
 5 files changed, 45 insertions(+), 23 deletions(-)

--
2.30.2
2022-11-18 13:53:39 -08:00
Kui-Feng Lee
15bbaabed8 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-11-18 13:53:39 -08:00
Jiri Olsa
eb77c7210b libbpf: Use correct return pointer in attach_raw_tp
We need to pass '*link' to final libbpf_get_error,
because that one holds the return value, not 'link'.

Fixes: 4fa5bcfe07f7 ("libbpf: Allow BPF program auto-attach handlers to bail out")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221114145257.882322-1-jolsa@kernel.org
2022-11-18 13:53:39 -08:00
Kumar Kartikeya Dwivedi
2557efc8e1 bpf: Support bpf_list_head in map values
Add the support on the map side to parse, recognize, verify, and build
metadata table for a new special field of the type struct bpf_list_head.
To parameterize the bpf_list_head for a certain value type and the
list_node member it will accept in that value type, we use BTF
declaration tags.

The definition of bpf_list_head in a map value will be done as follows:

struct foo {
	struct bpf_list_node node;
	int data;
};

struct map_value {
	struct bpf_list_head head __contains(foo, node);
};

Then, the bpf_list_head only allows adding to the list 'head' using the
bpf_list_node 'node' for the type struct foo.

The 'contains' annotation is a BTF declaration tag composed of four
parts, "contains:name:node" where the name is then used to look up the
type in the map BTF, with its kind hardcoded to BTF_KIND_STRUCT during
the lookup. The node defines name of the member in this type that has
the type struct bpf_list_node, which is actually used for linking into
the linked list. For now, 'kind' part is hardcoded as struct.

This allows building intrusive linked lists in BPF, using container_of
to obtain pointer to entry, while being completely type safe from the
perspective of the verifier. The verifier knows exactly the type of the
nodes, and knows that list helpers return that type at some fixed offset
where the bpf_list_node member used for this list exists. The verifier
also uses this information to disallow adding types that are not
accepted by a certain list.

For now, no elements can be added to such lists. Support for that is
coming in future patches, hence draining and freeing items is done with
a TODO that will be resolved in a future patch.

Note that the bpf_list_head_free function moves the list out to a local
variable under the lock and releases it, doing the actual draining of
the list items outside the lock. While this helps with not holding the
lock for too long pessimizing other concurrent list operations, it is
also necessary for deadlock prevention: unless every function called in
the critical section would be notrace, a fentry/fexit program could
attach and call bpf_map_update_elem again on the map, leading to the
same lock being acquired if the key matches and lead to a deadlock.
While this requires some special effort on part of the BPF programmer to
trigger and is highly unlikely to occur in practice, it is always better
if we can avoid such a condition.

While notrace would prevent this, doing the draining outside the lock
has advantages of its own, hence it is used to also fix the deadlock
related problem.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221114191547.1694267-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-18 13:53:39 -08:00
Kang Minchul
9781b9eced libbpf: checkpatch: Fixed code alignments in ringbuf.c
Fixed some checkpatch issues in ringbuf.c

Signed-off-by: Kang Minchul <tegongkang@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221113190648.38556-4-tegongkang@gmail.com
2022-11-18 13:53:39 -08:00
Kang Minchul
4c3b53d09c libbpf: Fixed various checkpatch issues in libbpf.c
Fixed following checkpatch issues:

WARNING: Block comments use a trailing */ on a separate line
+        * other BPF program's BTF object */

WARNING: Possible repeated word: 'be'
+        * name. This is important to be be able to find corresponding BTF

ERROR: switch and case should be at the same indent
+       switch (ext->kcfg.sz) {
+               case 1: *(__u8 *)ext_val = value; break;
+               case 2: *(__u16 *)ext_val = value; break;
+               case 4: *(__u32 *)ext_val = value; break;
+               case 8: *(__u64 *)ext_val = value; break;
+               default:

ERROR: trailing statements should be on next line
+               case 1: *(__u8 *)ext_val = value; break;

ERROR: trailing statements should be on next line
+               case 2: *(__u16 *)ext_val = value; break;

ERROR: trailing statements should be on next line
+               case 4: *(__u32 *)ext_val = value; break;

ERROR: trailing statements should be on next line
+               case 8: *(__u64 *)ext_val = value; break;

ERROR: code indent should use tabs where possible
+                }$

WARNING: please, no spaces at the start of a line
+                }$

WARNING: Block comments use a trailing */ on a separate line
+        * for faster search */

ERROR: code indent should use tabs where possible
+^I^I^I^I^I^I        &ext->kcfg.is_signed);$

WARNING: braces {} are not necessary for single statement blocks
+       if (err) {
+               return err;
+       }

ERROR: code indent should use tabs where possible
+^I^I^I^I        sizeof(*obj->btf_modules), obj->btf_module_cnt + 1);$

Signed-off-by: Kang Minchul <tegongkang@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221113190648.38556-3-tegongkang@gmail.com
2022-11-18 13:53:39 -08:00
Kang Minchul
7b18ff1212 libbpf: checkpatch: Fixed code alignments in btf.c
Fixed some checkpatch issues in btf.c

Signed-off-by: Kang Minchul <tegongkang@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221113190648.38556-2-tegongkang@gmail.com
2022-11-18 13:53:39 -08:00
David Michael
c975797ebe libbpf: Fix uninitialized warning in btf_dump_dump_type_data
GCC 11.3.0 fails to compile btf_dump.c due to the following error,
which seems to originate in btf_dump_struct_data where the returned
value would be uninitialized if btf_vlen returns zero.

btf_dump.c: In function ‘btf_dump_dump_type_data’:
btf_dump.c:2363:12: error: ‘err’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
 2363 |         if (err < 0)
      |            ^

Fixes: 920d16af9b42 ("libbpf: BTF dumper support for typed data")
Signed-off-by: David Michael <fedora.dm0@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/87zgcu60hq.fsf@gmail.com
2022-11-18 13:53:39 -08:00
Manu Bretelle
9167308b4a ci: remove s390x-self-hosted-builder from libbpf/libbpf
Those were moved to libbpf/ci: https://github.com/libbpf/ci/tree/master/rootfs/s390x-self-hosted-builder

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2022-11-16 13:58:37 -08:00
Manu Bretelle
7049d3a2ea ci: Use s390x label to schedule workflows on s390x
The runners are having their labels uniformized across architecture.
z15 is being removed in favor of s390x.

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2022-11-16 13:55:31 -08:00
Andrii Nakryiko
ea931ec6c5 ci: drop LGTM integration
LGTM is deprecated, remove it. We have CodeQL now.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-11-16 12:17:40 -08:00
Andrii Nakryiko
3a73d6f865 readme: replace LGTM badge with CodeQL badge
LGTM is going to be removed, CodeQL is supposed to be a replacement.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-11-16 12:17:40 -08:00
Andrii Nakryiko
7b0891ac6b ci: build libbpf with more versions of clang and gcc
Add few more versions of clang and gcc used to compile-test libbpf.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-11-16 12:16:17 -08:00
Andrii Nakryiko
c80f12f7f6 ci: fix Debian builds due to pkg-config dependency change
Seems like we need pkgconfig dependency instead of pkg-config.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-11-16 11:25:17 -08:00
Andrii Nakryiko
3b6093fd43 sync: start syncing include/uapi/linux/fcntl.h UAPI header
Libbpf relies on F_DUPFD_CLOEXEC constant coming from fcntl.h UAPI
header, so we need to sync it along other UAPI headers. Also update sync
script to keep doing this automatically going forward.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-11-16 10:56:59 -08:00
Andrii Nakryiko
8d358ab948 sync: make LIBBPF_PATHS and LIBBPF_VIEW_PATHS into real array variables
Use correct Bash syntax to define these two variables as arrays.
Drop shellcheck opt-out for unquoted use of array.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-11-14 21:42:37 -08:00
Andrii Nakryiko
971ad8f8d0 sync: fix sync script's use of bash array variables
Don't wrap LIBBPF_PATHS[@] and LIBBPF_VIEW_PATHS[@] in quotes when
passing it to git commands. Not clear how it worked before, but
something recently broke. Either git commands became stricter or
something.

But either way, we do want to pass each element of LIBBPF_PATHS or
LIBBPF_VIEW_PATHS as separate command line arguments, so putting them in
quotes doesn't make sense, as that makes them look like a single
argument to git.

So drop all the quotes around these arrays. The only place where it's
still needed is in commit_signature call, as we do want to pass array as
single arg ($2) and then internally we unfold it into multiple command
line arguments.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-11-12 18:24:12 -08:00
Andrii Nakryiko
2ed27f9e63 ci: update vmlinux.h
Update vmlinux.h to get latest enums for some of selftests.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-11-12 18:24:12 -08:00
Andrii Nakryiko
4bdbb7ea28 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   62c69e89e81bfbdb9a87ae3e0599dcc6aacf786b
Checkpoint bpf-next commit: b548b17a93fd18357a5a6f535c10c1e68719ad32
Baseline bpf commit:        e7b09357453a99e6f9e74c39e9ca1363c22c0b96
Checkpoint bpf commit:      9cbd48d5fa14e4c65f8580de16686077f7cea02b

Alan Maguire (1):
  libbpf: Btf dedup identical struct test needs check for nested
    structs/arrays

Andrii Nakryiko (2):
  libbpf: clean up and refactor BTF fixup step
  libbpf: only add BPF_F_MMAPABLE flag for data maps with global vars

Anshuman Khandual (4):
  perf: Add system error and not in transaction branch types
  perf: Extend branch type classification
  perf: Capture branch privilege information
  perf: Add PERF_BR_NEW_ARCH_[N] map for BRBE on arm64 platform

Eduard Zingerman (4):
  libbpf: Resolve enum fwd as full enum64 and vice versa
  libbpf: Hashmap interface update to allow both long and void*
    keys/values
  libbpf: Resolve unambigous forward declarations
  libbpf: Hashmap.h update to fix build issues using LLVM14

Martin KaFai Lau (1):
  bpf: Add hwtstamp field for the sockops prog

Namhyung Kim (1):
  perf: Kill __PERF_SAMPLE_CALLCHAIN_EARLY

Ravi Bangoria (3):
  perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file
  perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL

Sandipan Das (1):
  perf/core: Add speculation info to branch entries

Xu Kuohai (1):
  libbpf: Avoid allocating reg_name with sscanf in parse_usdt_arg()

Yonghong Song (2):
  bpf: Implement cgroup storage available to non-cgroup-attached bpf
    progs
  libbpf: Support new cgroup local storage

 include/uapi/linux/bpf.h        |  51 +++++-
 include/uapi/linux/perf_event.h |  57 ++++++-
 src/btf.c                       | 267 ++++++++++++++++++++++----------
 src/btf_dump.c                  |  15 +-
 src/hashmap.c                   |  18 +--
 src/hashmap.h                   |  91 +++++++----
 src/libbpf.c                    | 196 ++++++++++++++---------
 src/libbpf_probes.c             |   1 +
 src/strset.c                    |  18 +--
 src/usdt.c                      |  44 +++---
 10 files changed, 511 insertions(+), 247 deletions(-)

--
2.30.2
2022-11-12 18:24:12 -08:00
Andrii Nakryiko
4978cf9cd8 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-11-12 18:24:12 -08:00
Martin KaFai Lau
00fc9f407c bpf: Add hwtstamp field for the sockops prog
The bpf-tc prog has already been able to access the
skb_hwtstamps(skb)->hwtstamp.  This patch extends the same hwtstamp
access to the sockops prog.

In sockops, the skb is also available to the bpf prog during
the BPF_SOCK_OPS_PARSE_HDR_OPT_CB event.  There is a use case
that the hwtstamp will be useful to the sockops prog to better
measure the one-way-delay when the sender has put the tx
timestamp in the tcp header option.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221107230420.4192307-2-martin.lau@linux.dev
2022-11-12 18:24:12 -08:00
Eduard Zingerman
e1b34c589d libbpf: Hashmap.h update to fix build issues using LLVM14
A fix for the LLVM compilation error while building bpftool.
Replaces the expression:

  _Static_assert((p) == NULL || ...)

by expression:

  _Static_assert((__builtin_constant_p((p)) ? (p) == NULL : 0) || ...)

When "p" is not a constant the former is not considered to be a
constant expression by LLVM 14.

The error was introduced in the following patch-set: [1].
The error was reported here: [2].

  [1] https://lore.kernel.org/bpf/20221109142611.879983-1-eddyz87@gmail.com/
  [2] https://lore.kernel.org/all/202211110355.BcGcbZxP-lkp@intel.com/

Reported-by: kernel test robot <lkp@intel.com>
Fixes: c302378bc157 ("libbpf: Hashmap interface update to allow both long and void* keys/values")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221110223240.1350810-1-eddyz87@gmail.com
2022-11-12 18:24:12 -08:00
Eduard Zingerman
7583310911 libbpf: Resolve unambigous forward declarations
Resolve forward declarations that don't take part in type graphs
comparisons if declaration name is unambiguous. Example:

CU #1:

struct foo;              // standalone forward declaration
struct foo *some_global;

CU #2:

struct foo { int x; };
struct foo *another_global;

The `struct foo` from CU #1 is not a part of any definition that is
compared against another definition while `btf_dedup_struct_types`
processes structural types. The the BTF after `btf_dedup_struct_types`
the BTF looks as follows:

[1] STRUCT 'foo' size=4 vlen=1 ...
[2] INT 'int' size=4 ...
[3] PTR '(anon)' type_id=1
[4] FWD 'foo' fwd_kind=struct
[5] PTR '(anon)' type_id=4

This commit adds a new pass `btf_dedup_resolve_fwds`, that maps such
forward declarations to structs or unions with identical name in case
if the name is not ambiguous.

The pass is positioned before `btf_dedup_ref_types` so that types
[3] and [5] could be merged as a same type after [1] and [4] are merged.
The final result for the example above looks as follows:

[1] STRUCT 'foo' size=4 vlen=1
	'x' type_id=2 bits_offset=0
[2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[3] PTR '(anon)' type_id=1

For defconfig kernel with BTF enabled this removes 63 forward
declarations. Examples of removed declarations: `pt_regs`, `in6_addr`.
The running time of `btf__dedup` function is increased by about 3%.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221109142611.879983-3-eddyz87@gmail.com
2022-11-12 18:24:12 -08:00
Eduard Zingerman
4a65c5d888 libbpf: Hashmap interface update to allow both long and void* keys/values
An update for libbpf's hashmap interface from void* -> void* to a
polymorphic one, allowing both long and void* keys and values.

This simplifies many use cases in libbpf as hashmaps there are mostly
integer to integer.

Perf copies hashmap implementation from libbpf and has to be
updated as well.

Changes to libbpf, selftests/bpf and perf are packed as a single
commit to avoid compilation issues with any future bisect.

Polymorphic interface is acheived by hiding hashmap interface
functions behind auxiliary macros that take care of necessary
type casts, for example:

    #define hashmap_cast_ptr(p)						\
	({								\
		_Static_assert((p) == NULL || sizeof(*(p)) == sizeof(long),\
			       #p " pointee should be a long-sized integer or a pointer"); \
		(long *)(p);						\
	})

    bool hashmap_find(const struct hashmap *map, long key, long *value);

    #define hashmap__find(map, key, value) \
		hashmap_find((map), (long)(key), hashmap_cast_ptr(value))

- hashmap__find macro casts key and value parameters to long
  and long* respectively
- hashmap_cast_ptr ensures that value pointer points to a memory
  of appropriate size.

This hack was suggested by Andrii Nakryiko in [1].
This is a follow up for [2].

[1] https://lore.kernel.org/bpf/CAEf4BzZ8KFneEJxFAaNCCFPGqp20hSpS2aCj76uRk3-qZUH5xg@mail.gmail.com/
[2] https://lore.kernel.org/bpf/af1facf9-7bc8-8a3d-0db4-7b3f333589a2@meta.com/T/#m65b28f1d6d969fcd318b556db6a3ad499a42607d

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221109142611.879983-2-eddyz87@gmail.com
2022-11-12 18:24:12 -08:00
Eduard Zingerman
3a387f5a8f libbpf: Resolve enum fwd as full enum64 and vice versa
Changes de-duplication logic for enums in the following way:
- update btf_hash_enum to ignore size and kind fields to get
  ENUM and ENUM64 types in a same hash bucket;
- update btf_compat_enum to consider enum fwd to be compatible with
  full enum64 (and vice versa);

This allows BTF de-duplication in the following case:

    // CU #1
    enum foo;

    struct s {
      enum foo *a;
    } *x;

    // CU #2
    enum foo {
      x = 0xfffffffff // big enough to force enum64
    };

    struct s {
      enum foo *a;
    } *y;

De-duplicated BTF prior to this commit:

    [1] ENUM64 'foo' encoding=UNSIGNED size=8 vlen=1
    	'x' val=68719476735ULL
    [2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64
        encoding=(none)
    [3] STRUCT 's' size=8 vlen=1
    	'a' type_id=4 bits_offset=0
    [4] PTR '(anon)' type_id=1
    [5] PTR '(anon)' type_id=3
    [6] STRUCT 's' size=8 vlen=1
    	'a' type_id=8 bits_offset=0
    [7] ENUM 'foo' encoding=UNSIGNED size=4 vlen=0
    [8] PTR '(anon)' type_id=7
    [9] PTR '(anon)' type_id=6

De-duplicated BTF after this commit:

    [1] ENUM64 'foo' encoding=UNSIGNED size=8 vlen=1
    	'x' val=68719476735ULL
    [2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64
        encoding=(none)
    [3] STRUCT 's' size=8 vlen=1
    	'a' type_id=4 bits_offset=0
    [4] PTR '(anon)' type_id=1
    [5] PTR '(anon)' type_id=3

Enum forward declarations in C do not provide information about
enumeration values range. Thus the `btf_type->size` field is
meaningless for forward enum declarations. In fact, GCC does not
encode size in DWARF for forward enum declarations
(but dwarves sets enumeration size to a default value of `sizeof(int) * 8`
when size is not specified see dwarf_loader.c:die__create_new_enumeration).

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221101235413.1824260-1-eddyz87@gmail.com
2022-11-12 18:24:12 -08:00
Ravi Bangoria
a2eba90326 perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL
PERF_MEM_LVLNUM_EXTN_MEM was introduced to cover CXL devices but it's
bit ambiguous name and also not generic enough to cover cxl.cache and
cxl.io devices. Rename it to PERF_MEM_LVLNUM_CXL to be more specific.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/f6268268-b4e9-9ed6-0453-65792644d953@amd.com
2022-11-12 18:24:12 -08:00
Yonghong Song
7106ebe768 libbpf: Support new cgroup local storage
Add support for new cgroup local storage.

Acked-by: David Vernet <void@manifault.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042856.673989-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-12 18:24:12 -08:00
Yonghong Song
3c6d127e50 bpf: Implement cgroup storage available to non-cgroup-attached bpf progs
Similar to sk/inode/task storage, implement similar cgroup local storage.

There already exists a local storage implementation for cgroup-attached
bpf programs.  See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper
bpf_get_local_storage(). But there are use cases such that non-cgroup
attached bpf progs wants to access cgroup local storage data. For example,
tc egress prog has access to sk and cgroup. It is possible to use
sk local storage to emulate cgroup local storage by storing data in socket.
But this is a waste as it could be lots of sockets belonging to a particular
cgroup. Alternatively, a separate map can be created with cgroup id as the key.
But this will introduce additional overhead to manipulate the new map.
A cgroup local storage, similar to existing sk/inode/task storage,
should help for this use case.

The life-cycle of storage is managed with the life-cycle of the
cgroup struct.  i.e. the storage is destroyed along with the owning cgroup
with a call to bpf_cgrp_storage_free() when cgroup itself
is deleted.

The userspace map operations can be done by using a cgroup fd as a key
passed to the lookup, update and delete operations.

Typically, the following code is used to get the current cgroup:
    struct task_struct *task = bpf_get_current_task_btf();
    ... task->cgroups->dfl_cgrp ...
and in structure task_struct definition:
    struct task_struct {
        ....
        struct css_set __rcu            *cgroups;
        ....
    }
With sleepable program, accessing task->cgroups is not protected by rcu_read_lock.
So the current implementation only supports non-sleepable program and supporting
sleepable program will be the next step together with adding rcu_read_lock
protection for rcu tagged structures.

Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old cgroup local
storage support, the new map name BPF_MAP_TYPE_CGRP_STORAGE is used
for cgroup storage available to non-cgroup-attached bpf programs. The old
cgroup storage supports bpf_get_local_storage() helper to get the cgroup data.
The new cgroup storage helper bpf_cgrp_storage_get() can provide similar
functionality. While old cgroup storage pre-allocates storage memory, the new
mechanism can also pre-allocate with a user space bpf_map_update_elem() call
to avoid potential run-time memory allocation failure.
Therefore, the new cgroup storage can provide all functionality w.r.t.
the old one. So in uapi bpf.h, the old BPF_MAP_TYPE_CGROUP_STORAGE is alias to
BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED to indicate the old cgroup storage can
be deprecated since the new one can provide the same functionality.

Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042850.673791-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-12 18:24:12 -08:00
Alan Maguire
6ebbbacb5c libbpf: Btf dedup identical struct test needs check for nested structs/arrays
When examining module BTF, it is common to see core kernel structures
such as sk_buff, net_device duplicated in the module.  After adding
debug messaging to BTF it turned out that much of the problem
was down to the identical struct test failing during deduplication;
sometimes the compiler adds identical structs.  However
it turns out sometimes that type ids of identical struct members
can also differ, even when the containing structs are still identical.

To take an example, for struct sk_buff, debug messaging revealed
that the identical struct matching was failing for the anon
struct "headers"; specifically for the first field:

__u8       __pkt_type_offset[0]; /*   128     0 */

Looking at the code in BTF deduplication, we have code that guards
against the possibility of identical struct definitions, down to
type ids, and identical array definitions.  However in this case
we have a struct which is being defined twice but does not have
identical type ids since each duplicate struct has separate type
ids for the above array member.   A similar problem (though not
observed) could occur for struct-in-struct.

The solution is to make the "identical struct" test check members
not just for matching ids, but to also check if they in turn are
identical structs or arrays.

The results of doing this are quite dramatic (for some modules
at least); I see the number of type ids drop from around 10000
to just over 1000 in one module for example.

For testing use latest pahole or apply [1], otherwise dedups
can fail for the reasons described there.

Also fix return type of btf_dedup_identical_arrays() as
suggested by Andrii to match boolean return type used
elsewhere.

Fixes: efdd3eb8015e ("libbpf: Accommodate DWARF/compiler bug with duplicated structs")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1666622309-22289-1-git-send-email-alan.maguire@oracle.com

[1] https://lore.kernel.org/bpf/1666364523-9648-1-git-send-email-alan.maguire
2022-11-12 18:24:12 -08:00
Xu Kuohai
1bb7a8349a libbpf: Avoid allocating reg_name with sscanf in parse_usdt_arg()
The reg_name in parse_usdt_arg() is used to hold register name, which
is short enough to be held in a 16-byte array, so we could define
reg_name as char reg_name[16] to avoid dynamically allocating reg_name
with sscanf.

Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221018145538.2046842-1-xukuohai@huaweicloud.com
2022-11-12 18:24:12 -08:00
Andrii Nakryiko
3cd45b660c libbpf: only add BPF_F_MMAPABLE flag for data maps with global vars
Teach libbpf to not add BPF_F_MMAPABLE flag unnecessarily for ARRAY maps
that are backing data sections, if such data sections don't expose any
variables to user-space. Exposed variables are those that have
STB_GLOBAL or STB_WEAK ELF binding and correspond to BTF VAR's
BTF_VAR_GLOBAL_ALLOCATED linkage.

The overall idea is that if some data section doesn't have any variable that
is exposed through BPF skeleton, then there is no reason to make such
BPF array mmapable. Making BPF array mmapable is not a free no-op
action, because BPF verifier doesn't allow users to put special objects
(such as BPF spin locks, RB tree nodes, linked list nodes, kptrs, etc;
anything that has a sensitive internal state that should not be modified
arbitrarily from user space) into mmapable arrays, as there is no way to
prevent user space from corrupting such sensitive state through direct
memory access through memory-mapped region.

By making sure that libbpf doesn't add BPF_F_MMAPABLE flag to BPF array
maps corresponding to data sections that only have static variables
(which are not supposed to be visible to user space according to libbpf
and BPF skeleton rules), users now can have spinlocks, kptrs, etc in
either default .bss/.data sections or custom .data.* sections (assuming
there are no global variables in such sections).

The only possible hiccup with this approach is the need to use global
variables during BPF static linking, even if it's not intended to be
shared with user space through BPF skeleton. To allow such scenarios,
extend libbpf's STV_HIDDEN ELF visibility attribute handling to
variables. Libbpf is already treating global hidden BPF subprograms as
static subprograms and adjusts BTF accordingly to make BPF verifier
verify such subprograms as static subprograms with preserving entire BPF
verifier state between subprog calls. This patch teaches libbpf to treat
global hidden variables as static ones and adjust BTF information
accordingly as well. This allows to share variables between multiple
object files during static linking, but still keep them internal to BPF
program and not get them exposed through BPF skeleton.

Note, that if the user has some advanced scenario where they absolutely
need BPF_F_MMAPABLE flag on .data/.bss/.rodata BPF array map despite
only having static variables, they still can achieve this by forcing it
through explicit bpf_map__set_map_flags() API.

Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20221019002816.359650-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-12 18:24:12 -08:00
Andrii Nakryiko
0e195e4597 libbpf: clean up and refactor BTF fixup step
Refactor libbpf's BTF fixup step during BPF object open phase. The only
functional change is that we now ignore BTF_VAR_GLOBAL_EXTERN variables
during fix up, not just BTF_VAR_STATIC ones, which shouldn't cause any
change in behavior as there shouldn't be any extern variable in data
sections for valid BPF object anyways.

Otherwise it's just collapsing two functions that have no reason to be
separate, and switching find_elf_var_offset() helper to return entire
symbol pointer, not just its offset. This will be used by next patch to
get ELF symbol visibility.

While refactoring, also "normalize" debug messages inside
btf_fixup_datasec() to follow general libbpf style and print out data
section name consistently, where it's available.

Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221019002816.359650-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-12 18:24:12 -08:00
Ravi Bangoria
08830e9d2f perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file
PERF_MEM_SNOOPX_PEER is defined only in tools uapi header. Although
it's used only by perf tool, not defining it in kernel header can
create problems in future.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220928095805.596-8-ravi.bangoria@amd.com
2022-11-12 18:24:12 -08:00
Ravi Bangoria
1022f26d04 perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
PERF_MEM_LVLNUM_EXTN_MEM which can be used to indicate accesses to
extension memory like CXL etc. PERF_MEM_LVL_IO can be used for IO
accesses but it can not distinguish between local and remote IO.
Introduce new field PERF_MEM_LVLNUM_IO which can be clubbed with
PERF_MEM_REMOTE_REMOTE to indicate Remote IO accesses.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220928095805.596-2-ravi.bangoria@amd.com
2022-11-12 18:24:12 -08:00
Namhyung Kim
b4ca1f6407 perf: Kill __PERF_SAMPLE_CALLCHAIN_EARLY
There's no in-tree user anymore.  Let's get rid of it.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220908214104.3851807-3-namhyung@kernel.org
2022-11-12 18:24:12 -08:00
Anshuman Khandual
fd71ca941b perf: Add PERF_BR_NEW_ARCH_[N] map for BRBE on arm64 platform
BRBE captured branch types will overflow perf_branch_entry.type and generic
branch types in perf_branch_entry.new_type. So override each available arch
specific branch type in the following manner to comprehensively process all
reported branch types in BRBE.

  PERF_BR_ARM64_FIQ            PERF_BR_NEW_ARCH_1
  PERF_BR_ARM64_DEBUG_HALT     PERF_BR_NEW_ARCH_2
  PERF_BR_ARM64_DEBUG_EXIT     PERF_BR_NEW_ARCH_3
  PERF_BR_ARM64_DEBUG_INST     PERF_BR_NEW_ARCH_4
  PERF_BR_ARM64_DEBUG_DATA     PERF_BR_NEW_ARCH_5

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lkml.kernel.org/r/20220824044822.70230-5-anshuman.khandual@arm.com
2022-11-12 18:24:12 -08:00
Anshuman Khandual
a14b39bd31 perf: Capture branch privilege information
Platforms like arm64 could capture privilege level information for all the
branch records. Hence this adds a new element in the struct branch_entry to
record the privilege level information, which could be requested through a
new event.attr.branch_sample_type based flag PERF_SAMPLE_BRANCH_PRIV_SAVE.
This flag helps user choose whether privilege information is captured.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lkml.kernel.org/r/20220824044822.70230-4-anshuman.khandual@arm.com
2022-11-12 18:24:12 -08:00
Anshuman Khandual
ade228b8f0 perf: Extend branch type classification
branch_entry.type now has ran out of space to accommodate more branch types
classification. This will prevent perf branch stack implementation on arm64
(via BRBE) to capture all available branch types. Extending this bit field
i.e branch_entry.type [4 bits] is not an option as it will break user space
ABI both for little and big endian perf tools.

Extend branch classification with a new field branch_entry.new_type via a
new branch type PERF_BR_EXTEND_ABI in branch_entry.type. Perf tools which
could decode PERF_BR_EXTEND_ABI, will then parse branch_entry.new_type as
well.

branch_entry.new_type is a 4 bit field which can hold upto 16 branch types.
The first three branch types will hold various generic page faults followed
by five architecture specific branch types, which can be overridden by the
platform for specific use cases. These architecture specific branch types
gets overridden on arm64 platform for BRBE implementation.

New generic branch types

 - PERF_BR_NEW_FAULT_ALGN
 - PERF_BR_NEW_FAULT_DATA
 - PERF_BR_NEW_FAULT_INST

New arch specific branch types

 - PERF_BR_NEW_ARCH_1
 - PERF_BR_NEW_ARCH_2
 - PERF_BR_NEW_ARCH_3
 - PERF_BR_NEW_ARCH_4
 - PERF_BR_NEW_ARCH_5

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lkml.kernel.org/r/20220824044822.70230-3-anshuman.khandual@arm.com
2022-11-12 18:24:12 -08:00
Anshuman Khandual
41ab246bdf perf: Add system error and not in transaction branch types
This expands generic branch type classification by adding two more entries
there in i.e system error and not in transaction. This also updates the x86
implementation to process X86_BR_NO_TX records as appropriate. This changes
branch types reported to user space on x86 platform but it should not be a
problem. The possible scenarios and impacts are enumerated here.

 --------------------------------------------------------------------------
 | kernel | perf tool |                     Impact                        |
 --------------------------------------------------------------------------
 |   old  |    old    |  Works as before                                  |
 --------------------------------------------------------------------------
 |   old  |    new    |  PERF_BR_UNKNOWN is processed                     |
 --------------------------------------------------------------------------
 |   new  |    old    |  PERF_BR_NO_TX is blocked via old PERF_BR_MAX     |
 --------------------------------------------------------------------------
 |   new  |    new    |  PERF_BR_NO_TX is recognized                      |
 --------------------------------------------------------------------------

When PERF_BR_NO_TX is blocked via old PERF_BR_MAX (new kernel with old perf
tool) the user space might throw up an warning complaining about an
unrecognized branch types being reported, but it's expected. PERF_BR_SERROR
& PERF_BR_NO_TX branch types will be used for BRBE implementation on arm64
platform.

PERF_BR_NO_TX complements 'abort' and 'in_tx' elements in perf_branch_entry
which represent other transaction states for a given branch record. Because
this completes the transaction state classification.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lkml.kernel.org/r/20220824044822.70230-2-anshuman.khandual@arm.com
2022-11-12 18:24:12 -08:00
Sandipan Das
d918025bc8 perf/core: Add speculation info to branch entries
Add a new "spec" bitfield to branch entries for providing speculation
information. This will be populated using hints provided by branch sampling
features on supported hardware. The following cases are covered:

  * No branch speculation information is available
  * Branch is speculative but taken on the wrong path
  * Branch is non-speculative but taken on the correct path
  * Branch is speculative and taken on the correct path

Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/834088c302faf21c7b665031dd111f424e509a64.1660211399.git.sandipan.das@amd.com
2022-11-12 18:24:12 -08:00
Daniel Müller
918d7712c0 ci: Make sure to keep ci/diffs/ directory around
Commit 837664758d ("ci: Allow usage of .patch patches") removed the
ci/diffs/.do_not_use_dot_patch_here marker file. Given that we currently
have no CI patches present and that git does not track (empty)
directories, ci/diffs/ got removed. That's fine functionality-wise, but
it makes for a bit of a discoverability hurdle. Add back a marker file
to keep the directory around.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-11-08 08:33:47 -08:00
Daniel Müller
4a84a7619f ci: Provide KBUILD_OUTPUT to actions asking for it
As of https://github.com/libbpf/ci/pull/67 a bunch of actions honor
KBUILD_OUTPUT. Doing so will make it possible to separate source code
from build artifacts, which in turn may allow us to support incremental
kernel compilation in CI down the line.
Irrespective of these future changes, actions pertaining the kernel
build now ask for an additional input defining where to store or expect
build artifacts. Provide it.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-11-07 11:02:01 -08:00
Daniel Müller
837664758d ci: Allow usage of .patch patches
With https://github.com/libbpf/ci/pull/68 merged we can now keep the
.patch extension for patches and don't have to worry about forgetting
the rename to .diff.
Remove the marker file reminding us of that need.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-11-07 11:00:56 -08:00
Daniel Müller
11bf829873 ci: Remove no longer needed patches
Patch "selftests/bpf: Fix OOB write in test_verifier" has made it to the
bpf branch (after originally landing on bpf-next). Remove it from CI, as
it is no longer necessary.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-11-07 11:00:56 -08:00
Matteo Croce
c97b16d96c ci: enable shellcheck linter
Run shellckeck linter in a github action,
as in https://github.com/libbpf/ci/pull/61

Signed-off-by: Matteo Croce <teknoraver@meta.com>
2022-10-27 16:46:38 -07:00
Matteo Croce
1c17672353 shellcheck: fix errors
Signed-off-by: Matteo Croce <teknoraver@meta.com>
2022-10-27 16:46:38 -07:00
Tobias Waldekranz
68e6f83f22 Makefile: Fix cross-compilation for 32-bit targets
Determining the correct library installation path (lib vs. lib64)
using uname(1) breaks in cross compilation scenarios where word widths
differ between the host and target system.

Instead, source the information from the compilers '-dumpmachine'
option (supported by both GCC and Clang).

We call this the "host" architecture, using the same nomenclature as
Autotools (--host configure option).

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
2022-10-18 17:33:04 -07:00
grantseltzer
383ffb79a6 Add documentation badge to README
This adds a documentation badge that links to libbpf.readthedocs.org
When rendered on github it will display the status of the docs build

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
2022-10-17 13:55:59 -07:00
David Vernet
50315fd763 README: Fix Arch packaging link
libbpf is now packaged as part of the core repository, not the extra
repository. Fix the current link which gets a 404.

Signed-off-by: David Vernet <void@manifault.com>
2022-10-17 13:17:40 -07:00
Andrii Nakryiko
534a2c6f53 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   87dbdc230d162bf9ee1ac77c8ade178b6b1e199e
Checkpoint bpf-next commit: 62c69e89e81bfbdb9a87ae3e0599dcc6aacf786b
Baseline bpf commit:        60240bc26114543fcbfcd8a28466e67e77b20388
Checkpoint bpf commit:      e7b09357453a99e6f9e74c39e9ca1363c22c0b96

Andrii Nakryiko (1):
  bpf: explicitly define BPF_FUNC_xxx integer values

Eduard Zingerman (1):
  bpftool: Print newline before '}' for struct with padding only fields

Kui-Feng Lee (2):
  bpf: Parameterize task iterators.
  bpf: Handle bpf_link_info for the parameterized task BPF iterators.

Roberto Sassu (5):
  libbpf: Fix LIBBPF_1.0.0 declaration in libbpf.map
  libbpf: Introduce bpf_get_fd_by_id_opts and
    bpf_map_get_fd_by_id_opts()
  libbpf: Introduce bpf_prog_get_fd_by_id_opts()
  libbpf: Introduce bpf_btf_get_fd_by_id_opts()
  libbpf: Introduce bpf_link_get_fd_by_id_opts()

Shung-Hsi Yu (3):
  libbpf: Use elf_getshdrnum() instead of e_shnum
  libbpf: Deal with section with no data gracefully
  libbpf: Fix null-pointer dereference in find_prog_by_sec_insn()

Xin Liu (1):
  libbpf: Fix overrun in netlink attribute iteration

Xu Kuohai (2):
  libbpf: Fix use-after-free in btf_dump_name_dups
  libbpf: Fix memory leak in parse_usdt_arg()

 include/uapi/linux/bpf.h | 442 ++++++++++++++++++++-------------------
 src/bpf.c                |  48 ++++-
 src/bpf.h                |  16 ++
 src/btf_dump.c           |  35 +++-
 src/libbpf.c             |  22 +-
 src/libbpf.map           |   6 +-
 src/nlattr.c             |   2 +-
 src/usdt.c               |  11 +-
 8 files changed, 347 insertions(+), 235 deletions(-)

--
2.30.2
2022-10-17 13:13:02 -07:00
Shung-Hsi Yu
3a3ef0c1d0 libbpf: Fix null-pointer dereference in find_prog_by_sec_insn()
When there are no program sections, obj->programs is left unallocated,
and find_prog_by_sec_insn()'s search lands on &obj->programs[0] == NULL,
and will cause null-pointer dereference in the following access to
prog->sec_idx.

Guard the search with obj->nr_programs similar to what's being done in
__bpf_program__iter() to prevent null-pointer access from happening.

Fixes: db2b8b06423c ("libbpf: Support CO-RE relocations for multi-prog sections")
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221012022353.7350-4-shung-hsi.yu@suse.com
2022-10-17 13:13:02 -07:00
Shung-Hsi Yu
3ee4823fcb libbpf: Deal with section with no data gracefully
ELF section data pointer returned by libelf may be NULL (if section has
SHT_NOBITS), so null check section data pointer before attempting to
copy license and kversion section.

Fixes: cb1e5e961991 ("bpf tools: Collect version and license from ELF sections")
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221012022353.7350-3-shung-hsi.yu@suse.com
2022-10-17 13:13:02 -07:00
Shung-Hsi Yu
7412775110 libbpf: Use elf_getshdrnum() instead of e_shnum
This commit replace e_shnum with the elf_getshdrnum() helper to fix two
oss-fuzz-reported heap-buffer overflow in __bpf_object__open. Both
reports are incorrectly marked as fixed and while still being
reproducible in the latest libbpf.

  # clusterfuzz-testcase-minimized-bpf-object-fuzzer-5747922482888704
  libbpf: loading object 'fuzz-object' from buffer
  libbpf: sec_cnt is 0
  libbpf: elf: section(1) .data, size 0, link 538976288, flags 2020202020202020, type=2
  libbpf: elf: section(2) .data, size 32, link 538976288, flags 202020202020ff20, type=1
  =================================================================
  ==13==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000c0 at pc 0x0000005a7b46 bp 0x7ffd12214af0 sp 0x7ffd12214ae8
  WRITE of size 4 at 0x6020000000c0 thread T0
  SCARINESS: 46 (4-byte-write-heap-buffer-overflow-far-from-bounds)
      #0 0x5a7b45 in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3414:24
      #1 0x5733c0 in bpf_object_open /src/libbpf/src/libbpf.c:7223:16
      #2 0x5739fd in bpf_object__open_mem /src/libbpf/src/libbpf.c:7263:20
      ...

The issue lie in libbpf's direct use of e_shnum field in ELF header as
the section header count. Where as libelf implemented an extra logic
that, when e_shnum == 0 && e_shoff != 0, will use sh_size member of the
initial section header as the real section header count (part of ELF
spec to accommodate situation where section header counter is larger
than SHN_LORESERVE).

The above inconsistency lead to libbpf writing into a zero-entry calloc
area. So intead of using e_shnum directly, use the elf_getshdrnum()
helper provided by libelf to retrieve the section header counter into
sec_cnt.

Fixes: 0d6988e16a12 ("libbpf: Fix section counting logic")
Fixes: 25bbbd7a444b ("libbpf: Remove assumptions about uniqueness of .rodata/.data/.bss maps")
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=40868
Link: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=40957
Link: https://lore.kernel.org/bpf/20221012022353.7350-2-shung-hsi.yu@suse.com
2022-10-17 13:13:02 -07:00
Xu Kuohai
881a10980b libbpf: Fix memory leak in parse_usdt_arg()
In the arm64 version of parse_usdt_arg(), when sscanf returns 2, reg_name
is allocated but not freed. Fix it.

Fixes: 0f8619929c57 ("libbpf: Usdt aarch64 arg parsing support")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20221011120108.782373-3-xukuohai@huaweicloud.com
2022-10-17 13:13:02 -07:00
Xu Kuohai
54caf920db libbpf: Fix use-after-free in btf_dump_name_dups
ASAN reports an use-after-free in btf_dump_name_dups:

ERROR: AddressSanitizer: heap-use-after-free on address 0xffff927006db at pc 0xaaaab5dfb618 bp 0xffffdd89b890 sp 0xffffdd89b928
READ of size 2 at 0xffff927006db thread T0
    #0 0xaaaab5dfb614 in __interceptor_strcmp.part.0 (test_progs+0x21b614)
    #1 0xaaaab635f144 in str_equal_fn tools/lib/bpf/btf_dump.c:127
    #2 0xaaaab635e3e0 in hashmap_find_entry tools/lib/bpf/hashmap.c:143
    #3 0xaaaab635e72c in hashmap__find tools/lib/bpf/hashmap.c:212
    #4 0xaaaab6362258 in btf_dump_name_dups tools/lib/bpf/btf_dump.c:1525
    #5 0xaaaab636240c in btf_dump_resolve_name tools/lib/bpf/btf_dump.c:1552
    #6 0xaaaab6362598 in btf_dump_type_name tools/lib/bpf/btf_dump.c:1567
    #7 0xaaaab6360b48 in btf_dump_emit_struct_def tools/lib/bpf/btf_dump.c:912
    #8 0xaaaab6360630 in btf_dump_emit_type tools/lib/bpf/btf_dump.c:798
    #9 0xaaaab635f720 in btf_dump__dump_type tools/lib/bpf/btf_dump.c:282
    #10 0xaaaab608523c in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:236
    #11 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
    #12 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
    #13 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
    #14 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
    #15 0xaaaab5d65990  (test_progs+0x185990)

0xffff927006db is located 11 bytes inside of 16-byte region [0xffff927006d0,0xffff927006e0)
freed by thread T0 here:
    #0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4)
    #1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191
    #2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163
    #3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106
    #4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157
    #5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519
    #6 0xaaaab6353e10 in btf__add_field tools/lib/bpf/btf.c:2032
    #7 0xaaaab6084fcc in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:232
    #8 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
    #9 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
    #10 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
    #11 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
    #12 0xaaaab5d65990  (test_progs+0x185990)

previously allocated by thread T0 here:
    #0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4)
    #1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191
    #2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163
    #3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106
    #4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157
    #5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519
    #6 0xaaaab6353ff0 in btf_add_enum_common tools/lib/bpf/btf.c:2070
    #7 0xaaaab6354080 in btf__add_enum tools/lib/bpf/btf.c:2102
    #8 0xaaaab6082f50 in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:162
    #9 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
    #10 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
    #11 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
    #12 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
    #13 0xaaaab5d65990  (test_progs+0x185990)

The reason is that the key stored in hash table name_map is a string
address, and the string memory is allocated by realloc() function, when
the memory is resized by realloc() later, the old memory may be freed,
so the address stored in name_map references to a freed memory, causing
use-after-free.

Fix it by storing duplicated string address in name_map.

Fixes: 919d2b1dbb07 ("libbpf: Allow modification of BTF and add btf__add_str API")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20221011120108.782373-2-xukuohai@huaweicloud.com
2022-10-17 13:13:02 -07:00
Roberto Sassu
0d6c47523c libbpf: Introduce bpf_link_get_fd_by_id_opts()
Introduce bpf_link_get_fd_by_id_opts(), for symmetry with
bpf_map_get_fd_by_id_opts(), to let the caller pass the newly introduced
data structure bpf_get_fd_by_id_opts. Keep the existing
bpf_link_get_fd_by_id(), and call bpf_link_get_fd_by_id_opts() with NULL as
opts argument, to prevent setting open_flags.

Currently, the kernel does not support non-zero open_flags for
bpf_link_get_fd_by_id_opts(), and a call with them will result in an error
returned by the bpf() system call. The caller should always pass zero
open_flags.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221006110736.84253-6-roberto.sassu@huaweicloud.com
2022-10-17 13:13:02 -07:00
Roberto Sassu
998282f179 libbpf: Introduce bpf_btf_get_fd_by_id_opts()
Introduce bpf_btf_get_fd_by_id_opts(), for symmetry with
bpf_map_get_fd_by_id_opts(), to let the caller pass the newly introduced
data structure bpf_get_fd_by_id_opts. Keep the existing
bpf_btf_get_fd_by_id(), and call bpf_btf_get_fd_by_id_opts() with NULL as
opts argument, to prevent setting open_flags.

Currently, the kernel does not support non-zero open_flags for
bpf_btf_get_fd_by_id_opts(), and a call with them will result in an error
returned by the bpf() system call. The caller should always pass zero
open_flags.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221006110736.84253-5-roberto.sassu@huaweicloud.com
2022-10-17 13:13:02 -07:00
Roberto Sassu
d6d1ec5b25 libbpf: Introduce bpf_prog_get_fd_by_id_opts()
Introduce bpf_prog_get_fd_by_id_opts(), for symmetry with
bpf_map_get_fd_by_id_opts(), to let the caller pass the newly introduced
data structure bpf_get_fd_by_id_opts. Keep the existing
bpf_prog_get_fd_by_id(), and call bpf_prog_get_fd_by_id_opts() with NULL as
opts argument, to prevent setting open_flags.

Currently, the kernel does not support non-zero open_flags for
bpf_prog_get_fd_by_id_opts(), and a call with them will result in an error
returned by the bpf() system call. The caller should always pass zero
open_flags.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221006110736.84253-4-roberto.sassu@huaweicloud.com
2022-10-17 13:13:02 -07:00
Roberto Sassu
a719cae6aa libbpf: Introduce bpf_get_fd_by_id_opts and bpf_map_get_fd_by_id_opts()
Define a new data structure called bpf_get_fd_by_id_opts, with the member
open_flags, to be used by callers of the _opts variants of
bpf_*_get_fd_by_id() to specify the permissions needed for the file
descriptor to be obtained.

Also, introduce bpf_map_get_fd_by_id_opts(), to let the caller pass a
bpf_get_fd_by_id_opts structure.

Finally, keep the existing bpf_map_get_fd_by_id(), and call
bpf_map_get_fd_by_id_opts() with NULL as opts argument, to request
read-write permissions (current behavior).

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221006110736.84253-3-roberto.sassu@huaweicloud.com
2022-10-17 13:13:02 -07:00
Roberto Sassu
07024c87de libbpf: Fix LIBBPF_1.0.0 declaration in libbpf.map
Add the missing LIBBPF_0.8.0 at the end of the LIBBPF_1.0.0 declaration,
similarly to other version declarations.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221006110736.84253-2-roberto.sassu@huaweicloud.com
2022-10-17 13:13:02 -07:00
Andrii Nakryiko
19ef40cee6 bpf: explicitly define BPF_FUNC_xxx integer values
Historically enum bpf_func_id's BPF_FUNC_xxx enumerators relied on
implicit sequential values being assigned by compiler. This is
convenient, as new BPF helpers are always added at the very end, but it
also has its downsides, some of them being:

  - with over 200 helpers now it's very hard to know what's each helper's ID,
    which is often important to know when working with BPF assembly (e.g.,
    by dumping raw bpf assembly instructions with llvm-objdump -d
    command). it's possible to work around this by looking into vmlinux.h,
    dumping /sys/btf/kernel/vmlinux, looking at libbpf-provided
    bpf_helper_defs.h, etc. But it always feels like an unnecessary step
    and one should be able to quickly figure this out from UAPI header.

  - when backporting and cherry-picking only some BPF helpers onto older
    kernels it's important to be able to skip some enum values for helpers
    that weren't backported, but preserve absolute integer IDs to keep BPF
    helper IDs stable so that BPF programs stay portable across upstream
    and backported kernels.

While neither problem is insurmountable, they come up frequently enough
and are annoying enough to warrant improving the situation. And for the
backporting the problem can easily go unnoticed for a while, especially
if backport is done with people not very familiar with BPF subsystem overall.

Anyways, it's easy to fix this by making sure that __BPF_FUNC_MAPPER
macro provides explicit helper IDs. Unfortunately that would potentially
break existing users that use UAPI-exposed __BPF_FUNC_MAPPER and are
expected to pass macro that accepts only symbolic helper identifier
(e.g., map_lookup_elem for bpf_map_lookup_elem() helper).

As such, we need to introduce a new macro (___BPF_FUNC_MAPPER) which
would specify both identifier and integer ID, but in such a way as to
allow existing __BPF_FUNC_MAPPER be expressed in terms of new
___BPF_FUNC_MAPPER macro. And that's what this patch is doing. To avoid
duplication and allow __BPF_FUNC_MAPPER stay *exactly* the same,
___BPF_FUNC_MAPPER accepts arbitrary "context" arguments, which can be
used to pass any extra macros, arguments, and whatnot. In our case we
use this to pass original user-provided macro that expects single
argument and __BPF_FUNC_MAPPER is using it's own three-argument
__BPF_FUNC_MAPPER_APPLY intermediate macro to impedance-match new and
old "callback" macros.

Once we resolve this, we use new ___BPF_FUNC_MAPPER to define enum
bpf_func_id with explicit values. The other users of __BPF_FUNC_MAPPER
in kernel (namely in kernel/bpf/disasm.c) are kept exactly the same both
as demonstration that backwards compat works, but also to avoid
unnecessary code churn.

Note that new ___BPF_FUNC_MAPPER() doesn't forcefully insert comma
between values, as that might not be appropriate in all possible cases
where ___BPF_FUNC_MAPPER might be used by users. This doesn't reduce
usability, as it's trivial to insert that comma inside "callback" macro.

To validate all the manually specified IDs are exactly right, we used
BTF to compare before and after values:

  $ bpftool btf dump file ~/linux-build/default/vmlinux | rg bpf_func_id -A 211 > after.txt
  $ git stash # stach UAPI changes
  $ make -j90
  ... re-building kernel without UAPI changes ...
  $ bpftool btf dump file ~/linux-build/default/vmlinux | rg bpf_func_id -A 211 > before.txt
  $ diff -u before.txt after.txt
  --- before.txt  2022-10-05 10:48:18.119195916 -0700
  +++ after.txt   2022-10-05 10:46:49.446615025 -0700
  @@ -1,4 +1,4 @@
  -[14576] ENUM 'bpf_func_id' encoding=UNSIGNED size=4 vlen=211
  +[9560] ENUM 'bpf_func_id' encoding=UNSIGNED size=4 vlen=211
          'BPF_FUNC_unspec' val=0
          'BPF_FUNC_map_lookup_elem' val=1
          'BPF_FUNC_map_update_elem' val=2

As can be seen from diff above, the only thing that changed was resulting BTF
type ID of ENUM bpf_func_id, not any of the enumerators, their names or integer
values.

The only other place that needed fixing was scripts/bpf_doc.py used to generate
man pages and bpf_helper_defs.h header for libbpf and selftests. That script is
tightly-coupled to exact shape of ___BPF_FUNC_MAPPER macro definition, so had
to be trivially adapted.

Cc: Quentin Monnet <quentin@isovalent.com>
Reported-by: Andrea Terzolo <andrea.terzolo@polito.it>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221006042452.2089843-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-17 13:13:02 -07:00
Eduard Zingerman
3d3ff49213 bpftool: Print newline before '}' for struct with padding only fields
btf_dump_emit_struct_def attempts to print empty structures at a
single line, e.g. `struct empty {}`. However, it has to account for a
case when there are no regular but some padding fields in the struct.
In such case `vlen` would be zero, but size would be non-zero.

E.g. here is struct bpf_timer from vmlinux.h before this patch:

 struct bpf_timer {
 	long: 64;
	long: 64;};

And after this patch:

 struct bpf_dynptr {
 	long: 64;
	long: 64;
 };

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221001104425.415768-1-eddyz87@gmail.com
2022-10-17 13:13:02 -07:00
Xin Liu
3745a20b28 libbpf: Fix overrun in netlink attribute iteration
I accidentally found that a change in commit 1045b03e07d8 ("netlink: fix
overrun in attribute iteration") was not synchronized to the function
`nla_ok` in tools/lib/bpf/nlattr.c, I think it is necessary to modify,
this patch will do it.

Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220930090708.62394-1-liuxin350@huawei.com
2022-10-17 13:13:02 -07:00
Kui-Feng Lee
b9e909dd41 bpf: Handle bpf_link_info for the parameterized task BPF iterators.
Add new fields to bpf_link_info that users can query it through
bpf_obj_get_info_by_fd().

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20220926184957.208194-3-kuifeng@fb.com
2022-10-17 13:13:02 -07:00
Kui-Feng Lee
73c0c44b67 bpf: Parameterize task iterators.
Allow creating an iterator that loops through resources of one
thread/process.

People could only create iterators to loop through all resources of
files, vma, and tasks in the system, even though they were interested
in only the resources of a specific task or process.  Passing the
additional parameters, people can now create an iterator to go
through all resources or only the resources of a task.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20220926184957.208194-2-kuifeng@fb.com
2022-10-17 13:13:02 -07:00
Daniel Müller
abde7fb314 Remove lru_bug from DENYLIST-latest.s390x
The comment associated with the entry is a bit confusing. It stemmed
from the test being denylisted on bpf, but not bpf-next in the past.
Regardless, by now said change has propagated to both trees, so we no
longer need to carry around this deny list entry here.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-10-12 09:29:08 -07:00
Manu Bretelle
63389d32f6 ci: remove mkrootfs from libbpf/libbpf
This is being moved to libbpf/ci instead https://github.com/libbpf/ci/pull/44

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2022-10-11 09:14:31 -07:00
Frantisek Sumsal
59080bd06c ci: use CodeQL instead of LGTM
As LGTM is going to be shut down by EOY[0], let's move the code scanning to
CodeQL as recommended. Thanks to GH integration the results from such
scans will be shown both in the respective PR and in the Security ->
Code Scanning tab[1].

[0] https://github.blog/2022-08-15-the-next-step-for-lgtm-com-github-code-scanning/
[1] https://github.com/libbpf/libbpf/security/code-scanning
2022-10-10 16:31:14 -07:00
Daniel Müller
8b0b41f812 Remove travis-ci symlink
With https://github.com/libbpf/ci/pull/41 merged we no longer require
the travis-ci symlink in this repository. Remove it. Also, it turns out
we still have a few locations referencing travis-ci/ instead of ci/.
Convert those.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-10-07 11:54:39 -07:00
chantra
6bd5b40bcd ci: install wget package on s390x runners
`wget` is installed by default in GH runners.
It is used in
[`get-linux-source`](79c799d6fb/get-linux-source/checkout_latest_kernel.sh (L32))
to download source faster than through a git fetch.

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2022-10-06 14:24:50 -07:00
chantra
6cd8907a4a ci: update actions-runner to 2.298.2 on s390x
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2022-10-06 14:24:50 -07:00
chantra
fa2875be8a ci: install zstd on s390x runners
zstd is installed by [default in GH
runners](https://github.com/actions/runner-images/blob/main/images/linux/Ubuntu2004-Readme.md).

Having it by default, we can start leveraging it when uploading
artifacts. It has a better compression ratio and is multithreaded.

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
2022-10-06 14:24:50 -07:00
chantra
27a93eae7c [s390x][ci] Force replacing workers when a worker already exist with
same name.

This is essentially aligning whith what is done in
0f2883e196/entrypoint.sh (L90-L91)

The issue at hand did manifest on s390x host when restarting a runner
and GH having an existing runner with the same name.
The logic was to default to not replace it and the runner would be
started with somne defaults, which mean the name would change, and the
labels would be lost, making the runner unusable (while still running): https://gist.github.com/chantra/ef0bd3e0c9e35bb82619636acf2f7c98

By replacing the existing runner, we will not get into that state.
2022-10-04 10:52:07 -07:00
thiagoftsm
dac1c4b6a8 Merge branch 'libbpf:master' into master 2022-09-29 21:37:39 +00:00
Andrii Nakryiko
1714037104 vmtest: regenerate latest vmlinux.h
Update checked in vmlinux.h for 5.5 kernel tests.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-09-27 15:23:45 -07:00
Andrii Nakryiko
d598cb20c7 libbpf: bump version to 1.1.0
Bump LIBBPF_MINOR_VERSION to 1 for v1.1.0.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-09-27 15:23:45 -07:00
Andrii Nakryiko
ce321d6fd4 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   e34cfee65ec891a319ce79797dda18083af33a76
Checkpoint bpf-next commit: 87dbdc230d162bf9ee1ac77c8ade178b6b1e199e
Baseline bpf commit:        14b20b784f59bdd95f6f1cfb112c9818bcec4d84
Checkpoint bpf commit:      60240bc26114543fcbfcd8a28466e67e77b20388

Andrii Nakryiko (3):
  libbpf: Fix crash if SEC("freplace") programs don't have
    attach_prog_fd set
  libbpf: restore memory layout of bpf_object_open_opts
  libbpf: Don't require full struct enum64 in UAPI headers

Benjamin Tissoires (1):
  libbpf: add map_get_fd_by_id and map_delete_elem in light skeleton

Daniel Borkmann (1):
  libbpf: Remove gcc support for bpf_tail_call_static for now

David Vernet (3):
  bpf: Define new BPF_MAP_TYPE_USER_RINGBUF map type
  bpf: Add bpf_user_ringbuf_drain() helper
  bpf: Add libbpf logic for user-space ring buffer

Hao Luo (2):
  bpf: Introduce cgroup iter
  bpf: Add CGROUP prefix to cgroup_iter_order

James Hilliard (1):
  libbpf: Add GCC support for bpf_tail_call_static

Jiri Olsa (1):
  bpf: Return value in kprobe get_func_ip only for entry address

Jon Doron (1):
  libbpf: Fix the case of running as non-root with capabilities

Pu Lehui (1):
  bpf, cgroup: Reject prog_attach_flags array when effective query

Quentin Monnet (1):
  bpf: Fix a few typos in BPF helpers documentation

Shmulik Ladkani (2):
  bpf, flow_dissector: Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode for
    bpf progs
  bpf: Support getting tunnel flags

Stanislav Fomichev (1):
  bpf: update bpf_{g,s}et_retval documentation

Tao Chen (1):
  libbpf: Support raw BTF placed in the default search path

Wang Yufen (1):
  libbpf: Add pathname_concat() helper

Xin Liu (2):
  libbpf: Clean up legacy bpf maps declaration in bpf_helpers
  libbpf: Fix NULL pointer exception in API btf_dump__dump_type_data

Yonghong Song (3):
  bpf: Update descriptions for helpers bpf_get_func_arg[_cnt]()
  libbpf: Add new BPF_PROG2 macro
  libbpf: Improve BPF_PROG2 macro code quality and description

 include/uapi/linux/bpf.h | 139 +++++++++++++++++---
 src/bpf_helpers.h        |  12 --
 src/bpf_tracing.h        | 107 ++++++++++++++++
 src/btf.c                |  32 ++---
 src/btf.h                |  25 +++-
 src/btf_dump.c           |   2 +-
 src/libbpf.c             | 106 ++++++++-------
 src/libbpf.h             | 111 +++++++++++++++-
 src/libbpf.map           |  10 ++
 src/libbpf_probes.c      |   1 +
 src/libbpf_version.h     |   2 +-
 src/ringbuf.c            | 271 +++++++++++++++++++++++++++++++++++++++
 src/skel_internal.h      |  23 ++++
 src/usdt.c               |   2 +-
 14 files changed, 731 insertions(+), 112 deletions(-)

--
2.30.2
2022-09-27 15:23:45 -07:00
Andrii Nakryiko
0f5b3a10ae sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-09-27 15:23:45 -07:00
Pu Lehui
5859c59e50 bpf, cgroup: Reject prog_attach_flags array when effective query
Attach flags is only valid for attached progs of this layer cgroup,
but not for effective progs. For querying with EFFECTIVE flags,
exporting attach flags does not make sense. So when effective query,
we reject prog_attach_flags array and don't need to populate it.
Also we limit attach_flags to output 0 during effective query.

Fixes: b79c9fc9551b ("bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20220921104604.2340580-2-pulehui@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-09-27 15:23:45 -07:00
Andrii Nakryiko
85f8b7c4dc libbpf: Don't require full struct enum64 in UAPI headers
Drop the requirement for system-wide kernel UAPI headers to provide full
struct btf_enum64 definition. This is an unexpected requirement that
slipped in libbpf 1.0 and put unnecessary pressure ([0]) on users to have
a bleeding-edge kernel UAPI header from unreleased Linux 6.0.

To achieve this, we forward declare struct btf_enum64. But that's not
enough as there is btf_enum64_value() helper that expects to know the
layout of struct btf_enum64. So we get a bit creative with
reinterpreting memory layout as array of __u32 and accesing lo32/hi32
fields as array elements. Alternative way would be to have a local
pointer variable for anonymous struct with exactly the same layout as
struct btf_enum64, but that gets us into C++ compiler errors complaining
about invalid type casts. So play it safe, if ugly.

  [0] Closes: https://github.com/libbpf/libbpf/issues/562

Fixes: d90ec262b35b ("libbpf: Add enum64 support for btf_dump")
Reported-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://lore.kernel.org/bpf/20220927042940.147185-1-andrii@kernel.org
2022-09-27 15:23:45 -07:00
Jon Doron
9da0dcb621 libbpf: Fix the case of running as non-root with capabilities
When running rootless with special capabilities like:
FOWNER / DAC_OVERRIDE / DAC_READ_SEARCH

The "access" API will not make the proper check if there is really
access to a file or not.

>From the access man page:
"
The check is done using the calling process's real UID and GID, rather
than the effective IDs as is done when actually attempting an operation
(e.g., open(2)) on the file.  Similarly, for the root user, the check
uses the set of permitted capabilities  rather than the set of effective
capabilities; ***and for non-root users, the check uses an empty set of
capabilities.***
"

What that means is that for non-root user the access API will not do the
proper validation if the process really has permission to a file or not.

To resolve this this patch replaces all the access API calls with
faccessat with AT_EACCESS flag.

Signed-off-by: Jon Doron <jond@wiz.io>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220925070431.1313680-1-arilou@gmail.com
2022-09-27 15:23:45 -07:00
Jiri Olsa
82c4054376 bpf: Return value in kprobe get_func_ip only for entry address
Changing return value of kprobe's version of bpf_get_func_ip
to return zero if the attach address is not on the function's
entry point.

For kprobes attached in the middle of the function we can't easily
get to the function address especially now with the CONFIG_X86_KERNEL_IBT
support.

If user cares about current IP for kprobes attached within the
function body, they can get it with PT_REGS_IP(ctx).

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220926153340.1621984-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-27 15:23:45 -07:00
Andrii Nakryiko
b3a117773d libbpf: restore memory layout of bpf_object_open_opts
When attach_prog_fd field was removed in libbpf 1.0 and replaced with
`long: 0` placeholder, it actually shifted all the subsequent fields by
8 byte. This is due to `long: 0` promising to adjust next field's offset
to long-aligned offset. But in this case we were already long-aligned
as pin_root_path is a pointer. So `long: 0` had no effect, and thus
didn't feel the gap created by removed attach_prog_fd.

Non-zero bitfield should have been used instead. I validated using
pahole. Originally kconfig field was at offset 40. With `long: 0` it's
at offset 32, which is wrong. With this change it's back at offset 40.

While technically libbpf 1.0 is allowed to break backwards
compatibility and applications should have been recompiled against
libbpf 1.0 headers, but given how trivial it is to preserve memory
layout, let's fix this.

Reported-by: Grant Seltzer Richman <grantseltzer@gmail.com>
Fixes: 146bf811f5ac ("libbpf: remove most other deprecated high-level APIs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220923230559.666608-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-09-27 15:23:45 -07:00
Wang Yufen
fc2577c54c libbpf: Add pathname_concat() helper
Move snprintf and len check to common helper pathname_concat() to make the
code simpler.

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1663828124-10437-1-git-send-email-wangyufen@huawei.com
2022-09-27 15:23:45 -07:00
Tao Chen
0420f75dbc libbpf: Support raw BTF placed in the default search path
Currently, the default vmlinux files at '/boot/vmlinux-*',
'/lib/modules/*/vmlinux-*' etc. are parsed with 'btf__parse_elf()' to
extract BTF. It is possible that these files are actually raw BTF files
similar to /sys/kernel/btf/vmlinux. So parse these files with
'btf__parse' which tries both raw format and ELF format.

This might be useful in some scenarios where users put their custom BTF
into known locations and don't want to specify btf_custom_path option.

Signed-off-by: Tao Chen <chentao.kernel@linux.alibaba.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/3f59fb5a345d2e4f10e16fe9e35fbc4c03ecaa3e.1662999860.git.chentao.kernel@linux.alibaba.com
2022-09-27 15:23:45 -07:00
Yonghong Song
aa25f218b4 libbpf: Improve BPF_PROG2 macro code quality and description
Commit 34586d29f8df ("libbpf: Add new BPF_PROG2 macro") added BPF_PROG2
macro for trampoline based programs with struct arguments. Andrii
made a few suggestions to improve code quality and description.
This patch implemented these suggestions including better internal
macro name, consistent usage pattern for __builtin_choose_expr(),
simpler macro definition for always-inline func arguments and
better macro description.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20220910025214.1536510-1-yhs@fb.com
2022-09-27 15:23:45 -07:00
David Vernet
9e9bf46c92 bpf: Add libbpf logic for user-space ring buffer
Now that all of the logic is in place in the kernel to support user-space
produced ring buffers, we can add the user-space logic to libbpf. This
patch therefore adds the following public symbols to libbpf:

struct user_ring_buffer *
user_ring_buffer__new(int map_fd,
		      const struct user_ring_buffer_opts *opts);
void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
                                         __u32 size, int timeout_ms);
void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample);
void user_ring_buffer__discard(struct user_ring_buffer *rb,
void user_ring_buffer__free(struct user_ring_buffer *rb);

A user-space producer must first create a struct user_ring_buffer * object
with user_ring_buffer__new(), and can then reserve samples in the
ring buffer using one of the following two symbols:

void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
                                         __u32 size, int timeout_ms);

With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring
buffer will be returned if sufficient space is available in the buffer.
user_ring_buffer__reserve_blocking() provides similar semantics, but will
block for up to 'timeout_ms' in epoll_wait if there is insufficient space
in the buffer. This function has the guarantee from the kernel that it will
receive at least one event-notification per invocation to
bpf_ringbuf_drain(), provided that at least one sample is drained, and the
BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain().

Once a sample is reserved, it must either be committed to the ring buffer
with user_ring_buffer__submit(), or discarded with
user_ring_buffer__discard().

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com
2022-09-27 15:23:45 -07:00
David Vernet
28903eb40e bpf: Add bpf_user_ringbuf_drain() helper
In a prior change, we added a new BPF_MAP_TYPE_USER_RINGBUF map type which
will allow user-space applications to publish messages to a ring buffer
that is consumed by a BPF program in kernel-space. In order for this
map-type to be useful, it will require a BPF helper function that BPF
programs can invoke to drain samples from the ring buffer, and invoke
callbacks on those samples. This change adds that capability via a new BPF
helper function:

bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx,
                       u64 flags)

BPF programs may invoke this function to run callback_fn() on a series of
samples in the ring buffer. callback_fn() has the following signature:

long callback_fn(struct bpf_dynptr *dynptr, void *context);

Samples are provided to the callback in the form of struct bpf_dynptr *'s,
which the program can read using BPF helper functions for querying
struct bpf_dynptr's.

In order to support bpf_ringbuf_drain(), a new PTR_TO_DYNPTR register
type is added to the verifier to reflect a dynptr that was allocated by
a helper function and passed to a BPF program. Unlike PTR_TO_STACK
dynptrs which are allocated on the stack by a BPF program, PTR_TO_DYNPTR
dynptrs need not use reference tracking, as the BPF helper is trusted to
properly free the dynptr before returning. The verifier currently only
supports PTR_TO_DYNPTR registers that are also DYNPTR_TYPE_LOCAL.

Note that while the corresponding user-space libbpf logic will be added
in a subsequent patch, this patch does contain an implementation of the
.map_poll() callback for BPF_MAP_TYPE_USER_RINGBUF maps. This
.map_poll() callback guarantees that an epoll-waiting user-space
producer will receive at least one event notification whenever at least
one sample is drained in an invocation of bpf_user_ringbuf_drain(),
provided that the function is not invoked with the BPF_RB_NO_WAKEUP
flag. If the BPF_RB_FORCE_WAKEUP flag is provided, a wakeup
notification is sent even if no sample was drained.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920000100.477320-3-void@manifault.com
2022-09-27 15:23:45 -07:00
David Vernet
8138aa78bd bpf: Define new BPF_MAP_TYPE_USER_RINGBUF map type
We want to support a ringbuf map type where samples are published from
user-space, to be consumed by BPF programs. BPF currently supports a
kernel -> user-space circular ring buffer via the BPF_MAP_TYPE_RINGBUF
map type.  We'll need to define a new map type for user-space -> kernel,
as none of the helpers exported for BPF_MAP_TYPE_RINGBUF will apply
to a user-space producer ring buffer, and we'll want to add one or
more helper functions that would not apply for a kernel-producer
ring buffer.

This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type
definition. The map type is useless in its current form, as there is no
way to access or use it for anything until we one or more BPF helpers. A
follow-on patch will therefore add a new helper function that allows BPF
programs to run callbacks on samples that are published to the ring
buffer.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920000100.477320-2-void@manifault.com
2022-09-27 15:23:45 -07:00
Xin Liu
8ac9773f52 libbpf: Fix NULL pointer exception in API btf_dump__dump_type_data
We found that function btf_dump__dump_type_data can be called by the
user as an API, but in this function, the `opts` parameter may be used
as a null pointer.This causes `opts->indent_str` to trigger a NULL
pointer exception.

Fixes: 2ce8450ef5a3 ("libbpf: add bpf_object__open_{file, mem} w/ extensible opts")
Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Weibin Kong <kongweibin2@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220917084809.30770-1-liuxin350@huawei.com
2022-09-27 15:23:45 -07:00
Xin Liu
b63791cbde libbpf: Clean up legacy bpf maps declaration in bpf_helpers
Legacy BPF map declarations are no longer supported in libbpf v1.0 [0].
Only BTF-defined maps are supported starting from v1.0, so it is time to
remove the definition of bpf_map_def in bpf_helpers.h.

  [0] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0

Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20220913073643.19960-1-liuxin350@huawei.com
2022-09-27 15:23:45 -07:00
Andrii Nakryiko
0ff6d28aec libbpf: Fix crash if SEC("freplace") programs don't have attach_prog_fd set
Fix SIGSEGV caused by libbpf trying to find attach type in vmlinux BTF
for freplace programs. It's wrong to search in vmlinux BTF and libbpf
doesn't even mark vmlinux BTF as required for freplace programs. So
trying to search anything in obj->vmlinux_btf might cause NULL
dereference if nothing else in BPF object requires vmlinux BTF.

Instead, error out if freplace (EXT) program doesn't specify
attach_prog_fd during at the load time.

Fixes: 91abb4a6d79d ("libbpf: Support attachment of BPF tracing programs to kernel modules")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220909193053.577111-3-andrii@kernel.org
2022-09-27 15:23:45 -07:00
Daniel Borkmann
861364fa45 libbpf: Remove gcc support for bpf_tail_call_static for now
This reverts commit 14e5ce79943a ("libbpf: Add GCC support for
bpf_tail_call_static"). Reason is that gcc invented their own BPF asm
which is not conform with LLVM one, and going forward this would be
more painful to maintain here and in other areas of the library. Thus
remove it; ask to gcc folks is to align with LLVM one to use exact
same syntax.

Fixes: 14e5ce79943a ("libbpf: Add GCC support for bpf_tail_call_static")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: James Hilliard <james.hilliard1@gmail.com>
Cc: Jose E. Marchesi <jose.marchesi@oracle.com>
2022-09-27 15:23:45 -07:00
Yonghong Song
21ec5ca723 libbpf: Add new BPF_PROG2 macro
To support struct arguments in trampoline based programs,
existing BPF_PROG doesn't work any more since
the type size is needed to find whether a parameter
takes one or two registers. So this patch added a new
BPF_PROG2 macro to support such trampoline programs.

The idea is suggested by Andrii. For example, if the
to-be-traced function has signature like
  typedef struct {
       void *x;
       int t;
  } sockptr;
  int blah(sockptr x, char y);

In the new BPF_PROG2 macro, the argument can be
represented as
  __bpf_prog_call(
     ({ union {
          struct { __u64 x, y; } ___z;
          sockptr x;
        } ___tmp = { .___z = { ctx[0], ctx[1] }};
        ___tmp.x;
     }),
     ({ union {
          struct { __u8 x; } ___z;
          char y;
        } ___tmp = { .___z = { ctx[2] }};
        ___tmp.y;
     }));
In the above, the values stored on the stack are properly
assigned to the actual argument type value by using 'union'
magic. Note that the macro also works even if no arguments
are with struct types.

Note that new BPF_PROG2 works for both llvm16 and pre-llvm16
compilers where llvm16 supports bpf target passing value
with struct up to 16 byte size and pre-llvm16 will pass
by reference by storing values on the stack. With static functions
with struct argument as always inline, the compiler is able
to optimize and remove additional stack saving of struct values.

Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220831152707.2079473-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-27 15:23:45 -07:00
Yonghong Song
255690da57 bpf: Update descriptions for helpers bpf_get_func_arg[_cnt]()
Now instead of the number of arguments, the number of registers
holding argument values are stored in trampoline. Update
the description of bpf_get_func_arg[_cnt]() helpers. Previous
programs without struct arguments should continue to work
as usual.

Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220831152657.2078805-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-27 15:23:45 -07:00
Shmulik Ladkani
b1753eaf3b bpf: Support getting tunnel flags
Existing 'bpf_skb_get_tunnel_key' extracts various tunnel parameters
(id, ttl, tos, local and remote) but does not expose ip_tunnel_info's
tun_flags to the BPF program.

It makes sense to expose tun_flags to the BPF program.

Assume for example multiple GRE tunnels maintained on a single GRE
interface in collect_md mode. The program expects origins to initiate
over GRE, however different origins use different GRE characteristics
(e.g. some prefer to use GRE checksum, some do not; some pass a GRE key,
some do not, etc..).

A BPF program getting tun_flags can therefore remember the relevant
flags (e.g. TUNNEL_CSUM, TUNNEL_SEQ...) for each initiating remote. In
the reply path, the program can use 'bpf_skb_set_tunnel_key' in order
to correctly reply to the remote, using similar characteristics, based
on the stored tunnel flags.

Introduce BPF_F_TUNINFO_FLAGS flag for bpf_skb_get_tunnel_key. If
specified, 'bpf_tunnel_key->tunnel_flags' is set with the tun_flags.

Decided to use the existing unused 'tunnel_ext' as the storage for the
'tunnel_flags' in order to avoid changing bpf_tunnel_key's layout.

Also, the following has been considered during the design:

  1. Convert the "interesting" internal TUNNEL_xxx flags back to BPF_F_yyy
     and place into the new 'tunnel_flags' field. This has 2 drawbacks:

     - The BPF_F_yyy flags are from *set_tunnel_key* enumeration space,
       e.g. BPF_F_ZERO_CSUM_TX. It is awkward that it is "returned" into
       tunnel_flags from a *get_tunnel_key* call.
     - Not all "interesting" TUNNEL_xxx flags can be mapped to existing
       BPF_F_yyy flags, and it doesn't make sense to create new BPF_F_yyy
       flags just for purposes of the returned tunnel_flags.

  2. Place key.tun_flags into 'tunnel_flags' but mask them, keeping only
     "interesting" flags. That's ok, but the drawback is that what's
     "interesting" for my usecase might be limiting for other usecases.

Therefore I decided to expose what's in key.tun_flags *as is*, which seems
most flexible. The BPF user can just choose to ignore bits he's not
interested in. The TUNNEL_xxx are also UAPI, so no harm exposing them
back in the get_tunnel_key call.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220831144010.174110-1-shmulik.ladkani@gmail.com
2022-09-27 15:23:45 -07:00
James Hilliard
eeb2bc4061 libbpf: Add GCC support for bpf_tail_call_static
The bpf_tail_call_static function is currently not defined unless
using clang >= 8.

To support bpf_tail_call_static on GCC we can check if __clang__ is
not defined to enable bpf_tail_call_static.

We need to use GCC assembly syntax when the compiler does not define
__clang__ as LLVM inline assembly is not fully compatible with GCC.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220829210546.755377-1-james.hilliard1@gmail.com
2022-09-27 15:23:45 -07:00
Quentin Monnet
a11587cc01 bpf: Fix a few typos in BPF helpers documentation
Address a few typos in the documentation for the BPF helper functions.
They were reported by Jakub [0], who ran spell checkers on the generated
man page [1].

[0] https://lore.kernel.org/linux-man/d22dcd47-023c-8f52-d369-7b5308e6c842@gmail.com/T/#mb02e7d4b7fb61d98fa914c77b581184e9a9537af
[1] https://lore.kernel.org/linux-man/eb6a1e41-c48e-ac45-5154-ac57a2c76108@gmail.com/T/#m4a8d1b003616928013ffcd1450437309ab652f9f

v3: Do not copy unrelated (and breaking) elements to tools/ header
v2: Turn a ',' into a ';'

Reported-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220825220806.107143-1-quentin@isovalent.com
2022-09-27 15:23:45 -07:00
Benjamin Tissoires
7fb6138fae libbpf: add map_get_fd_by_id and map_delete_elem in light skeleton
This allows to have a better control over maps from the kernel when
preloading eBPF programs.

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220824134055.1328882-8-benjamin.tissoires@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-27 15:23:45 -07:00
Hao Luo
c918b3e724 bpf: Add CGROUP prefix to cgroup_iter_order
bpf_cgroup_iter_order is globally visible but the entries do not have
CGROUP prefix. As requested by Andrii, put a CGROUP in the names
in bpf_cgroup_iter_order.

This patch fixes two previous commits: one introduced the API and
the other uses the API in bpf selftest (that is, the selftest
cgroup_hierarchical_stats).

I tested this patch via the following command:

  test_progs -t cgroup,iter,btf_dump

Fixes: d4ccaf58a847 ("bpf: Introduce cgroup iter")
Fixes: 88886309d2e8 ("selftests/bpf: add a selftest for cgroup hierarchical stats collection")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220825223936.1865810-1-haoluo@google.com
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
2022-09-27 15:23:45 -07:00
Hao Luo
981001bf46 bpf: Introduce cgroup iter
Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes:

 - walking a cgroup's descendants in pre-order.
 - walking a cgroup's descendants in post-order.
 - walking a cgroup's ancestors.
 - process only the given cgroup.

When attaching cgroup_iter, one can set a cgroup to the iter_link
created from attaching. This cgroup is passed as a file descriptor
or cgroup id and serves as the starting point of the walk. If no
cgroup is specified, the starting point will be the root cgroup v2.

For walking descendants, one can specify the order: either pre-order or
post-order. For walking ancestors, the walk starts at the specified
cgroup and ends at the root.

One can also terminate the walk early by returning 1 from the iter
program.

Note that because walking cgroup hierarchy holds cgroup_mutex, the iter
program is called with cgroup_mutex held.

Currently only one session is supported, which means, depending on the
volume of data bpf program intends to send to user space, the number
of cgroups that can be walked is limited. For example, given the current
buffer size is 8 * PAGE_SIZE, if the program sends 64B data for each
cgroup, assuming PAGE_SIZE is 4kb, the total number of cgroups that can
be walked is 512. This is a limitation of cgroup_iter. If the output
data is larger than the kernel buffer size, after all data in the
kernel buffer is consumed by user space, the subsequent read() syscall
will signal EOPNOTSUPP. In order to work around, the user may have to
update their program to reduce the volume of data sent to output. For
example, skip some uninteresting cgroups. In future, we may extend
bpf_iter flags to allow customizing buffer size.

Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220824233117.1312810-2-haoluo@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-27 15:23:45 -07:00
Stanislav Fomichev
ee7d295f83 bpf: update bpf_{g,s}et_retval documentation
* replace 'syscall' with 'upper layers', still mention that it's being
  exported via syscall errno
* describe what happens in set_retval(-EPERM) + return 1
* describe what happens with bind's 'return 3'

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220823222555.523590-5-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-27 15:23:45 -07:00
Shmulik Ladkani
94d69cc07f bpf, flow_dissector: Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode for bpf progs
Currently, attaching BPF_PROG_TYPE_FLOW_DISSECTOR programs completely
replaces the flow-dissector logic with custom dissection logic. This
forces implementors to write programs that handle dissection for any
flows expected in the namespace.

It makes sense for flow-dissector BPF programs to just augment the
dissector with custom logic (e.g. dissecting certain flows or custom
protocols), while enjoying the broad capabilities of the standard
dissector for any other traffic.

Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode. Flow-dissector BPF
programs may return this to indicate no dissection was made, and
fallback to the standard dissector is requested.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220821113519.116765-3-shmulik.ladkani@gmail.com
2022-09-27 15:23:45 -07:00
Mikhail Tuzikov
12a41a80c5 Adding network diag utils into actions-runner-libbpf container 2022-09-27 11:06:30 -07:00
Daniel Müller
10a32130e7 Clean up local allow/deny lists
Now that we are including the upstream allow/deny lists we can remove
any duplicates from our local lists. While at it, we also add some usdt
tests to the denylist, which are currently failing. This is the same
step we took in the vmtest repository [0].

[0] https://github.com/kernel-patches/vmtest/pull/133

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-09-06 15:01:05 -07:00
Daniel Müller
fad270918d Use deny/allow lists from upstream
So far we have relied on allow/deny lists maintained in this repository
to decide which tests to explicitly include/exclude from running in CI.
With recent changes [0] this information is now available in upstream
Linux.
As such, this change switches us over to using the upstream allow/deny
lists in addition to the local ones. We unconditionally honor the
upstream lists for all kernel versions.

[0] https://lore.kernel.org/bpf/165893461358.29339.11641967418379627671.git-patchwork-notify@kernel.org/T/#m2a97b0ea9ef0ddee7a53bbf7919e3f324b233937

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-09-06 15:01:05 -07:00
Daniel Müller
c091b07808 Fix comment: WHITELIST -> ALLOWLIST
Commit 693de729d0 ("Rename blacklists and whitelists") renamed the
black and white lists but missed the adjustment of a comment,
referencing a file name. Update it accordingly.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-09-06 14:07:51 -07:00
Daniel Müller
efd33720cd Set KERNEL and REPO_ROOT environment variable for run-qemu action
With an upcoming change we would like to invoke bpftool checks from the
run-qemu action (https://github.com/libbpf/ci/pull/37). This action
requires two environment variables, KERNEL and REPO_ROOT, set in order
to function.
Make sure to set them now. Long term we should probably make them
explicit input arguments instead of implicit global state, but there are
many more such instances that we need to clean up.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-09-01 11:00:13 -07:00
Daniel Müller
9aedff8d03 Provide kernel-root argument to run-qemu action
With https://github.com/libbpf/ci/pull/36 merged the run-qemu action now
accepts an additional argument, `kernel-root`.
Provide it to the action with the value appropriate for this repository.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-09-01 10:36:35 -07:00
Daniel Müller
51e63f7229 Explicitly provide kernel-root argument to prepare-rootfs action
Let's make the "kernel-root" explicit when using the prepare-rootfs
action, instead of relying on the default, .kernel.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-29 11:14:39 -07:00
chantra
c53af98d1a [s390x][runner] update action runner to 2.296.0 (latest) 2022-08-27 17:14:28 -07:00
chantra
2c44349e09 [s390x][runners] Use consistent runner name across restarts
Currently, the runner name is taken from the docker container's
hostname.
This changes across restarts, causing the runner name to change across
restarts too.

This uses the host name to keep a consistent name.
2022-08-27 17:14:28 -07:00
Daniel Müller
58361243ec Fix sourcing of helpers.sh in coverity workflow
The path to the helpers.sh script to source was put one level too deep
by cfbd763ef8 ("Use foldable helpers where applicable") and the
GITHUB_ACTION_PATH variable is not actually defined in a workflow.

Fix up both issues.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-26 11:30:12 -07:00
Andrii Nakryiko
c32e1cf948 README: add dark background logo image
Add auto-selectable libbpf logo for light and dark themes.

Suggested-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-08-24 22:09:09 -07:00
Andrii Nakryiko
c4f44c7c11 assets: add libbpf logo images
Add three layouts of libbpf logos (sparse, compact, sideways) with three
color variants (light bg, dark bg, monochrome).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-08-24 21:51:42 -07:00
Daniel Müller
a7a525d47a Rename test_progs_noalu function to test_progs_no_alu32
As a follow up to 66b788c1a4 ("Factor out test_progs_noalu function")
and taking into account feedback [0], this change renames the
test_progs_noalu function to test_progs_no_alu32, to stay closer to the
name of the binary being invoked.

[0] https://github.com/kernel-patches/vmtest/pull/124#discussion_r953175641

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-24 08:08:21 -07:00
Daniel Müller
cfbd763ef8 Use foldable helpers where applicable
As discussed at some earlier point in time, some of the actions/workflow
logic does not use our foldable helpers despite being able to. Switch
them over.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-23 12:04:38 -07:00
thiagoftsm
862b60f205 Merge branch 'libbpf:master' into master 2022-08-22 19:29:03 +00:00
Andrii Nakryiko
a0325403af readme: add logo and clarify initial section
Add libbpf logo to the header and restructure and rewrite a bit
intro part about libbpf, it's bpf-next origins, etc.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-08-22 12:10:03 -07:00
Andrii Nakryiko
7436656dbf README: add link to readthedocs doc site
Add link to https://libbpf.readthedocs.io/en/latest/api.html for API documentation.
2022-08-19 10:37:43 -07:00
Daniel Müller
7984737fbf Support running of individual tests
This change adjusts the run_selftests.sh script to accept an optional
list of arguments specifying the tests to run. We will make use of it
once we run selftests in parallel.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-18 15:31:52 -07:00
Andrii Nakryiko
a0d1e22c77 ci: blacklist lru_bug selftest on s390x
Make sure we don't fail on lru_bug selftests as it relies of BPF
trampoline, not supported by s390x.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-08-18 15:29:04 -07:00
Andrii Nakryiko
e58c615210 ci: update vmlinux.h to latest config
Some selftests require conn->mark, regenerate vmlinux.h.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-08-18 15:29:04 -07:00
Andrii Nakryiko
aec0b1cd7d sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   73cf09a36bf7bfb3e5a3ff23755c36d49137c44d
Checkpoint bpf-next commit: e34cfee65ec891a319ce79797dda18083af33a76
Baseline bpf commit:        e7c677bdd03d54e9a1bafcaf1faf5c573a506bba
Checkpoint bpf commit:      14b20b784f59bdd95f6f1cfb112c9818bcec4d84

Andrii Nakryiko (3):
  libbpf: Fix potential NULL dereference when parsing ELF
  libbpf: Streamline bpf_attr and perf_event_attr initialization
  libbpf: Clean up deprecated and legacy aliases

Hangbin Liu (2):
  libbpf: Add names for auxiliary maps
  libbpf: Making bpf_prog_load() ignore name if kernel doesn't support

Hao Luo (1):
  libbpf: Allows disabling auto attach

Quentin Monnet (1):
  bpf: Clear up confusion in bpf_skb_adjust_room()'s documentation

 include/uapi/linux/bpf.h |   6 +-
 src/bpf.c                | 186 ++++++++++++++++++++++-----------------
 src/btf.c                |   2 -
 src/btf.h                |   1 -
 src/libbpf.c             |  81 ++++++++++++-----
 src/libbpf.h             |   2 +
 src/libbpf.map           |   2 +
 src/libbpf_internal.h    |   3 +
 src/libbpf_legacy.h      |   2 +
 src/netlink.c            |   3 +-
 src/skel_internal.h      |  10 ++-
 11 files changed, 183 insertions(+), 115 deletions(-)

--
2.30.2
2022-08-18 15:29:04 -07:00
Andrii Nakryiko
a202bd7433 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-08-18 15:29:04 -07:00
Andrii Nakryiko
ba81a5b778 libbpf: Clean up deprecated and legacy aliases
Remove three missed deprecated APIs that were aliased to new APIs:
bpf_object__unload, bpf_prog_attach_xattr and btf__load.

Also move legacy API libbpf_find_kernel_btf (aliased to
btf__load_vmlinux_btf) into libbpf_legacy.h.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20220816001929.369487-4-andrii@kernel.org
2022-08-18 15:29:04 -07:00
Andrii Nakryiko
f7cee4152f libbpf: Streamline bpf_attr and perf_event_attr initialization
Make sure that entire libbpf code base is initializing bpf_attr and
perf_event_attr with memset(0). Also for bpf_attr make sure we
clear and pass to kernel only relevant parts of bpf_attr. bpf_attr is
a huge union of independent sub-command attributes, so there is no need
to clear and pass entire union bpf_attr, which over time grows quite
a lot and for most commands this growth is completely irrelevant.

Few cases where we were relying on compiler initialization of BPF UAPI
structs (like bpf_prog_info, bpf_map_info, etc) with `= {};` were
switched to memset(0) pattern for future-proofing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20220816001929.369487-3-andrii@kernel.org
2022-08-18 15:29:04 -07:00
Andrii Nakryiko
06c4624c8c libbpf: Fix potential NULL dereference when parsing ELF
Fix if condition filtering empty ELF sections to prevent NULL
dereference.

Fixes: 47ea7417b074 ("libbpf: Skip empty sections in bpf_object__init_global_data_maps")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20220816001929.369487-2-andrii@kernel.org
2022-08-18 15:29:04 -07:00
Hao Luo
c8f4b9c878 libbpf: Allows disabling auto attach
Adds libbpf APIs for disabling auto-attach for individual functions.
This is motivated by the use case of cgroup iter [1]. Some iter
types require their parameters to be non-zero, therefore applying
auto-attach on them will fail. With these two new APIs, users who
want to use auto-attach and these types of iters can disable
auto-attach on the program and perform manual attach.

[1] https://lore.kernel.org/bpf/CAEf4BzZ+a2uDo_t6kGBziqdz--m2gh2_EUwkGLDtMd65uwxUjA@mail.gmail.com/

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220816234012.910255-1-haoluo@google.com
2022-08-18 15:29:04 -07:00
Hangbin Liu
079bc8536d libbpf: Making bpf_prog_load() ignore name if kernel doesn't support
Similar with commit 10b62d6a38f7 ("libbpf: Add names for auxiliary maps"),
let's make bpf_prog_load() also ignore name if kernel doesn't support
program name.

To achieve this, we need to call sys_bpf_prog_load() directly in
probe_kern_prog_name() to avoid circular dependency. sys_bpf_prog_load()
also need to be exported in the libbpf_internal.h file.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220813000936.6464-1-liuhangbin@gmail.com
2022-08-18 15:29:04 -07:00
Quentin Monnet
8be13ee80b bpf: Clear up confusion in bpf_skb_adjust_room()'s documentation
Adding or removing room space _below_ layers 2 or 3, as the description
mentions, is ambiguous. This was written with a mental image of the
packet with layer 2 at the top, layer 3 under it, and so on. But it has
led users to believe that it was on lower layers (before the beginning
of the L2 and L3 headers respectively).

Let's make it more explicit, and specify between which layers the room
space is adjusted.

Reported-by: Rumen Telbizov <rumen.telbizov@menlosecurity.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220812153727.224500-3-quentin@isovalent.com
2022-08-18 15:29:04 -07:00
Hangbin Liu
3db7585378 libbpf: Add names for auxiliary maps
The bpftool self-created maps can appear in final map show output due to
deferred removal in kernel. These maps don't have a name, which would make
users confused about where it comes from.

With a libbpf_ prefix name, users could know who created these maps.
It also could make some tests (like test_offload.py, which skip base maps
without names as a workaround) filter them out.

Kernel adds bpf prog/map name support in the same merge
commit fadad670a8ab ("Merge branch 'bpf-extend-info'"). So we can also use
kernel_supports(NULL, FEAT_PROG_NAME) to check if kernel supports map name.

As discussed [1], Let's make bpf_map_create accept non-null
name string, and silently ignore the name if kernel doesn't support.

  [1] https://lore.kernel.org/bpf/CAEf4BzYL1TQwo1231s83pjTdFPk9XWWhfZC5=KzkU-VO0k=0Ug@mail.gmail.com/

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220811034020.529685-1-liuhangbin@gmail.com
2022-08-18 15:29:04 -07:00
Daniel Müller
69938da6d7 Explicitly specify Qemu image path to use
The path to the file system image used by our invocation of Qemu is
currently hard coded to /tmp/root.img somewhere in a different
repository. With
da44c0b6ee
landed we have the option of specifying it explicitly from here. Let's
do just that, so that we can remove the default value from libbpf/ci
altogether.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-18 14:38:23 -07:00
Daniel Müller
bfdf7653e0 Rename travis-ci/ directory to ci/
We are no longer using Travis. As such, we should move away from a lot
of CI functionality located in a folder called travis-ci/. This change
renames the travis-ci/ directory to the more generic ci/.
To preserve backwards compatibility until all "consumers" have
transitioned, we add a symbolic link called travis-ci back. It will be
removed in the near term future.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-18 09:02:13 -07:00
Daniel Müller
d700dcf162 Print allow and denylists
We should include the deny and allow lists used somewhere in the output
of our CI runs in order to improve debuggability in general. With this
change we print out these lists once assembled.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-17 11:41:22 -07:00
Daniel Müller
c03b9f6d0b Move kernel version check inwards
The run_selftests.sh script defines functions for running individual
tests. However, not all tests are run in all configurations. E.g.,
test_progs is not run on 4.9.0 kernels and test_maps is only run when
testing on the "latest" kernel version. The checks for these conditions,
however, are applied inconsistently: some are in the functions
themselves and others on the call site.
This change unifies all checks to happen within the test function
itself.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-17 11:41:22 -07:00
Daniel Müller
66b788c1a4 Factor out test_progs_noalu function
This change factors out a new function, test_progs_noalu, in the
run_selftests.sh script. Having this function available will make it
easier for us to run tests conditionally later on, but it's also a
matter of having one function for one binary.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-17 11:41:22 -07:00
Daniel Müller
e3c2b8a48d Re-enable test_maps selftest
Back in 2020, we disabled the test_maps selftest with e05f9be4f4
("vmtests: temporarily disable test_maps") for reasons not closely
elaborated.
It appears that by now the test is succeeding again, so let's enable it
back.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-15 15:50:55 -07:00
Andrii Nakryiko
13a26d78f3 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   71930846b36f8e4e68267f8a3d47e33435c3657a
Checkpoint bpf-next commit: 73cf09a36bf7bfb3e5a3ff23755c36d49137c44d
Baseline bpf commit:        f946964a9f79f8dcb5a6329265281eebfc23aee5
Checkpoint bpf commit:      e7c677bdd03d54e9a1bafcaf1faf5c573a506bba

Alexei Starovoitov (1):
  bpf: Disallow bpf programs call prog_run command.

Andrii Nakryiko (2):
  libbpf: Reject legacy 'maps' ELF section
  libbpf: preserve errno across pr_warn/pr_info/pr_debug

Dave Marchevsky (1):
  bpf: Improve docstring for BPF_F_USER_BUILD_ID flag

Florian Fainelli (1):
  libbpf: Initialize err in probe_map_create

Gustavo A. R. Silva (1):
  treewide: uapi: Replace zero-length arrays with flexible-array members

Hengqi Chen (1):
  libbpf: Do not require executable permission for shared libraries

James Hilliard (2):
  libbpf: Skip empty sections in bpf_object__init_global_data_maps
  libbpf: Ensure functions with always_inline attribute are inline

Jesper Dangaard Brouer (1):
  bpf: Add BPF-helper for accessing CLOCK_TAI

Namhyung Kim (1):
  perf/core: Add a new read format to get a number of lost samples

 include/uapi/linux/bpf.h        | 27 +++++++++++++++++++++++++--
 include/uapi/linux/perf_event.h |  7 +++++--
 include/uapi/linux/pkt_cls.h    |  4 ++--
 src/bpf_tracing.h               | 14 +++++++-------
 src/libbpf.c                    | 25 +++++++++++++++++--------
 src/libbpf_probes.c             |  2 +-
 src/skel_internal.h             |  4 ++--
 src/usdt.bpf.h                  |  4 ++--
 8 files changed, 61 insertions(+), 26 deletions(-)

--
2.30.2
2022-08-10 14:07:19 -07:00
Andrii Nakryiko
6b92311c3a sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-08-10 14:07:19 -07:00
Alexei Starovoitov
6fdbfb00f1 bpf: Disallow bpf programs call prog_run command.
The verifier cannot perform sufficient validation of bpf_attr->test.ctx_in
pointer, therefore bpf programs should not be allowed to call BPF_PROG_RUN
command from within the program.
To fix this issue split bpf_sys_bpf() bpf helper into normal kern_sys_bpf()
kernel function that can only be used by the kernel light skeleton directly.

Reported-by: YiFei Zhu <zhuyifei@google.com>
Fixes: b1d18a7574d0 ("bpf: Extend sys_bpf commands for bpf_syscall programs.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-10 14:07:19 -07:00
Andrii Nakryiko
45dca19bd2 libbpf: preserve errno across pr_warn/pr_info/pr_debug
As suggested in [0], make sure that libbpf_print saves and restored
errno and as such guaranteed that no matter what actual print callback
user installs, macros like pr_warn/pr_info/pr_debug are completely
transparent as far as errno goes.

While libbpf code is pretty careful about not clobbering important errno
values accidentally with pr_warn(), it's a trivial change to make sure
that pr_warn can be used anywhere without a risk of clobbering errno.

No functional changes, just future proofing.

  [0] https://github.com/libbpf/libbpf/pull/536

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/r/20220810183425.1998735-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-10 14:07:19 -07:00
Jesper Dangaard Brouer
2fe1958ec8 bpf: Add BPF-helper for accessing CLOCK_TAI
Commit 3dc6ffae2da2 ("timekeeping: Introduce fast accessor to clock tai")
introduced a fast and NMI-safe accessor for CLOCK_TAI. Especially in time
sensitive networks (TSN), where all nodes are synchronized by Precision Time
Protocol (PTP), it's helpful to have the possibility to generate timestamps
based on CLOCK_TAI instead of CLOCK_MONOTONIC. With a BPF helper for TAI in
place, it becomes very convenient to correlate activity across different
machines in the network.

Use cases for such a BPF helper include functionalities such as Tx launch
time (e.g. ETF and TAPRIO Qdiscs) and timestamping.

Note: CLOCK_TAI is nothing new per se, only the NMI-safe variant of it is.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
[Kurt: Wrote changelog and renamed helper]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20220809060803.5773-2-kurt@linutronix.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-10 14:07:19 -07:00
Dave Marchevsky
cbd9b7e5d8 bpf: Improve docstring for BPF_F_USER_BUILD_ID flag
Most tools which use bpf_get_stack or bpf_get_stackid symbolicate the
stack - meaning the stack of addresses in the target process' address
space is transformed into meaningful symbol names. The
BPF_F_USER_BUILD_ID flag eases this process by finding the build_id of
the file-backed vma which the address falls in and translating the
address to an offset within the backing file.

To be more specific, the offset is a "file offset" from the beginning of
the backing file. The symbols in ET_DYN ELF objects have a st_value
which is also described as an "offset" - but an offset in the process
address space, relative to the base address of the object.

It's necessary to translate between the "file offset" and "virtual
address offset" during symbolication before they can be directly
compared. Failure to do so can lead to confusing bugs, so this patch
clarifies language in the documentation in an attempt to keep this from
happening.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220808164723.3107500-1-davemarchevsky@fb.com
2022-08-10 14:07:19 -07:00
Hengqi Chen
0cc6bfab39 libbpf: Do not require executable permission for shared libraries
Currently, resolve_full_path() requires executable permission for both
programs and shared libraries. This causes failures on distos like Debian
since the shared libraries are not installed executable and Linux is not
requiring shared libraries to have executable permissions. Let's remove
executable permission check for shared libraries.

Reported-by: Goro Fuji <goro@fastly.com>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220806102021.3867130-1-hengqi.chen@gmail.com
2022-08-10 14:07:19 -07:00
Andrii Nakryiko
41c612167e libbpf: Reject legacy 'maps' ELF section
Add explicit error message if BPF object file is still using legacy BPF
map definitions in SEC("maps"). Before this change, if BPF object file
is still using legacy map definition user will see a bit confusing:

  libbpf: elf: skipping unrecognized data section(4) maps
  libbpf: prog 'handler': bad map relo against 'server_map' in section 'maps'

Now libbpf will be explicit about rejecting "maps" ELF section:

  libbpf: elf: legacy map definitions in 'maps' section are not supported by libbpf v1.0+

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220803214202.23750-1-andrii@kernel.org
2022-08-10 14:07:19 -07:00
James Hilliard
69d537ba0b libbpf: Ensure functions with always_inline attribute are inline
GCC expects the always_inline attribute to only be set on inline
functions, as such we should make all functions with this attribute
use the __always_inline macro which makes the function inline and
sets the attribute.

Fixes errors like:
/home/buildroot/bpf-next/tools/testing/selftests/bpf/tools/include/bpf/bpf_tracing.h:439:1: error: ‘always_inline’ function might not be inlinable [-Werror=attributes]
  439 | ____##name(unsigned long long *ctx, ##args)
      | ^~~~

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220803151403.793024-1-james.hilliard1@gmail.com
2022-08-10 14:07:19 -07:00
Florian Fainelli
bd1e5cff31 libbpf: Initialize err in probe_map_create
GCC-11 warns about the possibly unitialized err variable in
probe_map_create:

libbpf_probes.c: In function 'probe_map_create':
libbpf_probes.c:361:38: error: 'err' may be used uninitialized in this function [-Werror=maybe-uninitialized]
  361 |                 return fd < 0 && err == exp_err ? 1 : 0;
      |                                  ~~~~^~~~~~~~~~

Fixes: 878d8def0603 ("libbpf: Rework feature-probing APIs")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220801025109.1206633-1-f.fainelli@gmail.com
2022-08-10 14:07:19 -07:00
James Hilliard
3d484ca473 libbpf: Skip empty sections in bpf_object__init_global_data_maps
The GNU assembler generates an empty .bss section. This is a well
established behavior in GAS that happens in all supported targets.

The LLVM assembler doesn't generate an empty .bss section.

bpftool chokes on the empty .bss section.

Additionally in bpf_object__elf_collect the sec_desc->data is not
initialized when a section is not recognized. In this case, this
happens with .comment.

So we must check that sec_desc->data is initialized before checking
if the size is 0.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220731232649.4668-1-james.hilliard1@gmail.com
2022-08-10 14:07:19 -07:00
Gustavo A. R. Silva
c25544735b treewide: uapi: Replace zero-length arrays with flexible-array members
There is a regular need in the kernel to provide a way to declare
having a dynamically sized set of trailing elements in a structure.
Kernel code should always use “flexible array members”[1] for these
cases. The older style of one-element or zero-length arrays should
no longer be used[2].

This code was transformed with the help of Coccinelle:
(linux-5.19-rc2$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch)

@@
identifier S, member, array;
type T1, T2;
@@

struct S {
  ...
  T1 member;
  T2 array[
- 0
  ];
};

-fstrict-flex-arrays=3 is coming and we need to land these changes
to prevent issues like these in the short future:

../fs/minix/dir.c:337:3: warning: 'strcpy' will always overflow; destination buffer has size 0,
but the source string has length 2 (including NUL byte) [-Wfortify-source]
		strcpy(de3->name, ".");
		^

Since these are all [0] to [] changes, the risk to UAPI is nearly zero. If
this breaks anything, we can use a union with a new member name.

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/78
Build-tested-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/lkml/62b675ec.wKX6AOZ6cbE71vtF%25lkp@intel.com/
Acked-by: Dan Williams <dan.j.williams@intel.com> # For ndctl.h
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2022-08-10 14:07:19 -07:00
Namhyung Kim
179c7940eb perf/core: Add a new read format to get a number of lost samples
Sometimes we want to know an accurate number of samples even if it's
lost.  Currenlty PERF_RECORD_LOST is generated for a ring-buffer which
might be shared with other events.  So it's hard to know per-event
lost count.

Add event->lost_samples field and PERF_FORMAT_LOST to retrieve it from
userspace.

Original-patch-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220616180623.1358843-1-namhyung@kernel.org
2022-08-10 14:07:19 -07:00
Daniel Müller
f6692dc4e8 Remove checked-in configuration
Both the bpf and bpf-next tree have suitable BPF selftest configurations
available for usage with the latest kernel now upstream. While we do
test on 4.9 and 5.5 kernels as well, there we just download prebuilt
binaries. The configuration we use for building selftests is always the
upstream one.
With this change we remove the checked-in configuration, as it is now no
longer needed.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-10 10:24:28 -07:00
Daniel Müller
693de729d0 Rename blacklists and whitelists
Upstream uses denylist and allowlist terminology instead of blacklist
and whitelist. It also has established a less deeply nested directory
structure.
This change renames the blacklist & whitelist files accordingly and
moves them one level up out of their containing directory to mirror the
layout we have upstream as well as in kernel-patches/vmtest.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-10 08:31:17 -07:00
Daniel Müller
0667206913 Use checkout action in version v3
The current version of actions/checkout is v3. That means that v2, which
we currently use, has been superseded. Update the version we use
accordingly.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-09 14:02:50 -07:00
Daniel Müller
a2ebd9ceff Rely on upstream kernel configuration
So far we have relied on the kernel configuration as checked into the
this repository. However, a suitable configuration is now included in
upstream Linux [0].
With this change we add support for using the configuration from there.

[0] https://lore.kernel.org/bpf/165893461358.29339.11641967418379627671.git-patchwork-notify@kernel.org/T/#m2a97b0ea9ef0ddee7a53bbf7919e3f324b233937

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-09 09:23:59 -07:00
Daniel Müller
0e43565ad8 ci: Bump LLVM version we use to 16
Development on LLVM 16 has started and version 15 is no longer available
in the repository we install it from. Bump the version we use
accordingly.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-08-01 13:10:42 -07:00
Andrii Nakryiko
5b795f7b30 ci: blacklist skeleton selftest
Selftest relies on new 5.19+ kernel support for big ARRAY maps.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-07-31 16:45:48 -07:00
Andrii Nakryiko
3fa2c28d2c sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   b0d93b44641a83c28014ca38001e85bf6dc8501e
Checkpoint bpf-next commit: 71930846b36f8e4e68267f8a3d47e33435c3657a
Baseline bpf commit:        d28b25a62a47a8c8aa19bd543863aab6717e68c9
Checkpoint bpf commit:      f946964a9f79f8dcb5a6329265281eebfc23aee5

Andrii Nakryiko (7):
  libbpf: add bpf_core_type_matches() helper macro
  libbpf: Remove unnecessary usdt_rel_ip assignments
  libbpf: generalize virtual __kconfig externs and use it for USDT
  libbpf: improve BPF_KPROBE_SYSCALL macro and rename it to BPF_KSYSCALL
  libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
  libbpf: fallback to tracefs mount point if debugfs is not mounted
  libbpf: make RINGBUF map size adjustments more eagerly

Anquan Wu (1):
  libbpf: Fix the name of a reused map

Chuang Wang (3):
  libbpf: Cleanup the legacy kprobe_event on failed add/attach_event()
  libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy()
  libbpf: Cleanup the legacy uprobe_event on failed add/attach_event()

Dan Carpenter (3):
  libbpf: fix an snprintf() overflow check
  libbpf: Fix sign expansion bug in btf_dump_get_enum_value()
  libbpf: Fix str_has_sfx()'s return value

Daniel Müller (4):
  bpf: Introduce TYPE_MATCH related constants/macros
  bpf, libbpf: Add type match support
  bpf: Correctly propagate errors up from bpf_core_composites_match
  libbpf: Support PPC in arch_specific_syscall_pfx

Hangbin Liu (1):
  Bonding: add per-port priority for failover re-selection

Hengqi Chen (1):
  libbpf: Error out when binary_path is NULL for uprobe and USDT

Ilya Leoshkevich (1):
  libbpf: Extend BPF_KSYSCALL documentation

James Hilliard (1):
  libbpf: Disable SEC pragma macro on GCC

Joanne Koong (2):
  bpf: Add flags arg to bpf_dynptr_read and bpf_dynptr_write APIs
  bpf: fix bpf_skb_pull_data documentation

Joe Burton (1):
  libbpf: Add bpf_obj_get_opts()

Jon Doron (1):
  libbpf: perfbuf: Add API to get the ring buffer

Pu Lehui (1):
  bpf, docs: Remove deprecated xsk libbpf APIs description

Yixun Lan (1):
  libbpf, riscv: Use a0 for RC register

 docs/libbpf_naming_convention.rst |  13 +-
 include/uapi/linux/bpf.h          |  15 +-
 include/uapi/linux/if_link.h      |   1 +
 src/bpf.c                         |   9 +
 src/bpf.h                         |  11 +
 src/bpf_core_read.h               |  11 +
 src/bpf_helpers.h                 |  13 +
 src/bpf_tracing.h                 |  60 +++-
 src/btf_dump.c                    |   2 +-
 src/gen_loader.c                  |   2 +-
 src/libbpf.c                      | 440 ++++++++++++++++++++++--------
 src/libbpf.h                      |  62 +++++
 src/libbpf.map                    |   3 +
 src/libbpf_internal.h             |   8 +-
 src/relo_core.c                   | 286 ++++++++++++++++++-
 src/relo_core.h                   |   4 +
 src/usdt.bpf.h                    |  16 +-
 src/usdt.c                        |   6 +-
 18 files changed, 793 insertions(+), 169 deletions(-)

--
2.30.2
2022-07-31 16:45:48 -07:00
Andrii Nakryiko
0fa013e705 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-07-31 16:45:48 -07:00
Joe Burton
d8e2c9d965 libbpf: Add bpf_obj_get_opts()
Add an extensible variant of bpf_obj_get() capable of setting the
`file_flags` parameter.

This parameter is needed to enable unprivileged access to BPF maps.
Without a method like this, users must manually make the syscall.

Signed-off-by: Joe Burton <jevburton@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220729202727.3311806-1-jevburton.kernel@gmail.com
2022-07-31 16:45:48 -07:00
Daniel Müller
b2d7228d7c libbpf: Support PPC in arch_specific_syscall_pfx
Commit 708ac5bea0ce ("libbpf: add ksyscall/kretsyscall sections support
for syscall kprobes") added the arch_specific_syscall_pfx() function,
which returns a string representing the architecture in use. As it turns
out this function is currently not aware of Power PC, where NULL is
returned. That's being flagged by the libbpf CI system, which builds for
ppc64le and the compiler sees a NULL pointer being passed in to a %s
format string.
With this change we add representations for two more architectures, for
Power PC and Power PC 64, and also adjust the string format logic to
handle NULL pointers gracefully, in an attempt to prevent similar issues
with other architectures in the future.

Fixes: 708ac5bea0ce ("libbpf: add ksyscall/kretsyscall sections support for syscall kprobes")
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220728222345.3125975-1-deso@posteo.net
2022-07-31 16:45:48 -07:00
Ilya Leoshkevich
427f2a0c83 libbpf: Extend BPF_KSYSCALL documentation
Explicitly list known quirks. Mention that socket-related syscalls can be
invoked via socketcall().

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220726134008.256968-2-iii@linux.ibm.com
2022-07-31 16:45:48 -07:00
Dan Carpenter
8663289b51 libbpf: Fix str_has_sfx()'s return value
The return from strcmp() is inverted so it wrongly returns true instead
of false and vice versa.

Fixes: a1c9d61b19cb ("libbpf: Improve library identification for uprobe binary path resolution")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/YtZ+/dAA195d99ak@kili
2022-07-31 16:45:48 -07:00
Dan Carpenter
77e514d626 libbpf: Fix sign expansion bug in btf_dump_get_enum_value()
The code here is supposed to take a signed int and store it in a signed
long long. Unfortunately, the way that the type promotion works with
this conditional statement is that it takes a signed int, type promotes
it to a __u32, and then stores that as a signed long long. The result is
never negative.

This is from static analysis, but I made a little test program just to
test it before I sent the patch:

  #include <stdio.h>

  int main(void)
  {
        unsigned long long src = -1ULL;
        signed long long dst1, dst2;
        int is_signed = 1;

        dst1 = is_signed ? *(int *)&src : *(unsigned int *)0;
        dst2 = is_signed ? (signed long long)*(int *)&src : *(unsigned int *)0;

        printf("%lld\n", dst1);
        printf("%lld\n", dst2);

        return 0;
  }

Fixes: d90ec262b35b ("libbpf: Add enum64 support for btf_dump")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/YtZ+LpgPADm7BeEd@kili
2022-07-31 16:45:48 -07:00
Dan Carpenter
b44b214118 libbpf: fix an snprintf() overflow check
The snprintf() function returns the number of bytes it *would* have
copied if there were enough space.  So it can return > the
sizeof(gen->attach_target).

Fixes: 67234743736a ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/YtZ+oAySqIhFl6/J@kili
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-31 16:45:48 -07:00
Andrii Nakryiko
610707057a libbpf: make RINGBUF map size adjustments more eagerly
Make libbpf adjust RINGBUF map size (rounding it up to closest power-of-2
of page_size) more eagerly: during open phase when initializing the map
and on explicit calls to bpf_map__set_max_entries().

Such approach allows user to check actual size of BPF ringbuf even
before it's created in the kernel, but also it prevents various edge
case scenarios where BPF ringbuf size can get out of sync with what it
would be in kernel. One of them (reported in [0]) is during an attempt
to pin/reuse BPF ringbuf.

Move adjust_ringbuf_sz() helper closer to its first actual use. The
implementation of the helper is unchanged.

Also make detection of whether bpf_object is already loaded more robust
by checking obj->loaded explicitly, given that map->fd can be < 0 even
if bpf_object is already loaded due to ability to disable map creation
with bpf_map__set_autocreate(map, false).

  [0] Closes: https://github.com/libbpf/libbpf/pull/530

Fixes: 0087a681fa8c ("libbpf: Automatically fix up BPF_MAP_TYPE_RINGBUF size, if necessary")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220715230952.2219271-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-31 16:45:48 -07:00
Joanne Koong
7e567b8761 bpf: fix bpf_skb_pull_data documentation
Fix documentation for bpf_skb_pull_data() helper for
when len == 0.

Fixes: fa15601ab31e ("bpf: add documentation for eBPF helpers (33-41)")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220715193800.3940070-1-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-31 16:45:48 -07:00
Andrii Nakryiko
1fe0248c61 libbpf: fallback to tracefs mount point if debugfs is not mounted
Teach libbpf to fallback to tracefs mount point (/sys/kernel/tracing) if
debugfs (/sys/kernel/debug/tracing) isn't mounted.

Acked-by: Yonghong Song <yhs@fb.com>
Suggested-by: Connor O'Brien <connoro@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220715185736.898848-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-31 16:45:48 -07:00
Andrii Nakryiko
0862e4e54d libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding
kretsyscall variants (for return kprobes) to allow users to kprobe
syscall functions in kernel. These special sections allow to ignore
complexities and differences between kernel versions and host
architectures when it comes to syscall wrapper and corresponding
__<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on
whether host kernel has CONFIG_ARCH_HAS_SYSCALL_WRAPPER (though libbpf
itself doesn't rely on /proc/config.gz for detecting this, see
BPF_KSYSCALL patch for how it's done internally).

Combined with the use of BPF_KSYSCALL() macro, this allows to just
specify intended syscall name and expected input arguments and leave
dealing with all the variations to libbpf.

In addition to SEC("ksyscall+") and SEC("kretsyscall+") add
bpf_program__attach_ksyscall() API which allows to specify syscall name
at runtime and provide associated BPF cookie value.

At the moment SEC("ksyscall") and bpf_program__attach_ksyscall() do not
handle all the calling convention quirks for mmap(), clone() and compat
syscalls. It also only attaches to "native" syscall interfaces. If host
system supports compat syscalls or defines 32-bit syscalls in 64-bit
kernel, such syscall interfaces won't be attached to by libbpf.

These limitations may or may not change in the future. Therefore it is
recommended to use SEC("kprobe") for these syscalls or if working with
compat and 32-bit interfaces is required.

Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220714070755.3235561-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-31 16:45:48 -07:00
Andrii Nakryiko
fd6c9d906a libbpf: improve BPF_KPROBE_SYSCALL macro and rename it to BPF_KSYSCALL
Improve BPF_KPROBE_SYSCALL (and rename it to shorter BPF_KSYSCALL to
match libbpf's SEC("ksyscall") section name, added in next patch) to use
__kconfig variable to determine how to properly fetch syscall arguments.

Instead of relying on hard-coded knowledge of whether kernel's
architecture uses syscall wrapper or not (which only reflects the latest
kernel versions, but is not necessarily true for older kernels and won't
necessarily hold for later kernel versions on some particular host
architecture), determine this at runtime by attempting to create
perf_event (with fallback to kprobe event creation through tracefs on
legacy kernels, just like kprobe attachment code is doing) for kernel
function that would correspond to bpf() syscall on a system that has
CONFIG_ARCH_HAS_SYSCALL_WRAPPER set (e.g., for x86-64 it would try
'__x64_sys_bpf').

If host kernel uses syscall wrapper, syscall kernel function's first
argument is a pointer to struct pt_regs that then contains syscall
arguments. In such case we need to use bpf_probe_read_kernel() to fetch
actual arguments (which we do through BPF_CORE_READ() macro) from inner
pt_regs.

But if the kernel doesn't use syscall wrapper approach, input
arguments can be read from struct pt_regs directly with no probe reading.

All this feature detection is done without requiring /proc/config.gz
existence and parsing, and BPF-side helper code uses newly added
LINUX_HAS_SYSCALL_WRAPPER virtual __kconfig extern to keep in sync with
user-side feature detection of libbpf.

BPF_KSYSCALL() macro can be used both with SEC("kprobe") programs that
define syscall function explicitly (e.g., SEC("kprobe/__x64_sys_bpf"))
and SEC("ksyscall") program added in the next patch (which are the same
kprobe program with added benefit of libbpf determining correct kernel
function name automatically).

Kretprobe and kretsyscall (added in next patch) programs don't need
BPF_KSYSCALL as they don't provide access to input arguments. Normal
BPF_KRETPROBE is completely sufficient and is recommended.

Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220714070755.3235561-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-31 16:45:48 -07:00
Andrii Nakryiko
d56d93baff libbpf: generalize virtual __kconfig externs and use it for USDT
Libbpf supports single virtual __kconfig extern currently: LINUX_KERNEL_VERSION.
LINUX_KERNEL_VERSION isn't coming from /proc/kconfig.gz and is intead
customly filled out by libbpf.

This patch generalizes this approach to support more such virtual
__kconfig externs. One such extern added in this patch is
LINUX_HAS_BPF_COOKIE which is used for BPF-side USDT supporting code in
usdt.bpf.h instead of using CO-RE-based enum detection approach for
detecting bpf_get_attach_cookie() BPF helper. This allows to remove
otherwise not needed CO-RE dependency and keeps user-space and BPF-side
parts of libbpf's USDT support strictly in sync in terms of their
feature detection.

We'll use similar approach for syscall wrapper detection for
BPF_KSYSCALL() BPF-side macro in follow up patch.

Generally, currently libbpf reserves CONFIG_ prefix for Kconfig values
and LINUX_ for virtual libbpf-backed externs. In the future we might
extend the set of prefixes that are supported. This can be done without
any breaking changes, as currently any __kconfig extern with
unrecognized name is rejected.

For LINUX_xxx externs we support the normal "weak rule": if libbpf
doesn't recognize given LINUX_xxx extern but such extern is marked as
__weak, it is not rejected and defaults to zero.  This follows
CONFIG_xxx handling logic and will allow BPF applications to
opportunistically use newer libbpf virtual externs without breaking on
older libbpf versions unnecessarily.

Tested-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220714070755.3235561-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-31 16:45:48 -07:00
Jon Doron
1648fa16b5 libbpf: perfbuf: Add API to get the ring buffer
Add support for writing a custom event reader, by exposing the ring
buffer.

With the new API perf_buffer__buffer() you will get access to the
raw mmaped()'ed per-cpu underlying memory of the ring buffer.

This region contains both the perf buffer data and header
(struct perf_event_mmap_page), which manages the ring buffer
state (head/tail positions, when accessing the head/tail position
it's important to take into consideration SMP).
With this type of low level access one can implement different types of
consumers here are few simple examples where this API helps with:

1. perf_event_read_simple is allocating using malloc, perhaps you want
   to handle the wrap-around in some other way.
2. Since perf buf is per-cpu then the order of the events is not
   guarnteed, for example:
   Given 3 events where each event has a timestamp t0 < t1 < t2,
   and the events are spread on more than 1 CPU, then we can end
   up with the following state in the ring buf:
   CPU[0] => [t0, t2]
   CPU[1] => [t1]
   When you consume the events from CPU[0], you could know there is
   a t1 missing, (assuming there are no drops, and your event data
   contains a sequential index).
   So now one can simply do the following, for CPU[0], you can store
   the address of t0 and t2 in an array (without moving the tail, so
   there data is not perished) then move on the CPU[1] and set the
   address of t1 in the same array.
   So you end up with something like:
   void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
   and move the tails as you process in order.
3. Assuming there are multiple CPUs and we want to start draining the
   messages from them, then we can "pick" with which one to start with
   according to the remaining free space in the ring buffer.

Signed-off-by: Jon Doron <jond@wiz.io>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220715181122.149224-1-arilou@gmail.com
2022-07-31 16:45:48 -07:00
Anquan Wu
9b6f4eb157 libbpf: Fix the name of a reused map
BPF map name is limited to BPF_OBJ_NAME_LEN.
A map name is defined as being longer than BPF_OBJ_NAME_LEN,
it will be truncated to BPF_OBJ_NAME_LEN when a userspace program
calls libbpf to create the map. A pinned map also generates a path
in the /sys. If the previous program wanted to reuse the map,
it can not get bpf_map by name, because the name of the map is only
partially the same as the name which get from pinned path.

The syscall information below show that map name "process_pinned_map"
is truncated to "process_pinned_".

    bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/process_pinned_map",
    bpf_fd=0, file_flags=0}, 144) = -1 ENOENT (No such file or directory)

    bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4,
    value_size=4,max_entries=1024, map_flags=0, inner_map_fd=0,
    map_name="process_pinned_",map_ifindex=0, btf_fd=3, btf_key_type_id=6,
    btf_value_type_id=10,btf_vmlinux_value_type_id=0}, 72) = 4

This patch check that if the name of pinned map are the same as the
actual name for the first (BPF_OBJ_NAME_LEN - 1),
bpf map still uses the name which is included in bpf object.

Fixes: 26736eb9a483 ("tools: libbpf: allow map reuse")
Signed-off-by: Anquan Wu <leiqi96@hotmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/OSZP286MB1725CEA1C95C5CB8E7CCC53FB8869@OSZP286MB1725.JPNP286.PROD.OUTLOOK.COM
2022-07-31 16:45:48 -07:00
Hengqi Chen
b3fe4be0b3 libbpf: Error out when binary_path is NULL for uprobe and USDT
binary_path is a required non-null parameter for bpf_program__attach_usdt
and bpf_program__attach_uprobe_opts. Check it against NULL to prevent
coredump on strchr.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220712025745.2703995-1-hengqi.chen@gmail.com
2022-07-31 16:45:48 -07:00
Joanne Koong
6d5026e434 bpf: Add flags arg to bpf_dynptr_read and bpf_dynptr_write APIs
Commit 13bbbfbea759 ("bpf: Add bpf_dynptr_read and bpf_dynptr_write")
added the bpf_dynptr_write() and bpf_dynptr_read() APIs.

However, it will be needed for some dynptr types to pass in flags as
well (e.g. when writing to a skb, the user may like to invalidate the
hash or recompute the checksum).

This patch adds a "u64 flags" arg to the bpf_dynptr_read() and
bpf_dynptr_write() APIs before their UAPI signature freezes where
we then cannot change them anymore with a 5.19.x released kernel.

Fixes: 13bbbfbea759 ("bpf: Add bpf_dynptr_read and bpf_dynptr_write")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20220706232547.4016651-1-joannelkoong@gmail.com
2022-07-31 16:45:48 -07:00
Daniel Müller
ca60209447 bpf: Correctly propagate errors up from bpf_core_composites_match
This change addresses a comment made earlier [0] about a missing return
of an error when __bpf_core_types_match is invoked from
bpf_core_composites_match, which could have let to us erroneously
ignoring errors.

Regarding the typedef name check pointed out in the same context, it is
not actually an issue, because callers of the function perform a name
check for the root type anyway. To make that more obvious, let's add
comments to the function (similar to what we have for
bpf_core_types_are_compat, which is called in pretty much the same
context).

[0]: https://lore.kernel.org/bpf/165708121449.4919.13204634393477172905.git-patchwork-notify@kernel.org/T/#m55141e8f8cfd2e8d97e65328fa04852870d01af6

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220707211931.3415440-1-deso@posteo.net
2022-07-31 16:45:48 -07:00
James Hilliard
b31ca3fa0e libbpf: Disable SEC pragma macro on GCC
It seems the gcc preprocessor breaks with pragmas when surrounding
__attribute__.

Disable these pragmas on GCC due to upstream bugs see:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55578
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90400

Fixes errors like:
error: expected identifier or '(' before '#pragma'
  106 | SEC("cgroup/bind6")
      | ^~~

error: expected '=', ',', ';', 'asm' or '__attribute__' before '#pragma'
  114 | char _license[] SEC("license") = "GPL";
      | ^~~

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220706111839.1247911-1-james.hilliard1@gmail.com
2022-07-31 16:45:48 -07:00
Pu Lehui
295a4aae35 bpf, docs: Remove deprecated xsk libbpf APIs description
Since xsk APIs has been removed from libbpf, let's clean up the
BPF docs simutaneously.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20220708042736.669132-1-pulehui@huawei.com
2022-07-31 16:45:48 -07:00
Yixun Lan
8498996f9f libbpf, riscv: Use a0 for RC register
According to the RISC-V calling convention register usage here [0], a0
is used as return value register, so rename it to make it consistent
with the spec.

  [0] section 18.2, table 18.2
      https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf

Fixes: 589fed479ba1 ("riscv, libbpf: Add RISC-V (RV64) support to bpf_tracing.h")
Signed-off-by: Yixun Lan <dlan@gentoo.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Amjad OULED-AMEUR <ouledameur.amjad@gmail.com>
Link: https://lore.kernel.org/bpf/20220706140204.47926-1-dlan@gentoo.org
2022-07-31 16:45:48 -07:00
Andrii Nakryiko
aa13a6ff58 libbpf: Remove unnecessary usdt_rel_ip assignments
Coverity detected that usdt_rel_ip is unconditionally overwritten
anyways, so there is no need to unnecessarily initialize it with unused
value. Clean this up.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220705224818.4026623-4-andrii@kernel.org
2022-07-31 16:45:48 -07:00
Chuang Wang
bace4782cd libbpf: Cleanup the legacy uprobe_event on failed add/attach_event()
A potential scenario, when an error is returned after
add_uprobe_event_legacy() in perf_event_uprobe_open_legacy(), or
bpf_program__attach_perf_event_opts() in
bpf_program__attach_uprobe_opts() returns an error, the uprobe_event
that was previously created is not cleaned.

So, with this patch, when an error is returned, fix this by adding
remove_uprobe_event_legacy()

Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220629151848.65587-4-nashuiliang@gmail.com
2022-07-31 16:45:48 -07:00
Chuang Wang
ab2221de84 libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy()
Use "type" as opposed to "err" in pr_warn() after
determine_uprobe_perf_type_legacy() returns an error.

Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220629151848.65587-3-nashuiliang@gmail.com
2022-07-31 16:45:48 -07:00
Chuang Wang
d8a50bfe35 libbpf: Cleanup the legacy kprobe_event on failed add/attach_event()
Before the 0bc11ed5ab60 commit ("kprobes: Allow kprobes coexist with
livepatch"), in a scenario where livepatch and kprobe coexist on the
same function entry, the creation of kprobe_event using
add_kprobe_event_legacy() will be successful, at the same time as a
trace event (e.g. /debugfs/tracing/events/kprobe/XXX) will exist, but
perf_event_open() will return an error because both livepatch and kprobe
use FTRACE_OPS_FL_IPMODIFY. As follows:

1) add a livepatch

$ insmod livepatch-XXX.ko

2) add a kprobe using tracefs API (i.e. add_kprobe_event_legacy)

$ echo 'p:mykprobe XXX' > /sys/kernel/debug/tracing/kprobe_events

3) enable this kprobe (i.e. sys_perf_event_open)

This will return an error, -EBUSY.

On Andrii Nakryiko's comment, few error paths in
bpf_program__attach_kprobe_opts() that should need to call
remove_kprobe_event_legacy().

With this patch, whenever an error is returned after
add_kprobe_event_legacy() or bpf_program__attach_perf_event_opts(), this
ensures that the created kprobe_event is cleaned.

Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Signed-off-by: Jingren Zhou <zhoujingren@didiglobal.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220629151848.65587-2-nashuiliang@gmail.com
2022-07-31 16:45:48 -07:00
Andrii Nakryiko
95971ddd48 libbpf: add bpf_core_type_matches() helper macro
This patch finalizes support for the proposed type match relation in libbpf by
adding bpf_core_type_matches() macro which emits TYPE_MATCH relocation.

Clang support for this relocation was added in [0].

  [0] https://reviews.llvm.org/D126838

Signed-off-by: Daniel Müller <deso@posteo.net>¬
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>¬
Link: https://lore.kernel.org/bpf/20220628160127.607834-7-deso@posteo.net¬
2022-07-31 16:45:48 -07:00
Daniel Müller
7410ddc0f4 bpf, libbpf: Add type match support
This patch adds support for the proposed type match relation to
relo_core where it is shared between userspace and kernel. It plumbs
through both kernel-side and libbpf-side support.

The matching relation is defined as follows (copy from source):
- modifiers and typedefs are stripped (and, hence, effectively ignored)
- generally speaking types need to be of same kind (struct vs. struct, union
  vs. union, etc.)
  - exceptions are struct/union behind a pointer which could also match a
    forward declaration of a struct or union, respectively, and enum vs.
    enum64 (see below)
Then, depending on type:
- integers:
  - match if size and signedness match
- arrays & pointers:
  - target types are recursively matched
- structs & unions:
  - local members need to exist in target with the same name
  - for each member we recursively check match unless it is already behind a
    pointer, in which case we only check matching names and compatible kind
- enums:
  - local variants have to have a match in target by symbolic name (but not
    numeric value)
  - size has to match (but enum may match enum64 and vice versa)
- function pointers:
  - number and position of arguments in local type has to match target
  - for each argument and the return value we recursively check match

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220628160127.607834-5-deso@posteo.net
2022-07-31 16:45:48 -07:00
Daniel Müller
1b80b97a30 bpf: Introduce TYPE_MATCH related constants/macros
In order to provide type match support we require a new type of
relocation which, in turn, requires toolchain support. Recent LLVM/Clang
versions support a new value for the last argument to the
__builtin_preserve_type_info builtin, for example.
With this change we introduce the necessary constants into relevant
header files, mirroring what the compiler may support.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220628160127.607834-2-deso@posteo.net
2022-07-31 16:45:48 -07:00
Hangbin Liu
434b56c497 Bonding: add per-port priority for failover re-selection
Add per port priority support for bonding active slave re-selection during
failover. A higher number means higher priority in selection. The primary
slave still has the highest priority. This option also follows the
primary_reselect rules.

This option could only be configured via netlink.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-31 16:45:48 -07:00
Daniel Müller
d060a88aa5 Remove Travis specific folding logic
The foldable function from the CI helper infrastructure conceptually
support emitting both GitHub and Travis fold markers. However, given
that we no longer run anything on Travis, let's remove its special case,
as it's effectively dead code.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-07-25 11:45:46 -07:00
Daniel Müller
9340d9b650 Rename travis_fold function to foldable
We are no longer using Travis. As such, it is confusing to anyone
reading the code to see a function prefixed 'travis_' in GitHub actions
code.
This change renames the travis_fold function to 'foldable', as a first
step towards eliminating such confusing constructs from the repository
where possible.

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-07-25 11:45:46 -07:00
thiagoftsm
70599f3a1e Merge branch 'libbpf:master' into master 2022-07-15 16:43:22 +00:00
Andrii Nakryiko
b78c75fcb3 Makefile: remove xsk.c and xsk.h
xsk.{c,h} are not part of libbpf anymore, remove them from Makefile.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-07-03 20:23:34 -07:00
Andrii Nakryiko
f42d136c1c sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   bb7a4257892717caf82fe6da45b259b35f73445c
Checkpoint bpf-next commit: b0d93b44641a83c28014ca38001e85bf6dc8501e
Baseline bpf commit:        a2b1a5d40bd12b44322c2ccd40bb0ec1699708b6
Checkpoint bpf commit:      d28b25a62a47a8c8aa19bd543863aab6717e68c9

Andrii Nakryiko (14):
  libbpf: move xsk.{c,h} into selftests/bpf
  libbpf: remove deprecated low-level APIs
  libbpf: remove deprecated XDP APIs
  libbpf: remove deprecated probing APIs
  libbpf: remove deprecated BTF APIs
  libbpf: clean up perfbuf APIs
  libbpf: remove prog_info_linear APIs
  libbpf: remove most other deprecated high-level APIs
  libbpf: remove multi-instance and custom private data APIs
  libbpf: cleanup LIBBPF_DEPRECATED_SINCE supporting macros for v0.x
  libbpf: remove internal multi-instance prog support
  libbpf: clean up SEC() handling
  libbpf: enforce strict libbpf 1.0 behaviors
  libbpf: fix up few libbpf.map problems

Daniel Müller (1):
  bpf: Merge "types_are_compat" logic into relo_core.c

Stanislav Fomichev (4):
  bpf: per-cgroup lsm flavor
  tools/bpf: Sync btf_ids.h to tools
  libbpf: add lsm_cgoup_sock type
  libbpf: implement bpf_prog_query_opts

 include/uapi/linux/bpf.h |    4 +
 src/bpf.c                |  200 +----
 src/bpf.h                |   98 +--
 src/btf.c                |  183 +----
 src/btf.h                |   86 +--
 src/btf_dump.c           |   23 +-
 src/libbpf.c             | 1500 ++++----------------------------------
 src/libbpf.h             |  469 +-----------
 src/libbpf.map           |  114 +--
 src/libbpf_common.h      |   16 +-
 src/libbpf_internal.h    |   24 +-
 src/libbpf_legacy.h      |   28 +-
 src/libbpf_probes.c      |  125 +---
 src/netlink.c            |   62 +-
 src/relo_core.c          |   80 ++
 src/relo_core.h          |    2 +
 src/xsk.c                | 1260 --------------------------------
 src/xsk.h                |  336 ---------
 18 files changed, 339 insertions(+), 4271 deletions(-)
 delete mode 100644 src/xsk.c
 delete mode 100644 src/xsk.h

--
2.30.2
2022-07-03 20:23:34 -07:00
Stanislav Fomichev
812a95fdf7 libbpf: implement bpf_prog_query_opts
Implement bpf_prog_query_opts as a more expendable version of
bpf_prog_query. Expose new prog_attach_flags and attach_btf_func_id as
well:

* prog_attach_flags is a per-program attach_type; relevant only for
  lsm cgroup program which might have different attach_flags
  per attach_btf_id
* attach_btf_func_id is a new field expose for prog_query which
  specifies real btf function id for lsm cgroup attachments

Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220628174314.1216643-10-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Stanislav Fomichev
f9f7f2d30a libbpf: add lsm_cgoup_sock type
lsm_cgroup/ is the prefix for BPF_LSM_CGROUP.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220628174314.1216643-9-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Stanislav Fomichev
25ba007681 tools/bpf: Sync btf_ids.h to tools
Has been slowly getting out of sync, let's update it.

resolve_btfids usage has been updated to match the header changes.

Also bring new parts of tools/include/uapi/linux/bpf.h.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220628174314.1216643-8-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Stanislav Fomichev
9bdb296ec6 bpf: per-cgroup lsm flavor
Allow attaching to lsm hooks in the cgroup context.

Attaching to per-cgroup LSM works exactly like attaching
to other per-cgroup hooks. New BPF_LSM_CGROUP is added
to trigger new mode; the actual lsm hook we attach to is
signaled via existing attach_btf_id.

For the hooks that have 'struct socket' or 'struct sock' as its first
argument, we use the cgroup associated with that socket. For the rest,
we use 'current' cgroup (this is all on default hierarchy == v2 only).
Note that for some hooks that work on 'struct sock' we still
take the cgroup from 'current' because some of them work on the socket
that hasn't been properly initialized yet.

Behind the scenes, we allocate a shim program that is attached
to the trampoline and runs cgroup effective BPF programs array.
This shim has some rudimentary ref counting and can be shared
between several programs attaching to the same lsm hook from
different cgroups.

Note that this patch bloats cgroup size because we add 211
cgroup_bpf_attach_type(s) for simplicity sake. This will be
addressed in the subsequent patch.

Also note that we only add non-sleepable flavor for now. To enable
sleepable use-cases, bpf_prog_run_array_cg has to grab trace rcu,
shim programs have to be freed via trace rcu, cgroup_bpf.effective
should be also trace-rcu-managed + maybe some other changes that
I'm not aware of.

Reviewed-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220628174314.1216643-4-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Andrii Nakryiko
f009af7889 libbpf: fix up few libbpf.map problems
Seems like we missed to add 2 APIs to libbpf.map and another API was
misspelled. Fix it in libbpf.map.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-16-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Andrii Nakryiko
62e8af46d2 libbpf: enforce strict libbpf 1.0 behaviors
Remove support for legacy features and behaviors that previously had to
be disabled by calling libbpf_set_strict_mode():
  - legacy BPF map definitions are not supported now;
  - RLIMIT_MEMLOCK auto-setting, if necessary, is always on (but see
    libbpf_set_memlock_rlim());
  - program name is used for program pinning (instead of section name);
  - cleaned up error returning logic;
  - entry BPF programs should have SEC() always.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-15-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Andrii Nakryiko
fcd1b668c6 libbpf: clean up SEC() handling
Get rid of sloppy prefix logic and remove deprecated xdp_{devmap,cpumap}
sections.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-13-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Andrii Nakryiko
0eb12dca7e libbpf: remove internal multi-instance prog support
Clean up internals that had to deal with the possibility of
multi-instance bpf_programs. Libbpf 1.0 doesn't support this, so all
this is not necessary now and can be simplified.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-12-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Andrii Nakryiko
fedeba74b7 libbpf: cleanup LIBBPF_DEPRECATED_SINCE supporting macros for v0.x
Keep the LIBBPF_DEPRECATED_SINCE macro "framework" for future
deprecations, but clean up 0.x related helper macros.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-11-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Andrii Nakryiko
bf51e3c336 libbpf: remove multi-instance and custom private data APIs
Remove all the public APIs that are related to creating multi-instance
bpf_programs through custom preprocessing callback and generally working
with them.

Also remove all the bpf_{object,map,program}__[set_]priv() APIs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-10-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Andrii Nakryiko
d8454ba8ad libbpf: remove most other deprecated high-level APIs
Remove a bunch of high-level bpf_object/bpf_map/bpf_program related
APIs. All the APIs related to private per-object/map/prog state,
program preprocessing callback, and generally everything multi-instance
related is removed in a separate patch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-9-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Andrii Nakryiko
ec3bbc05c0 libbpf: remove prog_info_linear APIs
Remove prog_info_linear-related APIs previously used by perf.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-8-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Andrii Nakryiko
d32e7ea952 libbpf: clean up perfbuf APIs
Remove deprecated perfbuf APIs and clean up opts structs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Andrii Nakryiko
6abeb4203d libbpf: remove deprecated BTF APIs
Get rid of deprecated BTF-related APIs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Andrii Nakryiko
e28a540c59 libbpf: remove deprecated probing APIs
Get rid of deprecated feature-probing APIs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Andrii Nakryiko
e8802d6319 libbpf: remove deprecated XDP APIs
Get rid of deprecated bpf_set_link*() and bpf_get_link*() APIs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Andrii Nakryiko
9476dce6fe libbpf: remove deprecated low-level APIs
Drop low-level APIs as well as high-level (and very confusingly named)
BPF object loading bpf_prog_load_xattr() and bpf_prog_load_deprecated()
APIs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Andrii Nakryiko
8ee1202ff4 libbpf: move xsk.{c,h} into selftests/bpf
Remove deprecated xsk APIs from libbpf. But given we have selftests
relying on this, move those files (with minimal adjustments to make them
compilable) under selftests/bpf.

We also remove all the removed APIs from libbpf.map, while overall
keeping version inheritance chain, as most APIs are backwards
compatible so there is no need to reassign them as LIBBPF_1.0.0 versions.

Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-03 20:23:34 -07:00
Daniel Müller
7013b92fef bpf: Merge "types_are_compat" logic into relo_core.c
BPF type compatibility checks (bpf_core_types_are_compat()) are
currently duplicated between kernel and user space. That's a historical
artifact more than intentional doing and can lead to subtle bugs where
one implementation is adjusted but another is forgotten.

That happened with the enum64 work, for example, where the libbpf side
was changed (commit 23b2a3a8f63a ("libbpf: Add enum64 relocation
support")) to use the btf_kind_core_compat() helper function but the
kernel side was not (commit 6089fb325cf7 ("bpf: Add btf enum64
support")).

This patch addresses both the duplication issue, by merging both
implementations and moving them into relo_core.c, and fixes the alluded
to kind check (by giving preference to libbpf's already adjusted logic).

For discussion of the topic, please refer to:
https://lore.kernel.org/bpf/CAADnVQKbWR7oarBdewgOBZUPzryhRYvEbkhyPJQHHuxq=0K1gw@mail.gmail.com/T/#mcc99f4a33ad9a322afaf1b9276fb1f0b7add9665

Changelog:
v1 -> v2:
- limited libbpf recursion limit to 32
- changed name to __bpf_core_types_are_compat
- included warning previously present in libbpf version
- merged kernel and user space changes into a single patch

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220623182934.2582827-1-deso@posteo.net
2022-07-03 20:23:34 -07:00
Daniel Müller
20f0330235 Remove unused .travis.yml configuration
Checking earlier pull requests, to the best of my understanding nothing
is using Travis anymore -- all CI checks are GitHub Actions based.
Further checking the Travis repository [0] the last CI run there was 2
years ago.
Hence, let's remove stale configuration for Travis, as it's seemingly
only bitrotting and causing confusion.

[0]: https://travis-ci.org/github/libbpf/libbpf/builds

Signed-off-by: Daniel Müller <deso@posteo.net>
2022-06-28 18:26:00 -07:00
Andrii Nakryiko
29869d6ef0 ci: disable attach_probe test on 5.5
It's assuming kprobe w/ sleepable flag is loadable, which is failing on
5.5 kernel.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-06-24 13:32:31 -07:00
Andrii Nakryiko
72dbaf2ac3 ci: update vmlinux.h for 5.5 and 4.9 kernels
Update vmlinux.h to fix selftests build on 5.5 and 4.9 kernels.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-06-24 13:32:31 -07:00
Andrii Nakryiko
bc3673cdd5 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   3e6fe5ce4d4860c3a111c246fddc6f31492f4fb0
Checkpoint bpf-next commit: bb7a4257892717caf82fe6da45b259b35f73445c
Baseline bpf commit:        5e0b0a4c52d30bb09659446f40b77a692361600d
Checkpoint bpf commit:      a2b1a5d40bd12b44322c2ccd40bb0ec1699708b6

Delyan Kratunov (1):
  libbpf: add support for sleepable uprobe programs

Maxim Mikityanskiy (2):
  bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie
  bpf: Add helpers to issue and check SYN cookies in XDP

 include/uapi/linux/bpf.h | 88 ++++++++++++++++++++++++++++++++++++++--
 src/libbpf.c             |  5 ++-
 2 files changed, 88 insertions(+), 5 deletions(-)

--
2.30.2
2022-06-24 13:32:31 -07:00
Andrii Nakryiko
78909b8caf sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-06-24 13:32:31 -07:00
Maxim Mikityanskiy
ec718073b0 bpf: Add helpers to issue and check SYN cookies in XDP
The new helpers bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6} allow an XDP
program to generate SYN cookies in response to TCP SYN packets and to
check those cookies upon receiving the first ACK packet (the final
packet of the TCP handshake).

Unlike bpf_tcp_{gen,check}_syncookie these new helpers don't need a
listening socket on the local machine, which allows to use them together
with synproxy to accelerate SYN cookie generation.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220615134847.3753567-4-maximmi@nvidia.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-24 13:32:31 -07:00
Maxim Mikityanskiy
9c73b6d422 bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie
bpf_tcp_gen_syncookie expects the full length of the TCP header (with
all options), and bpf_tcp_check_syncookie accepts lengths bigger than
sizeof(struct tcphdr). Fix the documentation that says these lengths
should be exactly sizeof(struct tcphdr).

While at it, fix a typo in the name of struct ipv6hdr.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220615134847.3753567-2-maximmi@nvidia.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-24 13:32:31 -07:00
Delyan Kratunov
0c84902331 libbpf: add support for sleepable uprobe programs
Add section mappings for u(ret)probe.s programs.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Link: https://lore.kernel.org/r/aedbc3b74f3523f00010a7b0df8f3388cca59f16.1655248076.git.delyank@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-24 13:32:31 -07:00
Roberto Sassu
4cb682229d configs: Enable CONFIG_MODULE_SIG
Enable CONFIG_MODULE_SIG to test the new helper
bpf_verify_pkcs7_signature().

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
2022-06-17 22:05:28 -07:00
Eyal Birger
0304a3c027 ci: enable vrf configs for x86_64
Set CONFIG_NET_L3_MASTER_DEV=y, CONFIG_NET_VRF=y for x86_64.

These options are needed for performing LWT BPF tests in test_progs.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
2022-06-17 09:58:20 -07:00
Mykola Lysenko
a459010926 ci: temporarily disable varlen test 2022-06-17 09:47:40 -07:00
Andrii Nakryiko
e5ff285a44 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   fe92833524e368e59bba9c57e00f7359f133667f
Checkpoint bpf-next commit: 3e6fe5ce4d4860c3a111c246fddc6f31492f4fb0
Baseline bpf commit:        825464e79db4aac936e0fdae62cdfb7546d0028f
Checkpoint bpf commit:      5e0b0a4c52d30bb09659446f40b77a692361600d

Andrii Nakryiko (1):
  libbpf: Fix internal USDT address translation logic for shared
    libraries

Yonghong Song (1):
  libbpf: Fix an unsigned < 0 bug

 src/libbpf.c |   2 +-
 src/usdt.c   | 123 ++++++++++++++++++++++++++-------------------------
 2 files changed, 64 insertions(+), 61 deletions(-)

--
2.30.2
2022-06-16 16:58:52 -07:00
Andrii Nakryiko
2d91c46d1a libbpf: Fix internal USDT address translation logic for shared libraries
Perform the same virtual address to file offset translation that libbpf
is doing for executable ELF binaries also for shared libraries.
Currently libbpf is making a simplifying and sometimes wrong assumption
that for shared libraries relative virtual addresses inside ELF are
always equal to file offsets.

Unfortunately, this is not always the case with LLVM's lld linker, which
now by default generates quite more complicated ELF segments layout.
E.g., for liburandom_read.so from selftests/bpf, here's an excerpt from
readelf output listing ELF segments (a.k.a. program headers):

  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000000040 0x0000000000000040 0x0001f8 0x0001f8 R   0x8
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x0005e4 0x0005e4 R   0x1000
  LOAD           0x0005f0 0x00000000000015f0 0x00000000000015f0 0x000160 0x000160 R E 0x1000
  LOAD           0x000750 0x0000000000002750 0x0000000000002750 0x000210 0x000210 RW  0x1000
  LOAD           0x000960 0x0000000000003960 0x0000000000003960 0x000028 0x000029 RW  0x1000

Compare that to what is generated by GNU ld (or LLVM lld's with extra
-znoseparate-code argument which disables this cleverness in the name of
file size reduction):

  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x000550 0x000550 R   0x1000
  LOAD           0x001000 0x0000000000001000 0x0000000000001000 0x000131 0x000131 R E 0x1000
  LOAD           0x002000 0x0000000000002000 0x0000000000002000 0x0000ac 0x0000ac R   0x1000
  LOAD           0x002dc0 0x0000000000003dc0 0x0000000000003dc0 0x000262 0x000268 RW  0x1000

You can see from the first example above that for executable (Flg == "R E")
PT_LOAD segment (LOAD #2), Offset doesn't match VirtAddr columns.
And it does in the second case (GNU ld output).

This is important because all the addresses, including USDT specs,
operate in a virtual address space, while kernel is expecting file
offsets when performing uprobe attach. So such mismatches have to be
properly taken care of and compensated by libbpf, which is what this
patch is fixing.

Also patch clarifies few function and variable names, as well as updates
comments to reflect this important distinction (virtaddr vs file offset)
and to ephasize that shared libraries are not all that different from
executables in this regard.

This patch also changes selftests/bpf Makefile to force urand_read and
liburand_read.so to be built with Clang and LLVM's lld (and explicitly
request this ELF file size optimization through -znoseparate-code linker
parameter) to validate libbpf logic and ensure regressions don't happen
in the future. I've bundled these selftests changes together with libbpf
changes to keep the above description tied with both libbpf and
selftests changes.

Fixes: 74cc6311cec9 ("libbpf: Add USDT notes parsing and resolution logic")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220616055543.3285835-1-andrii@kernel.org
2022-06-16 16:58:52 -07:00
Yonghong Song
d3e41fc1aa libbpf: Fix an unsigned < 0 bug
Andrii reported a bug with the following information:

  2859 	if (enum64_placeholder_id == 0) {
  2860 		enum64_placeholder_id = btf__add_int(btf, "enum64_placeholder", 1, 0);
  >>>     CID 394804:  Control flow issues  (NO_EFFECT)
  >>>     This less-than-zero comparison of an unsigned value is never true. "enum64_placeholder_id < 0U".
  2861 		if (enum64_placeholder_id < 0)
  2862 			return enum64_placeholder_id;
  2863    	...

Here enum64_placeholder_id declared as '__u32' so enum64_placeholder_id < 0
is always false. Declare enum64_placeholder_id as 'int' in order to capture
the potential error properly.

Fixes: f2a625889bb8 ("libbpf: Add enum64 sanitization")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220613054314.1251905-1-yhs@fb.com
2022-06-16 16:58:52 -07:00
Andrii Nakryiko
645500dd7d ci: blacklist mptcp test on s390x
It is also blacklisted in kernel-patches CI.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-06-10 14:13:02 -07:00
Andrii Nakryiko
5497411f48 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   02f4afebf8a54ba16f99f4f6ca10df3efeac6229
Checkpoint bpf-next commit: fe92833524e368e59bba9c57e00f7359f133667f
Baseline bpf commit:        d08af2c46881b62f4efad8ebb7eae381fa1f1033
Checkpoint bpf commit:      825464e79db4aac936e0fdae62cdfb7546d0028f

Andrii Nakryiko (1):
  libbpf: Fix uprobe symbol file offset calculation logic

Yonghong Song (10):
  bpf: Add btf enum64 support
  libbpf: Permit 64bit relocation value
  libbpf: Fix an error in 64bit relocation value computation
  libbpf: Refactor btf__add_enum() for future code sharing
  libbpf: Add enum64 parsing and new enum64 public API
  libbpf: Add enum64 deduplication support
  libbpf: Add enum64 support for btf_dump
  libbpf: Add enum64 sanitization
  libbpf: Add enum64 support for bpf linking
  libbpf: Add enum64 relocation support

 include/uapi/linux/btf.h |  17 +++-
 src/btf.c                | 201 +++++++++++++++++++++++++++++++++++----
 src/btf.h                |  32 ++++++-
 src/btf_dump.c           | 137 +++++++++++++++++++-------
 src/libbpf.c             | 126 ++++++++++++++----------
 src/libbpf.map           |   2 +
 src/libbpf_internal.h    |   2 +
 src/linker.c             |   2 +
 src/relo_core.c          | 105 ++++++++++++--------
 src/relo_core.h          |   4 +-
 10 files changed, 483 insertions(+), 145 deletions(-)

--
2.30.2
2022-06-10 14:13:02 -07:00
Andrii Nakryiko
74b22b6c8a libbpf: Fix uprobe symbol file offset calculation logic
Fix libbpf's bpf_program__attach_uprobe() logic of determining
function's *file offset* (which is what kernel is actually expecting)
when attaching uprobe/uretprobe by function name. Previously calculation
was determining virtual address offset relative to base load address,
which (offset) is not always the same as file offset (though very
frequently it is which is why this went unnoticed for a while).

Fixes: 433966e3ae04 ("libbpf: Support function name-based attach uprobes")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Riham Selim <rihams@fb.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20220606220143.3796908-1-andrii@kernel.org
2022-06-10 14:13:02 -07:00
Yonghong Song
416351822c libbpf: Add enum64 relocation support
The enum64 relocation support is added. The bpf local type
could be either enum or enum64 and the remote type could be
either enum or enum64 too. The all combinations of local enum/enum64
and remote enum/enum64 are supported.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062647.3721719-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-10 14:13:02 -07:00
Yonghong Song
3f9d041e19 libbpf: Add enum64 support for bpf linking
Add BTF_KIND_ENUM64 support for bpf linking, which is
very similar to BTF_KIND_ENUM.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062642.3721494-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-10 14:13:02 -07:00
Yonghong Song
a945df2439 libbpf: Add enum64 sanitization
When old kernel does not support enum64 but user space btf
contains non-zero enum kflag or enum64, libbpf needs to
do proper sanitization so modified btf can be accepted
by the kernel.

Sanitization for enum kflag can be achieved by clearing
the kflag bit. For enum64, the type is replaced with an
union of integer member types and the integer member size
must be smaller than enum64 size. If such an integer
type cannot be found, a new type is created and used
for union members.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062636.3721375-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-10 14:13:02 -07:00
Yonghong Song
f429a582bf libbpf: Add enum64 support for btf_dump
Add enum64 btf dumping support. For long long and unsigned long long
dump, suffixes 'LL' and 'ULL' are added to avoid compilation errors
in some cases.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062631.3720526-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-10 14:13:02 -07:00
Yonghong Song
25238de149 libbpf: Add enum64 deduplication support
Add enum64 deduplication support. BTF_KIND_ENUM64 handling
is very similar to BTF_KIND_ENUM.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062626.3720166-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-10 14:13:02 -07:00
Yonghong Song
c3f8eecb16 libbpf: Add enum64 parsing and new enum64 public API
Add enum64 parsing support and two new enum64 public APIs:
  btf__add_enum64
  btf__add_enum64_value

Also add support of signedness for BTF_KIND_ENUM. The
BTF_KIND_ENUM API signatures are not changed. The signedness
will be changed from unsigned to signed if btf__add_enum_value()
finds any negative values.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062621.3719391-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-10 14:13:02 -07:00
Yonghong Song
25fd7a1cf5 libbpf: Refactor btf__add_enum() for future code sharing
Refactor btf__add_enum() function to create a separate
function btf_add_enum_common() so later the common function
can be used to add enum64 btf type. There is no functionality
change for this patch.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062615.3718063-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-10 14:13:02 -07:00
Yonghong Song
0167a88355 libbpf: Fix an error in 64bit relocation value computation
Currently, the 64bit relocation value in the instruction
is computed as follows:
  __u64 imm = insn[0].imm + ((__u64)insn[1].imm << 32)

Suppose insn[0].imm = -1 (0xffffffff) and insn[1].imm = 1.
With the above computation, insn[0].imm will first sign-extend
to 64bit -1 (0xffffffffFFFFFFFF) and then add 0x1FFFFFFFF,
producing incorrect value 0xFFFFFFFF. The correct value
should be 0x1FFFFFFFF.

Changing insn[0].imm to __u32 first will prevent 64bit sign
extension and fix the issue. Merging high and low 32bit values
also changed from '+' to '|' to be consistent with other
similar occurences in kernel and libbpf.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062610.3717378-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-10 14:13:02 -07:00
Yonghong Song
23e3d8cf31 libbpf: Permit 64bit relocation value
Currently, the libbpf limits the relocation value to be 32bit
since all current relocations have such a limit. But with
BTF_KIND_ENUM64 support, the enum value could be 64bit.
So let us permit 64bit relocation value in libbpf.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062605.3716779-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-10 14:13:02 -07:00
Yonghong Song
9a976c6b98 bpf: Add btf enum64 support
Currently, BTF only supports upto 32bit enum value with BTF_KIND_ENUM.
But in kernel, some enum indeed has 64bit values, e.g.,
in uapi bpf.h, we have
  enum {
        BPF_F_INDEX_MASK                = 0xffffffffULL,
        BPF_F_CURRENT_CPU               = BPF_F_INDEX_MASK,
        BPF_F_CTXLEN_MASK               = (0xfffffULL << 32),
  };
In this case, BTF_KIND_ENUM will encode the value of BPF_F_CTXLEN_MASK
as 0, which certainly is incorrect.

This patch added a new btf kind, BTF_KIND_ENUM64, which permits
64bit value to cover the above use case. The BTF_KIND_ENUM64 has
the following three fields followed by the common type:
  struct bpf_enum64 {
    __u32 nume_off;
    __u32 val_lo32;
    __u32 val_hi32;
  };
Currently, btf type section has an alignment of 4 as all element types
are u32. Representing the value with __u64 will introduce a pad
for bpf_enum64 and may also introduce misalignment for the 64bit value.
Hence, two members of val_hi32 and val_lo32 are chosen to avoid these issues.

The kflag is also introduced for BTF_KIND_ENUM and BTF_KIND_ENUM64
to indicate whether the value is signed or unsigned. The kflag intends
to provide consistent output of BTF C fortmat with the original
source code. For example, the original BTF_KIND_ENUM bit value is 0xffffffff.
The format C has two choices, printing out 0xffffffff or -1 and current libbpf
prints out as unsigned value. But if the signedness is preserved in btf,
the value can be printed the same as the original source code.
The kflag value 0 means unsigned values, which is consistent to the default
by libbpf and should also cover most cases as well.

The new BTF_KIND_ENUM64 is intended to support the enum value represented as
64bit value. But it can represent all BTF_KIND_ENUM values as well.
The compiler ([1]) and pahole will generate BTF_KIND_ENUM64 only if the value has
to be represented with 64 bits.

In addition, a static inline function btf_kind_core_compat() is introduced which
will be used later when libbpf relo_core.c changed. Here the kernel shares the
same relo_core.c with libbpf.

  [1] https://reviews.llvm.org/D124641

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062600.3716578-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-10 14:13:02 -07:00
Andrii Nakryiko
e93b1010f3 ci: disable unpriv_bpf_disabled test on s390x
Seems like it's relying on fentry which is not supported on s390x.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-06-07 17:39:28 -07:00
Andrii Nakryiko
76fc1ad6d5 ci: make sure to not override CFLAGS
Use EXTRA_CFLAGS instead of overriding CFLAGS.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-06-07 17:39:28 -07:00
Andrii Nakryiko
33c5f2bec3 libbpf: bump Makefile version to 1.0.0 to match libbpf.map
We are now in v1.0 dev cycle.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-06-07 17:39:28 -07:00
Andrii Nakryiko
d4998cbb6c ci: update Kconfigs to make all selftests working
Also disable fexit_stress which is using test_run's support for TRACING
progs now.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-06-07 17:39:28 -07:00
Andrii Nakryiko
eb1d1ad83f sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   ac6a65868a5a45db49d5ee8524df3b701110d844
Checkpoint bpf-next commit: 02f4afebf8a54ba16f99f4f6ca10df3efeac6229
Baseline bpf commit:        f3f19f939c11925dadd3f4776f99f8c278a7017b
Checkpoint bpf commit:      d08af2c46881b62f4efad8ebb7eae381fa1f1033

Andrii Nakryiko (2):
  libbpf: start 1.0 development cycle
  libbpf: remove bpf_create_map*() APIs

Daniel Müller (5):
  libbpf: Introduce libbpf_bpf_prog_type_str
  libbpf: Introduce libbpf_bpf_map_type_str
  libbpf: Introduce libbpf_bpf_attach_type_str
  libbpf: Introduce libbpf_bpf_link_type_str
  libbpf: Fix a couple of typos

Douglas Raillard (1):
  libbpf: Fix determine_ptr_size() guessing

Eric Dumazet (1):
  net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes

Geliang Tang (1):
  bpf: Add bpf_skc_to_mptcp_sock_proto

Joanne Koong (5):
  bpf: Add verifier support for dynptrs
  bpf: Add bpf_dynptr_from_mem for local dynptrs
  bpf: Dynptr support for ring buffers
  bpf: Add bpf_dynptr_read and bpf_dynptr_write
  bpf: Add dynptr data slices

Julia Lawall (1):
  libbpf: Fix typo in comment

Yuze Chi (1):
  libbpf: Fix is_pow_of_2

 include/uapi/linux/bpf.h     |  90 +++++++++++++++++++
 include/uapi/linux/if_link.h |   2 +
 src/bpf.c                    |  80 -----------------
 src/bpf.h                    |  42 ---------
 src/btf.c                    |  28 ++++--
 src/libbpf.c                 | 167 +++++++++++++++++++++++++++++++++--
 src/libbpf.h                 |  38 +++++++-
 src/libbpf.map               |  10 +++
 src/libbpf_internal.h        |   5 ++
 src/libbpf_version.h         |   4 +-
 src/linker.c                 |   5 --
 src/relo_core.c              |   8 +-
 12 files changed, 332 insertions(+), 147 deletions(-)

--
2.30.2
2022-06-07 17:39:28 -07:00
Andrii Nakryiko
8aa946389d sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-06-07 17:39:28 -07:00
Yuze Chi
ad0783c430 libbpf: Fix is_pow_of_2
Move the correct definition from linker.c into libbpf_internal.h.

Fixes: 0087a681fa8c ("libbpf: Automatically fix up BPF_MAP_TYPE_RINGBUF size, if necessary")
Reported-by: Yuze Chi <chiyuze@google.com>
Signed-off-by: Yuze Chi <chiyuze@google.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220603055156.2830463-1-irogers@google.com
2022-06-07 17:39:28 -07:00
Daniel Müller
55638904af libbpf: Fix a couple of typos
This change fixes a couple of typos that were encountered while studying
the source code.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220601154025.3295035-1-deso@posteo.net
2022-06-07 17:39:28 -07:00
Douglas Raillard
a5d75daa8c libbpf: Fix determine_ptr_size() guessing
One strategy employed by libbpf to guess the pointer size is by finding
the size of "unsigned long" type. This is achieved by looking for a type
of with the expected name and checking its size.

Unfortunately, the C syntax is friendlier to humans than to computers
as there is some variety in how such a type can be named. Specifically,
gcc and clang do not use the same names for integer types in debug info:

    - clang uses "unsigned long"
    - gcc uses "long unsigned int"

Lookup all the names for such a type so that libbpf can hope to find the
information it wants.

Signed-off-by: Douglas Raillard <douglas.raillard@arm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220524094447.332186-1-douglas.raillard@arm.com
2022-06-07 17:39:28 -07:00
Daniel Müller
37218f49fa libbpf: Introduce libbpf_bpf_link_type_str
This change introduces a new function, libbpf_bpf_link_type_str, to the
public libbpf API. The function allows users to get a string
representation for a bpf_link_type enum variant.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220523230428.3077108-11-deso@posteo.net
2022-06-07 17:39:28 -07:00
Daniel Müller
bdbce77631 libbpf: Introduce libbpf_bpf_attach_type_str
This change introduces a new function, libbpf_bpf_attach_type_str, to
the public libbpf API. The function allows users to get a string
representation for a bpf_attach_type variant.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220523230428.3077108-8-deso@posteo.net
2022-06-07 17:39:28 -07:00
Daniel Müller
242c116f04 libbpf: Introduce libbpf_bpf_map_type_str
This change introduces a new function, libbpf_bpf_map_type_str, to the
public libbpf API. The function allows users to get a string
representation for a bpf_map_type enum variant.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220523230428.3077108-5-deso@posteo.net
2022-06-07 17:39:28 -07:00
Daniel Müller
4d9cd51e7e libbpf: Introduce libbpf_bpf_prog_type_str
This change introduces a new function, libbpf_bpf_prog_type_str, to the
public libbpf API. The function allows users to get a string
representation for a bpf_prog_type variant.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220523230428.3077108-2-deso@posteo.net
2022-06-07 17:39:28 -07:00
Joanne Koong
f035838503 bpf: Add dynptr data slices
This patch adds a new helper function

void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len);

which returns a pointer to the underlying data of a dynptr. *len*
must be a statically known value. The bpf program may access the returned
data slice as a normal buffer (eg can do direct reads and writes), since
the verifier associates the length with the returned pointer, and
enforces that no out of bounds accesses occur.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220523210712.3641569-6-joannelkoong@gmail.com
2022-06-07 17:39:28 -07:00
Joanne Koong
7ed5bf8f4c bpf: Add bpf_dynptr_read and bpf_dynptr_write
This patch adds two helper functions, bpf_dynptr_read and
bpf_dynptr_write:

long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset);

long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len);

The dynptr passed into these functions must be valid dynptrs that have
been initialized.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220523210712.3641569-5-joannelkoong@gmail.com
2022-06-07 17:39:28 -07:00
Joanne Koong
1a0f5d1c87 bpf: Dynptr support for ring buffers
Currently, our only way of writing dynamically-sized data into a ring
buffer is through bpf_ringbuf_output but this incurs an extra memcpy
cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
memcpy, but it can only safely support reservation sizes that are
statically known since the verifier cannot guarantee that the bpf
program won’t access memory outside the reserved space.

The bpf_dynptr abstraction allows for dynamically-sized ring buffer
reservations without the extra memcpy.

There are 3 new APIs:

long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);

These closely follow the functionalities of the original ringbuf APIs.
For example, all ringbuffer dynptrs that have been reserved must be
either submitted or discarded before the program exits.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20220523210712.3641569-4-joannelkoong@gmail.com
2022-06-07 17:39:28 -07:00
Joanne Koong
c68a2738fd bpf: Add bpf_dynptr_from_mem for local dynptrs
This patch adds a new api bpf_dynptr_from_mem:

long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr);

which initializes a dynptr to point to a bpf program's local memory. For now
only local memory that is of reg type PTR_TO_MAP_VALUE is supported.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220523210712.3641569-3-joannelkoong@gmail.com
2022-06-07 17:39:28 -07:00
Joanne Koong
97009215cb bpf: Add verifier support for dynptrs
This patch adds the bulk of the verifier work for supporting dynamic
pointers (dynptrs) in bpf.

A bpf_dynptr is opaque to the bpf program. It is a 16-byte structure
defined internally as:

struct bpf_dynptr_kern {
    void *data;
    u32 size;
    u32 offset;
} __aligned(8);

The upper 8 bits of *size* is reserved (it contains extra metadata about
read-only status and dynptr type). Consequently, a dynptr only supports
memory less than 16 MB.

There are different types of dynptrs (eg malloc, ringbuf, ...). In this
patchset, the most basic one, dynptrs to a bpf program's local memory,
is added. For now only local memory that is of reg type PTR_TO_MAP_VALUE
is supported.

In the verifier, dynptr state information will be tracked in stack
slots. When the program passes in an uninitialized dynptr
(ARG_PTR_TO_DYNPTR | MEM_UNINIT), the stack slots corresponding
to the frame pointer where the dynptr resides at are marked
STACK_DYNPTR. For helper functions that take in initialized dynptrs (eg
bpf_dynptr_read + bpf_dynptr_write which are added later in this
patchset), the verifier enforces that the dynptr has been initialized
properly by checking that their corresponding stack slots have been
marked as STACK_DYNPTR.

The 6th patch in this patchset adds test cases that the verifier should
successfully reject, such as for example attempting to use a dynptr
after doing a direct write into it inside the bpf program.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20220523210712.3641569-2-joannelkoong@gmail.com
2022-06-07 17:39:28 -07:00
Julia Lawall
4c39a3e1aa libbpf: Fix typo in comment
Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/bpf/20220521111145.81697-71-Julia.Lawall@inria.fr
2022-06-07 17:39:28 -07:00
Geliang Tang
cb11988cf4 bpf: Add bpf_skc_to_mptcp_sock_proto
This patch implements a new struct bpf_func_proto, named
bpf_skc_to_mptcp_sock_proto. Define a new bpf_id BTF_SOCK_TYPE_MPTCP,
and a new helper bpf_skc_to_mptcp_sock(), which invokes another new
helper bpf_mptcp_sock_from_subflow() in net/mptcp/bpf.c to get struct
mptcp_sock from a given subflow socket.

v2: Emit BTF type, add func_id checks in verifier.c and bpf_trace.c,
remove build check for CONFIG_BPF_JIT
v5: Drop EXPORT_SYMBOL (Martin)

Co-developed-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220519233016.105670-2-mathew.j.martineau@linux.intel.com
2022-06-07 17:39:28 -07:00
Andrii Nakryiko
7e8d4234ac libbpf: remove bpf_create_map*() APIs
To test API removal, get rid of bpf_create_map*() APIs. Perf defines
__weak implementation of bpf_map_create() that redirects to old
bpf_create_map() and that seems to compile and run fine.

Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220518185915.3529475-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 17:39:28 -07:00
Andrii Nakryiko
00f40c01fb libbpf: start 1.0 development cycle
Start libbpf 1.0 development cycle by adding LIBBPF_1.0.0 section to
libbpf.map file and marking all current symbols as local. As we remove
all the deprecated APIs we'll populate global list before the final 1.0
release.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220518185915.3529475-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 17:39:28 -07:00
Eric Dumazet
881eba7ef5 net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes
New netlink attributes IFLA_TSO_MAX_SIZE and IFLA_TSO_MAX_SEGS
are used to report to user-space the device TSO limits.

ip -d link sh dev eth1
...
   tso_max_size 65536 tso_max_segs 65535

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-07 17:39:28 -07:00
wangjie
4eb6485c08 Makefile: add support for cross compilation
Support CROSS_COMPILE and EXTRA_CFLAGS/EXTRA_LDFLAGS environments,
to make cross compiling more flexible.

Signed-off-by: Jie Wang <wangjie22@lixiang.com>
2022-05-24 23:24:54 -07:00
Ilya Leoshkevich
eaf9123419 vmtest: add netfilter to s390x config
This is required for the new synproxy test.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2022-05-23 17:39:33 -07:00
Ilya Leoshkevich
cc904c1a74 vmtest: keep coreutils
Kernel's vmtest.sh uses stdbuf, which is unfortunately not present in
busybox. Do not delete coreutils, which has it. As a result, the
compressed image grows by 1M (~5%).

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2022-05-23 17:39:33 -07:00
Ilya Leoshkevich
f3b96c873d vmtest: add iptables
iptables is required by the new selftests for raw syncookie helpers.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2022-05-23 17:39:33 -07:00
Maxim Mikityanskiy
47595c2f08 ci: blacklist xdp_syncookie on s390x
The xdp_syncookie test uses kfunc, and BPF JIT doesn't support kfunc on
s390x.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
2022-05-20 17:14:41 -07:00
thiagoftsm
e4f2e6e865 Merge branch 'libbpf:master' into master 2022-05-17 02:16:02 +00:00
Andrii Nakryiko
86eb09863c sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   b2531d4bdce19f28364b45aac9132e153b1f23a4
Checkpoint bpf-next commit: ac6a65868a5a45db49d5ee8524df3b701110d844
Baseline bpf commit:        f3f19f939c11925dadd3f4776f99f8c278a7017b
Checkpoint bpf commit:      f3f19f939c11925dadd3f4776f99f8c278a7017b

Andrii Nakryiko (1):
  libbpf: fix memory leak in attach_tp for target-less tracepoint
    program

 src/libbpf.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--
2.30.2
2022-05-16 13:46:05 -07:00
Andrii Nakryiko
d43fc5a42f libbpf: fix memory leak in attach_tp for target-less tracepoint program
Fix sec_name memory leak if user defines target-less SEC("tp").

Fixes: 9af8efc45eb1 ("libbpf: Allow "incomplete" basic tracing SEC() definitions")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20220516184547.3204674-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-16 13:46:05 -07:00
Andrii Nakryiko
12e932ac0e ci: whitelist 'usdt' test on 5.5 and update vmlinux.h
Update vmlinux.h for latest selftests. Also whitelist usdt test on 5.5,
as it should work.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-05-13 16:13:31 -07:00
Andrii Nakryiko
75452cd290 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   d54d06a4c4bc5d76815d02e4b041b31d9dbb3fef
Checkpoint bpf-next commit: b2531d4bdce19f28364b45aac9132e153b1f23a4
Baseline bpf commit:        ba3beec2ec1d3b4fd8672ca6e781dac4b3267f6e
Checkpoint bpf commit:      f3f19f939c11925dadd3f4776f99f8c278a7017b

Andrii Nakryiko (12):
  libbpf: Allow "incomplete" basic tracing SEC() definitions
  libbpf: Support target-less SEC() definitions for BTF-backed programs
  libbpf: Append "..." in fixed up log if CO-RE spec is truncated
  libbpf: Use libbpf_mem_ensure() when allocating new map
  libbpf: Allow to opt-out from creating BPF maps
  libbpf: Make __kptr and __kptr_ref unconditionally use btf_type_tag()
    attr
  libbpf: Improve usability of field-based CO-RE helpers
  libbpf: Complete field-based CO-RE helpers with field offset helper
  libbpf: Provide barrier() and barrier_var() in bpf_helpers.h
  libbpf: Automatically fix up BPF_MAP_TYPE_RINGBUF size, if necessary
  libbpf: Clean up ringbuf size adjustment implementation
  libbpf: Add safer high-level wrappers for map operations

Feng Zhou (1):
  bpf: add bpf_map_lookup_percpu_elem for percpu map

Jiri Olsa (1):
  libbpf: Add bpf_program__set_insns function

Kaixi Fan (1):
  bpf: Add source ip in "struct bpf_tunnel_key"

Kui-Feng Lee (3):
  bpf, x86: Generate trampolines from bpf_tramp_links
  bpf, x86: Attach a cookie to fentry/fexit/fmod_ret/lsm.
  libbpf: Assign cookies to links in libbpf.

 include/uapi/linux/bpf.h |  23 ++
 src/bpf.c                |  22 ++
 src/bpf.h                |   4 +
 src/bpf_core_read.h      |  37 ++-
 src/bpf_helpers.h        |  29 ++-
 src/libbpf.c             | 473 ++++++++++++++++++++++++++++++++-------
 src/libbpf.h             | 156 +++++++++++++
 src/libbpf.map           |  12 +-
 8 files changed, 659 insertions(+), 97 deletions(-)

--
2.30.2
2022-05-13 16:13:31 -07:00
Andrii Nakryiko
ae67bfbae3 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-05-13 16:13:31 -07:00
Andrii Nakryiko
650adc5118 libbpf: Add safer high-level wrappers for map operations
Add high-level API wrappers for most common and typical BPF map
operations that works directly on instances of struct bpf_map * (so
you don't have to call bpf_map__fd()) and validate key/value size
expectations.

These helpers require users to specify key (and value, where
appropriate) sizes when performing lookup/update/delete/etc. This forces
user to actually think and validate (for themselves) those. This is
a good thing as user is expected by kernel to implicitly provide correct
key/value buffer sizes and kernel will just read/write necessary amount
of data. If it so happens that user doesn't set up buffers correctly
(which bit people for per-CPU maps especially) kernel either randomly
overwrites stack data or return -EFAULT, depending on user's luck and
circumstances. These high-level APIs are meant to prevent such
unpleasant and hard to debug bugs.

This patch also adds bpf_map_delete_elem_flags() low-level API and
requires passing flags to bpf_map__delete_elem() API for consistency
across all similar APIs, even though currently kernel doesn't expect
any extra flags for BPF_MAP_DELETE_ELEM operation.

List of map operations that get these high-level APIs:

  - bpf_map_lookup_elem;
  - bpf_map_update_elem;
  - bpf_map_delete_elem;
  - bpf_map_lookup_and_delete_elem;
  - bpf_map_get_next_key.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220512220713.2617964-1-andrii@kernel.org
2022-05-13 16:13:31 -07:00
Feng Zhou
babc92b9f1 bpf: add bpf_map_lookup_percpu_elem for percpu map
Add new ebpf helpers bpf_map_lookup_percpu_elem.

The implementation method is relatively simple, refer to the implementation
method of map_lookup_elem of percpu map, increase the parameters of cpu, and
obtain it according to the specified cpu.

Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Link: https://lore.kernel.org/r/20220511093854.411-2-zhoufeng.zf@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-13 16:13:31 -07:00
Jiri Olsa
e335f3fa5f libbpf: Add bpf_program__set_insns function
Adding bpf_program__set_insns that allows to set new instructions
for a BPF program.

This is a very advanced libbpf API and users need to know what
they are doing. This should be used from prog_prepare_load_fn
callback only.

We can have changed instructions after calling prog_prepare_load_fn
callback, reloading them.

One of the users of this new API will be perf's internal BPF prologue
generation.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220510074659.2557731-2-jolsa@kernel.org
2022-05-13 16:13:31 -07:00
Andrii Nakryiko
7062757357 libbpf: Clean up ringbuf size adjustment implementation
Drop unused iteration variable, move overflow prevention check into the
for loop.

Fixes: 0087a681fa8c ("libbpf: Automatically fix up BPF_MAP_TYPE_RINGBUF size, if necessary")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220510185159.754299-1-andrii@kernel.org
2022-05-13 16:13:31 -07:00
Kui-Feng Lee
aec48fffee libbpf: Assign cookies to links in libbpf.
Add a cookie field to the attributes of bpf_link_create().
Add bpf_program__attach_trace_opts() to attach a cookie to a link.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220510205923.3206889-5-kuifeng@fb.com
2022-05-13 16:13:31 -07:00
Kui-Feng Lee
c116ae6130 bpf, x86: Attach a cookie to fentry/fexit/fmod_ret/lsm.
Pass a cookie along with BPF_LINK_CREATE requests.

Add a bpf_cookie field to struct bpf_tracing_link to attach a cookie.
The cookie of a bpf_tracing_link is available by calling
bpf_get_attach_cookie when running the BPF program of the attached
link.

The value of a cookie will be set at bpf_tramp_run_ctx by the
trampoline of the link.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220510205923.3206889-4-kuifeng@fb.com
2022-05-13 16:13:31 -07:00
Kui-Feng Lee
99b21d41e3 bpf, x86: Generate trampolines from bpf_tramp_links
Replace struct bpf_tramp_progs with struct bpf_tramp_links to collect
struct bpf_tramp_link(s) for a trampoline.  struct bpf_tramp_link
extends bpf_link to act as a linked list node.

arch_prepare_bpf_trampoline() accepts a struct bpf_tramp_links to
collects all bpf_tramp_link(s) that a trampoline should call.

Change BPF trampoline and bpf_struct_ops to pass bpf_tramp_links
instead of bpf_tramp_progs.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220510205923.3206889-2-kuifeng@fb.com
2022-05-13 16:13:31 -07:00
Kaixi Fan
7a443259de bpf: Add source ip in "struct bpf_tunnel_key"
Add tunnel source ip field in "struct bpf_tunnel_key". Add related code
to set and get tunnel source field.

Signed-off-by: Kaixi Fan <fankaixi.li@bytedance.com>
Link: https://lore.kernel.org/r/20220430074844.69214-2-fankaixi.li@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-13 16:13:31 -07:00
Andrii Nakryiko
b3197662ba libbpf: Automatically fix up BPF_MAP_TYPE_RINGBUF size, if necessary
Kernel imposes a pretty particular restriction on ringbuf map size. It
has to be a power-of-2 multiple of page size. While generally this isn't
hard for user to satisfy, sometimes it's impossible to do this
declaratively in BPF source code or just plain inconvenient to do at
runtime.

One such example might be BPF libraries that are supposed to work on
different architectures, which might not agree on what the common page
size is.

Let libbpf find the right size for user instead, if it turns out to not
satisfy kernel requirements. If user didn't set size at all, that's most
probably a mistake so don't upsize such zero size to one full page,
though. Also we need to be careful about not overflowing __u32
max_entries.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220509004148.1801791-9-andrii@kernel.org
2022-05-13 16:13:31 -07:00
Andrii Nakryiko
486b1a080b libbpf: Provide barrier() and barrier_var() in bpf_helpers.h
Add barrier() and barrier_var() macros into bpf_helpers.h to be used by
end users. While a bit advanced and specialized instruments, they are
sometimes indispensable. Instead of requiring each user to figure out
exact asm volatile incantations for themselves, provide them from
bpf_helpers.h.

Also remove conflicting definitions from selftests. Some tests rely on
barrier_var() definition being nothing, those will still work as libbpf
does the #ifndef/#endif guarding for barrier() and barrier_var(),
allowing users to redefine them, if necessary.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220509004148.1801791-8-andrii@kernel.org
2022-05-13 16:13:31 -07:00
Andrii Nakryiko
ba9850c048 libbpf: Complete field-based CO-RE helpers with field offset helper
Add bpf_core_field_offset() helper to complete field-based CO-RE
helpers. This helper can be useful for feature-detection and for some
more advanced cases of field reading (e.g., reading flexible array members).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220509004148.1801791-6-andrii@kernel.org
2022-05-13 16:13:31 -07:00
Andrii Nakryiko
5c1d6799df libbpf: Improve usability of field-based CO-RE helpers
Allow to specify field reference in two ways:

  - if user has variable of necessary type, they can use variable-based
    reference (my_var.my_field or my_var_ptr->my_field). This was the
    only supported syntax up till now.
  - now, bpf_core_field_exists() and bpf_core_field_size() support also
    specifying field in a fashion similar to offsetof() macro, by
    specifying type of the containing struct/union separately and field
    name separately: bpf_core_field_exists(struct my_type, my_field).
    This forms is quite often more convenient in practice and it matches
    type-based CO-RE helpers that support specifying type by its name
    without requiring any variables.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220509004148.1801791-4-andrii@kernel.org
2022-05-13 16:13:31 -07:00
Andrii Nakryiko
1f30788b41 libbpf: Make __kptr and __kptr_ref unconditionally use btf_type_tag() attr
It will be annoying and surprising for users of __kptr and __kptr_ref if
libbpf silently ignores them just because Clang used for compilation
didn't support btf_type_tag(). It's much better to get clear compiler
error than debug BPF verifier failures later on.

Fixes: ef89654f2bc7 ("libbpf: Add kptr type tag macros to bpf_helpers.h")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220509004148.1801791-3-andrii@kernel.org
2022-05-13 16:13:31 -07:00
Andrii Nakryiko
a8bc578af9 libbpf: Allow to opt-out from creating BPF maps
Add bpf_map__set_autocreate() API that allows user to opt-out from
libbpf automatically creating BPF map during BPF object load.

This is a useful feature when building CO-RE-enabled BPF application
that takes advantage of some new-ish BPF map type (e.g., socket-local
storage) if kernel supports it, but otherwise uses some alternative way
(e.g., extra HASH map). In such case, being able to disable the creation
of a map that kernel doesn't support allows to successfully create and
load BPF object file with all its other maps and programs.

It's still up to user to make sure that no "live" code in any of their BPF
programs are referencing such map instance, which can be achieved by
guarding such code with CO-RE relocation check or by using .rodata
global variables.

If user fails to properly guard such code to turn it into "dead code",
libbpf will helpfully post-process BPF verifier log and will provide
more meaningful error and map name that needs to be guarded properly. As
such, instead of:

  ; value = bpf_map_lookup_elem(&missing_map, &zero);
  4: (85) call unknown#2001000000
  invalid func unknown#2001000000

... user will see:

  ; value = bpf_map_lookup_elem(&missing_map, &zero);
  4: <invalid BPF map reference>
  BPF map 'missing_map' is referenced but wasn't created

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220428041523.4089853-4-andrii@kernel.org
2022-05-13 16:13:31 -07:00
Andrii Nakryiko
d46f1aaa7c libbpf: Use libbpf_mem_ensure() when allocating new map
Reuse libbpf_mem_ensure() when adding a new map to the list of maps
inside bpf_object. It takes care of proper resizing and reallocating of
map array and zeroing out newly allocated memory.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220428041523.4089853-3-andrii@kernel.org
2022-05-13 16:13:31 -07:00
Andrii Nakryiko
1a18c6f051 libbpf: Append "..." in fixed up log if CO-RE spec is truncated
Detect CO-RE spec truncation and append "..." to make user aware that
there was supposed to be more of the spec there.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220428041523.4089853-2-andrii@kernel.org
2022-05-13 16:13:31 -07:00
Andrii Nakryiko
97ab064bc0 libbpf: Support target-less SEC() definitions for BTF-backed programs
Similar to previous patch, support target-less definitions like
SEC("fentry"), SEC("freplace"), etc. For such BTF-backed program types
it is expected that user will specify BTF target programmatically at
runtime using bpf_program__set_attach_target() *before* load phase. If
not, libbpf will report this as an error.

Aslo use SEC_ATTACH_BTF flag instead of explicitly listing a set of
types that are expected to require attach_btf_id. This was an accidental
omission during custom SEC() support refactoring.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220428185349.3799599-3-andrii@kernel.org
2022-05-13 16:13:31 -07:00
Andrii Nakryiko
eee09dc704 libbpf: Allow "incomplete" basic tracing SEC() definitions
In a lot of cases the target of kprobe/kretprobe, tracepoint, raw
tracepoint, etc BPF program might not be known at the compilation time
and will be discovered at runtime. This was always a supported case by
libbpf, with APIs like bpf_program__attach_{kprobe,tracepoint,etc}()
accepting full target definition, regardless of what was defined in
SEC() definition in BPF source code.

Unfortunately, up till now libbpf still enforced users to specify at
least something for the fake target, e.g., SEC("kprobe/whatever"), which
is cumbersome and somewhat misleading.

This patch allows target-less SEC() definitions for basic tracing BPF
program types:

  - kprobe/kretprobe;
  - multi-kprobe/multi-kretprobe;
  - tracepoints;
  - raw tracepoints.

Such target-less SEC() definitions are meant to specify declaratively
proper BPF program type only. Attachment of them will have to be handled
programmatically using correct APIs. As such, skeleton's auto-attachment
of such BPF programs is skipped and generic bpf_program__attach() will
fail, if attempted, due to the lack of enough target information.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220428185349.3799599-2-andrii@kernel.org
2022-05-13 16:13:31 -07:00
Ilya Leoshkevich
87dff0a2c7 vmtest: allow building foreign debian rootfs
This would allow building s390x images without access to an IBM Z.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2022-05-09 12:17:29 -07:00
Ilya Leoshkevich
14777c3784 vmtest: use debian bookworm
A newer iproute2 version is required for MPTCP tests. Use a newer
distro version, which has it.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2022-05-09 12:17:29 -07:00
Andrii Nakryiko
3a4e26307d sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   34ba23b44c664792a4308ec37b5788a3162944ec
Checkpoint bpf-next commit: d54d06a4c4bc5d76815d02e4b041b31d9dbb3fef
Baseline bpf commit:        8de8b71b787f38983d414d2dba169a3bfefa668a
Checkpoint bpf commit:      ba3beec2ec1d3b4fd8672ca6e781dac4b3267f6e

Alan Maguire (1):
  libbpf: Usdt aarch64 arg parsing support

Andrii Nakryiko (10):
  libbpf: Support opting out from autoloading BPF programs declaratively
  libbpf: Teach bpf_link_create() to fallback to
    bpf_raw_tracepoint_open()
  libbpf: Fix anonymous type check in CO-RE logic
  libbpf: Drop unhelpful "program too large" guess
  libbpf: Fix logic for finding matching program for CO-RE relocation
  libbpf: Avoid joining .BTF.ext data with BPF programs by section name
  libbpf: Record subprog-resolved CO-RE relocations unconditionally
  libbpf: Refactor CO-RE relo human description formatting routine
  libbpf: Simplify bpf_core_parse_spec() signature
  libbpf: Fix up verifier log for unguarded failed CO-RE relos

Gaosheng Cui (1):
  libbpf: Remove redundant non-null checks on obj_elf

Grant Seltzer (4):
  libbpf: Add error returns to two API functions
  libbpf: Update API functions usage to check error
  libbpf: Add documentation to API functions
  libbpf: Improve libbpf API documentation link position

Kumar Kartikeya Dwivedi (2):
  bpf: Allow storing referenced kptr in map
  libbpf: Add kptr type tag macros to bpf_helpers.h

Pu Lehui (2):
  libbpf: Fix usdt_cookie being cast to 32 bits
  libbpf: Support riscv USDT argument parsing logic

Runqing Yang (1):
  libbpf: Fix a bug with checking bpf_probe_read_kernel() support in old
    kernels

Vladimir Isaev (1):
  libbpf: Add ARC support to bpf_tracing.h

Yuntao Wang (1):
  libbpf: Remove unnecessary type cast

 docs/index.rst           |   3 +-
 include/uapi/linux/bpf.h |  12 ++
 src/bpf.c                |  34 ++++-
 src/bpf_helpers.h        |   7 +
 src/bpf_tracing.h        |  23 +++
 src/btf.c                |   9 +-
 src/libbpf.c             | 322 ++++++++++++++++++++++++++++++---------
 src/libbpf.h             |  82 +++++++++-
 src/libbpf_internal.h    |   9 +-
 src/relo_core.c          | 104 +++++++------
 src/relo_core.h          |   6 +
 src/usdt.c               | 191 ++++++++++++++++++++++-
 12 files changed, 668 insertions(+), 134 deletions(-)

--
2.30.2
2022-04-27 15:19:08 -07:00
Andrii Nakryiko
ef6f1fdfff sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-04-27 15:19:08 -07:00
Andrii Nakryiko
c3f58eb6cf libbpf: Fix up verifier log for unguarded failed CO-RE relos
Teach libbpf to post-process BPF verifier log on BPF program load
failure and detect known error patterns to provide user with more
context.

Currently there is one such common situation: an "unguarded" failed BPF
CO-RE relocation. While failing CO-RE relocation is expected, it is
expected to be property guarded in BPF code such that BPF verifier
always eliminates BPF instructions corresponding to such failed CO-RE
relos as dead code. In cases when user failed to take such precautions,
BPF verifier provides the best log it can:

  123: (85) call unknown#195896080
  invalid func unknown#195896080

Such incomprehensible log error is due to libbpf "poisoning" BPF
instruction that corresponds to failed CO-RE relocation by replacing it
with invalid `call 0xbad2310` instruction (195896080 == 0xbad2310 reads
"bad relo" if you squint hard enough).

Luckily, libbpf has all the necessary information to look up CO-RE
relocation that failed and provide more human-readable description of
what's going on:

  5: <invalid CO-RE relocation>
  failed to resolve CO-RE relocation <byte_off> [6] struct task_struct___bad.fake_field_subprog (0:2 @ offset 8)

This hopefully makes it much easier to understand what's wrong with
user's BPF program without googling magic constants.

This BPF verifier log fixup is setup to be extensible and is going to be
used for at least one other upcoming feature of libbpf in follow up patches.
Libbpf is parsing lines of BPF verifier log starting from the very end.
Currently it processes up to 10 lines of code looking for familiar
patterns. This avoids wasting lots of CPU processing huge verifier logs
(especially for log_level=2 verbosity level). Actual verification error
should normally be found in last few lines, so this should work
reliably.

If libbpf needs to expand log beyond available log_buf_size, it
truncates the end of the verifier log. Given verifier log normally ends
with something like:

  processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

... truncating this on program load error isn't too bad (end user can
always increase log size, if it needs to get complete log).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220426004511.2691730-10-andrii@kernel.org
2022-04-27 15:19:08 -07:00
Andrii Nakryiko
2c3a55bfe7 libbpf: Simplify bpf_core_parse_spec() signature
Simplify bpf_core_parse_spec() signature to take struct bpf_core_relo as
an input instead of requiring callers to decompose them into type_id,
relo, spec_str, etc. This makes using and reusing this helper easier.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220426004511.2691730-9-andrii@kernel.org
2022-04-27 15:19:08 -07:00
Andrii Nakryiko
e2d8a820cb libbpf: Refactor CO-RE relo human description formatting routine
Refactor how CO-RE relocation is formatted. Now it dumps human-readable
representation, currently used by libbpf in either debug or error
message output during CO-RE relocation resolution process, into provided
buffer. This approach allows for better reuse of this functionality
outside of CO-RE relocation resolution, which we'll use in next patch
for providing better error message for BPF verifier rejecting BPF
program due to unguarded failed CO-RE relocation.

It also gets rid of annoying "stitching" of libbpf_print() calls, which
was the only place where we did this.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220426004511.2691730-8-andrii@kernel.org
2022-04-27 15:19:08 -07:00
Andrii Nakryiko
aaaeea6499 libbpf: Record subprog-resolved CO-RE relocations unconditionally
Previously, libbpf recorded CO-RE relocations with insns_idx resolved
according to finalized subprog locations (which are appended at the end
of entry BPF program) to simplify the job of light skeleton generator.

This is necessary because once subprogs' instructions are appended to
main entry BPF program all the subprog instruction indices are shifted
and that shift is different for each entry (main) BPF program, so it's
generally impossible to map final absolute insn_idx of the finalized BPF
program to their original locations inside subprograms.

This information is now going to be used not only during light skeleton
generation, but also to map absolute instruction index to subprog's
instruction and its corresponding CO-RE relocation. So start recording
these relocations always, not just when obj->gen_loader is set.

This information is going to be freed at the end of bpf_object__load()
step, as before (but this can change in the future if there will be
a need for this information post load step).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220426004511.2691730-7-andrii@kernel.org
2022-04-27 15:19:08 -07:00
Andrii Nakryiko
f2e994e0b7 libbpf: Avoid joining .BTF.ext data with BPF programs by section name
Instead of using ELF section names as a joining key between .BTF.ext and
corresponding BPF programs, pre-build .BTF.ext section number to ELF
section index mapping during bpf_object__open() and use it later for
matching .BTF.ext information (func/line info or CO-RE relocations) to
their respective BPF programs and subprograms.

This simplifies corresponding joining logic and let's libbpf do
manipulations with BPF program's ELF sections like dropping leading '?'
character for non-autoloaded programs. Original joining logic in
bpf_object__relocate_core() (see relevant comment that's now removed)
was never elegant, so it's a good improvement regardless. But it also
avoids unnecessary internal assumptions about preserving original ELF
section name as BPF program's section name (which was broken when
SEC("?abc") support was added).

Fixes: a3820c481112 ("libbpf: Support opting out from autoloading BPF programs declaratively")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220426004511.2691730-5-andrii@kernel.org
2022-04-27 15:19:08 -07:00
Andrii Nakryiko
eb22de1f7d libbpf: Fix logic for finding matching program for CO-RE relocation
Fix the bug in bpf_object__relocate_core() which can lead to finding
invalid matching BPF program when processing CO-RE relocation. IF
matching program is not found, last encountered program will be assumed
to be correct program and thus error detection won't detect the problem.

Fixes: 9c82a63cf370 ("libbpf: Fix CO-RE relocs against .text section")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220426004511.2691730-4-andrii@kernel.org
2022-04-27 15:19:08 -07:00
Andrii Nakryiko
0a901dd1cd libbpf: Drop unhelpful "program too large" guess
libbpf pretends it knows actual limit of BPF program instructions based
on UAPI headers it compiled with. There is neither any guarantee that
UAPI headers match host kernel, nor BPF verifier actually uses
BPF_MAXINSNS constant anymore. Just drop unhelpful "guess", BPF verifier
will emit actual reason for failure in its logs anyways.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220426004511.2691730-3-andrii@kernel.org
2022-04-27 15:19:08 -07:00
Andrii Nakryiko
36582ee432 libbpf: Fix anonymous type check in CO-RE logic
Use type name for checking whether CO-RE relocation is referring to
anonymous type. Using spec string makes no sense.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220426004511.2691730-2-andrii@kernel.org
2022-04-27 15:19:08 -07:00
Kumar Kartikeya Dwivedi
e7f46e2cae libbpf: Add kptr type tag macros to bpf_helpers.h
Include convenience definitions:
__kptr:	Unreferenced kptr
__kptr_ref: Referenced kptr

Users can use them to tag the pointer type meant to be used with the new
support directly in the map value definition.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-11-memxor@gmail.com
2022-04-27 15:19:08 -07:00
Kumar Kartikeya Dwivedi
179ca056b0 bpf: Allow storing referenced kptr in map
Extending the code in previous commits, introduce referenced kptr
support, which needs to be tagged using 'kptr_ref' tag instead. Unlike
unreferenced kptr, referenced kptr have a lot more restrictions. In
addition to the type matching, only a newly introduced bpf_kptr_xchg
helper is allowed to modify the map value at that offset. This transfers
the referenced pointer being stored into the map, releasing the
references state for the program, and returning the old value and
creating new reference state for the returned pointer.

Similar to unreferenced pointer case, return value for this case will
also be PTR_TO_BTF_ID_OR_NULL. The reference for the returned pointer
must either be eventually released by calling the corresponding release
function, otherwise it must be transferred into another map.

It is also allowed to call bpf_kptr_xchg with a NULL pointer, to clear
the value, and obtain the old value if any.

BPF_LDX, BPF_STX, and BPF_ST cannot access referenced kptr. A future
commit will permit using BPF_LDX for such pointers, but attempt at
making it safe, since the lifetime of object won't be guaranteed.

There are valid reasons to enforce the restriction of permitting only
bpf_kptr_xchg to operate on referenced kptr. The pointer value must be
consistent in face of concurrent modification, and any prior values
contained in the map must also be released before a new one is moved
into the map. To ensure proper transfer of this ownership, bpf_kptr_xchg
returns the old value, which the verifier would require the user to
either free or move into another map, and releases the reference held
for the pointer being moved in.

In the future, direct BPF_XCHG instruction may also be permitted to work
like bpf_kptr_xchg helper.

Note that process_kptr_func doesn't have to call
check_helper_mem_access, since we already disallow rdonly/wronly flags
for map, which is what check_map_access_type checks, and we already
ensure the PTR_TO_MAP_VALUE refers to kptr by obtaining its off_desc,
so check_map_access is also not required.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-4-memxor@gmail.com
2022-04-27 15:19:08 -07:00
Yuntao Wang
56dff81d46 libbpf: Remove unnecessary type cast
The link variable is already of type 'struct bpf_link *', casting it to
'struct bpf_link *' is redundant, drop it.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220424143420.457082-1-ytcoode@gmail.com
2022-04-27 15:19:08 -07:00
Andrii Nakryiko
0d4cefc4fc libbpf: Teach bpf_link_create() to fallback to bpf_raw_tracepoint_open()
Teach bpf_link_create() to fallback to bpf_raw_tracepoint_open() on
older kernels for programs that are attachable through
BPF_RAW_TRACEPOINT_OPEN. This makes bpf_link_create() more unified and
convenient interface for creating bpf_link-based attachments.

With this approach end users can just use bpf_link_create() for
tp_btf/fentry/fexit/fmod_ret/lsm program attachments without needing to
care about kernel support, as libbpf will handle this transparently. On
the other hand, as newer features (like BPF cookie) are added to
LINK_CREATE interface, they will be readily usable though the same
bpf_link_create() API without any major refactoring from user's
standpoint.

bpf_program__attach_btf_id() is now using bpf_link_create() internally
as well and will take advantaged of this unified interface when BPF
cookie is added for fentry/fexit.

Doing proactive feature detection of LINK_CREATE support for
fentry/tp_btf/etc is quite involved. It requires parsing vmlinux BTF,
determining some stable and guaranteed to be in all kernels versions
target BTF type (either raw tracepoint or fentry target function),
actually attaching this program and thus potentially affecting the
performance of the host kernel briefly, etc. So instead we are taking
much simpler "lazy" approach of falling back to
bpf_raw_tracepoint_open() call only if initial LINK_CREATE command
fails. For modern kernels this will mean zero added overhead, while
older kernels will incur minimal overhead with a single fast-failing
LINK_CREATE call.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Kui-Feng Lee <kuifeng@fb.com>
Link: https://lore.kernel.org/bpf/20220421033945.3602803-3-andrii@kernel.org
2022-04-27 15:19:08 -07:00
Grant Seltzer
5954a6c4aa libbpf: Improve libbpf API documentation link position
This puts the link for libbpf API documentation into the sidebar
for much easier navigation.

You can preview this change at:

  https://libbpf-test.readthedocs.io/en/latest/

Note that the link is hardcoded to the production version, so you
can see that it self references itself here for now:

  https://libbpf-test.readthedocs.io/en/latest/api.html

This will need to make its way into the libbpf mirror, before being
deployed to libbpf.readthedocs.org

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220422031050.303984-1-grantseltzer@gmail.com
2022-04-27 15:19:08 -07:00
Gaosheng Cui
38be0379c9 libbpf: Remove redundant non-null checks on obj_elf
Obj_elf is already non-null checked at the function entry, so remove
redundant non-null checks on obj_elf.

Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220421031803.2283974-1-cuigaosheng1@huawei.com
2022-04-27 15:19:08 -07:00
Grant Seltzer
5fa8bb6b42 libbpf: Add documentation to API functions
This adds documentation for the following API functions:

- bpf_program__set_expected_attach_type()
- bpf_program__set_type()
- bpf_program__set_attach_target()
- bpf_program__attach()
- bpf_program__pin()
- bpf_program__unpin()

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220420161226.86803-3-grantseltzer@gmail.com
2022-04-27 15:19:08 -07:00
Grant Seltzer
c5b91a333e libbpf: Update API functions usage to check error
This updates usage of the following API functions within
libbpf so their newly added error return is checked:

- bpf_program__set_expected_attach_type()
- bpf_program__set_type()

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220420161226.86803-2-grantseltzer@gmail.com
2022-04-27 15:19:08 -07:00
Grant Seltzer
8073e03491 libbpf: Add error returns to two API functions
This adds an error return to the following API functions:

- bpf_program__set_expected_attach_type()
- bpf_program__set_type()

In both cases, the error occurs when the BPF object has
already been loaded when the function is called. In this
case -EBUSY is returned.

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220420161226.86803-1-grantseltzer@gmail.com
2022-04-27 15:19:08 -07:00
Pu Lehui
eb2b216081 libbpf: Support riscv USDT argument parsing logic
Add riscv-specific USDT argument specification parsing logic.
riscv USDT argument format is shown below:
- Memory dereference case:
  "size@off(reg)", e.g. "-8@-88(s0)"
- Constant value case:
  "size@val", e.g. "4@5"
- Register read case:
  "size@reg", e.g. "-8@a1"

s8 will be marked as poison while it's a reg of riscv, we need
to alias it in advance. Both RV32 and RV64 have been tested.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220419145238.482134-3-pulehui@huawei.com
2022-04-27 15:19:08 -07:00
Pu Lehui
bddd106e80 libbpf: Fix usdt_cookie being cast to 32 bits
The usdt_cookie is defined as __u64, which should not be
used as a long type because it will be cast to 32 bits
in 32-bit platforms.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220419145238.482134-2-pulehui@huawei.com
2022-04-27 15:19:08 -07:00
Andrii Nakryiko
e205664ddb libbpf: Support opting out from autoloading BPF programs declaratively
Establish SEC("?abc") naming convention (i.e., adding question mark in
front of otherwise normal section name) that allows to set corresponding
program's autoload property to false. This is effectively just
a declarative way to do bpf_program__set_autoload(prog, false).

Having a way to do this declaratively in BPF code itself is useful and
convenient for various scenarios. E.g., for testing, when BPF object
consists of multiple independent BPF programs that each needs to be
tested separately. Opting out all of them by default and then setting
autoload to true for just one of them at a time simplifies testing code
(see next patch for few conversions in BPF selftests taking advantage of
this new feature).

Another real-world use case is in libbpf-tools for cases when different
BPF programs have to be picked depending on particulars of the host
kernel due to various incompatible changes (like kernel function renames
or signature change, or to pick kprobe vs fentry depending on
corresponding kernel support for the latter). Marking all the different
BPF program candidates as non-autoloaded declaratively makes this more
obvious in BPF source code and allows simpler code in user-space code.

When BPF program marked as SEC("?abc") it is otherwise treated just like
SEC("abc") and bpf_program__section_name() reported will be "abc".

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220419002452.632125-1-andrii@kernel.org
2022-04-27 15:19:08 -07:00
Alan Maguire
557499a13e libbpf: Usdt aarch64 arg parsing support
Parsing of USDT arguments is architecture-specific. On aarch64 it is
relatively easy since registers used are x[0-31] and sp. Format is
slightly different compared to x86_64. Possible forms are:

- "size@[reg[,offset]]" for dereferences, e.g. "-8@[sp,76]" and "-4@[sp]";
- "size@reg" for register values, e.g. "-4@x0";
- "size@value" for raw values, e.g. "-8@1".

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1649690496-1902-2-git-send-email-alan.maguire@oracle.com
2022-04-27 15:19:08 -07:00
Runqing Yang
ffd4015f3b libbpf: Fix a bug with checking bpf_probe_read_kernel() support in old kernels
Background:
Libbpf automatically replaces calls to BPF bpf_probe_read_{kernel,user}
[_str]() helpers with bpf_probe_read[_str](), if libbpf detects that
kernel doesn't support new APIs. Specifically, libbpf invokes the
probe_kern_probe_read_kernel function to load a small eBPF program into
the kernel in which bpf_probe_read_kernel API is invoked and lets the
kernel checks whether the new API is valid. If the loading fails, libbpf
considers the new API invalid and replaces it with the old API.

static int probe_kern_probe_read_kernel(void)
{
	struct bpf_insn insns[] = {
		BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),	/* r1 = r10 (fp) */
		BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),	/* r1 += -8 */
		BPF_MOV64_IMM(BPF_REG_2, 8),		/* r2 = 8 */
		BPF_MOV64_IMM(BPF_REG_3, 0),		/* r3 = 0 */
		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_probe_read_kernel),
		BPF_EXIT_INSN(),
	};
	int fd, insn_cnt = ARRAY_SIZE(insns);

	fd = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL,
                           "GPL", insns, insn_cnt, NULL);
	return probe_fd(fd);
}

Bug:
On older kernel versions [0], the kernel checks whether the version
number provided in the bpf syscall, matches the LINUX_VERSION_CODE.
If not matched, the bpf syscall fails. eBPF However, the
probe_kern_probe_read_kernel code does not set the kernel version
number provided to the bpf syscall, which causes the loading process
alwasys fails for old versions. It means that libbpf will replace the
new API with the old one even the kernel supports the new one.

Solution:
After a discussion in [1], the solution is using BPF_PROG_TYPE_TRACEPOINT
program type instead of BPF_PROG_TYPE_KPROBE because kernel does not
enfoce version check for tracepoint programs. I test the patch in old
kernels (4.18 and 4.19) and it works well.

  [0] https://elixir.bootlin.com/linux/v4.19/source/kernel/bpf/syscall.c#L1360
  [1] Closes: https://github.com/libbpf/libbpf/issues/473

Signed-off-by: Runqing Yang <rainkin1993@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220409144928.27499-1-rainkin1993@gmail.com
2022-04-27 15:19:08 -07:00
Vladimir Isaev
68e7624e9f libbpf: Add ARC support to bpf_tracing.h
Add PT_REGS macros suitable for ARCompact and ARCv2.

Signed-off-by: Vladimir Isaev <isaev@synopsys.com>
Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220408224442.599566-1-geomatsi@gmail.com
2022-04-27 15:19:08 -07:00
Maxim Mikityanskiy
b221db664f ci: enable synproxy config for all architectures
Enable the following options in Kconfig for x86-64 and s390x:

CONFIG_NETFILTER_SYNPROXY=y
CONFIG_NETFILTER_XT_TARGET_CT=y
CONFIG_NETFILTER_XT_MATCH_STATE=y
CONFIG_IP_NF_FILTER=y
CONFIG_IP_NF_TARGET_SYNPROXY=y
CONFIG_IP_NF_RAW=y

These options are needed to run the selftests for the new BPF SYN cookie
helpers.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
2022-04-27 15:18:22 -07:00
chantra
7bf9ee2dba [rootfs] update rootfs to ship with ethtool
Add `ethtool` as a dependency to the rootfs image.

Tested by running and building the rootfs images with both
`sudo ./mkrootfs_arch.sh`
and
`sudo ./mkrootfs_debian.sh`

and running in qemu with:
```
wget https://libbpf-ci.s3-us-west-1.amazonaws.com/x86_64/vmlinuz-5.5.0
rootfs_img=rootfs.img kernel_bzimage=vmlinuz-5.5.0

mkdir rootfs
touch rootfs.img
truncate -s 2G rootfs.img
sudo mount -o loop rootfs.img rootfs
cat ~/Downloads/libbpf-vmtest-rootfs-2022.04.25.tar.zst | sudo tar -C rootfs -I zstd -xvf -
sudo install  -m 755 -o root -g root  /dev/stdin rootfs/etc/rcS.d/S50-startup <<'EOF'
ethtool -h
cat /etc/issue
EOF

qemu-system-x86_64 -nodefaults  -display none -serial mon:stdio -enable-kvm -m 4G -drive file="${rootfs_img}",format=raw,index=1,media=disk,if=virtio,cache=none -kernel "${kernel_bzimage}" -append "root=/dev/vda rw console=ttyS0,115200"
```

The last block printed ethtool's help, confirming the presence of
ethtool in the rootfs.

`libbpf-vmtest-rootfs-2022.04.25.tar.zst` was generated and uploaded to S3. INDEX in libbpf/ci needs to be changed to make the CI pick it up.
2022-04-25 16:30:32 -07:00
grantseltzer
533c7666eb Fix downloads formats
Signed-off-by: grantseltzer <grantseltzer@gmail.com>
2022-04-22 14:30:27 -07:00
grantseltzer
dea5ae9fc9 Enable downloads feature
Signed-off-by: grantseltzer <grantseltzer@gmail.com>
2022-04-19 16:08:19 -07:00
Evgeny Vereshchagin
8bc3e510fc ci: turn off _FORTIFY_SOURCE explicitly
libelf is compiled with _FORTIFY_SOURCE by default and it
isn't compatible with MSan. It was borrowed
from https://github.com/google/oss-fuzz/pull/7422
2022-04-10 18:57:38 -07:00
Evgeny Vereshchagin
14414c6ea5 ci: turn on the alignment check
to catch issues like https://github.com/libbpf/libbpf/issues/391
2022-04-10 18:57:38 -07:00
Evgeny Vereshchagin
ea10235072 ci: point elfutils to a commit where a couple bugs are fixed
Fixes
```
./out/bpf-object-fuzzer: Running 1 inputs 1 time(s) each.
Running: CORPUS/036ff286c13e4590646c7ef59435ec642432da8e
elf_begin.c:232:20: runtime error: member access within misaligned address 0x000001655e71 for type 'Elf64_Shdr', which requires 8 byte alignment
0x000001655e71: note: pointer points here
 00 00 00  7f 45 4c 46 02 02 01 00  00 00 07 fb 00 1d 00 00  6c 69 63 65 42 fb 00 41  00 57 03 00 20
              ^
    #0 0x574d51 in get_shnum /home/libbpf/elfutils/libelf/elf_begin.c:232:20
    #1 0x574d51 in file_read_elf /home/libbpf/elfutils/libelf/elf_begin.c:296:19
    #2 0x569c2c in __libelf_read_mmaped_file /home/libbpf/elfutils/libelf/elf_begin.c:559:14
    #3 0x58e812 in elf_memory /home/libbpf/elfutils/libelf/elf_memory.c:49:10
    #4 0x4905b4 in bpf_object__elf_init /home/libbpf/src/libbpf.c:1255:9
    #5 0x4905b4 in bpf_object_open /home/libbpf/src/libbpf.c:7104:8
    #6 0x49144e in bpf_object__open_mem /home/libbpf/src/libbpf.c:7171:20
    #7 0x483018 in LLVMFuzzerTestOneInput /home/libbpf/fuzz/bpf-object-fuzzer.c:16:8
    #8 0x439389 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/home/libbpf/out/bpf-object-fuzzer+0x439389)
    #9 0x419e2f in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) (/home/libbpf/out/bpf-object-fuzzer+0x419e2f)
    #10 0x421aee in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/home/libbpf/out/bpf-object-fuzzer+0x421aee)
    #11 0x410f96 in main (/home/libbpf/out/bpf-object-fuzzer+0x410f96)
    #12 0x7f153e21255f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)
    #13 0x7f153e21260b in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2d60b)
    #14 0x410fe4 in _start (/home/libbpf/out/bpf-object-fuzzer+0x410fe4)

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior elf_begin.c:232:20 in
```
and
```
./out/bpf-object-fuzzer: Running 1 inputs 1 time(s) each.
Running: CORPUS/446b578d82c47fe177de6fd675f4cb6bae8d1ea9
elf_begin.c:485:40: runtime error: addition of unsigned offset to 0x000002277e70 overflowed to 0x0000021d7e6f
    #0 0x5748f1 in file_read_elf /home/libbpf/elfutils/libelf/elf_begin.c:485:40
    #1 0x569c2c in __libelf_read_mmaped_file /home/libbpf/elfutils/libelf/elf_begin.c:559:14
    #2 0x58e812 in elf_memory /home/libbpf/elfutils/libelf/elf_memory.c:49:10
    #3 0x4905b4 in bpf_object__elf_init /home/libbpf/src/libbpf.c:1255:9
    #4 0x4905b4 in bpf_object_open /home/libbpf/src/libbpf.c:7104:8
    #5 0x49144e in bpf_object__open_mem /home/libbpf/src/libbpf.c:7171:20
    #6 0x483018 in LLVMFuzzerTestOneInput /home/libbpf/fuzz/bpf-object-fuzzer.c:16:8
    #7 0x439389 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/home/libbpf/out/bpf-object-fuzzer+0x439389)
    #8 0x419e2f in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) (/home/libbpf/out/bpf-object-fuzzer+0x419e2f)
    #9 0x421aee in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/home/libbpf/out/bpf-object-fuzzer+0x421aee)
    #10 0x410f96 in main (/home/libbpf/out/bpf-object-fuzzer+0x410f96)
    #11 0x7f753e38255f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)
    #12 0x7f753e38260b in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2d60b)
    #13 0x410fe4 in _start (/home/libbpf/out/bpf-object-fuzzer+0x410fe4)

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior elf_begin.c:485:40 in
```
2022-04-10 18:57:38 -07:00
Evgeny Vereshchagin
f3cc144922 ci: turn off unaligned access in libelf explicitly 2022-04-10 18:57:38 -07:00
Andrii Nakryiko
b69f8ee93e ci: allow usdt selftest on s390x
libbpf now has s390x support for USDT, so enable corresponding selftest.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-04-09 09:17:51 -07:00
Andrii Nakryiko
bbfb018473 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   2d0df01974ce2b59b6f7d5bd3ea58d74f12ddf85
Checkpoint bpf-next commit: 34ba23b44c664792a4308ec37b5788a3162944ec
Baseline bpf commit:        0a210af6d0a0595fef566e7eeb072f10f37774be
Checkpoint bpf commit:      8de8b71b787f38983d414d2dba169a3bfefa668a

Alan Maguire (2):
  libbpf: Improve library identification for uprobe binary path
    resolution
  libbpf: Improve string parsing for uprobe auto-attach

Andrii Nakryiko (5):
  libbpf: Fix use #ifdef instead of #if to avoid compiler warning
  libbpf: Use strlcpy() in path resolution fallback logic
  libbpf: Allow WEAK and GLOBAL bindings during BTF fixup
  libbpf: Don't error out on CO-RE relos for overriden weak subprogs
  libbpf: Use weak hidden modifier for USDT BPF-side API functions

Colin Ian King (1):
  libbpf: Fix spelling mistake "libaries" -> "libraries"

Haowen Bai (1):
  libbpf: Potential NULL dereference in usdt_manager_attach_usdt()

Ilya Leoshkevich (3):
  libbpf: Minor style improvements in USDT code
  libbpf: Make BPF-side of USDT support work on big-endian machines
  libbpf: Add s390-specific USDT arg spec parsing logic

 src/libbpf.c          | 105 ++++++++++++++++++++----------------------
 src/libbpf_internal.h |  11 +++++
 src/usdt.bpf.h        |  13 ++++--
 src/usdt.c            |  79 ++++++++++++++++++++++++++-----
 4 files changed, 136 insertions(+), 72 deletions(-)

--
2.30.2
2022-04-09 09:17:51 -07:00
Andrii Nakryiko
1ce956ab3a libbpf: Use weak hidden modifier for USDT BPF-side API functions
Use __weak __hidden for bpf_usdt_xxx() APIs instead of much more
confusing `static inline __noinline`. This was previously impossible due
to libbpf erroring out on CO-RE relocations pointing to eliminated weak
subprogs. Now that previous patch fixed this issue, switch back to
__weak __hidden as it's a more direct way of specifying the desired
behavior.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220408181425.2287230-3-andrii@kernel.org
2022-04-09 09:17:51 -07:00
Andrii Nakryiko
5016f30a24 libbpf: Don't error out on CO-RE relos for overriden weak subprogs
During BPF static linking, all the ELF relocations and .BTF.ext
information (including CO-RE relocations) are preserved for __weak
subprograms that were logically overriden by either previous weak
subprogram instance or by corresponding "strong" (non-weak) subprogram.
This is just how native user-space linkers work, nothing new.

But libbpf is over-zealous when processing CO-RE relocation to error out
when CO-RE relocation belonging to such eliminated weak subprogram is
encountered. Instead of erroring out on this expected situation, log
debug-level message and skip the relocation.

Fixes: db2b8b06423c ("libbpf: Support CO-RE relocations for multi-prog sections")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220408181425.2287230-2-andrii@kernel.org
2022-04-09 09:17:51 -07:00
Andrii Nakryiko
075c96c298 libbpf: Allow WEAK and GLOBAL bindings during BTF fixup
During BTF fix up for global variables, global variable can be global
weak and will have STB_WEAK binding in ELF. Support such global
variables in addition to non-weak ones.

This is not the problem when using BPF static linking, as BPF static
linker "fixes up" BTF during generation so that libbpf doesn't have to
do it anymore during bpf_object__open(), which led to this not being
noticed for a while, along with a pretty rare (currently) use of __weak
variables and maps.

Reported-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220407230446.3980075-2-andrii@kernel.org
2022-04-09 09:17:51 -07:00
Andrii Nakryiko
f044607934 libbpf: Use strlcpy() in path resolution fallback logic
Coverity static analyzer complains that strcpy() can cause buffer
overflow. Use libbpf_strlcpy() instead to be 100% sure this doesn't
happen.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220407230446.3980075-1-andrii@kernel.org
2022-04-09 09:17:51 -07:00
Ilya Leoshkevich
4fd682d358 libbpf: Add s390-specific USDT arg spec parsing logic
The logic is superficially similar to that of x86, but the small
differences (no need for register table and dynamic allocation of
register names, no $ sign before constants) make maintaining a common
implementation too burdensome. Therefore simply add a s390x-specific
version of parse_usdt_arg().

Note that while bcc supports index registers, this patch does not. This
should not be a problem in most cases, since s390 uses a default value
"nor" for STAP_SDT_ARG_CONSTRAINT.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220407214411.257260-4-iii@linux.ibm.com
2022-04-09 09:17:51 -07:00
Ilya Leoshkevich
3663820dda libbpf: Make BPF-side of USDT support work on big-endian machines
BPF_USDT_ARG_REG_DEREF handling always reads 8 bytes, regardless of
the actual argument size. On little-endian the relevant argument bits
end up in the lower bits of val, and later on the code that handles
all the argument types expects them to be there.

On big-endian they end up in the upper bits of val, breaking that
expectation. Fix by right-shifting val on big-endian.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220407214411.257260-3-iii@linux.ibm.com
2022-04-09 09:17:51 -07:00
Ilya Leoshkevich
fcb67a3e70 libbpf: Minor style improvements in USDT code
Fix several typos and references to non-existing headers.
Also use __BYTE_ORDER__ instead of __BYTE_ORDER for consistency with
the rest of the bpf code - see commit 45f2bebc8079 ("libbpf: Fix
endianness detection in BPF_CORE_READ_BITFIELD_PROBED()") for
rationale).

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220407214411.257260-2-iii@linux.ibm.com
2022-04-09 09:17:51 -07:00
Andrii Nakryiko
73b8386f2e libbpf: Fix use #ifdef instead of #if to avoid compiler warning
As reported by Naresh:

  perf build errors on i386 [1] on Linux next-20220407 [2]

  usdt.c:1181:5: error: "__x86_64__" is not defined, evaluates to 0
  [-Werror=undef]
   1181 | #if __x86_64__
        |     ^~~~~~~~~~
  usdt.c:1196:5: error: "__x86_64__" is not defined, evaluates to 0
  [-Werror=undef]
   1196 | #if __x86_64__
        |     ^~~~~~~~~~
  cc1: all warnings being treated as errors

Use #ifdef instead of #if to avoid this.

Fixes: 4c59e584d158 ("libbpf: Add x86-specific USDT arg spec parsing logic")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220407203842.3019904-1-andrii@kernel.org
2022-04-09 09:17:51 -07:00
Haowen Bai
462e3f600a libbpf: Potential NULL dereference in usdt_manager_attach_usdt()
link could be null but still dereference bpf_link__destroy(&link->link)
and it will lead to a null pointer access.

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1649299098-2069-1-git-send-email-baihaowen@meizu.com
2022-04-09 09:17:51 -07:00
Alan Maguire
13fe7fedfa libbpf: Improve string parsing for uprobe auto-attach
For uprobe auto-attach, the parsing can be simplified for the SEC()
name to a single sscanf(); the return value of the sscanf can then
be used to distinguish between sections that simply specify
"u[ret]probe" (and thus cannot auto-attach), those that specify
"u[ret]probe/binary_path:function+offset" etc.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1649245431-29956-3-git-send-email-alan.maguire@oracle.com
2022-04-09 09:17:51 -07:00
Alan Maguire
b974879969 libbpf: Improve library identification for uprobe binary path resolution
In the process of doing path resolution for uprobe attach, libraries are
identified by matching a ".so" substring in the binary_path.
This matches a lot of patterns that do not conform to library.so[.version]
format, so instead match a ".so" _suffix_, and if that fails match a
".so." substring for the versioned library case.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1649245431-29956-2-git-send-email-alan.maguire@oracle.com
2022-04-09 09:17:51 -07:00
Colin Ian King
2b674f2b21 libbpf: Fix spelling mistake "libaries" -> "libraries"
There is a spelling mistake in a pr_warn message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220406080835.14879-1-colin.i.king@gmail.com
2022-04-09 09:17:51 -07:00
Hengqi Chen
5810af7446 Makefile: Add usdt.bpf.h to list of HEADERS
Add usdt.bpf.h to HEADERS so that it can be installed
and included by users.

Signed-off-by: Hengqi Chen <chenhengqi@outlook.com>
2022-04-06 20:28:15 -07:00
Andrii Nakryiko
042471d356 ci: blacklist usdt selftest on s390x
libbpf doesn't support USDTs on s390x yet, blacklist corresponding
selftest.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-04-06 07:34:58 -07:00
Andrii Nakryiko
f7833c0819 ci: ensure CONFIG_DEBUG_INFO_BTF=y by choosing DWARF debug info
With recent upstream changes, the default for debug info is
CONFIG_DEBUG_INFO_NONE=y, which prevents BTF from being generated.
Choose CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y to make sure we do
get DWARF generated.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-04-06 07:34:58 -07:00
Andrii Nakryiko
c562444fb0 Makefile: add usdt.o to list of OBJS
Compile user-space parts of USDT support.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-04-06 07:34:58 -07:00
Andrii Nakryiko
750c9fb595 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   9492450fd28736262dea9143ebb3afc2c131ace1
Checkpoint bpf-next commit: 2d0df01974ce2b59b6f7d5bd3ea58d74f12ddf85
Baseline bpf commit:        6bd0c76bd70447aedfeafa9e1fcc249991d6c678
Checkpoint bpf commit:      0a210af6d0a0595fef566e7eeb072f10f37774be

Alan Maguire (3):
  libbpf: auto-resolve programs/libraries when necessary for uprobes
  libbpf: Support function name-based attach uprobes
  libbpf: Add auto-attach for uprobes based on section name

Andrii Nakryiko (6):
  libbpf: Avoid NULL deref when initializing map BTF info
  libbpf: Add BPF-side of USDT support
  libbpf: Wire up USDT API and bpf_link integration
  libbpf: Add USDT notes parsing and resolution logic
  libbpf: Wire up spec management and other arch-independent USDT logic
  libbpf: Add x86-specific USDT arg spec parsing logic

Anshuman Khandual (1):
  perf: Add irq and exception return branch types

Geliang Tang (1):
  bpf: Sync comments for bpf_get_stack

Haiyue Wang (1):
  bpf: Correct the comment for BTF kind bitfield

Hengqi Chen (1):
  libbpf: Close fd in bpf_object__reuse_map

Ilya Leoshkevich (1):
  libbpf: Support Debian in resolve_full_path()

Yuntao Wang (1):
  libbpf: Don't return -EINVAL if hdr_len < offsetofend(core_relo_len)

 include/uapi/linux/bpf.h        |    8 +-
 include/uapi/linux/btf.h        |    4 +-
 include/uapi/linux/perf_event.h |    2 +
 src/btf.c                       |    6 +-
 src/libbpf.c                    |  486 +++++++++++-
 src/libbpf.h                    |   41 +-
 src/libbpf.map                  |    1 +
 src/libbpf_internal.h           |   19 +
 src/usdt.bpf.h                  |  256 +++++++
 src/usdt.c                      | 1280 +++++++++++++++++++++++++++++++
 10 files changed, 2080 insertions(+), 23 deletions(-)
 create mode 100644 src/usdt.bpf.h
 create mode 100644 src/usdt.c

--
2.30.2
2022-04-06 07:34:58 -07:00
Andrii Nakryiko
08cc701fae sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-04-06 07:34:58 -07:00
Andrii Nakryiko
fa323673c5 libbpf: Add x86-specific USDT arg spec parsing logic
Add x86/x86_64-specific USDT argument specification parsing. Each
architecture will require their own logic, as all this is arch-specific
assembly-based notation. Architectures that libbpf doesn't support for
USDTs will pr_warn() with specific error and return -ENOTSUP.

We use sscanf() as a very powerful and easy to use string parser. Those
spaces in sscanf's format string mean "skip any whitespaces", which is
pretty nifty (and somewhat little known) feature.

All this was tested on little-endian architecture, so bit shifts are
probably off on big-endian, which our CI will hopefully prove.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-6-andrii@kernel.org
2022-04-06 07:34:58 -07:00
Andrii Nakryiko
876b933999 libbpf: Wire up spec management and other arch-independent USDT logic
Last part of architecture-agnostic user-space USDT handling logic is to
set up BPF spec and, optionally, IP-to-ID maps from user-space.
usdt_manager performs a compact spec ID allocation to utilize
fixed-sized BPF maps as efficiently as possible. We also use hashmap to
deduplicate USDT arg spec strings and map identical strings to single
USDT spec, minimizing the necessary BPF map size. usdt_manager supports
arbitrary sequences of attachment and detachment, both of the same USDT
and multiple different USDTs and internally maintains a free list of
unused spec IDs. bpf_link_usdt's logic is extended with proper setup and
teardown of this spec ID free list and supporting BPF maps.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-5-andrii@kernel.org
2022-04-06 07:34:58 -07:00
Andrii Nakryiko
406386b441 libbpf: Add USDT notes parsing and resolution logic
Implement architecture-agnostic parts of USDT parsing logic. The code is
the documentation in this case, it's futile to try to succinctly
describe how USDT parsing is done in any sort of concreteness. But
still, USDTs are recorded in special ELF notes section (.note.stapsdt),
where each USDT call site is described separately. Along with USDT
provider and USDT name, each such note contains USDT argument
specification, which uses assembly-like syntax to describe how to fetch
value of USDT argument. USDT arg spec could be just a constant, or
a register, or a register dereference (most common cases in x86_64), but
it technically can be much more complicated cases, like offset relative
to global symbol and stuff like that. One of the later patches will
implement most common subset of this for x86 and x86-64 architectures,
which seems to handle a lot of real-world production application.

USDT arg spec contains a compact encoding allowing usdt.bpf.h from
previous patch to handle the above 3 cases. Instead of recording which
register might be needed, we encode register's offset within struct
pt_regs to simplify BPF-side implementation. USDT argument can be of
different byte sizes (1, 2, 4, and 8) and signed or unsigned. To handle
this, libbpf pre-calculates necessary bit shifts to do proper casting
and sign-extension in a short sequences of left and right shifts.

The rest is in the code with sometimes extensive comments and references
to external "documentation" for USDTs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-4-andrii@kernel.org
2022-04-06 07:34:58 -07:00
Andrii Nakryiko
1b4b798916 libbpf: Wire up USDT API and bpf_link integration
Wire up libbpf USDT support APIs without yet implementing all the
nitty-gritty details of USDT discovery, spec parsing, and BPF map
initialization.

User-visible user-space API is simple and is conceptually very similar
to uprobe API.

bpf_program__attach_usdt() API allows to programmatically attach given
BPF program to a USDT, specified through binary path (executable or
shared lib), USDT provider and name. Also, just like in uprobe case, PID
filter is specified (0 - self, -1 - any process, or specific PID).
Optionally, USDT cookie value can be specified. Such single API
invocation will try to discover given USDT in specified binary and will
use (potentially many) BPF uprobes to attach this program in correct
locations.

Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that
represents this attachment. It is a virtual BPF link that doesn't have
direct kernel object, as it can consist of multiple underlying BPF
uprobe links. As such, attachment is not atomic operation and there can
be brief moment when some USDT call sites are attached while others are
still in the process of attaching. This should be taken into
consideration by user. But bpf_program__attach_usdt() guarantees that
in the case of success all USDT call sites are successfully attached, or
all the successfuly attachments will be detached as soon as some USDT
call sites failed to be attached. So, in theory, there could be cases of
failed bpf_program__attach_usdt() call which did trigger few USDT
program invocations. This is unavoidable due to multi-uprobe nature of
USDT and has to be handled by user, if it's important to create an
illusion of atomicity.

USDT BPF programs themselves are marked in BPF source code as either
SEC("usdt"), in which case they won't be auto-attached through
skeleton's <skel>__attach() method, or it can have a full definition,
which follows the spirit of fully-specified uprobes:
SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's
attach method will attempt auto-attachment. Similarly, generic
bpf_program__attach() will have enought information to go off of for
parameterless attachment.

USDT BPF programs are actually uprobes, and as such for kernel they are
marked as BPF_PROG_TYPE_KPROBE.

Another part of this patch is USDT-related feature probing:
  - BPF cookie support detection from user-space;
  - detection of kernel support for auto-refcounting of USDT semaphore.

The latter is optional. If kernel doesn't support such feature and USDT
doesn't rely on USDT semaphores, no error is returned. But if libbpf
detects that USDT requires setting semaphores and kernel doesn't support
this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't
support poking process's memory directly to increment semaphore value,
like BCC does on legacy kernels, due to inherent raciness and danger of
such process memory manipulation. Libbpf let's kernel take care of this
properly or gives up.

Logistically, all the extra USDT-related infrastructure of libbpf is put
into a separate usdt.c file and abstracted behind struct usdt_manager.
Each bpf_object has lazily-initialized usdt_manager pointer, which is
only instantiated if USDT programs are attempted to be attached. Closing
BPF object frees up usdt_manager resources. usdt_manager keeps track of
USDT spec ID assignment and few other small things.

Subsequent patches will fill out remaining missing pieces of USDT
initialization and setup logic.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-3-andrii@kernel.org
2022-04-06 07:34:58 -07:00
Andrii Nakryiko
f5390e4f07 libbpf: Add BPF-side of USDT support
Add BPF-side implementation of libbpf-provided USDT support. This
consists of single header library, usdt.bpf.h, which is meant to be used
from user's BPF-side source code. This header is added to the list of
installed libbpf header, along bpf_helpers.h and others.

BPF-side implementation consists of two BPF maps:
  - spec map, which contains "a USDT spec" which encodes information
    necessary to be able to fetch USDT arguments and other information
    (argument count, user-provided cookie value, etc) at runtime;
  - IP-to-spec-ID map, which is only used on kernels that don't support
    BPF cookie feature. It allows to lookup spec ID based on the place
    in user application that triggers USDT program.

These maps have default sizes, 256 and 1024, which are chosen
conservatively to not waste a lot of space, but handling a lot of common
cases. But there could be cases when user application needs to either
trace a lot of different USDTs, or USDTs are heavily inlined and their
arguments are located in a lot of differing locations. For such cases it
might be necessary to size those maps up, which libbpf allows to do by
overriding BPF_USDT_MAX_SPEC_CNT and BPF_USDT_MAX_IP_CNT macros.

It is an important aspect to keep in mind. Single USDT (user-space
equivalent of kernel tracepoint) can have multiple USDT "call sites".
That is, single logical USDT is triggered from multiple places in user
application. This can happen due to function inlining. Each such inlined
instance of USDT invocation can have its own unique USDT argument
specification (instructions about the location of the value of each of
USDT arguments). So while USDT looks very similar to usual uprobe or
kernel tracepoint, under the hood it's actually a collection of uprobes,
each potentially needing different spec to know how to fetch arguments.

User-visible API consists of three helper functions:
  - bpf_usdt_arg_cnt(), which returns number of arguments of current USDT;
  - bpf_usdt_arg(), which reads value of specified USDT argument (by
    it's zero-indexed position) and returns it as 64-bit value;
  - bpf_usdt_cookie(), which functions like BPF cookie for USDT
    programs; this is necessary as libbpf doesn't allow specifying actual
    BPF cookie and utilizes it internally for USDT support implementation.

Each bpf_usdt_xxx() APIs expect struct pt_regs * context, passed into
BPF program. On kernels that don't support BPF cookie it is used to
fetch absolute IP address of the underlying uprobe.

usdt.bpf.h also provides BPF_USDT() macro, which functions like
BPF_PROG() and BPF_KPROBE() and allows much more user-friendly way to
get access to USDT arguments, if USDT definition is static and known to
the user. It is expected that majority of use cases won't have to use
bpf_usdt_arg_cnt() and bpf_usdt_arg() directly and BPF_USDT() will cover
all their needs.

Last, usdt.bpf.h is utilizing BPF CO-RE for one single purpose: to
detect kernel support for BPF cookie. If BPF CO-RE dependency is
undesirable, user application can redefine BPF_USDT_HAS_BPF_COOKIE to
either a boolean constant (or equivalently zero and non-zero), or even
point it to its own .rodata variable that can be specified from user's
application user-space code. It is important that
BPF_USDT_HAS_BPF_COOKIE is known to BPF verifier as static value (thus
.rodata and not just .data), as otherwise BPF code will still contain
bpf_get_attach_cookie() BPF helper call and will fail validation at
runtime, if not dead-code eliminated.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-2-andrii@kernel.org
2022-04-06 07:34:58 -07:00
Ilya Leoshkevich
00cd090f81 libbpf: Support Debian in resolve_full_path()
attach_probe selftest fails on Debian-based distros with `failed to
resolve full path for 'libc.so.6'`. The reason is that these distros
embraced multiarch to the point where even for the "main" architecture
they store libc in /lib/<triple>.

This is configured in /etc/ld.so.conf and in theory it's possible to
replicate the loader's parsing and processing logic in libbpf, however
a much simpler solution is to just enumerate the known library paths.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220404225020.51029-1-iii@linux.ibm.com
2022-04-06 07:34:58 -07:00
Yuntao Wang
0167a314e7 libbpf: Don't return -EINVAL if hdr_len < offsetofend(core_relo_len)
Since core relos is an optional part of the .BTF.ext ELF section, we should
skip parsing it instead of returning -EINVAL if header size is less than
offsetofend(struct btf_ext_header, core_relo_len).

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220404005320.1723055-1-ytcoode@gmail.com
2022-04-06 07:34:58 -07:00
Alan Maguire
8dcb95d509 libbpf: Add auto-attach for uprobes based on section name
Now that u[ret]probes can use name-based specification, it makes
sense to add support for auto-attach based on SEC() definition.
The format proposed is

        SEC("u[ret]probe/binary:[raw_offset|[function_name[+offset]]")

For example, to trace malloc() in libc:

        SEC("uprobe/libc.so.6:malloc")

...or to trace function foo2 in /usr/bin/foo:

        SEC("uprobe//usr/bin/foo:foo2")

Auto-attach is done for all tasks (pid -1).  prog can be an absolute
path or simply a program/library name; in the latter case, we use
PATH/LD_LIBRARY_PATH to resolve the full path, falling back to
standard locations (/usr/bin:/usr/sbin or /usr/lib64:/usr/lib) if
the file is not found via environment-variable specified locations.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1648654000-21758-4-git-send-email-alan.maguire@oracle.com
2022-04-06 07:34:58 -07:00
Alan Maguire
d112c9ce24 libbpf: Support function name-based attach uprobes
kprobe attach is name-based, using lookups of kallsyms to translate
a function name to an address.  Currently uprobe attach is done
via an offset value as described in [1].  Extend uprobe opts
for attach to include a function name which can then be converted
into a uprobe-friendly offset.  The calcualation is done in
several steps:

1. First, determine the symbol address using libelf; this gives us
   the offset as reported by objdump
2. If the function is a shared library function - and the binary
   provided is a shared library - no further work is required;
   the address found is the required address
3. Finally, if the function is local, subtract the base address
   associated with the object, retrieved from ELF program headers.

The resultant value is then added to the func_offset value passed
in to specify the uprobe attach address.  So specifying a func_offset
of 0 along with a function name "printf" will attach to printf entry.

The modes of operation supported are then

1. to attach to a local function in a binary; function "foo1" in
   "/usr/bin/foo"
2. to attach to a shared library function in a shared library -
   function "malloc" in libc.

[1] https://www.kernel.org/doc/html/latest/trace/uprobetracer.html

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1648654000-21758-3-git-send-email-alan.maguire@oracle.com
2022-04-06 07:34:58 -07:00
Alan Maguire
4a7fa5b2bc libbpf: auto-resolve programs/libraries when necessary for uprobes
bpf_program__attach_uprobe_opts() requires a binary_path argument
specifying binary to instrument.  Supporting simply specifying
"libc.so.6" or "foo" should be possible too.

Library search checks LD_LIBRARY_PATH, then /usr/lib64, /usr/lib.
This allows users to run BPF programs prefixed with
LD_LIBRARY_PATH=/path2/lib while still searching standard locations.
Similarly for non .so files, we check PATH and /usr/bin, /usr/sbin.

Path determination will be useful for auto-attach of BPF uprobe programs
using SEC() definition.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1648654000-21758-2-git-send-email-alan.maguire@oracle.com
2022-04-06 07:34:58 -07:00
Haiyue Wang
ff845b85e8 bpf: Correct the comment for BTF kind bitfield
The commit 8fd886911a6a ("bpf: Add BTF_KIND_FLOAT to uapi") has extended
the BTF kind bitfield from 4 to 5 bits, correct the comment.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220403115327.205964-1-haiyue.wang@intel.com
2022-04-06 07:34:58 -07:00
Geliang Tang
fee7b9400a bpf: Sync comments for bpf_get_stack
Commit ee2a098851bf missed updating the comments for helper bpf_get_stack
in tools/include/uapi/linux/bpf.h. Sync it.

Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to accommodate skip > 0")
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/ce54617746b7ed5e9ba3b844e55e74cb8a60e0b5.1648110794.git.geliang.tang@suse.com
2022-04-06 07:34:58 -07:00
Hengqi Chen
360ed84faa libbpf: Close fd in bpf_object__reuse_map
pin_fd is dup-ed and assigned in bpf_map__reuse_fd. Close it
in bpf_object__reuse_map after reuse.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220319030533.3132250-1-hengqi.chen@gmail.com
2022-04-06 07:34:58 -07:00
Anshuman Khandual
3fbed0f1b2 perf: Add irq and exception return branch types
This expands generic branch type classification by adding two more entries
there in i.e irq and exception return. Also updates the x86 implementation
to process X86_BR_IRET and X86_BR_IRQ records as appropriate. This changes
branch types reported to user space on x86 platform but it should not be a
problem. The possible scenarios and impacts are enumerated here.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1645681014-3346-1-git-send-email-anshuman.khandual@arm.com
2022-04-06 07:34:58 -07:00
Andrii Nakryiko
67a4b14643 ci: remove subprogs from 5.5 whitelist
It seems like it started to cause kernel panic in CI, so drop it from
whitelist.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-03-19 23:08:50 -07:00
Andrii Nakryiko
7db9ce5fda libbpf: avoid NULL deref when initializing map BTF info
If BPF object doesn't have an BTF info, don't attempt to search for BTF
types describing BPF map key or value layout.

Fixes: 262cfb74ffda ("libbpf: Init btf_{key,value}_type_id on internal map open")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-03-19 23:08:50 -07:00
Andrii Nakryiko
f1b6bc31a5 ci: update s390x blacklist
Sync s390x blacklist with the one currently used for kernel-patches CI.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-03-19 23:08:50 -07:00
Andrii Nakryiko
3ef1813702 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   c344b9fc2108eeaa347c387219886cf87e520e93
Checkpoint bpf-next commit: 9492450fd28736262dea9143ebb3afc2c131ace1
Baseline bpf commit:        18b1ab7aa76bde181bdb1ab19a87fa9523c32f21
Checkpoint bpf commit:      6bd0c76bd70447aedfeafa9e1fcc249991d6c678

Delyan Kratunov (3):
  libbpf: .text routines are subprograms in strict mode
  libbpf: Init btf_{key,value}_type_id on internal map open
  libbpf: Add subskeleton scaffolding

Guo Zhengkui (1):
  libbpf: Fix array_size.cocci warning

Hengqi Chen (1):
  bpf: Fix comment for helper bpf_current_task_under_cgroup()

Jiri Olsa (5):
  bpf: Add multi kprobe link
  bpf: Add cookie support to programs attached with kprobe multi link
  libbpf: Add libbpf_kallsyms_parse function
  libbpf: Add bpf_link_create support for multi kprobes
  libbpf: Add bpf_program__attach_kprobe_multi_opts function

Martin KaFai Lau (1):
  bpf: Remove BPF_SKB_DELIVERY_TIME_NONE and rename
    s/delivery_time_/tstamp_/

Roberto Sassu (1):
  bpf-lsm: Introduce new helper bpf_ima_file_hash()

Toke Høiland-Jørgensen (2):
  bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
  libbpf: Support batch_size option to bpf_prog_test_run

lic121 (1):
  libbpf: Unmap rings when umem deleted

 include/uapi/linux/bpf.h |  72 +++++---
 src/bpf.c                |  13 +-
 src/bpf.h                |  12 +-
 src/libbpf.c             | 383 ++++++++++++++++++++++++++++++++++-----
 src/libbpf.h             |  52 ++++++
 src/libbpf.map           |   3 +
 src/libbpf_internal.h    |   5 +
 src/libbpf_legacy.h      |   4 +
 src/xsk.c                |  15 +-
 9 files changed, 487 insertions(+), 72 deletions(-)

--
2.30.2
2022-03-19 23:08:50 -07:00
Andrii Nakryiko
d580bc49d1 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-03-19 23:08:50 -07:00
Delyan Kratunov
cc4ef17c78 libbpf: Add subskeleton scaffolding
In symmetry with bpf_object__open_skeleton(),
bpf_object__open_subskeleton() performs the actual walking and linking
of maps, progs, and globals described by bpf_*_skeleton objects.

Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/6942a46fbe20e7ebf970affcca307ba616985b15.1647473511.git.delyank@fb.com
2022-03-19 23:08:50 -07:00
Delyan Kratunov
e7084d4363 libbpf: Init btf_{key,value}_type_id on internal map open
For internal and user maps, look up the key and value btf
types on open() and not load(), so that `bpf_map_btf_value_type_id`
is usable in `bpftool gen`.

Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/78dbe4e457b4a05e098fc6c8f50014b680c86e4e.1647473511.git.delyank@fb.com
2022-03-19 23:08:50 -07:00
Delyan Kratunov
c2ec92f0ee libbpf: .text routines are subprograms in strict mode
Currently, libbpf considers a single routine in .text to be a program. This
is particularly confusing when it comes to library objects - a single routine
meant to be used as an extern will instead be considered a bpf_program.

This patch hides this compatibility behavior behind the pre-existing
SEC_NAME strict mode flag.

Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/018de8d0d67c04bf436055270d35d394ba393505.1647473511.git.delyank@fb.com
2022-03-19 23:08:50 -07:00
Jiri Olsa
05acce9e03 libbpf: Add bpf_program__attach_kprobe_multi_opts function
Adding bpf_program__attach_kprobe_multi_opts function for attaching
kprobe program to multiple functions.

  struct bpf_link *
  bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog,
                                        const char *pattern,
                                        const struct bpf_kprobe_multi_opts *opts);

User can specify functions to attach with 'pattern' argument that
allows wildcards (*?' supported) or provide symbols or addresses
directly through opts argument. These 3 options are mutually
exclusive.

When using symbols or addresses, user can also provide cookie value
for each symbol/address that can be retrieved later in bpf program
with bpf_get_attach_cookie helper.

  struct bpf_kprobe_multi_opts {
          size_t sz;
          const char **syms;
          const unsigned long *addrs;
          const __u64 *cookies;
          size_t cnt;
          bool retprobe;
          size_t :0;
  };

Symbols, addresses and cookies are provided through opts object
(syms/addrs/cookies) as array pointers with specified count (cnt).

Each cookie value is paired with provided function address or symbol
with the same array index.

The program can be also attached as return probe if 'retprobe' is set.

For quick usage with NULL opts argument, like:

  bpf_program__attach_kprobe_multi_opts(prog, "ksys_*", NULL)

the 'prog' will be attached as kprobe to 'ksys_*' functions.

Also adding new program sections for automatic attachment:

  kprobe.multi/<symbol_pattern>
  kretprobe.multi/<symbol_pattern>

The symbol_pattern is used as 'pattern' argument in
bpf_program__attach_kprobe_multi_opts function.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220316122419.933957-10-jolsa@kernel.org
2022-03-19 23:08:50 -07:00
Jiri Olsa
2e6e39ef80 libbpf: Add bpf_link_create support for multi kprobes
Adding new kprobe_multi struct to bpf_link_create_opts object
to pass multiple kprobe data to link_create attr uapi.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220316122419.933957-9-jolsa@kernel.org
2022-03-19 23:08:50 -07:00
Jiri Olsa
42f78dd5ac libbpf: Add libbpf_kallsyms_parse function
Move the kallsyms parsing in internal libbpf_kallsyms_parse
function, so it can be used from other places.

It will be used in following changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220316122419.933957-8-jolsa@kernel.org
2022-03-19 23:08:50 -07:00
Jiri Olsa
50ae8c25d2 bpf: Add cookie support to programs attached with kprobe multi link
Adding support to call bpf_get_attach_cookie helper from
kprobe programs attached with kprobe multi link.

The cookie is provided by array of u64 values, where each
value is paired with provided function address or symbol
with the same array index.

When cookie array is provided it's sorted together with
addresses (check bpf_kprobe_multi_cookie_swap). This way
we can find cookie based on the address in
bpf_get_attach_cookie helper.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220316122419.933957-7-jolsa@kernel.org
2022-03-19 23:08:50 -07:00
Jiri Olsa
e85e26492d bpf: Add multi kprobe link
Adding new link type BPF_LINK_TYPE_KPROBE_MULTI that attaches kprobe
program through fprobe API.

The fprobe API allows to attach probe on multiple functions at once
very fast, because it works on top of ftrace. On the other hand this
limits the probe point to the function entry or return.

The kprobe program gets the same pt_regs input ctx as when it's attached
through the perf API.

Adding new attach type BPF_TRACE_KPROBE_MULTI that allows attachment
kprobe to multiple function with new link.

User provides array of addresses or symbols with count to attach the
kprobe program to. The new link_create uapi interface looks like:

  struct {
          __u32           flags;
          __u32           cnt;
          __aligned_u64   syms;
          __aligned_u64   addrs;
  } kprobe_multi;

The flags field allows single BPF_TRACE_KPROBE_MULTI bit to create
return multi kprobe.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220316122419.933957-4-jolsa@kernel.org
2022-03-19 23:08:50 -07:00
Roberto Sassu
9fb154ee77 bpf-lsm: Introduce new helper bpf_ima_file_hash()
ima_file_hash() has been modified to calculate the measurement of a file on
demand, if it has not been already performed by IMA or the measurement is
not fresh. For compatibility reasons, ima_inode_hash() remains unchanged.

Keep the same approach in eBPF and introduce the new helper
bpf_ima_file_hash() to take advantage of the modified behavior of
ima_file_hash().

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220302111404.193900-4-roberto.sassu@huawei.com
2022-03-19 23:08:50 -07:00
Hengqi Chen
34d57cc0eb bpf: Fix comment for helper bpf_current_task_under_cgroup()
Fix the descriptions of the return values of helper bpf_current_task_under_cgroup().

Fixes: c6b5fb8690fa ("bpf: add documentation for eBPF helpers (42-50)")
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220310155335.1278783-1-hengqi.chen@gmail.com
2022-03-19 23:08:50 -07:00
Martin KaFai Lau
a557610d11 bpf: Remove BPF_SKB_DELIVERY_TIME_NONE and rename s/delivery_time_/tstamp_/
This patch is to simplify the uapi bpf.h regarding to the tstamp type
and use a similar way as the kernel to describe the value stored
in __sk_buff->tstamp.

My earlier thought was to avoid describing the semantic and
clock base for the rcv timestamp until there is more clarity
on the use case, so the __sk_buff->delivery_time_type naming instead
of __sk_buff->tstamp_type.

With some thoughts, it can reuse the UNSPEC naming.  This patch first
removes BPF_SKB_DELIVERY_TIME_NONE and also

rename BPF_SKB_DELIVERY_TIME_UNSPEC to BPF_SKB_TSTAMP_UNSPEC
and    BPF_SKB_DELIVERY_TIME_MONO   to BPF_SKB_TSTAMP_DELIVERY_MONO.

The semantic of BPF_SKB_TSTAMP_DELIVERY_MONO is the same:
__sk_buff->tstamp has delivery time in mono clock base.

BPF_SKB_TSTAMP_UNSPEC means __sk_buff->tstamp has the (rcv)
tstamp at ingress and the delivery time at egress.  At egress,
the clock base could be found from skb->sk->sk_clockid.
__sk_buff->tstamp == 0 naturally means NONE, so NONE is not needed.

With BPF_SKB_TSTAMP_UNSPEC for the rcv tstamp at ingress,
the __sk_buff->delivery_time_type is also renamed to __sk_buff->tstamp_type
which was also suggested in the earlier discussion:
https://lore.kernel.org/bpf/b181acbe-caf8-502d-4b7b-7d96b9fc5d55@iogearbox.net/

The above will then make __sk_buff->tstamp and __sk_buff->tstamp_type
the same as its kernel skb->tstamp and skb->mono_delivery_time
counter part.

The internal kernel function bpf_skb_convert_dtime_type_read() is then
renamed to bpf_skb_convert_tstamp_type_read() and it can be simplified
with the BPF_SKB_DELIVERY_TIME_NONE gone.  A BPF_ALU32_IMM(BPF_AND)
insn is also saved by using BPF_JMP32_IMM(BPF_JSET).

The bpf helper bpf_skb_set_delivery_time() is also renamed to
bpf_skb_set_tstamp().  The arg name is changed from dtime
to tstamp also.  It only allows setting tstamp 0 for
BPF_SKB_TSTAMP_UNSPEC and it could be relaxed later
if there is use case to change mono delivery time to
non mono.

prog->delivery_time_access is also renamed to prog->tstamp_type_access.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220309090509.3712315-1-kafai@fb.com
2022-03-19 23:08:50 -07:00
Toke Høiland-Jørgensen
5ad674a007 libbpf: Support batch_size option to bpf_prog_test_run
Add support for setting the new batch_size parameter to BPF_PROG_TEST_RUN
to libbpf; just add it as an option and pass it through to the kernel.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-4-toke@redhat.com
2022-03-19 23:08:50 -07:00
Toke Høiland-Jørgensen
d647265e4b bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.

The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.

When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.

Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.

To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.

Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.

Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-19 23:08:50 -07:00
Guo Zhengkui
21cd83a1d1 libbpf: Fix array_size.cocci warning
Fix the following coccicheck warning:
tools/lib/bpf/bpf.c:114:31-32: WARNING: Use ARRAY_SIZE
tools/lib/bpf/xsk.c:484:34-35: WARNING: Use ARRAY_SIZE
tools/lib/bpf/xsk.c:485:35-36: WARNING: Use ARRAY_SIZE

It has been tested with gcc (Debian 8.3.0-6) 8.3.0 on x86_64.

Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220306023426.19324-1-guozhengkui@vivo.com
2022-03-19 23:08:50 -07:00
lic121
6e77ef94f0 libbpf: Unmap rings when umem deleted
xsk_umem__create() does mmap for fill/comp rings, but xsk_umem__delete()
doesn't do the unmap. This works fine for regular cases, because
xsk_socket__delete() does unmap for the rings. But for the case that
xsk_socket__create_shared() fails, umem rings are not unmapped.

fill_save/comp_save are checked to determine if rings have already be
unmapped by xsk. If fill_save and comp_save are NULL, it means that the
rings have already been used by xsk. Then they are supposed to be
unmapped by xsk_socket__delete(). Otherwise, xsk_umem__delete() does the
unmap.

Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices")
Signed-off-by: Cheng Li <lic121@chinatelecom.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220301132623.GA19995@vscode.7~
2022-03-19 23:08:50 -07:00
Andrii Nakryiko
c84815ee37 ci: enable CONFIG_FPROBE=y for multi-attach kprobe tests
Recently landed multi-attach kprobe functionality expects
CONFIG_FPROBE=y.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-03-18 00:52:43 -07:00
Mykola Lysenko
4282f3cdec ci: Add troubleshooting steps to s390x setup readme
Related to libbpf CI. Added more information on how
to setup and troubleshoot GitHub action runners for
s390x platform.

Signed-off-by: Mykola Lysenko <mykolal@fb.com>
2022-03-17 21:19:03 -07:00
Andrii Nakryiko
3591deb9bc ci: blacklist s390x tests
Blacklist timer_crash_mode as requiring BPF trampoline.

Temporary blacklist sk_lookup due to big-endian problems that haven't
been resolved upstream yet.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-03-07 22:16:11 -08:00
Andrii Nakryiko
767badc609 Makefile: update libbpf version to 0.8.0
New version cycle, bump LIBBPF_MINOR_VERSION to 8 in Makefile.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-03-07 22:16:11 -08:00
Andrii Nakryiko
8e654d74c4 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   b75dacaac4650478ed5a9d33975b91b99016daff
Checkpoint bpf-next commit: c344b9fc2108eeaa347c387219886cf87e520e93
Baseline bpf commit:        75134f16e7dd0007aa474b281935c5f42e79f2c8
Checkpoint bpf commit:      18b1ab7aa76bde181bdb1ab19a87fa9523c32f21

Andrii Nakryiko (2):
  libbpf: Allow BPF program auto-attach handlers to bail out
  libbpf: Support custom SEC() handlers

Hangbin Liu (1):
  bonding: add new option ns_ip6_target

Martin KaFai Lau (1):
  bpf: Add __sk_buff->delivery_time_type and
    bpf_skb_set_skb_delivery_time()

Stijn Tintel (1):
  libbpf: Fix BPF_MAP_TYPE_PERF_EVENT_ARRAY auto-pinning

Xu Kuohai (1):
  libbpf: Skip forward declaration when counting duplicated type names

Yuntao Wang (3):
  libbpf: Remove redundant check in btf_fixup_datasec()
  libbpf: Simplify the find_elf_sec_sz() function
  libbpf: Add a check to ensure that page_cnt is non-zero

 include/uapi/linux/bpf.h     |  41 +++-
 include/uapi/linux/if_link.h |   1 +
 src/btf_dump.c               |   5 +
 src/libbpf.c                 | 388 +++++++++++++++++++++++------------
 src/libbpf.h                 | 109 ++++++++++
 src/libbpf.map               |   6 +
 src/libbpf_version.h         |   2 +-
 7 files changed, 423 insertions(+), 129 deletions(-)

--
2.30.2
2022-03-07 22:16:11 -08:00
Andrii Nakryiko
dac1e23c97 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-03-07 22:16:11 -08:00
Andrii Nakryiko
dc679587eb libbpf: Support custom SEC() handlers
Allow registering and unregistering custom handlers for BPF program.
This allows user applications and libraries to plug into libbpf's
declarative SEC() definition handling logic. This allows to offload
complex and intricate custom logic into external libraries, but still
provide a great user experience.

One such example is USDT handling library, which has a lot of code and
complexity which doesn't make sense to put into libbpf directly, but it
would be really great for users to be able to specify BPF programs with
something like SEC("usdt/<path-to-binary>:<usdt_provider>:<usdt_name>")
and have correct BPF program type set (BPF_PROGRAM_TYPE_KPROBE, as it is
uprobe) and even support BPF skeleton's auto-attach logic.

In some cases, it might be even good idea to override libbpf's default
handling, like for SEC("perf_event") programs. With custom library, it's
possible to extend logic to support specifying perf event specification
right there in SEC() definition without burdening libbpf with lots of
custom logic or extra library dependecies (e.g., libpfm4). With current
patch it's possible to override libbpf's SEC("perf_event") handling and
specify a completely custom ones.

Further, it's possible to specify a generic fallback handling for any
SEC() that doesn't match any other custom or standard libbpf handlers.
This allows to accommodate whatever legacy use cases there might be, if
necessary.

See doc comments for libbpf_register_prog_handler() and
libbpf_unregister_prog_handler() for detailed semantics.

This patch also bumps libbpf development version to v0.8 and adds new
APIs there.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20220305010129.1549719-3-andrii@kernel.org
2022-03-07 22:16:11 -08:00
Andrii Nakryiko
0d834905d8 libbpf: Allow BPF program auto-attach handlers to bail out
Allow some BPF program types to support auto-attach only in subste of
cases. Currently, if some BPF program type specifies attach callback, it
is assumed that during skeleton attach operation all such programs
either successfully attach or entire skeleton attachment fails. If some
program doesn't support auto-attachment from skeleton, such BPF program
types shouldn't have attach callback specified.

This is limiting for cases when, depending on how full the SEC("")
definition is, there could either be enough details to support
auto-attach or there might not be and user has to use some specific API
to provide more details at runtime.

One specific example of such desired behavior might be SEC("uprobe"). If
it's specified as just uprobe auto-attach isn't possible. But if it's
SEC("uprobe/<some_binary>:<some_func>") then there are enough details to
support auto-attach. Note that there is a somewhat subtle difference
between auto-attach behavior of BPF skeleton and using "generic"
bpf_program__attach(prog) (which uses the same attach handlers under the
cover). Skeleton allow some programs within bpf_object to not have
auto-attach implemented and doesn't treat that as an error. Instead such
BPF programs are just skipped during skeleton's (optional) attach step.
bpf_program__attach(), on the other hand, is called when user *expects*
auto-attach to work, so if specified program doesn't implement or
doesn't support auto-attach functionality, that will be treated as an
error.

Another improvement to the way libbpf is handling SEC()s would be to not
require providing dummy kernel function name for kprobe. Currently,
SEC("kprobe/whatever") is necessary even if actual kernel function is
determined by user at runtime and bpf_program__attach_kprobe() is used
to specify it. With changes in this patch, it's possible to support both
SEC("kprobe") and SEC("kprobe/<actual_kernel_function"), while only in
the latter case auto-attach will be performed. In the former one, such
kprobe will be skipped during skeleton attach operation.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20220305010129.1549719-2-andrii@kernel.org
2022-03-07 22:16:11 -08:00
Yuntao Wang
0a43bc8905 libbpf: Add a check to ensure that page_cnt is non-zero
The page_cnt parameter is used to specify the number of memory pages
allocated for each per-CPU buffer, it must be non-zero and a power of 2.

Currently, the __perf_buffer__new() function attempts to validate that
the page_cnt is a power of 2 but forgets checking for the case where
page_cnt is zero, we can fix it by replacing 'page_cnt & (page_cnt - 1)'
with 'page_cnt == 0 || (page_cnt & (page_cnt - 1))'.

If so, we also don't need to add a check in perf_buffer__new_v0_6_0() to
make sure that page_cnt is non-zero and the check for zero in
perf_buffer__new_raw_v0_6_0() can also be removed.

The code will be cleaner and more readable.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220303005921.53436-1-ytcoode@gmail.com
2022-03-07 22:16:11 -08:00
Xu Kuohai
5d491d5d07 libbpf: Skip forward declaration when counting duplicated type names
Currently if a declaration appears in the BTF before the definition, the
definition is dumped as a conflicting name, e.g.:

    $ bpftool btf dump file vmlinux format raw | grep "'unix_sock'"
    [81287] FWD 'unix_sock' fwd_kind=struct
    [89336] STRUCT 'unix_sock' size=1024 vlen=14

    $ bpftool btf dump file vmlinux format c | grep "struct unix_sock"
    struct unix_sock;
    struct unix_sock___2 {	<--- conflict, the "___2" is unexpected
		    struct unix_sock___2 *unix_sk;

This causes a compilation error if the dump output is used as a header file.

Fix it by skipping declaration when counting duplicated type names.

Fixes: 351131b51c7a ("libbpf: add btf_dump API for BTF-to-C conversion")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220301053250.1464204-2-xukuohai@huawei.com
2022-03-07 22:16:11 -08:00
Stijn Tintel
9b53decb02 libbpf: Fix BPF_MAP_TYPE_PERF_EVENT_ARRAY auto-pinning
When a BPF map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY doesn't have the
max_entries parameter set, the map will be created with max_entries set
to the number of available CPUs. When we try to reuse such a pinned map,
map_is_reuse_compat will return false, as max_entries in the map
definition differs from max_entries of the existing map, causing the
following error:

  libbpf: couldn't reuse pinned map at '/sys/fs/bpf/m_logging': parameter mismatch

Fix this by overwriting max_entries in the map definition. For this to
work, we need to do this in bpf_object__create_maps, before calling
bpf_object__reuse_map.

Fixes: 57a00f41644f ("libbpf: Add auto-pinning of maps when loading BPF objects")
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220225152355.315204-1-stijn@linux-ipv6.be
2022-03-07 22:16:11 -08:00
Yuntao Wang
426672106e libbpf: Simplify the find_elf_sec_sz() function
The check in the last return statement is unnecessary, we can just return
the ret variable.

But we can simplify the function further by returning 0 immediately if we
find the section size and -ENOENT otherwise.

Thus we can also remove the ret variable.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220223085244.3058118-1-ytcoode@gmail.com
2022-03-07 22:16:11 -08:00
Yuntao Wang
c85a8bbe9c libbpf: Remove redundant check in btf_fixup_datasec()
The check 't->size && t->size != size' is redundant because if t->size
compares unequal to 0, we will just skip straight to sorting variables.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220220072750.209215-1-ytcoode@gmail.com
2022-03-07 22:16:11 -08:00
Martin KaFai Lau
e7997e49ea bpf: Add __sk_buff->delivery_time_type and bpf_skb_set_skb_delivery_time()
* __sk_buff->delivery_time_type:
This patch adds __sk_buff->delivery_time_type.  It tells if the
delivery_time is stored in __sk_buff->tstamp or not.

It will be most useful for ingress to tell if the __sk_buff->tstamp
has the (rcv) timestamp or delivery_time.  If delivery_time_type
is 0 (BPF_SKB_DELIVERY_TIME_NONE), it has the (rcv) timestamp.

Two non-zero types are defined for the delivery_time_type,
BPF_SKB_DELIVERY_TIME_MONO and BPF_SKB_DELIVERY_TIME_UNSPEC.  For UNSPEC,
it can only happen in egress because only mono delivery_time can be
forwarded to ingress now.  The clock of UNSPEC delivery_time
can be deduced from the skb->sk->sk_clockid which is how
the sch_etf doing it also.

* Provide forwarded delivery_time to tc-bpf@ingress:
With the help of the new delivery_time_type, the tc-bpf has a way
to tell if the __sk_buff->tstamp has the (rcv) timestamp or
the delivery_time.  During bpf load time, the verifier will learn if
the bpf prog has accessed the new __sk_buff->delivery_time_type.
If it does, it means the tc-bpf@ingress is expecting the
skb->tstamp could have the delivery_time.  The kernel will then
read the skb->tstamp as-is during bpf insn rewrite without
checking the skb->mono_delivery_time.  This is done by adding a
new prog->delivery_time_access bit.  The same goes for
writing skb->tstamp.

* bpf_skb_set_delivery_time():
The bpf_skb_set_delivery_time() helper is added to allow setting both
delivery_time and the delivery_time_type at the same time.  If the
tc-bpf does not need to change the delivery_time_type, it can directly
write to the __sk_buff->tstamp as the existing tc-bpf has already been
doing.  It will be most useful at ingress to change the
__sk_buff->tstamp from the (rcv) timestamp to
a mono delivery_time and then bpf_redirect_*().

bpf only has mono clock helper (bpf_ktime_get_ns), and
the current known use case is the mono EDT for fq, and
only mono delivery time can be kept during forward now,
so bpf_skb_set_delivery_time() only supports setting
BPF_SKB_DELIVERY_TIME_MONO.  It can be extended later when use cases
come up and the forwarding path also supports other clock bases.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-07 22:16:11 -08:00
Hangbin Liu
4c560383a6 bonding: add new option ns_ip6_target
This patch add a new bonding option ns_ip6_target, which correspond
to the arp_ip_target. With this we set IPv6 targets and send IPv6 NS
request to determine the health of the link.

For other related options like the validation, we still use
arp_validate, and will change to ns_validate later.

Note: the sysfs configuration support was removed based on
https://lore.kernel.org/netdev/8863.1645071997@famine

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-07 22:16:11 -08:00
Andrii Nakryiko
9c44c8a8e0 LICENSE: fix BSD-2-Clause by adding year and authors
Seems like 2015 is the year of the first libbpf commit. So use Lorenz's
suggestion and add "(c) 2015 The Libbpf Authors".

Closes: https://github.com/libbpf/libbpf/issues/461
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-02-23 17:55:21 -08:00
Andrii Nakryiko
1c173e5fc8 libbpf: fix libbpf.pc generation w.r.t. patch versions
Ensure that libbpf.pc gets full libbpf's version, including patch
releases. Also add some mechanism to ensure that official released
version (e.g., 0.7.1) and the one recorded in libbpf.map (which never
bumps patch version, so will be 0.7.0) are in sync up to major and minor
versions. This should ensure that major mistakes are captured. We'll
still need to be very careful with zeroing out patch version on minor
version bumps.

Closes: https://github.com/libbpf/libbpf/issues/455
Reported-by: Michel Salim <michel@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-02-22 20:06:42 -08:00
Andrii Nakryiko
93c570ca4b sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   2e3f7bed28376a1a41ce4a58b7163b586e97a546
Checkpoint bpf-next commit: b75dacaac4650478ed5a9d33975b91b99016daff
Baseline bpf commit:        45ce4b4f9009102cd9f581196d480a59208690c1
Checkpoint bpf commit:      75134f16e7dd0007aa474b281935c5f42e79f2c8

Andrii Nakryiko (1):
  libbpf: Fix memleak in libbpf_netlink_recv()

 src/netlink.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

--
2.30.2
2022-02-17 11:33:57 -08:00
Andrii Nakryiko
33201b7ebd libbpf: Fix memleak in libbpf_netlink_recv()
Ensure that libbpf_netlink_recv() frees dynamically allocated buffer in
all code paths.

Fixes: 9c3de619e13e ("libbpf: Use dynamically allocated buffer when receiving netlink messages")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20220217073958.276959-1-andrii@kernel.org
2022-02-17 11:33:57 -08:00
Andrii Nakryiko
6edaacad4f sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   8cbf062a250ed52148badf6f3ffd03657dd4a3f0
Checkpoint bpf-next commit: 2e3f7bed28376a1a41ce4a58b7163b586e97a546
Baseline bpf commit:        61d06f01f9710b327a53492e5add9f972eb909b3
Checkpoint bpf commit:      45ce4b4f9009102cd9f581196d480a59208690c1

Mauricio Vásquez (2):
  libbpf: Split bpf_core_apply_relo()
  libbpf: Expose bpf_core_{add,free}_cands() to bpftool

 src/libbpf.c          | 88 ++++++++++++++++++++++++-------------------
 src/libbpf_internal.h |  9 +++++
 src/relo_core.c       | 79 +++++++++++---------------------------
 src/relo_core.h       | 42 ++++++++++++++++++---
 4 files changed, 118 insertions(+), 100 deletions(-)

--
2.30.2
2022-02-16 13:58:30 -08:00
Mauricio Vásquez
af29a83fe2 libbpf: Expose bpf_core_{add,free}_cands() to bpftool
Expose bpf_core_add_cands() and bpf_core_free_cands() to handle
candidates list.

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com>
Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co>
Signed-off-by: Leonardo Di Donato <leonardo.didonato@elastic.co>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220215225856.671072-3-mauricio@kinvolk.io
2022-02-16 13:58:30 -08:00
Mauricio Vásquez
6387d3900f libbpf: Split bpf_core_apply_relo()
BTFGen needs to run the core relocation logic in order to understand
what are the types involved in a given relocation.

Currently bpf_core_apply_relo() calculates and **applies** a relocation
to an instruction. Having both operations in the same function makes it
difficult to only calculate the relocation without patching the
instruction. This commit splits that logic in two different phases: (1)
calculate the relocation and (2) patch the instruction.

For the first phase bpf_core_apply_relo() is renamed to
bpf_core_calc_relo_insn() who is now only on charge of calculating the
relocation, the second phase uses the already existing
bpf_core_patch_insn(). bpf_object__relocate_core() uses both of them and
the BTFGen will use only bpf_core_calc_relo_insn().

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com>
Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co>
Signed-off-by: Leonardo Di Donato <leonardo.didonato@elastic.co>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220215225856.671072-2-mauricio@kinvolk.io
2022-02-16 13:58:30 -08:00
Andrii Nakryiko
196da61f1d sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   dc37dc617fabfb1c3a16d49f5d8cc20e9e3608ca
Checkpoint bpf-next commit: 8cbf062a250ed52148badf6f3ffd03657dd4a3f0
Baseline bpf commit:        fe68195daf34d5dddacd3f93dd3eafc4beca3a0e
Checkpoint bpf commit:      61d06f01f9710b327a53492e5add9f972eb909b3

Alexei Starovoitov (1):
  libbpf: Prepare light skeleton for the kernel.

Jakub Sitnicki (1):
  selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup

Marco Elver (1):
  perf: uapi: Document perf_event_attr::sig_data truncation on 32 bit
    architectures

Toke Høiland-Jørgensen (1):
  libbpf: Use dynamically allocated buffer when receiving netlink
    messages

 include/uapi/linux/bpf.h        |   3 +-
 include/uapi/linux/perf_event.h |   2 +
 src/gen_loader.c                |  15 ++-
 src/netlink.c                   |  55 +++++++++-
 src/skel_internal.h             | 185 ++++++++++++++++++++++++++++----
 5 files changed, 234 insertions(+), 26 deletions(-)

--
2.30.2
2022-02-15 22:32:04 -08:00
Marco Elver
db8dc47ce8 perf: uapi: Document perf_event_attr::sig_data truncation on 32 bit architectures
Due to the alignment requirements of siginfo_t, as described in
3ddb3fd8cdb0 ("signal, perf: Fix siginfo_t by avoiding u64 on 32-bit
architectures"), siginfo_t::si_perf_data is limited to an unsigned long.

However, perf_event_attr::sig_data is an u64, to avoid having to deal
with compat conversions. Due to being an u64, it may not immediately be
clear to users that sig_data is truncated on 32 bit architectures.

Add a comment to explicitly point this out, and hopefully help some
users save time by not having to deduce themselves what's happening.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/r/20220131103407.1971678-3-elver@google.com
2022-02-15 22:32:04 -08:00
Toke Høiland-Jørgensen
f7d89c3910 libbpf: Use dynamically allocated buffer when receiving netlink messages
When receiving netlink messages, libbpf was using a statically allocated
stack buffer of 4k bytes. This happened to work fine on systems with a 4k
page size, but on systems with larger page sizes it can lead to truncated
messages. The user-visible impact of this was that libbpf would insist no
XDP program was attached to some interfaces because that bit of the netlink
message got chopped off.

Fix this by switching to a dynamically allocated buffer; we borrow the
approach from iproute2 of using recvmsg() with MSG_PEEK|MSG_TRUNC to get
the actual size of the pending message before receiving it, adjusting the
buffer as necessary. While we're at it, also add retries on interrupted
system calls around the recvmsg() call.

v2:
  - Move peek logic to libbpf_netlink_recv(), don't double free on ENOMEM.

Fixes: 8bbb77b7c7a2 ("libbpf: Add various netlink helpers")
Reported-by: Zhiqian Guan <zhguan@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20220211234819.612288-1-toke@redhat.com
2022-02-15 22:32:04 -08:00
Alexei Starovoitov
0d6262ad0a libbpf: Prepare light skeleton for the kernel.
Prepare light skeleton to be used in the kernel module and in the user space.
The look and feel of lskel.h is mostly the same with the difference that for
user space the skel->rodata is the same pointer before and after skel_load
operation, while in the kernel the skel->rodata after skel_open and the
skel->rodata after skel_load are different pointers.
Typical usage of skeleton remains the same for kernel and user space:
skel = my_bpf__open();
skel->rodata->my_global_var = init_val;
err = my_bpf__load(skel);
err = my_bpf__attach(skel);
// access skel->rodata->my_global_var;
// access skel->bss->another_var;

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209232001.27490-3-alexei.starovoitov@gmail.com
2022-02-15 22:32:04 -08:00
Jakub Sitnicki
7593fc7a85 selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup
Extend the context access tests for sk_lookup prog to cover the surprising
case of a 4-byte load from the remote_port field, where the expected value
is actually shifted by 16 bits.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220209184333.654927-3-jakub@cloudflare.com
2022-02-15 22:32:04 -08:00
Andrii Nakryiko
67f813c8a8 README: add libbpf distro packaging badge
Add badge displaying libbpf's packaging status across various Linux distros.
2022-02-11 21:21:10 -08:00
Andrii Nakryiko
2cd2d03f63 libbpf: Fix libbpf.map inheritance chain for LIBBPF_0.7.0
Ensure that LIBBPF_0.7.0 inherits everything from LIBBPF_0.6.0.

Fixes: dbdd2c7f8cec ("libbpf: Add API to get/set log_level at per-program level")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220211205235.2089104-1-andrii@kernel.org
2022-02-11 13:01:37 -08:00
thiagoftsm
74c16e9a0c Merge branch 'libbpf:master' into master 2022-02-11 14:34:02 +00:00
Andrii Nakryiko
528094c0c1 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   227a0713b319e7a8605312dee1c97c97a719a9fc
Checkpoint bpf-next commit: dc37dc617fabfb1c3a16d49f5d8cc20e9e3608ca
Baseline bpf commit:        77b1b8b43ec3c060ecf7e926a92b0f8772171046
Checkpoint bpf commit:      fe68195daf34d5dddacd3f93dd3eafc4beca3a0e

Andrii Nakryiko (1):
  libbpf: Fix compilation warning due to mismatched printf format

Dan Carpenter (1):
  libbpf: Fix signedness bug in btf_dump_array_data()

Hengqi Chen (1):
  libbpf: Add BPF_KPROBE_SYSCALL macro

Ilya Leoshkevich (7):
  libbpf: Add PT_REGS_SYSCALL_REGS macro
  libbpf: Fix accessing syscall arguments on powerpc
  libbpf: Fix riscv register names
  libbpf: Fix accessing syscall arguments on riscv
  libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL
  libbpf: Fix accessing the first syscall argument on arm64
  libbpf: Fix accessing the first syscall argument on s390

Mauricio Vásquez (1):
  libbpf: Remove mode check in libbpf_set_strict_mode()

 src/bpf_tracing.h | 85 +++++++++++++++++++++++++++++++++++++++++------
 src/btf_dump.c    |  6 ++--
 src/libbpf.c      |  8 -----
 3 files changed, 79 insertions(+), 20 deletions(-)

--
2.30.2
2022-02-09 09:48:32 -08:00
Andrii Nakryiko
37493e639f libbpf: Fix compilation warning due to mismatched printf format
On ppc64le architecture __s64 is long int and requires %ld. Cast to
ssize_t and use %zd to avoid architecture-specific specifiers.

Fixes: 4172843ed4a3 ("libbpf: Fix signedness bug in btf_dump_array_data()")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220209063909.1268319-1-andrii@kernel.org
2022-02-09 09:48:32 -08:00
Hengqi Chen
b0a3b9e8fe libbpf: Add BPF_KPROBE_SYSCALL macro
Add syscall-specific variant of BPF_KPROBE named BPF_KPROBE_SYSCALL ([0]).
The new macro hides the underlying way of getting syscall input arguments.
With the new macro, the following code:

    SEC("kprobe/__x64_sys_close")
    int BPF_KPROBE(do_sys_close, struct pt_regs *regs)
    {
        int fd;

        fd = PT_REGS_PARM1_CORE(regs);
        /* do something with fd */
    }

can be written as:

    SEC("kprobe/__x64_sys_close")
    int BPF_KPROBE_SYSCALL(do_sys_close, int fd)
    {
        /* do something with fd */
    }

  [0] Closes: https://github.com/libbpf/libbpf/issues/425

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220207143134.2977852-2-hengqi.chen@gmail.com
2022-02-09 09:48:32 -08:00
Ilya Leoshkevich
f7e08b4a8f libbpf: Fix accessing the first syscall argument on s390
On s390, the first syscall argument should be accessed via orig_gpr2
(see arch/s390/include/asm/syscall.h). Currently gpr[2] is used
instead, leading to bpf_syscall_macro test failure.

orig_gpr2 cannot be added to user_pt_regs, since its layout is a part
of the ABI. Therefore provide access to it only through
PT_REGS_PARM1_CORE_SYSCALL() by using a struct pt_regs flavor.

Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-11-iii@linux.ibm.com
2022-02-09 09:48:32 -08:00
Ilya Leoshkevich
9f6e3a7a59 libbpf: Fix accessing the first syscall argument on arm64
On arm64, the first syscall argument should be accessed via orig_x0
(see arch/arm64/include/asm/syscall.h). Currently regs[0] is used
instead, leading to bpf_syscall_macro test failure.

orig_x0 cannot be added to struct user_pt_regs, since its layout is a
part of the ABI. Therefore provide access to it only through
PT_REGS_PARM1_CORE_SYSCALL() by using a struct pt_regs flavor.

Reported-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-10-iii@linux.ibm.com
2022-02-09 09:48:32 -08:00
Ilya Leoshkevich
f1a756d793 libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL
arm64 and s390 need a special way to access the first syscall argument.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-9-iii@linux.ibm.com
2022-02-09 09:48:32 -08:00
Ilya Leoshkevich
32c19d8505 libbpf: Fix accessing syscall arguments on riscv
riscv does not select ARCH_HAS_SYSCALL_WRAPPER, so its syscall
handlers take "unpacked" syscall arguments. Indicate this to libbpf
using PT_REGS_SYSCALL_REGS macro.

Reported-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-7-iii@linux.ibm.com
2022-02-09 09:48:32 -08:00
Ilya Leoshkevich
497ec1d35c libbpf: Fix riscv register names
riscv registers are accessed via struct user_regs_struct, not struct
pt_regs. The program counter member in this struct is called pc, not
epc. The frame pointer is called s0, not fp.

Fixes: 3cc31d794097 ("libbpf: Normalize PT_REGS_xxx() macro definitions")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-6-iii@linux.ibm.com
2022-02-09 09:48:32 -08:00
Ilya Leoshkevich
8a28842a20 libbpf: Fix accessing syscall arguments on powerpc
powerpc does not select ARCH_HAS_SYSCALL_WRAPPER, so its syscall
handlers take "unpacked" syscall arguments. Indicate this to libbpf
using PT_REGS_SYSCALL_REGS macro.

Reported-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-5-iii@linux.ibm.com
2022-02-09 09:48:32 -08:00
Ilya Leoshkevich
50b4d99bbc libbpf: Add PT_REGS_SYSCALL_REGS macro
Architectures that select ARCH_HAS_SYSCALL_WRAPPER pass a pointer to
struct pt_regs to syscall handlers, others unpack it into individual
function parameters. Introduce a macro to describe what a particular
arch does.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-3-iii@linux.ibm.com
2022-02-09 09:48:32 -08:00
Dan Carpenter
f5bd7054f9 libbpf: Fix signedness bug in btf_dump_array_data()
The btf__resolve_size() function returns negative error codes so
"elem_size" must be signed for the error handling to work.

Fixes: 920d16af9b42 ("libbpf: BTF dumper support for typed data")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220208071552.GB10495@kili
2022-02-09 09:48:32 -08:00
Mauricio Vásquez
1ceec54cb0 libbpf: Remove mode check in libbpf_set_strict_mode()
libbpf_set_strict_mode() checks that the passed mode doesn't contain
extra bits for LIBBPF_STRICT_* flags that don't exist yet.

It makes it difficult for applications to disable some strict flags as
something like "LIBBPF_STRICT_ALL & ~LIBBPF_STRICT_MAP_DEFINITIONS"
is rejected by this check and they have to use a rather complicated
formula to calculate it.[0]

One possibility is to change LIBBPF_STRICT_ALL to only contain the bits
of all existing LIBBPF_STRICT_* flags instead of 0xffffffff. However
it's not possible because the idea is that applications compiled against
older libbpf_legacy.h would still be opting into latest
LIBBPF_STRICT_ALL features.[1]

The other possibility is to remove that check so something like
"LIBBPF_STRICT_ALL & ~LIBBPF_STRICT_MAP_DEFINITIONS" is allowed. It's
what this commit does.

[0]: https://lore.kernel.org/bpf/20220204220435.301896-1-mauricio@kinvolk.io/
[1]: https://lore.kernel.org/bpf/CAEf4BzaTWa9fELJLh+bxnOb0P1EMQmaRbJVG0L+nXZdy0b8G3Q@mail.gmail.com/

Fixes: 93b8952d223a ("libbpf: deprecate legacy BPF map definitions")
Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220207145052.124421-2-mauricio@kinvolk.io
2022-02-09 09:48:32 -08:00
Andrii Nakryiko
d6783c28b4 README: update Ubuntu package link
Point to libbpf package in Ubuntu Impish, so it doesn't expire soon.

Closes: https://github.com/libbpf/libbpf/issues/451
2022-02-07 09:37:49 -08:00
Andrii Nakryiko
cdef8257a8 ci: bump LLVM_VER to 15
Another bump of latest clang version to be used during BPF selftests build.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-02-04 15:28:06 -08:00
Andrii Nakryiko
fd181bc349 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   b3dddab2ff10853aa3ef70483415d07fee3034ba
Checkpoint bpf-next commit: 227a0713b319e7a8605312dee1c97c97a719a9fc
Baseline bpf commit:        e2bcbd7769ee8f05e1b3d10848aace98973844e4
Checkpoint bpf commit:      77b1b8b43ec3c060ecf7e926a92b0f8772171046

Alexei Starovoitov (2):
  libbpf: Open code low level bpf commands.
  libbpf: Open code raw_tp_open and link_create commands.

Andrii Nakryiko (2):
  libbpf: Stop using deprecated bpf_map__is_offload_neutral()
  libbpf: Deprecate forgotten btf__get_map_kv_tids()

Dave Marchevsky (1):
  libbpf: Deprecate btf_ext rec_size APIs

Delyan Kratunov (2):
  libbpf: Deprecate bpf_prog_test_run_xattr and bpf_prog_test_run
  libbpf: Deprecate priv/set_priv storage

Jakub Sitnicki (1):
  selftests/bpf: Extend verifier and bpf_sock tests for dst_port loads

Lorenzo Bianconi (1):
  libbpf: Deprecate xdp_cpumap, xdp_devmap and classifier sec
    definitions

 include/uapi/linux/bpf.h |  3 +-
 src/bpf.h                |  4 ++-
 src/btf.h                |  7 ++--
 src/libbpf.c             | 19 ++++++++---
 src/libbpf.h             |  7 +++-
 src/skel_internal.h      | 70 ++++++++++++++++++++++++++++++++++++++--
 6 files changed, 99 insertions(+), 11 deletions(-)

--
2.30.2
2022-02-04 10:27:02 -08:00
Andrii Nakryiko
47673bd255 libbpf: Deprecate forgotten btf__get_map_kv_tids()
btf__get_map_kv_tids() is in the same group of APIs as
btf_ext__reloc_func_info()/btf_ext__reloc_line_info() which were only
used by BCC. It was missed to be marked as deprecated in [0]. Fixing
that to complete [1].

  [0] https://patchwork.kernel.org/project/netdevbpf/patch/20220201014610.3522985-1-davemarchevsky@fb.com/
  [1] Closes: https://github.com/libbpf/libbpf/issues/277

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220203225017.1795946-1-andrii@kernel.org
2022-02-04 10:27:02 -08:00
Delyan Kratunov
20442822c0 libbpf: Deprecate priv/set_priv storage
Arbitrary storage via bpf_*__set_priv/__priv is being deprecated
without a replacement ([1]). perf uses this capability, but most of
that is going away with the removal of prologue generation ([2]).
perf is already suppressing deprecation warnings, so the remaining
cleanup will happen separately.

  [1]: Closes: https://github.com/libbpf/libbpf/issues/294
  [2]: https://lore.kernel.org/bpf/20220123221932.537060-1-jolsa@kernel.org/

Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220203180032.1921580-1-delyank@fb.com
2022-02-04 10:27:02 -08:00
Andrii Nakryiko
a9cd83ae25 libbpf: Stop using deprecated bpf_map__is_offload_neutral()
Open-code bpf_map__is_offload_neutral() logic in one place in
to-be-deprecated bpf_prog_load_xattr2.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220202225916.3313522-2-andrii@kernel.org
2022-02-04 10:27:02 -08:00
Delyan Kratunov
352c13cdee libbpf: Deprecate bpf_prog_test_run_xattr and bpf_prog_test_run
Deprecate non-extendable bpf_prog_test_run{,_xattr} in favor of
OPTS-based bpf_prog_test_run_opts ([0]).

  [0] Closes: https://github.com/libbpf/libbpf/issues/286

Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220202235423.1097270-5-delyank@fb.com
2022-02-04 10:27:02 -08:00
Alexei Starovoitov
a7a3a8811c libbpf: Open code raw_tp_open and link_create commands.
Open code raw_tracepoint_open and link_create used by light skeleton
to be able to avoid full libbpf eventually.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220131220528.98088-4-alexei.starovoitov@gmail.com
2022-02-04 10:27:02 -08:00
Alexei Starovoitov
bd402dccaf libbpf: Open code low level bpf commands.
Open code low level bpf commands used by light skeleton to
be able to avoid full libbpf eventually.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220131220528.98088-3-alexei.starovoitov@gmail.com
2022-02-04 10:27:02 -08:00
Lorenzo Bianconi
c245b0eeaf libbpf: Deprecate xdp_cpumap, xdp_devmap and classifier sec definitions
Deprecate xdp_cpumap, xdp_devmap and classifier sec definitions.
Introduce xdp/devmap and xdp/cpumap definitions according to the
standard for SEC("") in libbpf:
- prog_type.prog_flags/attach_place

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/5c7bd9426b3ce6a31d9a4b1f97eb299e1467fc52.1643727185.git.lorenzo@kernel.org
2022-02-04 10:27:02 -08:00
Dave Marchevsky
74dcd1bf6a libbpf: Deprecate btf_ext rec_size APIs
btf_ext__{func,line}_info_rec_size functions are used in conjunction
with already-deprecated btf_ext__reloc_{func,line}_info functions. Since
struct btf_ext is opaque to the user it was necessary to expose rec_size
getters in the past.

btf_ext__reloc_{func,line}_info were deprecated in commit 8505e8709b5ee
("libbpf: Implement generalized .BTF.ext func/line info adjustment")
as they're not compatible with support for multiple programs per
section. It was decided[0] that users of these APIs should implement their
own .btf.ext parsing to access this data, in which case the rec_size
getters are unnecessary. So deprecate them from libbpf 0.7.0 onwards.

  [0] Closes: https://github.com/libbpf/libbpf/issues/277

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220201014610.3522985-1-davemarchevsky@fb.com
2022-02-04 10:27:02 -08:00
Jakub Sitnicki
9f276b240b selftests/bpf: Extend verifier and bpf_sock tests for dst_port loads
Add coverage to the verifier tests and tests for reading bpf_sock fields to
ensure that 32-bit, 16-bit, and 8-bit loads from dst_port field are allowed
only at intended offsets and produce expected values.

While 16-bit and 8-bit access to dst_port field is straight-forward, 32-bit
wide loads need be allowed and produce a zero-padded 16-bit value for
backward compatibility.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220130115518.213259-3-jakub@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-02-04 10:27:02 -08:00
Andrii Nakryiko
9a64065733 sync: move bpf-next checkpoint to include selftests fixes
There are no relevant libbpf commits, but new checkpoint commit contains
important BPF selftests commits fixing CI failures from kernel repo
side.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-01-31 23:37:33 -08:00
Evgeny Vereshchagin
ae220adbb2 ci: no longer remove elfutils while building the fuzzer
Without it coverage reports can't be built
```
[2022-01-31 00:05:36,094 DEBUG] Generating file view html index file as: "/out/report/linux/file_view_index.html".
Traceback (most recent call last):
  File "/opt/code_coverage/coverage_utils.py", line 829, in <module>
    sys.exit(Main())
  File "/opt/code_coverage/coverage_utils.py", line 823, in Main
    return _CmdPostProcess(args)
  File "/opt/code_coverage/coverage_utils.py", line 780, in _CmdPostProcess
    processor.PrepareHtmlReport()
  File "/opt/code_coverage/coverage_utils.py", line 577, in PrepareHtmlReport
    self.GenerateFileViewHtmlIndexFile(per_file_coverage_summary,
  File "/opt/code_coverage/coverage_utils.py", line 450, in GenerateFileViewHtmlIndexFile
    self.GetCoverageHtmlReportPathForFile(file_path),
  File "/opt/code_coverage/coverage_utils.py", line 422, in GetCoverageHtmlReportPathForFile
    assert os.path.isfile(
AssertionError: "/tmp/tmp.UYax4l19Gh/lib/system.h" is not a file.
```

It's a follow-up to 393a058d06

Signed-off-by: Evgeny Vereshchagin <evvers@ya.ru>
2022-01-31 15:45:11 -08:00
Andrii Nakryiko
fec0813359 sync: regenerate vmlinux.h to include TASK_COMM_LEN constant
TASK_COMM_LEN is now part of vmlinux.h on latest kernel. Regenerate
vmlinux.h to have it on 5.5 and 4.9 kernels.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-01-25 23:37:04 -08:00
Andrii Nakryiko
1e702e8ffe sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   820e6e227c4053b6b631ae65ef1f65d560cb392b
Checkpoint bpf-next commit: c446fdacb10dcb3b9a9ed3b91d91e72d71d94b03
Baseline bpf commit:        baa59504c1cd0cca7d41954a45ee0b3dc78e41a0
Checkpoint bpf commit:      e2bcbd7769ee8f05e1b3d10848aace98973844e4

Andrii Nakryiko (3):
  libbpf: hide and discourage inconsistently named getters
  libbpf: deprecate bpf_map__resize()
  libbpf: deprecate bpf_program__is_<type>() and
    bpf_program__set_<type>() APIs

Christy Lee (2):
  libbpf: Mark bpf_object__open_buffer() API deprecated
  libbpf: Mark bpf_object__open_xattr() deprecated

Kajol Jain (1):
  perf: Add new macros for mem_hops field

Kenny Yu (2):
  bpf: Add bpf_copy_from_user_task() helper
  libbpf: Add "iter.s" section for sleepable bpf iterator programs

Kenta Tada (1):
  libbpf: Fix the incorrect register read for syscalls on x86_64

Lorenzo Bianconi (4):
  bpf: introduce BPF_F_XDP_HAS_FRAGS flag in prog_flags loading the ebpf
    program
  bpf: introduce bpf_xdp_get_buff_len helper
  libbpf: Add SEC name for xdp frags programs
  net: xdp: introduce bpf_xdp_pointer utility routine

 include/uapi/linux/bpf.h        | 41 +++++++++++++++++++++++++++++++++
 include/uapi/linux/perf_event.h |  5 +++-
 src/bpf_tracing.h               | 34 +++++++++++++++++++++++++++
 src/btf.h                       |  5 +---
 src/libbpf.c                    | 32 +++++++++++++++++--------
 src/libbpf.h                    | 34 ++++++++++++++++++++++++---
 src/libbpf.map                  |  2 ++
 src/libbpf_internal.h           |  3 +++
 src/libbpf_legacy.h             | 17 ++++++++++++++
 9 files changed, 155 insertions(+), 18 deletions(-)

--
2.30.2
2022-01-25 23:37:04 -08:00
Andrii Nakryiko
bad4fa116c sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-01-25 23:37:04 -08:00
Andrii Nakryiko
b2a7b16287 libbpf: deprecate bpf_program__is_<type>() and bpf_program__set_<type>() APIs
Not sure why these APIs were added in the first place instead of
a completely generic (and not requiring constantly adding new APIs with
each new BPF program type) bpf_program__type() and
bpf_program__set_type() APIs. But as it is right now, there are 13 such
specialized is_type/set_type APIs, while latest kernel is already at 30+
BPF program types.

Instead of completing the set of APIs and keep chasing kernel's
bpf_prog_type enum, deprecate existing subset and recommend generic
bpf_program__type() and bpf_program__set_type() APIs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220124194254.2051434-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-25 23:37:04 -08:00
Andrii Nakryiko
638311dcb8 libbpf: deprecate bpf_map__resize()
Deprecated bpf_map__resize() in favor of bpf_map__set_max_entries()
setter. In addition to having a surprising name (users often don't
realize that they need to use bpf_map__resize()), the name also implies
some magic way of resizing BPF map after it is created, which is clearly
not the case.

Another minor annoyance is that bpf_map__resize() disallows 0 value for
max_entries, which in some cases is totally acceptable (e.g., like for
BPF perf buf case to let libbpf auto-create one buffer per each
available CPU core).

  [0] Closes: https://github.com/libbpf/libbpf/issues/304

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220124194254.2051434-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-25 23:37:04 -08:00
Andrii Nakryiko
e72ac682ae libbpf: hide and discourage inconsistently named getters
Move a bunch of "getters" into libbpf_legacy.h to keep them there in
libbpf 1.0. See [0] for discussion of "Discouraged APIs". These getters
don't add any maintenance burden and are simple alias, but they are
inconsistent in naming. So keep them in libbpf_legacy.h instead of
libbpf.h to "hide" them in favor of preferred getters ([1]). Also add two
missing getters: bpf_program__type() and bpf_program__expected_attach_type().

  [0] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#handling-deprecation-of-apis-and-functionality
  [1] Closes: https://github.com/libbpf/libbpf/issues/307

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220124194254.2051434-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-25 23:37:04 -08:00
Kenta Tada
d2818a8f2c libbpf: Fix the incorrect register read for syscalls on x86_64
Currently, rcx is read as the fourth parameter of syscall on x86_64.
But x86_64 Linux System Call convention uses r10 actually.
This commit adds the wrapper for users who want to access to
syscall params to analyze the user space.

Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220124141622.4378-3-Kenta.Tada@sony.com
2022-01-25 23:37:04 -08:00
Christy Lee
c0c3f46ca6 libbpf: Mark bpf_object__open_xattr() deprecated
Mark bpf_object__open_xattr() as deprecated, use
bpf_object__open_file() instead.

  [0] Closes: https://github.com/libbpf/libbpf/issues/287

Signed-off-by: Christy Lee <christylee@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220125010917.679975-1-christylee@fb.com
2022-01-25 23:37:04 -08:00
Christy Lee
1af0f62fac libbpf: Mark bpf_object__open_buffer() API deprecated
Deprecate bpf_object__open_buffer() API in favor of the unified
opts-based bpf_object__open_mem() API.

Signed-off-by: Christy Lee <christylee@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220125005923.418339-2-christylee@fb.com
2022-01-25 23:37:04 -08:00
Kenny Yu
04aef6ce9b libbpf: Add "iter.s" section for sleepable bpf iterator programs
This adds a new bpf section "iter.s" to allow bpf iterator programs to
be sleepable.

Signed-off-by: Kenny Yu <kennyyu@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220124185403.468466-4-kennyyu@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-25 23:37:04 -08:00
Kenny Yu
a9491bb920 bpf: Add bpf_copy_from_user_task() helper
This adds a helper for bpf programs to read the memory of other
tasks.

As an example use case at Meta, we are using a bpf task iterator program
and this new helper to print C++ async stack traces for all threads of
a given process.

Signed-off-by: Kenny Yu <kennyyu@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220124185403.468466-3-kennyyu@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-25 23:37:04 -08:00
Lorenzo Bianconi
fb0b8a7cea net: xdp: introduce bpf_xdp_pointer utility routine
Similar to skb_header_pointer, introduce bpf_xdp_pointer utility routine
to return a pointer to a given position in the xdp_buff if the requested
area (offset + len) is contained in a contiguous memory area otherwise it
will be copied in a bounce buffer provided by the caller.
Similar to the tc counterpart, introduce the two following xdp helpers:
- bpf_xdp_load_bytes
- bpf_xdp_store_bytes

Reviewed-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/ab285c1efdd5b7a9d361348b1e7d3ef49f6382b3.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-25 23:37:04 -08:00
Lorenzo Bianconi
4ae89237d4 libbpf: Add SEC name for xdp frags programs
Introduce support for the following SEC entries for XDP frags
property:
- SEC("xdp.frags")
- SEC("xdp.frags/devmap")
- SEC("xdp.frags/cpumap")

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/af23b6e4841c171ad1af01917839b77847a4bc27.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-25 23:37:04 -08:00
Lorenzo Bianconi
9af0e19376 bpf: introduce bpf_xdp_get_buff_len helper
Introduce bpf_xdp_get_buff_len helper in order to return the xdp buffer
total size (linear and paged area)

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/aac9ac3504c84026cf66a3c71b7c5ae89bc991be.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-25 23:37:04 -08:00
Lorenzo Bianconi
57df70e180 bpf: introduce BPF_F_XDP_HAS_FRAGS flag in prog_flags loading the ebpf program
Introduce BPF_F_XDP_HAS_FRAGS and the related field in bpf_prog_aux
in order to notify the driver the loaded program support xdp frags.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/db2e8075b7032a356003f407d1b0deb99adaa0ed.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-25 23:37:04 -08:00
Kajol Jain
da06e3efcc perf: Add new macros for mem_hops field
Add new macros for mem_hops field which can be used to
represent remote-node, socket and board level details.

Currently the code had macro for HOPS_0, which corresponds
to data coming from another core but same node.
Add new macros for HOPS_1 to HOPS_3 to represent
remote-node, socket and board level data.

For ex: Encodings for mem_hops fields with L2 cache:

L2			- local L2
L2 | REMOTE | HOPS_0	- remote core, same node L2
L2 | REMOTE | HOPS_1	- remote node, same socket L2
L2 | REMOTE | HOPS_2	- remote socket, same board L2
L2 | REMOTE | HOPS_3	- remote board L2

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211206091749.87585-2-kjain@linux.ibm.com
2022-01-25 23:37:04 -08:00
Andrii Nakryiko
b4b6e4dc20 sync: start syncing perf_event.h UAPI header as well
This header is necessary for libbpf-sys to generate perf-related Rust
bindings. It's more convenient to have it available locally with libbpf.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-01-25 23:37:04 -08:00
Ilya Leoshkevich
0ee425cdd7 ci: add Ubuntu instructions for s390x-self-hosted-builder
There is a new Ubuntu-based builder, and setup steps for it are
slightly different from what we had on the old RHEL-based one.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2022-01-25 17:21:08 -08:00
Ilya Leoshkevich
f1085fe3c3 ci: add s390x-self-hosted-builder's user to kvm group
RHEL's podman sets /dev/kvm permissions to 0666, while Ubuntu's docker
sets them to 0660. Therefore, in order to use KVM from a container,
the user within must belong to the kvm group.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2022-01-25 17:21:08 -08:00
Evgeny Vereshchagin
393a058d06 tests: move the fuzzer upstream
It should make it easier to start using CFLite or something like that
to fuzz libbpf without getting pointless CVEs :-) More importantly,
now it's possible to build the fuzzer by just cloning the repository,
installing clang and running `./scripts/build-fuzzers.h`:
```
git clone https://github.com/libbpf/libbpf
./scripts/build-fuzzers.h
unzip -d CORPUS fuzz/bpf-object-fuzzer_seed_corpus.zip
./out/bpf-object-fuzzer CORPUS
```

It should make it easier (for me at least) to report some
elfutils bugs because they are much easier to reproduce manually
now.
2022-01-24 15:37:36 -08:00
Andrii Nakryiko
3febb8a165 ci: update s390x blacklist
Add bpf_mod_race and bpf_nf selftests to blacklist for s390x.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-01-21 16:54:32 -08:00
Andrii Nakryiko
5daafdccf9 ci: regenerate and check in latest vmlinux.h
Add a bunch of new kernel types that are needed for successful selftest
compilation.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-01-21 16:54:32 -08:00
Andrii Nakryiko
78e816a15d sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   e80f2a0d194605553315de68284fc41969f81f62
Checkpoint bpf-next commit: 820e6e227c4053b6b631ae65ef1f65d560cb392b
Baseline bpf commit:        343e53754b21ae45530623222aa079fecd3cf942
Checkpoint bpf commit:      baa59504c1cd0cca7d41954a45ee0b3dc78e41a0

Andrii Nakryiko (2):
  libbpf: deprecate legacy BPF map definitions
  libbpf: streamline low-level XDP APIs

Kui-Feng Lee (1):
  libbpf: Improve btf__add_btf() with an additional hashmap for strings.

Toke Høiland-Jørgensen (1):
  libbpf: Define BTF_KIND_* constants in btf.h to avoid compilation
    errors

Usama Arif (1):
  uapi/bpf: Add missing description and returns for helper documentation

YiFei Zhu (1):
  bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall return
    value

 include/uapi/linux/bpf.h |  33 +++++++++++
 src/bpf_helpers.h        |   2 +-
 src/btf.c                |  31 ++++++++++-
 src/btf.h                |  22 +++++++-
 src/libbpf.c             |   8 +++
 src/libbpf.h             |  29 ++++++++++
 src/libbpf.map           |   4 ++
 src/libbpf_legacy.h      |   5 ++
 src/netlink.c            | 117 ++++++++++++++++++++++++++++-----------
 9 files changed, 215 insertions(+), 36 deletions(-)

--
2.30.2
2022-01-21 16:54:32 -08:00
Andrii Nakryiko
d2ea0e2d03 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2022-01-21 16:54:32 -08:00
Andrii Nakryiko
8fbe7eec3a libbpf: streamline low-level XDP APIs
Introduce 4 new netlink-based XDP APIs for attaching, detaching, and
querying XDP programs:
  - bpf_xdp_attach;
  - bpf_xdp_detach;
  - bpf_xdp_query;
  - bpf_xdp_query_id.

These APIs replace bpf_set_link_xdp_fd, bpf_set_link_xdp_fd_opts,
bpf_get_link_xdp_id, and bpf_get_link_xdp_info APIs ([0]). The latter
don't follow a consistent naming pattern and some of them use
non-extensible approaches (e.g., struct xdp_link_info which can't be
modified without breaking libbpf ABI).

The approach I took with these low-level XDP APIs is similar to what we
did with low-level TC APIs. There is a nice duality of bpf_tc_attach vs
bpf_xdp_attach, and so on. I left bpf_xdp_attach() to support detaching
when -1 is specified for prog_fd for generality and convenience, but
bpf_xdp_detach() is preferred due to clearer naming and associated
semantics. Both bpf_xdp_attach() and bpf_xdp_detach() accept the same
opts struct allowing to specify expected old_prog_fd.

While doing the refactoring, I noticed that old APIs require users to
specify opts with old_fd == -1 to declare "don't care about already
attached XDP prog fd" condition. Otherwise, FD 0 is assumed, which is
essentially never an intended behavior. So I made this behavior
consistent with other kernel and libbpf APIs, in which zero FD means "no
FD". This seems to be more in line with the latest thinking in BPF land
and should cause less user confusion, hopefully.

For querying, I left two APIs, both more generic bpf_xdp_query()
allowing to query multiple IDs and attach mode, but also
a specialization of it, bpf_xdp_query_id(), which returns only requested
prog_id. Uses of prog_id returning bpf_get_link_xdp_id() were so
prevalent across selftests and samples, that it seemed a very common use
case and using bpf_xdp_query() for doing it felt very cumbersome with
a highly branches if/else chain based on flags and attach mode.

Old APIs are scheduled for deprecation in libbpf 0.8 release.

  [0] Closes: https://github.com/libbpf/libbpf/issues/309

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20220120061422.2710637-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-21 16:54:32 -08:00
Andrii Nakryiko
d788cd57b5 libbpf: deprecate legacy BPF map definitions
Enact deprecation of legacy BPF map definition in SEC("maps") ([0]). For
the definitions themselves introduce LIBBPF_STRICT_MAP_DEFINITIONS flag
for libbpf strict mode. If it is set, error out on any struct
bpf_map_def-based map definition. If not set, libbpf will print out
a warning for each legacy BPF map to raise awareness that it goes away.

For any use of BPF_ANNOTATE_KV_PAIR() macro providing a legacy way to
associate BTF key/value type information with legacy BPF map definition,
warn through libbpf's pr_warn() error message (but don't fail BPF object
open).

BPF-side struct bpf_map_def is marked as deprecated. User-space struct
bpf_map_def has to be used internally in libbpf, so it is left
untouched. It should be enough for bpf_map__def() to be marked
deprecated to raise awareness that it goes away.

bpftool is an interesting case that utilizes libbpf to open BPF ELF
object to generate skeleton. As such, even though bpftool itself uses
full on strict libbpf mode (LIBBPF_STRICT_ALL), it has to relax it a bit
for BPF map definition handling to minimize unnecessary disruptions. So
opt-out of LIBBPF_STRICT_MAP_DEFINITIONS for bpftool. User's code that
will later use generated skeleton will make its own decision whether to
enforce LIBBPF_STRICT_MAP_DEFINITIONS or not.

There are few tests in selftests/bpf that are consciously using legacy
BPF map definitions to test libbpf functionality. For those, temporary
opt out of LIBBPF_STRICT_MAP_DEFINITIONS mode for the duration of those
tests.

  [0] Closes: https://github.com/libbpf/libbpf/issues/272

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120060529.1890907-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-21 16:54:32 -08:00
YiFei Zhu
ab022c8eb4 bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall return value
The helpers continue to use int for retval because all the hooks
are int-returning rather than long-returning. The return value of
bpf_set_retval is int for future-proofing, in case in the future
there may be errors trying to set the retval.

After the previous patch, if a program rejects a syscall by
returning 0, an -EPERM will be generated no matter if the retval
is already set to -err. This patch change it being forced only if
retval is not -err. This is because we want to support, for
example, invoking bpf_set_retval(-EINVAL) and return 0, and have
the syscall return value be -EINVAL not -EPERM.

For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is
that, if the return value is NET_XMIT_DROP, the packet is silently
dropped. We preserve this behavior for backward compatibility
reasons, so even if an errno is set, the errno does not return to
caller. However, setting a non-err to retval cannot propagate so
this is not allowed and we return a -EFAULT in that case.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/b4013fd5d16bed0b01977c1fafdeae12e1de61fb.1639619851.git.zhuyifei@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-21 16:54:32 -08:00
Kui-Feng Lee
75c7c722f5 libbpf: Improve btf__add_btf() with an additional hashmap for strings.
Add a hashmap to map the string offsets from a source btf to the
string offsets from a target btf to reduce overheads.

btf__add_btf() calls btf__add_str() to add strings from a source to a
target btf.  It causes many string comparisons, and it is a major
hotspot when adding a big btf.  btf__add_str() uses strcmp() to check
if a hash entry is the right one.  The extra hashmap here compares
offsets of strings, that are much cheaper.  It remembers the results
of btf__add_str() for later uses to reduce the cost.

We are parallelizing BTF encoding for pahole by creating separated btf
instances for worker threads.  These per-thread btf instances will be
added to the btf instance of the main thread by calling btf__add_str()
to deduplicate and write out.  With this patch and -j4, the running
time of pahole drops to about 6.0s from 6.6s.

The following lines are the summary of 'perf stat' w/o the change.

       6.668126396 seconds time elapsed

      13.451054000 seconds user
       0.715520000 seconds sys

The following lines are the summary w/ the change.

       5.986973919 seconds time elapsed

      12.939903000 seconds user
       0.724152000 seconds sys

V4 fixes a bug of error checking against the pointer returned by
hashmap__new().

[v3] https://lore.kernel.org/bpf/20220118232053.2113139-1-kuifeng@fb.com/
[v2] https://lore.kernel.org/bpf/20220114193713.461349-1-kuifeng@fb.com/

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220119180214.255634-1-kuifeng@fb.com
2022-01-21 16:54:32 -08:00
Usama Arif
0f99d97a9c uapi/bpf: Add missing description and returns for helper documentation
Both description and returns section will become mandatory
for helpers and syscalls in a later commit to generate man pages.

This commit also adds in the documentation that BPF_PROG_RUN is
an alias for BPF_PROG_TEST_RUN for anyone searching for the
syscall in the generated man pages.

Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220119114442.1452088-1-usama.arif@bytedance.com
2022-01-21 16:54:32 -08:00
Toke Høiland-Jørgensen
8404d1396c libbpf: Define BTF_KIND_* constants in btf.h to avoid compilation errors
The btf.h header included with libbpf contains inline helper functions to
check for various BTF kinds. These helpers directly reference the
BTF_KIND_* constants defined in the kernel header, and because the header
file is included in user applications, this happens in the user application
compile units.

This presents a problem if a user application is compiled on a system with
older kernel headers because the constants are not available. To avoid
this, add #defines of the constants directly in btf.h before using them.

Since the kernel header moved to an enum for BTF_KIND_*, the #defines can
shadow the enum values without any errors, so we only need #ifndef guards
for the constants that predates the conversion to enum. We group these so
there's only one guard for groups of values that were added together.

  [0] Closes: https://github.com/libbpf/libbpf/issues/436

Fixes: 223f903e9c83 ("bpf: Rename BTF_KIND_TAG to BTF_KIND_DECL_TAG")
Fixes: 5b84bd10363e ("libbpf: Add support for BTF_KIND_TAG")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/bpf/20220118141327.34231-1-toke@redhat.com
2022-01-21 16:54:32 -08:00
Andrii Nakryiko
be89b28f96 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   44bab87d8ca6f0544a9f8fc97bdf33aa5b3c899e
Checkpoint bpf-next commit: e80f2a0d194605553315de68284fc41969f81f62
Baseline bpf commit:        d6d86830705f173fca6087a3e67ceaf68db80523
Checkpoint bpf commit:      343e53754b21ae45530623222aa079fecd3cf942

Christy Lee (2):
  libbpf: Rename bpf_prog_attach_xattr() to bpf_prog_attach_opts()
  libbpf: Deprecate bpf_map__def() API

Coco Li (1):
  gro: add ability to control gro max packet size

Mauricio Vásquez (1):
  libbpf: Use IS_ERR_OR_NULL() in hashmap__free()

Yafang Shao (1):
  libbpf: Fix possible NULL pointer dereference when destroying skeleton

 include/uapi/linux/if_link.h | 1 +
 src/bpf.c                    | 9 +++++++--
 src/bpf.h                    | 4 ++++
 src/hashmap.c                | 3 +--
 src/libbpf.c                 | 3 +++
 src/libbpf.h                 | 3 ++-
 src/libbpf.map               | 1 +
 7 files changed, 19 insertions(+), 5 deletions(-)

--
2.30.2
2022-01-14 22:08:26 -08:00
Christy Lee
7b8e97bffc libbpf: Deprecate bpf_map__def() API
All fields accessed via bpf_map_def can now be accessed via
appropirate getters and setters. Mark bpf_map__def() API as deprecated.

Signed-off-by: Christy Lee <christylee@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220108004218.355761-6-christylee@fb.com
2022-01-14 22:08:26 -08:00
Yafang Shao
5b6dfd7f6b libbpf: Fix possible NULL pointer dereference when destroying skeleton
When I checked the code in skeleton header file generated with my own
bpf prog, I found there may be possible NULL pointer dereference when
destroying skeleton. Then I checked the in-tree bpf progs, finding that is
a common issue. Let's take the generated samples/bpf/xdp_redirect_cpu.skel.h
for example. Below is the generated code in
xdp_redirect_cpu__create_skeleton():

	xdp_redirect_cpu__create_skeleton
		struct bpf_object_skeleton *s;
		s = (struct bpf_object_skeleton *)calloc(1, sizeof(*s));
		if (!s)
			goto error;
		...
	error:
		bpf_object__destroy_skeleton(s);
		return  -ENOMEM;

After goto error, the NULL 's' will be deferenced in
bpf_object__destroy_skeleton().

We can simply fix this issue by just adding a NULL check in
bpf_object__destroy_skeleton().

Fixes: d66562fba1ce ("libbpf: Add BPF object skeleton support")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220108134739.32541-1-laoar.shao@gmail.com
2022-01-14 22:08:26 -08:00
Christy Lee
e0de05d1b1 libbpf: Rename bpf_prog_attach_xattr() to bpf_prog_attach_opts()
All xattr APIs are being dropped, let's converge to the convention used in
high-level APIs and rename bpf_prog_attach_xattr to bpf_prog_attach_opts.

Signed-off-by: Christy Lee <christylee@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220107184604.3668544-2-christylee@fb.com
2022-01-14 22:08:26 -08:00
Mauricio Vásquez
d5daf275c7 libbpf: Use IS_ERR_OR_NULL() in hashmap__free()
hashmap__new() uses ERR_PTR() to return an error so it's better to
use IS_ERR_OR_NULL() in order to check the pointer before calling
free(). This will prevent freeing an invalid pointer if somebody calls
hashmap__free() with the result of a failed hashmap__new() call.

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220107152620.192327-1-mauricio@kinvolk.io
2022-01-14 22:08:26 -08:00
Coco Li
cde5b418dd gro: add ability to control gro max packet size
Eric Dumazet suggested to allow users to modify max GRO packet size.

We have seen GRO being disabled by users of appliances (such as
wifi access points) because of claimed bufferbloat issues,
or some work arounds in sch_cake, to split GRO/GSO packets.

Instead of disabling GRO completely, one can chose to limit
the maximum packet size of GRO packets, depending on their
latency constraints.

This patch adds a per device gro_max_size attribute
that can be changed with ip link command.

ip link set dev eth0 gro_max_size 16000

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-14 22:08:26 -08:00
Adam Jensen
060c8a99c4 include: Include linux/stddef.h
This fixes the build in environments such as Alpine Linux.
See [0] for discussion.

  [0] https://github.com/libbpf/libbpf/pull/41

Signed-off-by: Adam Jensen
2022-01-14 12:21:53 -08:00
Kumar Kartikeya Dwivedi
22411acc4b ci: Add userfaultfd kernel config
Add necessary kernel config values to run BPF mod race test in
conntrack-bpf series.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
2022-01-11 12:40:59 -08:00
Andrii Nakryiko
e99f34e144 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   ecf45e60a62dfeb65658abac02f0bdb45b786911
Checkpoint bpf-next commit: 44bab87d8ca6f0544a9f8fc97bdf33aa5b3c899e
Baseline bpf commit:        819d11507f6637731947836e6308f5966d64cf9d
Checkpoint bpf commit:      d6d86830705f173fca6087a3e67ceaf68db80523

Andrii Nakryiko (3):
  libbpf: Normalize PT_REGS_xxx() macro definitions
  libbpf: Use 100-character limit to make bpf_tracing.h easier to read
  libbpf: Improve LINUX_VERSION_CODE detection

Christy Lee (3):
  libbpf: Deprecate bpf_perf_event_read_simple() API
  libbpf 1.0: Deprecate bpf_map__is_offload_neutral()
  libbpf 1.0: Deprecate bpf_object__find_map_by_offset() API

Grant Seltzer (1):
  libbpf: Add documentation for bpf_map batch operations

Qiang Wang (2):
  libbpf: Use probe_name for legacy kprobe
  libbpf: Support repeated legacy kprobes on same function

 src/bpf.c             |   8 +-
 src/bpf.h             | 115 ++++++++++-
 src/bpf_tracing.h     | 431 +++++++++++++++++-------------------------
 src/libbpf.c          |  56 ++++--
 src/libbpf.h          |   5 +-
 src/libbpf_internal.h |   2 +
 src/libbpf_probes.c   |  16 --
 7 files changed, 342 insertions(+), 291 deletions(-)

--
2.30.2
2022-01-06 16:20:54 -08:00
Grant Seltzer
4449d71509 libbpf: Add documentation for bpf_map batch operations
This adds documention for:

- bpf_map_delete_batch()
- bpf_map_lookup_batch()
- bpf_map_lookup_and_delete_batch()
- bpf_map_update_batch()

This also updates the public API for the `keys` parameter
of `bpf_map_delete_batch()`, and both the
`keys` and `values` parameters of `bpf_map_update_batch()`
to be constants.

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220106201304.112675-1-grantseltzer@gmail.com
2022-01-06 16:20:54 -08:00
Christy Lee
12932191c6 libbpf 1.0: Deprecate bpf_object__find_map_by_offset() API
API created with simplistic assumptions about BPF map definitions.
It hasn’t worked for a while, deprecate it in preparation for
libbpf 1.0.

  [0] Closes: https://github.com/libbpf/libbpf/issues/302

Signed-off-by: Christy Lee <christylee@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220105003120.2222673-1-christylee@fb.com
2022-01-06 16:20:54 -08:00
Christy Lee
8440112546 libbpf 1.0: Deprecate bpf_map__is_offload_neutral()
Deprecate bpf_map__is_offload_neutral(). It’s most probably broken
already. PERF_EVENT_ARRAY isn’t the only map that’s not suitable
for hardware offloading. Applications can directly check map type
instead.

  [0] Closes: https://github.com/libbpf/libbpf/issues/306

Signed-off-by: Christy Lee <christylee@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220105000601.2090044-1-christylee@fb.com
2022-01-06 16:20:54 -08:00
Qiang Wang
b0c3d7133f libbpf: Support repeated legacy kprobes on same function
If repeated legacy kprobes on same function in one process,
libbpf will register using the same probe name and got -EBUSY
error. So append index to the probe name format to fix this
problem.

Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Qiang Wang <wangqiang.wq.frank@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211227130713.66933-2-wangqiang.wq.frank@bytedance.com
2022-01-06 16:20:54 -08:00
Qiang Wang
c2f2c26cb2 libbpf: Use probe_name for legacy kprobe
Fix a bug in commit 46ed5fc33db9, which wrongly used the
func_name instead of probe_name to register legacy kprobe.

Fixes: 46ed5fc33db9 ("libbpf: Refactor and simplify legacy kprobe code")
Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Qiang Wang <wangqiang.wq.frank@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com>
Link: https://lore.kernel.org/bpf/20211227130713.66933-1-wangqiang.wq.frank@bytedance.com
2022-01-06 16:20:54 -08:00
Christy Lee
2e52e09bc2 libbpf: Deprecate bpf_perf_event_read_simple() API
With perf_buffer__poll() and perf_buffer__consume() APIs available,
there is no reason to expose bpf_perf_event_read_simple() API to
users. If users need custom perf buffer, they could re-implement
the function.

Mark bpf_perf_event_read_simple() and move the logic to a new
static function so it can still be called by other functions in the
same file.

  [0] Closes: https://github.com/libbpf/libbpf/issues/310

Signed-off-by: Christy Lee <christylee@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211229204156.13569-1-christylee@fb.com
2022-01-06 16:20:54 -08:00
Andrii Nakryiko
0171976dc5 libbpf: Improve LINUX_VERSION_CODE detection
Ubuntu reports incorrect kernel version through uname(), which on older
kernels leads to kprobe BPF programs failing to load due to the version
check mismatch.

Accommodate Ubuntu's quirks with LINUX_VERSION_CODE by using
Ubuntu-specific /proc/version_code to fetch major/minor/patch versions
to form LINUX_VERSION_CODE.

While at it, consolide libbpf's kernel version detection code between
libbpf.c and libbpf_probes.c.

  [0] Closes: https://github.com/libbpf/libbpf/issues/421

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20211222231003.2334940-1-andrii@kernel.org
2022-01-06 16:20:54 -08:00
Andrii Nakryiko
3f592a59d7 libbpf: Use 100-character limit to make bpf_tracing.h easier to read
Improve bpf_tracing.h's macro definition readability by keeping them
single-line and better aligned. This makes it easier to follow all those
variadic patterns.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20211222213924.1869758-2-andrii@kernel.org
2022-01-06 16:20:54 -08:00
Andrii Nakryiko
0557ad0a9c libbpf: Normalize PT_REGS_xxx() macro definitions
Refactor PT_REGS macros definitions in  bpf_tracing.h to avoid excessive
duplication. We currently have classic PT_REGS_xxx() and CO-RE-enabled
PT_REGS_xxx_CORE(). We are about to add also _SYSCALL variants, which
would require excessive copying of all the per-architecture definitions.

Instead, separate architecture-specific field/register names from the
final macro that utilize them. That way for upcoming _SYSCALL variants
we'll be able to just define x86_64 exception and otherwise have one
common set of _SYSCALL macro definitions common for all architectures.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20211222213924.1869758-1-andrii@kernel.org
2022-01-06 16:20:54 -08:00
Kumar Kartikeya Dwivedi
7c382f0df9 ci: Add conntrack kernel config
Add necessary kernel config values to test BPF kfunc functionality for
netfilter's conntrack subsystem.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
2022-01-05 12:58:11 -08:00
Chris Tarazi
ceba6a788a travis-ci/rootfs: Fix mount(8) invocation for Arch Linux
Given that the rootfs for Arch Linux uses the busybox variant of
mount(8), the `-l` doesn't exist on that binary and gives the following
error msg with version v1.34.1 of busybox when invoking
mkrootfs_arch.sh:

```
...
[    0.781471] random: fast init done
starting pid 72, tty '': '/etc/init.d/rcS'
+ for path in /etc/rcS.d/S*
+ '[' -x /etc/rcS.d/S10-mount ']'
+ /etc/rcS.d/S10-mount
+ /bin/mount proc /proc -t proc
++ /bin/mount -l -t devtmpfs
/bin/mount: unrecognized option: l
...
```

This prevented me from generating a rootfs. This is fixed by removing
the `-l`, as plainly invoking `mount -t devtmpfs` returns the same
output with `mount -l ...` on the non-busybox variant (on Arch Linux, it
comes from the `utils-linux` package). After this change, I was able to
run `./tools/testing/selftests/bpf/vmtest.sh -i` (from the kernel src;
with modification to the script to pick up my locally-generated .zstd of
the new rootfs) and it worked.

Signed-off-by: Chris Tarazi <tarazichris@gmail.com>
2022-01-05 12:57:27 -08:00
grantseltzer
bf7aacea49 Fix comparison operator in API documentation 2022-01-04 21:17:07 -08:00
Andrii Nakryiko
af2da673d8 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   e967a20a8fabc6442a78e2e2059e63a4bb6aed08
Checkpoint bpf-next commit: ecf45e60a62dfeb65658abac02f0bdb45b786911
Baseline bpf commit:        819d11507f6637731947836e6308f5966d64cf9d
Checkpoint bpf commit:      819d11507f6637731947836e6308f5966d64cf9d

Jiri Olsa (1):
  libbpf: Do not use btf_dump__new() macro in C++ mode

 src/btf.h | 6 ++++++
 1 file changed, 6 insertions(+)

--
2.30.2
2021-12-23 20:00:21 -08:00
Jiri Olsa
1321a8bb49 libbpf: Do not use btf_dump__new() macro in C++ mode
As reported in here [0], C++ compilers don't support
__builtin_types_compatible_p(), so at least don't screw up compilation
for them and let C++ users pick btf_dump__new vs
btf_dump__new_deprecated explicitly.

  [0] https://github.com/libbpf/libbpf/issues/283#issuecomment-986100727

Fixes: 6084f5dc928f ("libbpf: Ensure btf_dump__new() and btf_dump_opts are future-proof")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20211223131736.483956-1-jolsa@kernel.org
2021-12-23 14:17:35 +01:00
grantseltzer
287d0d097b Add documentation for error checking in API
Signed-off-by: grantseltzer <grantseltzer@gmail.com>
2021-12-23 19:59:03 -08:00
Yucong Sun
9fab7c81ec ci: Add a step to patch kernel with temporary fixes
Apply a custom set of patches against bpf-next kernel tree before
building vmlinux image.
2021-12-20 20:31:42 -08:00
Andrii Nakryiko
96268bf0c2 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   a34efe503bc55c5732e328e5191ad549eb899f31
Checkpoint bpf-next commit: e967a20a8fabc6442a78e2e2059e63a4bb6aed08
Baseline bpf commit:        f7abc4c8df8c7930d0b9c56d9abee9a1fca635e9
Checkpoint bpf commit:      819d11507f6637731947836e6308f5966d64cf9d

Andrii Nakryiko (2):
  libbpf: Avoid reading past ELF data section end when copying license
  libbpf: Rework feature-probing APIs

 src/libbpf.c        |   5 +-
 src/libbpf.h        |  52 +++++++++-
 src/libbpf.map      |   3 +
 src/libbpf_probes.c | 235 ++++++++++++++++++++++++++++++++++----------
 4 files changed, 240 insertions(+), 55 deletions(-)

--
2.30.2
2021-12-17 17:13:53 -08:00
Andrii Nakryiko
168cf9b8ae libbpf: Rework feature-probing APIs
Create three extensible alternatives to inconsistently named
feature-probing APIs:

  - libbpf_probe_bpf_prog_type() instead of bpf_probe_prog_type();
  - libbpf_probe_bpf_map_type() instead of bpf_probe_map_type();
  - libbpf_probe_bpf_helper() instead of bpf_probe_helper().

Set up return values such that libbpf can report errors (e.g., if some
combination of input arguments isn't possible to validate, etc), in
addition to whether the feature is supported (return value 1) or not
supported (return value 0).

Also schedule deprecation of those three APIs. Also schedule deprecation
of bpf_probe_large_insn_limit().

Also fix all the existing detection logic for various program and map
types that never worked:

  - BPF_PROG_TYPE_LIRC_MODE2;
  - BPF_PROG_TYPE_TRACING;
  - BPF_PROG_TYPE_LSM;
  - BPF_PROG_TYPE_EXT;
  - BPF_PROG_TYPE_SYSCALL;
  - BPF_PROG_TYPE_STRUCT_OPS;
  - BPF_MAP_TYPE_STRUCT_OPS;
  - BPF_MAP_TYPE_BLOOM_FILTER.

Above prog/map types needed special setups and detection logic to work.
Subsequent patch adds selftests that will make sure that all the
detection logic keeps working for all current and future program and map
types, avoiding otherwise inevitable bit rot.

  [0] Closes: https://github.com/libbpf/libbpf/issues/312

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Julia Kartseva <hex@fb.com>
Link: https://lore.kernel.org/bpf/20211217171202.3352835-2-andrii@kernel.org
2021-12-17 17:13:53 -08:00
Andrii Nakryiko
8e706ddc6c libbpf: Avoid reading past ELF data section end when copying license
Fix possible read beyond ELF "license" data section if the license
string is not properly zero-terminated. Use the fact that libbpf_strlcpy
never accesses the (N-1)st byte of the source string because it's
replaced with '\0' anyways.

If this happens, it's a violation of contract between libbpf and a user,
but not handling this more robustly upsets CIFuzz, so given the fix is
trivial, let's fix the potential issue.

Fixes: 9fc205b413b3 ("libbpf: Add sane strncpy alternative and use it internally")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211214232054.3458774-1-andrii@kernel.org
2021-12-17 17:13:53 -08:00
Andrii Nakryiko
dc49f2d07b ci: add LIRC kernel config
Add necessary kernel config values to make BPF_PROG_TYPE_LIRC_MODE2
programs work.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-12-16 16:44:52 -08:00
Andrii Nakryiko
19656636a9 vmtest: blacklist bpf_loop and get_func_args_test for s390x
bpf_loop is using arch-specific sys_nanosleep attach function.
get_func_args_test relies on BPF trampoline.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-12-14 17:06:30 -08:00
Andrii Nakryiko
61acde2308 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   229fae38d0fc0d6ff58d57cbeb1432da55e58d4f
Checkpoint bpf-next commit: a34efe503bc55c5732e328e5191ad549eb899f31
Baseline bpf commit:        0be2516f865f5a876837184a8385163ff64a5889
Checkpoint bpf commit:      f7abc4c8df8c7930d0b9c56d9abee9a1fca635e9

Alexei Starovoitov (1):
  libbpf: Fix gen_loader assumption on number of programs.

Andrii Nakryiko (4):
  libbpf: Don't validate TYPE_ID relo's original imm value
  libbpf: Fix potential uninit memory read
  libbpf: Add sane strncpy alternative and use it internally
  libbpf: Auto-bump RLIMIT_MEMLOCK if kernel needs it for BPF

Grant Seltzer (1):
  libbpf: Add doc comments for bpf_program__(un)pin()

Hangbin Liu (1):
  Bonding: add arp_missed_max option

Hou Tao (1):
  bpf: Add bpf_strncmp helper

Jiri Olsa (1):
  bpf: Add get_func_[arg|ret|arg_cnt] helpers

Kui-Feng Lee (1):
  libbpf: Mark bpf_object__find_program_by_title API deprecated.

 include/uapi/linux/bpf.h     | 39 +++++++++++++++++
 include/uapi/linux/if_link.h |  1 +
 src/bpf.c                    | 85 +++++++++++++++++++++++++++++++++++-
 src/bpf.h                    |  2 +
 src/btf_dump.c               |  4 +-
 src/gen_loader.c             | 12 +++--
 src/libbpf.c                 | 55 +++++------------------
 src/libbpf.h                 | 25 +++++++++++
 src/libbpf.map               |  1 +
 src/libbpf_internal.h        | 58 ++++++++++++++++++++++++
 src/libbpf_legacy.h          | 12 ++++-
 src/relo_core.c              | 20 ++++++---
 src/xsk.c                    |  9 ++--
 13 files changed, 260 insertions(+), 63 deletions(-)

--
2.30.2
2021-12-14 17:06:30 -08:00
Andrii Nakryiko
266e897ad2 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2021-12-14 17:06:30 -08:00
Kui-Feng Lee
7152ecf163 libbpf: Mark bpf_object__find_program_by_title API deprecated.
Deprecate this API since v0.7.  All callers should move to
bpf_object__find_program_by_name if possible, otherwise use
bpf_object__for_each_program to find a program out from a given
section.

[0] Closes: https://github.com/libbpf/libbpf/issues/292

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211214035931.1148209-5-kuifeng@fb.com
2021-12-14 17:06:30 -08:00
Andrii Nakryiko
216eaa760e libbpf: Auto-bump RLIMIT_MEMLOCK if kernel needs it for BPF
The need to increase RLIMIT_MEMLOCK to do anything useful with BPF is
one of the first extremely frustrating gotchas that all new BPF users go
through and in some cases have to learn it a very hard way.

Luckily, starting with upstream Linux kernel version 5.11, BPF subsystem
dropped the dependency on memlock and uses memcg-based memory accounting
instead. Unfortunately, detecting memcg-based BPF memory accounting is
far from trivial (as can be evidenced by this patch), so in practice
most BPF applications still do unconditional RLIMIT_MEMLOCK increase.

As we move towards libbpf 1.0, it would be good to allow users to forget
about RLIMIT_MEMLOCK vs memcg and let libbpf do the sensible adjustment
automatically. This patch paves the way forward in this matter. Libbpf
will do feature detection of memcg-based accounting, and if detected,
will do nothing. But if the kernel is too old, just like BCC, libbpf
will automatically increase RLIMIT_MEMLOCK on behalf of user
application ([0]).

As this is technically a breaking change, during the transition period
applications have to opt into libbpf 1.0 mode by setting
LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK bit when calling
libbpf_set_strict_mode().

Libbpf allows to control the exact amount of set RLIMIT_MEMLOCK limit
with libbpf_set_memlock_rlim_max() API. Passing 0 will make libbpf do
nothing with RLIMIT_MEMLOCK. libbpf_set_memlock_rlim_max() has to be
called before the first bpf_prog_load(), bpf_btf_load(), or
bpf_object__load() call, otherwise it has no effect and will return
-EBUSY.

  [0] Closes: https://github.com/libbpf/libbpf/issues/369

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211214195904.1785155-2-andrii@kernel.org
2021-12-14 17:06:30 -08:00
Andrii Nakryiko
a4e725f8f5 libbpf: Add sane strncpy alternative and use it internally
strncpy() has a notoriously error-prone semantics which makes GCC
complain about it a lot (and quite often completely completely falsely
at that). Instead of pleasing GCC all the time (-Wno-stringop-truncation
is unfortunately only supported by GCC, so it's a bit too messy to just
enable it in Makefile), add libbpf-internal libbpf_strlcpy() helper
which follows what FreeBSD's strlcpy() does and what most people would
expect from strncpy(): copies up to N-1 first bytes from source string
into destination string and ensures zero-termination afterwards.

Replace all the relevant uses of strncpy/strncat/memcpy in libbpf with
libbpf_strlcpy().

This also fixes the issue reported by Emmanuel Deloget in xsk.c where
memcpy() could access source string beyond its end.

Fixes: 2f6324a3937f8 (libbpf: Support shared umems between queues and devices)
Reported-by: Emmanuel Deloget <emmanuel.deloget@eho.link>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211211004043.2374068-1-andrii@kernel.org
2021-12-14 17:06:30 -08:00
Andrii Nakryiko
df5689f1c8 libbpf: Fix potential uninit memory read
In case of BPF_CORE_TYPE_ID_LOCAL we fill out target result explicitly.
But targ_res itself isn't initialized in such a case, and subsequent
call to bpf_core_patch_insn() might read uninitialized field (like
fail_memsz_adjust in this case). So ensure that targ_res is
zero-initialized for BPF_CORE_TYPE_ID_LOCAL case.

This was reported by Coverity static analyzer.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211214010032.3843804-1-andrii@kernel.org
2021-12-14 17:06:30 -08:00
Grant Seltzer
6894f573d2 libbpf: Add doc comments for bpf_program__(un)pin()
This adds doc comments for the two bpf_program pinning functions,
bpf_program__pin() and bpf_program__unpin()

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211209232222.541733-1-grantseltzer@gmail.com
2021-12-14 17:06:30 -08:00
Jiri Olsa
2b0d408764 bpf: Add get_func_[arg|ret|arg_cnt] helpers
Adding following helpers for tracing programs:

Get n-th argument of the traced function:
  long bpf_get_func_arg(void *ctx, u32 n, u64 *value)

Get return value of the traced function:
  long bpf_get_func_ret(void *ctx, u64 *value)

Get arguments count of the traced function:
  long bpf_get_func_arg_cnt(void *ctx)

The trampoline now stores number of arguments on ctx-8
address, so it's easy to verify argument index and find
return value argument's position.

Moving function ip address on the trampoline stack behind
the number of functions arguments, so it's now stored on
ctx-16 address if it's needed.

All helpers above are inlined by verifier.

Also bit unrelated small change - using newly added function
bpf_prog_has_trampoline in check_get_func_ip.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211208193245.172141-5-jolsa@kernel.org
2021-12-14 17:06:30 -08:00
Andrii Nakryiko
ac20634cdc libbpf: Don't validate TYPE_ID relo's original imm value
During linking, type IDs in the resulting linked BPF object file can
change, and so ldimm64 instructions corresponding to
BPF_CORE_TYPE_ID_TARGET and BPF_CORE_TYPE_ID_LOCAL CO-RE relos can get
their imm value out of sync with actual CO-RE relocation information
that's updated by BPF linker properly during linking process.

We could teach BPF linker to adjust such instructions, but it feels
a bit too much for linker to re-implement good chunk of
bpf_core_patch_insns logic just for this. This is a redundant safety
check for TYPE_ID relocations, as the real validation is in matching
CO-RE specs, so if that works fine, it's very unlikely that there is
something wrong with the instruction itself.

So, instead, teach libbpf (and kernel) to ignore insn->imm for
BPF_CORE_TYPE_ID_TARGET and BPF_CORE_TYPE_ID_LOCAL relos.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211213010706.100231-1-andrii@kernel.org
2021-12-14 17:06:30 -08:00
Hou Tao
04804b4710 bpf: Add bpf_strncmp helper
The helper compares two strings: one string is a null-terminated
read-only string, and another string has const max storage size
but doesn't need to be null-terminated. It can be used to compare
file name in tracing or LSM program.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211210141652.877186-2-houtao1@huawei.com
2021-12-14 17:06:30 -08:00
Alexei Starovoitov
16bb788578 libbpf: Fix gen_loader assumption on number of programs.
libbpf's obj->nr_programs includes static and global functions. That number
could be higher than the actual number of bpf programs going be loaded by
gen_loader. Passing larger nr_programs to bpf_gen__init() doesn't hurt. Those
exra stack slots will stay as zero. bpf_gen__finish() needs to check that
actual number of progs that gen_loader saw is less than or equal to
obj->nr_programs.

Fixes: ba05fd36b851 ("libbpf: Perform map fd cleanup for gen_loader in case of error")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-12-14 17:06:30 -08:00
Hangbin Liu
5eb804a2db Bonding: add arp_missed_max option
Currently, we use hard code number to verify if we are in the
arp_interval timeslice. But some user may want to reduce/extend
the verify timeslice. With the similar team option 'missed_max'
the uers could change that number based on their own environment.

Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14 17:06:30 -08:00
Andrii Nakryiko
bcf58fc7a5 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   43174f0d4597325cb91f1f1f55263eb6e6101036
Checkpoint bpf-next commit: 229fae38d0fc0d6ff58d57cbeb1432da55e58d4f
Baseline bpf commit:        c0d95d3380ee099d735e08618c0d599e72f6c8b0
Checkpoint bpf commit:      0be2516f865f5a876837184a8385163ff64a5889

Alexei Starovoitov (8):
  libbpf: Replace btf__type_by_id() with btf_type_by_id().
  bpf: Prepare relo_core.c for kernel duty.
  bpf: Define enum bpf_core_relo_kind as uapi.
  bpf: Pass a set of bpf_core_relo-s to prog_load command.
  libbpf: Use CO-RE in the kernel in light skeleton.
  libbpf: Support init of inner maps in light skeleton.
  libbpf: Clean gen_loader's attach kind.
  libbpf: Reduce bpf_core_apply_relo_insn() stack usage.

Andrii Nakryiko (12):
  libbpf: Cleanup struct bpf_core_cand.
  libbpf: Use __u32 fields in bpf_map_create_opts
  libbpf: Add API to get/set log_level at per-program level
  libbpf: Deprecate bpf_prog_load_xattr() API
  libbpf: Fix bpf_prog_load() log_buf logic for log_level 0
  libbpf: Add OPTS-based bpf_btf_load() API
  libbpf: Allow passing preallocated log_buf when loading BTF into
    kernel
  libbpf: Allow passing user log setting through bpf_object_open_opts
  libbpf: Improve logging around BPF program loading
  libbpf: Preserve kernel error code and remove kprobe prog type
    guessing
  libbpf: Add per-program log buffer setter and getter
  libbpf: Deprecate bpf_object__load_xattr()

Eric Dumazet (1):
  tools: sync uapi/linux/if_link.h header

Grant Seltzer (1):
  libbpf: Add doc comments in libbpf.h

Joanne Koong (1):
  bpf: Add bpf_loop helper

Kumar Kartikeya Dwivedi (2):
  libbpf: Avoid double stores for success/failure case of ksym
    relocations
  libbpf: Avoid reload of imm for weak, unresolved, repeating ksym

Mehrdad Arshad Rad (1):
  libbpf: Remove duplicate assignments

Shuyi Cheng (1):
  libbpf: Add "bool skipped" to struct bpf_map

Vincent Minet (1):
  libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition

huangxuesen (1):
  libbpf: Fix trivial typo

 include/uapi/linux/bpf.h     | 103 +++++++++-
 include/uapi/linux/if_link.h | 293 +++++++++++++++++++++++----
 src/bpf.c                    |  88 ++++++---
 src/bpf.h                    |  30 ++-
 src/bpf_gen_internal.h       |   4 +
 src/btf.c                    |  82 +++++---
 src/gen_loader.c             | 114 ++++++++---
 src/libbpf.c                 | 371 ++++++++++++++++++++++++-----------
 src/libbpf.h                 | 109 +++++++++-
 src/libbpf.map               |   9 +
 src/libbpf_common.h          |   5 +
 src/libbpf_internal.h        |   3 +-
 src/libbpf_probes.c          |   2 +-
 src/libbpf_version.h         |   2 +-
 src/relo_core.c              | 231 ++++++++++++----------
 src/relo_core.h              | 103 +++-------
 16 files changed, 1138 insertions(+), 411 deletions(-)

--
2.30.2
2021-12-10 16:17:33 -08:00
Andrii Nakryiko
1f83414ea4 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2021-12-10 14:03:13 -08:00
Eric Dumazet
a0ddf21c92 tools: sync uapi/linux/if_link.h header
This file has not been updated for a while.

Sync it before BIG TCP patch series.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20211122184810.769159-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10 14:03:13 -08:00
Shuyi Cheng
bb14c6f5b5 libbpf: Add "bool skipped" to struct bpf_map
Fix error: "failed to pin map: Bad file descriptor, path:
/sys/fs/bpf/_rodata_str1_1."

In the old kernel, the global data map will not be created, see [0]. So
we should skip the pinning of the global data map to avoid
bpf_object__pin_maps returning error. Therefore, when the map is not
created, we mark “map->skipped" as true and then check during relocation
and during pinning.

Fixes: 16e0c35c6f7a ("libbpf: Load global data maps lazily on legacy kernels")
Signed-off-by: Shuyi Cheng <chengshuyi@linux.alibaba.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-12-10 14:03:13 -08:00
Vincent Minet
c7dedfe23f libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition
The btf__dedup_deprecated name was misspelled in the definition of the
compat symbol for btf__dedup. This leads it to be missing from the
shared library.

This fixes it.

Fixes: 957d350a8b94 ("libbpf: Turn btf_dedup_opts into OPTS-based struct")
Signed-off-by: Vincent Minet <vincent@vincent-minet.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211210063112.80047-1-vincent@vincent-minet.net
2021-12-10 14:03:13 -08:00
Andrii Nakryiko
f13e766fa4 libbpf: Deprecate bpf_object__load_xattr()
Deprecate non-extensible bpf_object__load_xattr() in v0.8 ([0]).

With log_level control through bpf_object_open_opts or
bpf_program__set_log_level(), we are finally at the point where
bpf_object__load_xattr() doesn't provide any functionality that can't be
accessed through other (better) ways. The other feature,
target_btf_path, is also controllable through bpf_object_open_opts.

  [0] Closes: https://github.com/libbpf/libbpf/issues/289

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211209193840.1248570-9-andrii@kernel.org
2021-12-10 14:03:13 -08:00
Andrii Nakryiko
90910812b5 libbpf: Add per-program log buffer setter and getter
Allow to set user-provided log buffer on a per-program basis ([0]). This
gives great deal of flexibility in terms of which programs are loaded
with logging enabled and where corresponding logs go.

Log buffer set with bpf_program__set_log_buf() overrides kernel_log_buf
and kernel_log_size settings set at bpf_object open time through
bpf_object_open_opts, if any.

Adjust bpf_object_load_prog_instance() logic to not perform own log buf
allocation and load retry if custom log buffer is provided by the user.

  [0] Closes: https://github.com/libbpf/libbpf/issues/418

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211209193840.1248570-8-andrii@kernel.org
2021-12-10 14:03:13 -08:00
Andrii Nakryiko
eb9d74e7ad libbpf: Preserve kernel error code and remove kprobe prog type guessing
Instead of rewriting error code returned by the kernel of prog load with
libbpf-sepcific variants pass through the original error.

There is now also no need to have a backup generic -LIBBPF_ERRNO__LOAD
fallback error as bpf_prog_load() guarantees that errno will be properly
set no matter what.

Also drop a completely outdated and pretty useless BPF_PROG_TYPE_KPROBE
guess logic. It's not necessary and neither it's helpful in modern BPF
applications.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211209193840.1248570-7-andrii@kernel.org
2021-12-10 14:03:13 -08:00
Andrii Nakryiko
d9b3fae391 libbpf: Improve logging around BPF program loading
Add missing "prog '%s': " prefixes in few places and use consistently
markers for beginning and end of program load logs. Here's an example of
log output:

libbpf: prog 'handler': BPF program load failed: Permission denied
libbpf: -- BEGIN PROG LOAD LOG ---
arg#0 reference type('UNKNOWN ') size cannot be determined: -22
; out1 = in1;
0: (18) r1 = 0xffffc9000cdcc000
2: (61) r1 = *(u32 *)(r1 +0)

...

81: (63) *(u32 *)(r4 +0) = r5
 R1_w=map_value(id=0,off=16,ks=4,vs=20,imm=0) R4=map_value(id=0,off=400,ks=4,vs=16,imm=0)
invalid access to map value, value_size=16 off=400 size=4
R4 min value is outside of the allowed memory range
processed 63 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
 -- END PROG LOAD LOG --
libbpf: failed to load program 'handler'
libbpf: failed to load object 'test_skeleton'

The entire verifier log, including BEGIN and END markers are now always
youtput during a single print callback call. This should make it much
easier to post-process or parse it, if necessary. It's not an explicit
API guarantee, but it can be reasonably expected to stay like that.

Also __bpf_object__open is renamed to bpf_object_open() as it's always
an adventure to find the exact function that implements bpf_object's
open phase, so drop the double underscored and use internal libbpf
naming convention.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211209193840.1248570-6-andrii@kernel.org
2021-12-10 14:03:13 -08:00
Andrii Nakryiko
dc1df24314 libbpf: Allow passing user log setting through bpf_object_open_opts
Allow users to provide their own custom log_buf, log_size, and log_level
at bpf_object level through bpf_object_open_opts. This log_buf will be
used during BTF loading. Subsequent patch will use same log_buf during
BPF program loading, unless overriden at per-bpf_program level.

When such custom log_buf is provided, libbpf won't be attempting
retrying loading of BTF to try to provide its own log buffer to capture
kernel's error log output. User is responsible to provide big enough
buffer, otherwise they run a risk of getting -ENOSPC error from the
bpf() syscall.

See also comments in bpf_object_open_opts regarding log_level and
log_buf interactions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211209193840.1248570-5-andrii@kernel.org
2021-12-10 14:03:13 -08:00
Andrii Nakryiko
0504b7ff22 libbpf: Allow passing preallocated log_buf when loading BTF into kernel
Add libbpf-internal btf_load_into_kernel() that allows to pass
preallocated log_buf and custom log_level to be passed into kernel
during BPF_BTF_LOAD call. When custom log_buf is provided,
btf_load_into_kernel() won't attempt an retry with automatically
allocated internal temporary buffer to capture BTF validation log.

It's important to note the relation between log_buf and log_level, which
slightly deviates from stricter kernel logic. From kernel's POV, if
log_buf is specified, log_level has to be > 0, and vice versa. While
kernel has good reasons to request such "sanity, this, in practice, is
a bit unconvenient and restrictive for libbpf's high-level bpf_object APIs.

So libbpf will allow to set non-NULL log_buf and log_level == 0. This is
fine and means to attempt to load BTF without logging requested, but if
it failes, retry the load with custom log_buf and log_level 1. Similar
logic will be implemented for program loading. In practice this means
that users can provide custom log buffer just in case error happens, but
not really request slower verbose logging all the time. This is also
consistent with libbpf behavior when custom log_buf is not set: libbpf
first tries to load everything with log_level=0, and only if error
happens allocates internal log buffer and retries with log_level=1.

Also, while at it, make BTF validation log more obvious and follow the log
pattern libbpf is using for dumping BPF verifier log during
BPF_PROG_LOAD. BTF loading resulting in an error will look like this:

libbpf: BTF loading error: -22
libbpf: -- BEGIN BTF LOAD LOG ---
magic: 0xeb9f
version: 1
flags: 0x0
hdr_len: 24
type_off: 0
type_len: 1040
str_off: 1040
str_len: 2063598257
btf_total_size: 1753
Total section length too long
-- END BTF LOAD LOG --
libbpf: Error loading .BTF into kernel: -22. BTF is optional, ignoring.

This makes it much easier to find relevant parts in libbpf log output.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211209193840.1248570-4-andrii@kernel.org
2021-12-10 14:03:13 -08:00
Andrii Nakryiko
3c93f7ddb2 libbpf: Add OPTS-based bpf_btf_load() API
Similar to previous bpf_prog_load() and bpf_map_create() APIs, add
bpf_btf_load() API which is taking optional OPTS struct. Schedule
bpf_load_btf() for deprecation in v0.8 ([0]).

This makes naming consistent with BPF_BTF_LOAD command, sets up an API
for extensibility in the future, moves options parameters (log-related
fields) into optional options, and also allows to pass log_level
directly.

It also removes log buffer auto-allocation logic from low-level API
(consistent with bpf_prog_load() behavior), but preserves a special
treatment of log_level == 0 with non-NULL log_buf, which matches
low-level bpf_prog_load() and high-level libbpf APIs for BTF and program
loading behaviors.

  [0] Closes: https://github.com/libbpf/libbpf/issues/419

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211209193840.1248570-3-andrii@kernel.org
2021-12-10 14:03:13 -08:00
Andrii Nakryiko
896a3ae0d0 libbpf: Fix bpf_prog_load() log_buf logic for log_level 0
To unify libbpf APIs behavior w.r.t. log_buf and log_level, fix
bpf_prog_load() to follow the same logic as bpf_btf_load() and
high-level bpf_object__load() API will follow in the subsequent patches:
  - if log_level is 0 and non-NULL log_buf is provided by a user, attempt
    load operation initially with no log_buf and log_level set;
  - if successful, we are done, return new FD;
  - on error, retry the load operation with log_level bumped to 1 and
    log_buf set; this way verbose logging will be requested only when we
    are sure that there is a failure, but will be fast in the
    common/expected success case.

Of course, user can still specify log_level > 0 from the very beginning
to force log collection.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211209193840.1248570-2-andrii@kernel.org
2021-12-10 14:03:13 -08:00
Grant Seltzer
728b1721e5 libbpf: Add doc comments in libbpf.h
This adds comments above functions in libbpf.h which document
their uses. These comments are of a format that doxygen and sphinx
can pick up and render. These are rendered by libbpf.readthedocs.org

These doc comments are for:

- bpf_object__open_file()
- bpf_object__open_mem()
- bpf_program__attach_uprobe()
- bpf_program__attach_uprobe_opts()

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211206203709.332530-1-grantseltzer@gmail.com
2021-12-10 14:03:13 -08:00
huangxuesen
04941813a5 libbpf: Fix trivial typo
Fix typo in comment from 'bpf_skeleton_map' to 'bpf_map_skeleton'
and from 'bpf_skeleton_prog' to 'bpf_prog_skeleton'.

Signed-off-by: huangxuesen <huangxuesen@kuaishou.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1638755236-3851199-1-git-send-email-hxseverything@gmail.com
2021-12-10 14:03:13 -08:00
Alexei Starovoitov
c3c540b402 libbpf: Reduce bpf_core_apply_relo_insn() stack usage.
Reduce bpf_core_apply_relo_insn() stack usage and bump
BPF_CORE_SPEC_MAX_LEN limit back to 64.

Fixes: 29db4bea1d10 ("bpf: Prepare relo_core.c for kernel duty.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211203182836.16646-1-alexei.starovoitov@gmail.com
2021-12-10 14:03:13 -08:00
Andrii Nakryiko
b633ace366 libbpf: Deprecate bpf_prog_load_xattr() API
bpf_prog_load_xattr() is high-level API that's named as a low-level
BPF_PROG_LOAD wrapper APIs, but it actually operates on struct
bpf_object. It's badly and confusingly misnamed as it will load all the
progs insige bpf_object, returning prog_fd of the very first BPF
program. It also has a bunch of ad-hoc things like log_level override,
map_ifindex auto-setting, etc. All this can be expressed more explicitly
and cleanly through existing libbpf APIs. This patch marks
bpf_prog_load_xattr() for deprecation in libbpf v0.8 ([0]).

  [0] Closes: https://github.com/libbpf/libbpf/issues/308

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211201232824.3166325-10-andrii@kernel.org
2021-12-10 14:03:13 -08:00
Andrii Nakryiko
d761220e33 libbpf: Add API to get/set log_level at per-program level
Add bpf_program__set_log_level() and bpf_program__log_level() to fetch
and adjust log_level sent during BPF_PROG_LOAD command. This allows to
selectively request more or less verbose output in BPF verifier log.

Also bump libbpf version to 0.7 and make these APIs the first in v0.7.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211201232824.3166325-3-andrii@kernel.org
2021-12-10 14:03:13 -08:00
Andrii Nakryiko
ac1c007607 libbpf: Use __u32 fields in bpf_map_create_opts
Corresponding Linux UAPI struct uses __u32, not int, so keep it
consistent.

Fixes: 992c4225419a ("libbpf: Unify low-level map creation APIs w/ new bpf_map_create()")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211201232824.3166325-2-andrii@kernel.org
2021-12-10 14:03:13 -08:00
Alexei Starovoitov
01b2b45e8d libbpf: Clean gen_loader's attach kind.
The gen_loader has to clear attach_kind otherwise the programs
without attach_btf_id will fail load if they follow programs
with attach_btf_id.

Fixes: 67234743736a ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201181040.23337-12-alexei.starovoitov@gmail.com
2021-12-10 14:03:13 -08:00
Alexei Starovoitov
a7935b996f libbpf: Support init of inner maps in light skeleton.
Add ability to initialize inner maps in light skeleton.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201181040.23337-11-alexei.starovoitov@gmail.com
2021-12-10 14:03:13 -08:00
Alexei Starovoitov
5f887b332c libbpf: Use CO-RE in the kernel in light skeleton.
Without lskel the CO-RE relocations are processed by libbpf before any other
work is done. Instead, when lskel is needed, remember relocation as RELO_CORE
kind. Then when loader prog is generated for a given bpf program pass CO-RE
relos of that program to gen loader via bpf_gen__record_relo_core(). The gen
loader will remember them as-is and pass it later as-is into the kernel.

The normal libbpf flow is to process CO-RE early before call relos happen. In
case of gen_loader the core relos have to be added to other relos to be copied
together when bpf static function is appended in different places to other main
bpf progs. During the copy the append_subprog_relos() will adjust insn_idx for
normal relos and for RELO_CORE kind too. When that is done each struct
reloc_desc has good relos for specific main prog.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201181040.23337-10-alexei.starovoitov@gmail.com
2021-12-10 14:03:13 -08:00
Andrii Nakryiko
20e7ed521a libbpf: Cleanup struct bpf_core_cand.
Remove two redundant fields from struct bpf_core_cand.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201181040.23337-8-alexei.starovoitov@gmail.com
2021-12-10 14:03:13 -08:00
Alexei Starovoitov
d785a21c71 bpf: Pass a set of bpf_core_relo-s to prog_load command.
struct bpf_core_relo is generated by llvm and processed by libbpf.
It's a de-facto uapi.
With CO-RE in the kernel the struct bpf_core_relo becomes uapi de-jure.
Add an ability to pass a set of 'struct bpf_core_relo' to prog_load command
and let the kernel perform CO-RE relocations.

Note the struct bpf_line_info and struct bpf_func_info have the same
layout when passed from LLVM to libbpf and from libbpf to the kernel
except "insn_off" fields means "byte offset" when LLVM generates it.
Then libbpf converts it to "insn index" to pass to the kernel.
The struct bpf_core_relo's "insn_off" field is always "byte offset".

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201181040.23337-6-alexei.starovoitov@gmail.com
2021-12-10 14:03:13 -08:00
Alexei Starovoitov
8e43882e53 bpf: Define enum bpf_core_relo_kind as uapi.
enum bpf_core_relo_kind is generated by llvm and processed by libbpf.
It's a de-facto uapi.
With CO-RE in the kernel the bpf_core_relo_kind values become uapi de-jure.
Also rename them with BPF_CORE_ prefix to distinguish from conflicting names in
bpf_core_read.h. The enums bpf_field_info_kind, bpf_type_id_kind,
bpf_type_info_kind, bpf_enum_value_kind are passing different values from bpf
program into llvm.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201181040.23337-5-alexei.starovoitov@gmail.com
2021-12-10 14:03:13 -08:00
Alexei Starovoitov
2be7e6a830 bpf: Prepare relo_core.c for kernel duty.
Make relo_core.c to be compiled for the kernel and for user space libbpf.

Note the patch is reducing BPF_CORE_SPEC_MAX_LEN from 64 to 32.
This is the maximum number of nested structs and arrays.
For example:
 struct sample {
     int a;
     struct {
         int b[10];
     };
 };

 struct sample *s = ...;
 int *y = &s->b[5];
This field access is encoded as "0:1:0:5" and spec len is 4.

The follow up patch might bump it back to 64.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201181040.23337-4-alexei.starovoitov@gmail.com
2021-12-10 14:03:13 -08:00
Alexei Starovoitov
b9bd1f8682 libbpf: Replace btf__type_by_id() with btf_type_by_id().
To prepare relo_core.c to be compiled in the kernel and the user space
replace btf__type_by_id with btf_type_by_id.

In libbpf btf__type_by_id and btf_type_by_id have different behavior.

bpf_core_apply_relo_insn() needs behavior of uapi btf__type_by_id
vs internal btf_type_by_id, but type_id range check is already done
in bpf_core_apply_relo(), so it's safe to replace it everywhere.
The kernel btf_type_by_id() does the check anyway. It doesn't hurt.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201181040.23337-2-alexei.starovoitov@gmail.com
2021-12-10 14:03:13 -08:00
Kumar Kartikeya Dwivedi
c4da8092cc libbpf: Avoid reload of imm for weak, unresolved, repeating ksym
Alexei pointed out that we can use BPF_REG_0 which already contains imm
from move_blob2blob computation. Note that we now compare the second
insn's imm, but this should not matter, since both will be zeroed out
for the error case for the insn populated earlier.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211122235733.634914-4-memxor@gmail.com
2021-12-10 14:03:13 -08:00
Kumar Kartikeya Dwivedi
b2b45a3131 libbpf: Avoid double stores for success/failure case of ksym relocations
Instead, jump directly to success case stores in case ret >= 0, else do
the default 0 value store and jump over the success case. This is better
in terms of readability. Readjust the code for kfunc relocation as well
to follow a similar pattern, also leads to easier to follow code now.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211122235733.634914-3-memxor@gmail.com
2021-12-10 14:03:13 -08:00
Joanne Koong
73c8768db7 bpf: Add bpf_loop helper
This patch adds the kernel-side and API changes for a new helper
function, bpf_loop:

long bpf_loop(u32 nr_loops, void *callback_fn, void *callback_ctx,
u64 flags);

where long (*callback_fn)(u32 index, void *ctx);

bpf_loop invokes the "callback_fn" **nr_loops** times or until the
callback_fn returns 1. The callback_fn can only return 0 or 1, and
this is enforced by the verifier. The callback_fn index is zero-indexed.

A few things to please note:
~ The "u64 flags" parameter is currently unused but is included in
case a future use case for it arises.
~ In the kernel-side implementation of bpf_loop (kernel/bpf/bpf_iter.c),
bpf_callback_t is used as the callback function cast.
~ A program can have nested bpf_loop calls but the program must
still adhere to the verifier constraint of its stack depth (the stack depth
cannot exceed MAX_BPF_STACK))
~ Recursive callback_fns do not pass the verifier, due to the call stack
for these being too deep.
~ The next patch will include the tests and benchmark

Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211130030622.4131246-2-joannekoong@fb.com
2021-12-10 14:03:13 -08:00
Mehrdad Arshad Rad
bafda72319 libbpf: Remove duplicate assignments
There is a same action when load_attr.attach_btf_id is initialized.

Signed-off-by: Mehrdad Arshad Rad <arshad.rad@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211128193337.10628-1-arshad.rad@gmail.com
2021-12-10 14:03:13 -08:00
Andrii Nakryiko
33ec2ca026 sync: improve patch application process by using patch command
git apply -3 doesn't always leave conflicted files in the working
directory. Use patch --merge instead, it seems to work better in more
complicated situations.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-12-09 22:43:20 -08:00
Yucong Sun
7e89be4022 Migrate vmtest to modular actions in libbpf/ci 2021-12-09 14:09:04 -08:00
thiagoftsm
e61e089911 Merge branch 'libbpf:master' into master 2021-12-05 19:50:33 +00:00
Ilya Leoshkevich
93e89b3474 ci: upgrade s390x runner to v2.285.0
This is needed for composite actions with conditional steps.

Fixes #416.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2021-12-03 10:42:54 -08:00
thiagoftsm
b9d46530c3 Merge branch 'libbpf:master' into master 2021-12-02 19:07:38 +00:00
Quentin Monnet
4884bf3dbd ci: fix test on /exitstatus existence and size
The condition for the test is incorrect, we want to add a default exit
status if the file is empty. But [[ -s file ]] returns true if the file
exists and has a size greater than zero. Let's reverse the condition.

Fixes: 385b2d1738 ("ci: change VM's /exitstatus format to prepare it for several results")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
2021-12-01 15:46:22 -08:00
Andrii Nakryiko
690d0531f9 ci: whitelist legacy_printk tests on 4.9 and 5.5
legacy_printk selftests is specially designed to be runnable on old
kernels and validate libbpf's bpf_trace_printk-related macros. Run them
in CI.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-11-30 16:39:31 -08:00
Quentin Monnet
7cda69caeb ci: add folding markers to avoid getting output out of sections
Move commands to existing sections, or add markers for new sections when
no surrounding existing section is relevant. This is to avoid having
logs outside of subsection for the “vmtest” step of the GitHub action.
2021-11-30 14:39:02 -08:00
Quentin Monnet
1f7db672e4 ci: carry on after selftest failure and report test group results
Two changes come with this patch:

- Test groups report their exit status individually for the final
  summary, by appending it to /exitstatus in the VM. “Test groups” are
  test_maps, test_verifier, test_progs, and test_progs-no_alu32.

- For these separate reports to make sense, allow the CI action to carry
  on even after one of the groups fails, by adding "&& true" to the
  commands in order to neutralise the effect of the "set -e".
2021-11-30 14:39:02 -08:00
Quentin Monnet
385b2d1738 ci: change VM's /exitstatus format to prepare it for several results
We recently introduced a summary for the results of the different groups
of tests for the CI, displayed after the machine is shut down. There are
currently two groups, "bpftool" and "vm_tests". We want to split the
latter into different subgroups. For that, we will make each group of
tests that runs in the VM print its exit status to the /exitstatus file.

In preparation for this, let's update the format of this /exitstatus
file. Instead of containing just an integer, it now contains a line with
a group name, a colon, and the integer result. This is easy enough to
parse on the other end.

We also drop the associative array, and iterate on /exitstatus instead
to produce the summary: this way, the order of the checks is preserved.
2021-11-30 14:39:02 -08:00
Quentin Monnet
7f11cd48d6 ci: create helpers for formatting errors and notices
Create helpers for formatting errors and notices for GitHub actions,
instead of directly printing the double colons and attributes.
2021-11-30 14:39:02 -08:00
Andrii Nakryiko
4374bad784 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   8f6f41f39348f25db843f2fcb2f1c166b4bfa2d7
Checkpoint bpf-next commit: 43174f0d4597325cb91f1f1f55263eb6e6101036
Baseline bpf commit:        c0d95d3380ee099d735e08618c0d599e72f6c8b0
Checkpoint bpf commit:      c0d95d3380ee099d735e08618c0d599e72f6c8b0

Alan Maguire (1):
  libbpf: Silence uninitialized warning/error in btf_dump_dump_type_data

Hengqi Chen (1):
  libbpf: Support static initialization of BPF_MAP_TYPE_PROG_ARRAY

Tiezhu Yang (1):
  bpf, mips: Fix build errors about __NR_bpf undeclared

 src/bpf.c           |   6 ++
 src/btf_dump.c      |   2 +-
 src/libbpf.c        | 154 ++++++++++++++++++++++++++++++++++----------
 src/skel_internal.h |  10 +++
 4 files changed, 138 insertions(+), 34 deletions(-)

--
2.30.2
2021-11-29 11:20:00 -08:00
Alan Maguire
55b057565f libbpf: Silence uninitialized warning/error in btf_dump_dump_type_data
When compiling libbpf with gcc 4.8.5, we see:

  CC       staticobjs/btf_dump.o
btf_dump.c: In function ‘btf_dump_dump_type_data.isra.24’:
btf_dump.c:2296:5: error: ‘err’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  if (err < 0)
     ^
cc1: all warnings being treated as errors
make: *** [staticobjs/btf_dump.o] Error 1

While gcc 4.8.5 is too old to build the upstream kernel, it's possible it
could be used to build standalone libbpf which suffers from the same problem.
Silence the error by initializing 'err' to 0.  The warning/error seems to be
a false positive since err is set early in the function.  Regardless we
shouldn't prevent libbpf from building for this.

Fixes: 920d16af9b42 ("libbpf: BTF dumper support for typed data")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1638180040-8037-1-git-send-email-alan.maguire@oracle.com
2021-11-29 11:20:00 -08:00
Hengqi Chen
472c0726e8 libbpf: Support static initialization of BPF_MAP_TYPE_PROG_ARRAY
Support static initialization of BPF_MAP_TYPE_PROG_ARRAY with a
syntax similar to map-in-map initialization ([0]):

    SEC("socket")
    int tailcall_1(void *ctx)
    {
        return 0;
    }

    struct {
        __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
        __uint(max_entries, 2);
        __uint(key_size, sizeof(__u32));
        __array(values, int (void *));
    } prog_array_init SEC(".maps") = {
        .values = {
            [1] = (void *)&tailcall_1,
        },
    };

Here's the relevant part of libbpf debug log showing what's
going on with prog-array initialization:

libbpf: sec '.relsocket': collecting relocation for section(3) 'socket'
libbpf: sec '.relsocket': relo #0: insn #2 against 'prog_array_init'
libbpf: prog 'entry': found map 0 (prog_array_init, sec 4, off 0) for insn #0
libbpf: .maps relo #0: for 3 value 0 rel->r_offset 32 name 53 ('tailcall_1')
libbpf: .maps relo #0: map 'prog_array_init' slot [1] points to prog 'tailcall_1'
libbpf: map 'prog_array_init': created successfully, fd=5
libbpf: map 'prog_array_init': slot [1] set to prog 'tailcall_1' fd=6

  [0] Closes: https://github.com/libbpf/libbpf/issues/354

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211128141633.502339-2-hengqi.chen@gmail.com
2021-11-29 11:20:00 -08:00
Tiezhu Yang
c86cb27d5b bpf, mips: Fix build errors about __NR_bpf undeclared
Add the __NR_bpf definitions to fix the following build errors for mips:

  $ cd tools/bpf/bpftool
  $ make
  [...]
  bpf.c:54:4: error: #error __NR_bpf not defined. libbpf does not support your arch.
   #  error __NR_bpf not defined. libbpf does not support your arch.
      ^~~~~
  bpf.c: In function ‘sys_bpf’:
  bpf.c:66:17: error: ‘__NR_bpf’ undeclared (first use in this function); did you mean ‘__NR_brk’?
    return syscall(__NR_bpf, cmd, attr, size);
                   ^~~~~~~~
                   __NR_brk
  [...]
  In file included from gen_loader.c:15:0:
  skel_internal.h: In function ‘skel_sys_bpf’:
  skel_internal.h:53:17: error: ‘__NR_bpf’ undeclared (first use in this function); did you mean ‘__NR_brk’?
    return syscall(__NR_bpf, cmd, attr, size);
                   ^~~~~~~~
                   __NR_brk

We can see the following generated definitions:

  $ grep -r "#define __NR_bpf" arch/mips
  arch/mips/include/generated/uapi/asm/unistd_o32.h:#define __NR_bpf (__NR_Linux + 355)
  arch/mips/include/generated/uapi/asm/unistd_n64.h:#define __NR_bpf (__NR_Linux + 315)
  arch/mips/include/generated/uapi/asm/unistd_n32.h:#define __NR_bpf (__NR_Linux + 319)

The __NR_Linux is defined in arch/mips/include/uapi/asm/unistd.h:

  $ grep -r "#define __NR_Linux" arch/mips
  arch/mips/include/uapi/asm/unistd.h:#define __NR_Linux	4000
  arch/mips/include/uapi/asm/unistd.h:#define __NR_Linux	5000
  arch/mips/include/uapi/asm/unistd.h:#define __NR_Linux	6000

That is to say, __NR_bpf is:

  4000 + 355 = 4355 for mips o32,
  6000 + 319 = 6319 for mips n32,
  5000 + 315 = 5315 for mips n64.

So use the GCC pre-defined macro _ABIO32, _ABIN32 and _ABI64 [1] to define
the corresponding __NR_bpf.

This patch is similar with commit bad1926dd2f6 ("bpf, s390: fix build for
libbpf and selftest suite").

  [1] https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/mips/mips.h#l549

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1637804167-8323-1-git-send-email-yangtiezhu@loongson.cn
2021-11-29 11:20:00 -08:00
Andrii Nakryiko
3ef05a585e sync: try harder when git am -3 fails
`git am -3` will give up frequently even in cases when patch can be
auto-merged with:

```
Applying: libbpf: Unify low-level map creation APIs w/ new bpf_map_create()
error: sha1 information is lacking or useless (src/libbpf.c).
error: could not build fake ancestor
Patch failed at 0001 libbpf: Unify low-level map creation APIs w/ new bpf_map_create()
```

But `git apply -3` in the same situation will succeed with three-way merge just
fine:

```
error: patch failed: src/bpf_gen_internal.h:51
Falling back to three-way merge...
Applied patch to 'src/bpf_gen_internal.h' cleanly.
```

So if git am fails, try git apply and if that succeeds, automatically
`git am --continue`. If not, fallback to user actions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-11-26 13:51:29 -08:00
Andrii Nakryiko
493bfa8a59 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   fa721d4f0b91f525339996f4faef7bb072d70162
Checkpoint bpf-next commit: 8f6f41f39348f25db843f2fcb2f1c166b4bfa2d7
Baseline bpf commit:        c0d95d3380ee099d735e08618c0d599e72f6c8b0
Checkpoint bpf commit:      c0d95d3380ee099d735e08618c0d599e72f6c8b0

Andrii Nakryiko (8):
  libbpf: Load global data maps lazily on legacy kernels
  libbpf: Unify low-level map creation APIs w/ new bpf_map_create()
  libbpf: Use bpf_map_create() consistently internally
  libbpf: Prevent deprecation warnings in xsk.c
  libbpf: Fix potential misaligned memory access in btf_ext__new()
  libbpf: Don't call libc APIs with NULL pointers
  libbpf: Fix glob_syms memory leak in bpf_linker
  libbpf: Fix using invalidated memory in bpf_linker

 src/bpf.c              | 140 +++++++++++++++++------------------------
 src/bpf.h              |  33 +++++++++-
 src/bpf_gen_internal.h |   5 +-
 src/btf.c              |  10 +--
 src/btf.h              |   2 +-
 src/gen_loader.c       |  46 +++++---------
 src/libbpf.c           | 107 +++++++++++++++++--------------
 src/libbpf.map         |   1 +
 src/libbpf_internal.h  |  21 -------
 src/libbpf_probes.c    |  30 ++++-----
 src/linker.c           |   6 +-
 src/skel_internal.h    |   3 +-
 src/xsk.c              |  18 +++---
 13 files changed, 204 insertions(+), 218 deletions(-)

--
2.30.2
2021-11-26 13:51:29 -08:00
Andrii Nakryiko
9f006f1ed6 libbpf: Fix using invalidated memory in bpf_linker
add_dst_sec() can invalidate bpf_linker's section index making
dst_symtab pointer pointing into unallocated memory. Reinitialize
dst_symtab pointer on each iteration to make sure it's always valid.

Fixes: faf6ed321cf6 ("libbpf: Add BPF static linker APIs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211124002325.1737739-7-andrii@kernel.org
2021-11-26 13:51:29 -08:00
Andrii Nakryiko
5fc0d66cad libbpf: Fix glob_syms memory leak in bpf_linker
glob_syms array wasn't freed on bpf_link__free(). Fix that.

Fixes: a46349227cd8 ("libbpf: Add linker extern resolution support for functions and global variables")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211124002325.1737739-6-andrii@kernel.org
2021-11-26 13:51:29 -08:00
Andrii Nakryiko
37c3e92657 libbpf: Don't call libc APIs with NULL pointers
Sanitizer complains about qsort(), bsearch(), and memcpy() being called
with NULL pointer. This can only happen when the associated number of
elements is zero, so no harm should be done. But still prevent this from
happening to keep sanitizer runs clean from extra noise.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211124002325.1737739-5-andrii@kernel.org
2021-11-26 13:51:29 -08:00
Andrii Nakryiko
25eb5c4e02 libbpf: Fix potential misaligned memory access in btf_ext__new()
Perform a memory copy before we do the sanity checks of btf_ext_hdr.
This prevents misaligned memory access if raw btf_ext data is not 4-byte
aligned ([0]).

While at it, also add missing const qualifier.

  [0] Closes: https://github.com/libbpf/libbpf/issues/391

Fixes: 2993e0515bb4 ("tools/bpf: add support to read .BTF.ext sections")
Reported-by: Evgeny Vereshchagin <evvers@ya.ru>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211124002325.1737739-3-andrii@kernel.org
2021-11-26 13:51:29 -08:00
Andrii Nakryiko
07e4e0cb04 libbpf: Prevent deprecation warnings in xsk.c
xsk.c is using own APIs that are marked for deprecation internally.
Given xsk.c and xsk.h will be gone in libbpf 1.0, there is no reason to
do public vs internal function split just to avoid deprecation warnings.
So just add a pragma to silence deprecation warnings (until the code is
removed completely).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211124193233.3115996-4-andrii@kernel.org
2021-11-26 13:51:29 -08:00
Andrii Nakryiko
316b60fa89 libbpf: Use bpf_map_create() consistently internally
Remove all the remaining uses of to-be-deprecated bpf_create_map*() APIs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211124193233.3115996-3-andrii@kernel.org
2021-11-26 13:51:29 -08:00
Andrii Nakryiko
6cfb97c561 libbpf: Unify low-level map creation APIs w/ new bpf_map_create()
Mark the entire zoo of low-level map creation APIs for deprecation in
libbpf 0.7 ([0]) and introduce a new bpf_map_create() API that is
OPTS-based (and thus future-proof) and matches the BPF_MAP_CREATE
command name.

While at it, ensure that gen_loader sends map_extra field. Also remove
now unneeded btf_key_type_id/btf_value_type_id logic that libbpf is
doing anyways.

  [0] Closes: https://github.com/libbpf/libbpf/issues/282

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211124193233.3115996-2-andrii@kernel.org
2021-11-26 13:51:29 -08:00
Andrii Nakryiko
5c31bcf220 libbpf: Load global data maps lazily on legacy kernels
Load global data maps lazily, if kernel is too old to support global
data. Make sure that programs are still correct by detecting if any of
the to-be-loaded programs have relocation against any of such maps.

This allows to solve the issue ([0]) with bpf_printk() and Clang
generating unnecessary and unreferenced .rodata.strX.Y sections, but it
also goes further along the CO-RE lines, allowing to have a BPF object
in which some code can work on very old kernels and relies only on BPF
maps explicitly, while other BPF programs might enjoy global variable
support. If such programs are correctly set to not load at runtime on
old kernels, bpf_object will load and function correctly now.

  [0] https://lore.kernel.org/bpf/CAK-59YFPU3qO+_pXWOH+c1LSA=8WA1yabJZfREjOEXNHAqgXNg@mail.gmail.com/

Fixes: aed659170a31 ("libbpf: Support multiple .rodata.* and .data.* BPF maps")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211123200105.387855-1-andrii@kernel.org
2021-11-26 13:51:29 -08:00
Andrii Nakryiko
5b4dbd8141 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   d41bc48bfab2076f7db88d079a3a3203dd9c4a54
Checkpoint bpf-next commit: fa721d4f0b91f525339996f4faef7bb072d70162
Baseline bpf commit:        099f896f498a2b26d84f4ddae039b2c542c18b48
Checkpoint bpf commit:      c0d95d3380ee099d735e08618c0d599e72f6c8b0

Andrii Nakryiko (2):
  libbpf: Add runtime APIs to query libbpf version
  libbpf: Accommodate DWARF/compiler bug with duplicated structs

Dave Tucker (1):
  bpf, docs: Fix ordering of bpf documentation

Florent Revest (1):
  libbpf: Change bpf_program__set_extra_flags to bpf_program__set_flags

 docs/index.rst |  4 ++--
 src/btf.c      | 45 +++++++++++++++++++++++++++++++++++++++++----
 src/libbpf.c   | 23 +++++++++++++++++++++--
 src/libbpf.h   |  6 +++++-
 src/libbpf.map |  5 ++++-
 5 files changed, 73 insertions(+), 10 deletions(-)

--
2.30.2
2021-11-23 23:04:18 -08:00
Florent Revest
14e12f4290 libbpf: Change bpf_program__set_extra_flags to bpf_program__set_flags
bpf_program__set_extra_flags has just been introduced so we can still
change it without breaking users.

This new interface is a bit more flexible (for example if someone wants
to clear a flag).

Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211119180035.1396139-1-revest@chromium.org
2021-11-23 23:04:18 -08:00
Andrii Nakryiko
60ce9af668 libbpf: Accommodate DWARF/compiler bug with duplicated structs
According to [0], compilers sometimes might produce duplicate DWARF
definitions for exactly the same struct/union within the same
compilation unit (CU). We've had similar issues with identical arrays
and handled them with a similar workaround in 6b6e6b1d09aa ("libbpf:
Accomodate DWARF/compiler bug with duplicated identical arrays"). Do the
same for struct/union by ensuring that two structs/unions are exactly
the same, down to the integer values of field referenced type IDs.

Solving this more generically (allowing referenced types to be
equivalent, but using different type IDs, all within a single CU)
requires a huge complexity increase to handle many-to-many mappings
between canonidal and candidate type graphs. Before we invest in that,
let's see if this approach handles all the instances of this issue in
practice. Thankfully it's pretty rare, it seems.

  [0] https://lore.kernel.org/bpf/YXr2NFlJTAhHdZqq@krava/

Reported-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211117194114.347675-1-andrii@kernel.org
2021-11-23 23:04:18 -08:00
Andrii Nakryiko
d29571725a libbpf: Add runtime APIs to query libbpf version
Libbpf provided LIBBPF_MAJOR_VERSION and LIBBPF_MINOR_VERSION macros to
check libbpf version at compilation time. This doesn't cover all the
needs, though, because version of libbpf that application is compiled
against doesn't necessarily match the version of libbpf at runtime,
especially if libbpf is used as a shared library.

Add libbpf_major_version() and libbpf_minor_version() returning major
and minor versions, respectively, as integers. Also add a convenience
libbpf_version_string() for various tooling using libbpf to print out
libbpf version in a human-readable form. Currently it will return
"v0.6", but in the future it can contains some extra information, so the
format itself is not part of a stable API and shouldn't be relied upon.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20211118174054.2699477-1-andrii@kernel.org
2021-11-23 23:04:18 -08:00
Dave Tucker
842c5b7bff bpf, docs: Fix ordering of bpf documentation
This commit fixes the display of the BPF documentation in the sidebar
when rendered as HTML.

Before this patch, the sidebar would render as follows for some
sections:

| BPF Documentation
  |- BPF Type Format (BTF)
    |- BPF Type Format (BTF)

This was due to creating a heading in index.rst followed by
a sphinx toctree, where the file referenced carries the same
title as the section heading.

To fix this I applied a pattern that has been established in other
subfolders of Documentation:

1. Re-wrote index.rst to have a single toctree
2. Split the sections out in to their own files

Additionally maps.rst and programs.rst make use of a glob pattern to
include map_* or prog_* rst files in their toctree, meaning future map
or program type documentation will be automatically included.

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1a1eed800e7b9dc13b458de113a489641519b0cc.1636749493.git.dave@dtucker.co.uk
2021-11-23 23:04:18 -08:00
Quentin Monnet
9109d6a4b4 ci: create summary for tests and account for bpftool checks result
The bpftool checks work as expected when the CI runs, except that they
do not set any error status code for the script on error, which means
that the failures are lost among the logs and not reported in a clear
way to the reviewers.

This commit aims at fixing the issue. We could simply exit with a
non-zero error status when the bpftool checks, but that would prevent
the other tests from running. Instead, we propose to store the result of
the bpftool checks in a bash array. This array can later be reused to
print a summary of the different groups of tests, at the end of the CI
run, to help the reviewers understand where the failure happened without
having to manually unfold all the sections on the GitHub interface.

Currently, there are only two groups: the bpftool checks and the "VM
tests". The latter may later be split into test_maps, test_progs,
test_progs-no_alu32, etc. by teaching each of them to append their exit
status code to the "exitstatus" file.

Fixes: 88649fe655 ("ci: run script to test bpftool types/options sync")
2021-11-18 11:19:36 -08:00
Quentin Monnet
eab19ffead ci: pass shutdown fold description to fold command
For displaying a coloured title for the shutdown section in the logs,
instead of having the colour control codes directly written in run.sh,
we can pass the section title as an argument to "travis_fold()" and have
it format and print it for us.

This is cleaner, and slightly more in-line with what we do in the CI
files of the vmtest repository.
2021-11-18 11:19:36 -08:00
Andrii Nakryiko
94a49850c5 Makefile: enforce gnu89 standard
libbpf conforms to kernel style and uses the same -std=gnu89 standard
for compilation. So enforce it on Github projection as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-11-16 13:17:03 -08:00
Andrii Nakryiko
d71409b508 Makefile: don't hide relevant parts of file path
File path has shared vs static path, it's useful to see. So preserve it
in pretty output.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-11-16 13:17:03 -08:00
Andrii Nakryiko
f0ecdeed3a sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   9faaffbe85edcbdc54096f7f87baa3bc4842a7e2
Checkpoint bpf-next commit: d41bc48bfab2076f7db88d079a3a3203dd9c4a54
Baseline bpf commit:        5833291ab6de9c3e2374336b51c814e515e8f3a5
Checkpoint bpf commit:      099f896f498a2b26d84f4ddae039b2c542c18b48

Kumar Kartikeya Dwivedi (1):
  libbpf: Perform map fd cleanup for gen_loader in case of error

Tiezhu Yang (1):
  bpf: Change value of MAX_TAIL_CALL_CNT from 32 to 33

Yonghong Song (1):
  libbpf: Fix a couple of missed btf_type_tag handling in btf.c

 include/uapi/linux/bpf.h |  2 +-
 src/bpf_gen_internal.h   |  4 ++--
 src/btf.c                |  2 ++
 src/gen_loader.c         | 47 +++++++++++++++++++++++++---------------
 src/libbpf.c             |  4 ++--
 5 files changed, 37 insertions(+), 22 deletions(-)

--
2.30.2
2021-11-16 13:16:07 -08:00
Andrii Nakryiko
d924fa62ee sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2021-11-16 13:16:07 -08:00
Kumar Kartikeya Dwivedi
0f5a62b2d8 libbpf: Perform map fd cleanup for gen_loader in case of error
Alexei reported a fd leak issue in gen loader (when invoked from
bpftool) [0]. When adding ksym support, map fd allocation was moved from
stack to loader map, however I missed closing these fds (relevant when
cleanup label is jumped to on error). For the success case, the
allocated fd is returned in loader ctx, hence this problem is not
noticed.

Make three changes, first MAX_USED_MAPS in MAX_FD_ARRAY_SZ instead of
MAX_USED_PROGS, the braino was not a problem until now for this case as
we didn't try to close map fds (otherwise use of it would have tried
closing 32 additional fds in ksym btf fd range). Then, do a cleanup for
all nr_maps fds in cleanup label code, so that in case of error all
temporary map fds from bpf_gen__map_create are closed.

Then, adjust the cleanup label to only generate code for the required
number of program and map fds.  To trim code for remaining program
fds, lay out prog_fd array in stack in the end, so that we can
directly skip the remaining instances.  Still stack size remains same,
since changing that would require changes in a lot of places
(including adjustment of stack_off macro), so nr_progs_sz variable is
only used to track required number of iterations (and jump over
cleanup size calculated from that), stack offset calculation remains
unaffected.

The difference for test_ksyms_module.o is as follows:
libbpf: //prog cleanup iterations: before = 34, after = 5
libbpf: //maps cleanup iterations: before = 64, after = 2

Also, move allocation of gen->fd_array offset to bpf_gen__init. Since
offset can now be 0, and we already continue even if add_data returns 0
in case of failure, we do not need to distinguish between 0 offset and
failure case 0, as we rely on bpf_gen__finish to check errors. We can
also skip check for gen->fd_array in add_*_fd functions, since
bpf_gen__init will take care of it.

  [0]: https://lore.kernel.org/bpf/CAADnVQJ6jSitKSNKyxOrUzwY2qDRX0sPkJ=VLGHuCLVJ=qOt9g@mail.gmail.com

Fixes: 18f4fccbf314 ("libbpf: Update gen_loader to emit BTF_KIND_FUNC relocations")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211112232022.899074-1-memxor@gmail.com
2021-11-16 13:16:07 -08:00
Tiezhu Yang
5ca49d2b32 bpf: Change value of MAX_TAIL_CALL_CNT from 32 to 33
In the current code, the actual max tail call count is 33 which is greater
than MAX_TAIL_CALL_CNT (defined as 32). The actual limit is not consistent
with the meaning of MAX_TAIL_CALL_CNT and thus confusing at first glance.
We can see the historical evolution from commit 04fd61ab36ec ("bpf: allow
bpf programs to tail-call other bpf programs") and commit f9dabe016b63
("bpf: Undo off-by-one in interpreter tail call count limit"). In order
to avoid changing existing behavior, the actual limit is 33 now, this is
reasonable.

After commit 874be05f525e ("bpf, tests: Add tail call test suite"), we can
see there exists failed testcase.

On all archs when CONFIG_BPF_JIT_ALWAYS_ON is not set:
 # echo 0 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf
 # dmesg | grep -w FAIL
 Tail call error path, max count reached jited:0 ret 34 != 33 FAIL

On some archs:
 # echo 1 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf
 # dmesg | grep -w FAIL
 Tail call error path, max count reached jited:1 ret 34 != 33 FAIL

Although the above failed testcase has been fixed in commit 18935a72eb25
("bpf/tests: Fix error in tail call limit tests"), it would still be good
to change the value of MAX_TAIL_CALL_CNT from 32 to 33 to make the code
more readable.

The 32-bit x86 JIT was using a limit of 32, just fix the wrong comments and
limit to 33 tail calls as the constant MAX_TAIL_CALL_CNT updated. For the
mips64 JIT, use "ori" instead of "addiu" as suggested by Johan Almbladh.
For the riscv JIT, use RV_REG_TCC directly to save one register move as
suggested by Björn Töpel. For the other implementations, no function changes,
it does not change the current limit 33, the new value of MAX_TAIL_CALL_CNT
can reflect the actual max tail call count, the related tail call testcases
in test_bpf module and selftests can work well for the interpreter and the
JIT.

Here are the test results on x86_64:

 # uname -m
 x86_64
 # echo 0 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf test_suite=test_tail_calls
 # dmesg | tail -1
 test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed]
 # rmmod test_bpf
 # echo 1 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf test_suite=test_tail_calls
 # dmesg | tail -1
 test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed]
 # rmmod test_bpf
 # ./test_progs -t tailcalls
 #142 tailcalls:OK
 Summary: 1/11 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/1636075800-3264-1-git-send-email-yangtiezhu@loongson.cn
2021-11-16 13:16:07 -08:00
Yonghong Song
219c8e11e0 libbpf: Fix a couple of missed btf_type_tag handling in btf.c
Commit 2dc1e488e5cd ("libbpf: Support BTF_KIND_TYPE_TAG") added the
BTF_KIND_TYPE_TAG support. But to test vmlinux build with ...

  #define __user __attribute__((btf_type_tag("user")))

... I needed to sync libbpf repo and manually copy libbpf sources to
pahole. To simplify process, I used BTF_KIND_RESTRICT to simulate the
BTF_KIND_TYPE_TAG with vmlinux build as "restrict" modifier is barely
used in kernel.

But this approach missed one case in dedup with structures where
BTF_KIND_RESTRICT is handled and BTF_KIND_TYPE_TAG is not handled in
btf_dedup_is_equiv(), and this will result in a pahole dedup failure.
This patch fixed this issue and a selftest is added in the subsequent
patch to test this scenario.

The other missed handling is in btf__resolve_size(). Currently the compiler
always emit like PTR->TYPE_TAG->... so in practice we don't hit the missing
BTF_KIND_TYPE_TAG handling issue with compiler generated code. But lets
add case BTF_KIND_TYPE_TAG in the switch statement to be future proof.

Fixes: 2dc1e488e5cd ("libbpf: Support BTF_KIND_TYPE_TAG")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211115163937.3922235-1-yhs@fb.com
2021-11-16 13:16:07 -08:00
Ilya Leoshkevich
140b902274 ci: add s390x vmtest
Run it on the self-hosted builder with tag "z15".

Also add the infrastructure code for the self-hosted builder.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2021-11-15 22:39:49 -08:00
Ilya Leoshkevich
1987a34fc9 vmtest: use libguestfs for disk image manipulations
Running vmtest inside a container removes the ability to use certain
root powers, among other things - mounting arbitrary images. Use
libguestfs in order to avoid having to mount anything.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2021-11-15 22:39:49 -08:00
Ilya Leoshkevich
3b1714aa92 vmtest: add s390x blacklist
A lot of tests in test_progs fail due to the missing trampoline
implementation on s390x (and a handful for other reasons). Yet, a lot
more pass, so disabling test_progs altogether is too heavy-handed.

So add a mechanism for arch-specific blacklists (as discussed in [1])
and introduce a s390x blacklist, that simply reflects the status quo.

[1] https://github.com/libbpf/libbpf/pull/204#discussion_r601768628

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2021-11-15 22:39:49 -08:00
Ilya Leoshkevich
554054d876 vmtest: tweak qemu invocation for s390x
We need a different binary and console. Also use a fixed number of
cores in order to avoid OOM in case a builder has too many of them.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2021-11-15 22:39:49 -08:00
Ilya Leoshkevich
26e196d449 vmtest: add s390x image
Generated by simply running mkrootfs_debian.sh.

Also use $ARCH as an image name prefix.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2021-11-15 22:39:49 -08:00
Ilya Leoshkevich
3fac0b3d08 vmtest: add s390x config
Select the current config based on $ARCH value and thus rename
the existing config to config-latest.$ARCH.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2021-11-15 22:39:49 -08:00
Ilya Leoshkevich
ac4a0fa400 vmtest: add debootstrap-based mkrootfs script
The existing mkrootfs.sh is based on Arch Linux, which supports only
x86_64.

Add mkrootfs_debian.sh: a debootstrap-based script. Debian was chosen,
because it supports more architectures than other mainstream distros.
Move init setup to mkrootfs_tweak.sh, rename the existing Arch script
to mkrootfs_arch.sh.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2021-11-15 22:39:49 -08:00
Ilya Leoshkevich
6ad73f5083 vmtest: do not install lld
s390x LLVM does not have it, and it is not needed for the libbpf CI.
So drop it.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2021-11-15 22:39:49 -08:00
Ilya Leoshkevich
8a52e49575 vmtest: use python3-docutils instead of python-docutils
There is no python-docutils on Debian Bullseye, but python3-docutils
exists everywhere. Since Python 2 is EOL anyway, use the Python 3
version.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2021-11-15 22:39:49 -08:00
Andrii Nakryiko
5b9d079c7f sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   b8b5cb55f5d3f03cc1479a3768d68173a10359ad
Checkpoint bpf-next commit: 9faaffbe85edcbdc54096f7f87baa3bc4842a7e2
Baseline bpf commit:        47b3708c6088a60e7dc3b809dbb0d4c46590b32f
Checkpoint bpf commit:      5833291ab6de9c3e2374336b51c814e515e8f3a5

Andrii Nakryiko (11):
  libbpf: Rename DECLARE_LIBBPF_OPTS into LIBBPF_OPTS
  libbpf: Pass number of prog load attempts explicitly
  libbpf: Unify low-level BPF_PROG_LOAD APIs into bpf_prog_load()
  libbpf: Remove internal use of deprecated bpf_prog_load() variants
  libbpf: Stop using to-be-deprecated APIs
  libbpf: Remove deprecation attribute from struct bpf_prog_prep_result
  libbpf: Free up resources used by inner map definition
  libbpf: Add ability to get/set per-program load flags
  libbpf: Turn btf_dedup_opts into OPTS-based struct
  libbpf: Ensure btf_dump__new() and btf_dump_opts are future-proof
  libbpf: Make perf_buffer__new() use OPTS-based interface

Mark Pashmfouroush (1):
  bpf: Add ingress_ifindex to bpf_sk_lookup

Song Liu (1):
  bpf: Introduce helper bpf_find_vma

Yonghong Song (2):
  bpf: Support BTF_KIND_TYPE_TAG for btf_type_tag attributes
  libbpf: Support BTF_KIND_TYPE_TAG

 include/uapi/linux/bpf.h |  21 +++
 include/uapi/linux/btf.h |   3 +-
 src/bpf.c                | 166 +++++++++++++---------
 src/bpf.h                |  74 +++++++++-
 src/bpf_gen_internal.h   |   8 +-
 src/btf.c                |  69 ++++++---
 src/btf.h                |  80 +++++++++--
 src/btf_dump.c           |  40 ++++--
 src/gen_loader.c         |  30 ++--
 src/libbpf.c             | 297 +++++++++++++++++++++++----------------
 src/libbpf.h             |  95 ++++++++++---
 src/libbpf.map           |  13 ++
 src/libbpf_common.h      |  14 +-
 src/libbpf_internal.h    |  33 +----
 src/libbpf_legacy.h      |   1 +
 src/libbpf_probes.c      |  20 ++-
 src/linker.c             |   4 +-
 src/xsk.c                |  34 ++---
 18 files changed, 670 insertions(+), 332 deletions(-)

--
2.30.2
2021-11-12 23:46:09 -08:00
Andrii Nakryiko
bc66d28b68 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2021-11-12 23:46:09 -08:00
Yonghong Song
98181e0546 libbpf: Support BTF_KIND_TYPE_TAG
Add libbpf support for BTF_KIND_TYPE_TAG.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211112012614.1505315-1-yhs@fb.com
2021-11-12 23:46:09 -08:00
Yonghong Song
d932a1a46b bpf: Support BTF_KIND_TYPE_TAG for btf_type_tag attributes
LLVM patches ([1] for clang, [2] and [3] for BPF backend)
added support for btf_type_tag attributes. This patch
added support for the kernel.

The main motivation for btf_type_tag is to bring kernel
annotations __user, __rcu etc. to btf. With such information
available in btf, bpf verifier can detect mis-usages
and reject the program. For example, for __user tagged pointer,
developers can then use proper helper like bpf_probe_read_user()
etc. to read the data.

BTF_KIND_TYPE_TAG may also useful for other tracing
facility where instead of to require user to specify
kernel/user address type, the kernel can detect it
by itself with btf.

  [1] https://reviews.llvm.org/D111199
  [2] https://reviews.llvm.org/D113222
  [3] https://reviews.llvm.org/D113496

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211112012609.1505032-1-yhs@fb.com
2021-11-12 23:46:09 -08:00
Andrii Nakryiko
011a01594c libbpf: Make perf_buffer__new() use OPTS-based interface
Add new variants of perf_buffer__new() and perf_buffer__new_raw() that
use OPTS-based options for future extensibility ([0]). Given all the
currently used API names are best fits, re-use them and use
___libbpf_override() approach and symbol versioning to preserve ABI and
source code compatibility. struct perf_buffer_opts and struct
perf_buffer_raw_opts are kept as well, but they are restructured such
that they are OPTS-based when used with new APIs. For struct
perf_buffer_raw_opts we keep few fields intact, so we have to also
preserve the memory location of them both when used as OPTS and for
legacy API variants. This is achieved with anonymous padding for OPTS
"incarnation" of the struct.  These pads can be eventually used for new
options.

  [0] Closes: https://github.com/libbpf/libbpf/issues/311

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211111053624.190580-6-andrii@kernel.org
2021-11-12 23:46:09 -08:00
Andrii Nakryiko
b9a88a4533 libbpf: Ensure btf_dump__new() and btf_dump_opts are future-proof
Change btf_dump__new() and corresponding struct btf_dump_ops structure
to be extensible by using OPTS "framework" ([0]). Given we don't change
the names, we use a similar approach as with bpf_prog_load(), but this
time we ended up with two APIs with the same name and same number of
arguments, so overloading based on number of arguments with
___libbpf_override() doesn't work.

Instead, use "overloading" based on types. In this particular case,
print callback has to be specified, so we detect which argument is
a callback. If it's 4th (last) argument, old implementation of API is
used by user code. If not, it must be 2nd, and thus new implementation
is selected. The rest is handled by the same symbol versioning approach.

btf_ext argument is dropped as it was never used and isn't necessary
either. If in the future we'll need btf_ext, that will be added into
OPTS-based struct btf_dump_opts.

struct btf_dump_opts is reused for both old API and new APIs. ctx field
is marked deprecated in v0.7+ and it's put at the same memory location
as OPTS's sz field. Any user of new-style btf_dump__new() will have to
set sz field and doesn't/shouldn't use ctx, as ctx is now passed along
the callback as mandatory input argument, following the other APIs in
libbpf that accept callbacks consistently.

Again, this is quite ugly in implementation, but is done in the name of
backwards compatibility and uniform and extensible future APIs (at the
same time, sigh). And it will be gone in libbpf 1.0.

  [0] Closes: https://github.com/libbpf/libbpf/issues/283

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211111053624.190580-5-andrii@kernel.org
2021-11-12 23:46:09 -08:00
Andrii Nakryiko
969018545d libbpf: Turn btf_dedup_opts into OPTS-based struct
btf__dedup() and struct btf_dedup_opts were added before we figured out
OPTS mechanism. As such, btf_dedup_opts is non-extensible without
breaking an ABI and potentially crashing user application.

Unfortunately, btf__dedup() and btf_dedup_opts are short and succinct
names that would be great to preserve and use going forward. So we use
___libbpf_override() macro approach, used previously for bpf_prog_load()
API, to define a new btf__dedup() variant that accepts only struct btf *
and struct btf_dedup_opts * arguments, and rename the old btf__dedup()
implementation into btf__dedup_deprecated(). This keeps both source and
binary compatibility with old and new applications.

The biggest problem was struct btf_dedup_opts, which wasn't OPTS-based,
and as such doesn't have `size_t sz;` as a first field. But btf__dedup()
is a pretty rarely used API and I believe that the only currently known
users (besides selftests) are libbpf's own bpf_linker and pahole.
Neither use case actually uses options and just passes NULL. So instead
of doing extra hacks, just rewrite struct btf_dedup_opts into OPTS-based
one, move btf_ext argument into those opts (only bpf_linker needs to
dedup btf_ext, so it's not a typical thing to specify), and drop never
used `dont_resolve_fwds` option (it was never used anywhere, AFAIK, it
makes BTF dedup much less useful and efficient).

Just in case, for old implementation, btf__dedup_deprecated(), detect
non-NULL options and error out with helpful message, to help users
migrate, if there are any user playing with btf__dedup().

The last remaining piece is dedup_table_size, which is another
anachronism from very early days of BTF dedup. Since then it has been
reduced to the only valid value, 1, to request forced hash collisions.
This is only used during testing. So instead introduce a bool flag to
force collisions explicitly.

This patch also adapts selftests to new btf__dedup() and btf_dedup_opts
use to avoid selftests breakage.

  [0] Closes: https://github.com/libbpf/libbpf/issues/281

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211111053624.190580-4-andrii@kernel.org
2021-11-12 23:46:09 -08:00
Andrii Nakryiko
0e80b7dc3f libbpf: Add ability to get/set per-program load flags
Add bpf_program__flags() API to retrieve prog_flags that will be (or
were) supplied to BPF_PROG_LOAD command.

Also add bpf_program__set_extra_flags() API to allow to set *extra*
flags, in addition to those determined by program's SEC() definition.
Such flags are logically OR'ed with libbpf-derived flags.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211111051758.92283-2-andrii@kernel.org
2021-11-12 23:46:09 -08:00
Mark Pashmfouroush
932800b20b bpf: Add ingress_ifindex to bpf_sk_lookup
It may be helpful to have access to the ifindex during bpf socket
lookup. An example may be to scope certain socket lookup logic to
specific interfaces, i.e. an interface may be made exempt from custom
lookup code.

Add the ifindex of the arriving connection to the bpf_sk_lookup API.

Signed-off-by: Mark Pashmfouroush <markpash@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211110111016.5670-2-markpash@cloudflare.com
2021-11-12 23:46:09 -08:00
Song Liu
cfc69268e5 bpf: Introduce helper bpf_find_vma
In some profiler use cases, it is necessary to map an address to the
backing file, e.g., a shared library. bpf_find_vma helper provides a
flexible way to achieve this. bpf_find_vma maps an address of a task to
the vma (vm_area_struct) for this address, and feed the vma to an callback
BPF function. The callback function is necessary here, as we need to
ensure mmap_sem is unlocked.

It is necessary to lock mmap_sem for find_vma. To lock and unlock mmap_sem
safely when irqs are disable, we use the same mechanism as stackmap with
build_id. Specifically, when irqs are disabled, the unlocked is postponed
in an irq_work. Refactor stackmap.c so that the irq_work is shared among
bpf_find_vma and stackmap helpers.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20211105232330.1936330-2-songliubraving@fb.com
2021-11-12 23:46:09 -08:00
Andrii Nakryiko
9b2bbdefd5 libbpf: Free up resources used by inner map definition
It's not enough to just free(map->inner_map), as inner_map itself can
have extra memory allocated, like map name.

Fixes: 646f02ffdd49 ("libbpf: Add BTF-defined map-in-map support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com>
Link: https://lore.kernel.org/bpf/20211107165521.9240-3-andrii@kernel.org
2021-11-12 23:46:09 -08:00
Andrii Nakryiko
c7236a5342 libbpf: Remove deprecation attribute from struct bpf_prog_prep_result
This deprecation annotation has no effect because for struct deprecation
attribute has to be declared after struct definition. But instead of
moving it to the end of struct definition, remove it. When deprecation
will go in effect at libbpf v0.7, this deprecation attribute will cause
libbpf's own source code compilation to trigger deprecation warnings,
which is unavoidable because libbpf still has to support that API.

So keep deprecation of APIs, but don't mark structs used in API as
deprecated.

Fixes: e21d585cb3db ("libbpf: Deprecate multi-instance bpf_program APIs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20211103220845.2676888-8-andrii@kernel.org
2021-11-12 23:46:09 -08:00
Andrii Nakryiko
a611094604 libbpf: Stop using to-be-deprecated APIs
Remove all the internal uses of libbpf APIs that are slated to be
deprecated in v0.7.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211103220845.2676888-6-andrii@kernel.org
2021-11-12 23:46:09 -08:00
Andrii Nakryiko
9b422137af libbpf: Remove internal use of deprecated bpf_prog_load() variants
Remove all the internal uses of bpf_load_program_xattr(), which is
slated for deprecation in v0.7.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211103220845.2676888-5-andrii@kernel.org
2021-11-12 23:46:09 -08:00
Andrii Nakryiko
65cdd0c73d libbpf: Unify low-level BPF_PROG_LOAD APIs into bpf_prog_load()
Add a new unified OPTS-based low-level API for program loading,
bpf_prog_load() ([0]).  bpf_prog_load() accepts few "mandatory"
parameters as input arguments (program type, name, license,
instructions) and all the other optional (as in not required to specify
for all types of BPF programs) fields into struct bpf_prog_load_opts.

This makes all the other non-extensible APIs variant for BPF_PROG_LOAD
obsolete and they are slated for deprecation in libbpf v0.7:
  - bpf_load_program();
  - bpf_load_program_xattr();
  - bpf_verify_program().

Implementation-wise, internal helper libbpf__bpf_prog_load is refactored
to become a public bpf_prog_load() API. struct bpf_prog_load_params used
internally is replaced by public struct bpf_prog_load_opts.

Unfortunately, while conceptually all this is pretty straightforward,
the biggest complication comes from the already existing bpf_prog_load()
*high-level* API, which has nothing to do with BPF_PROG_LOAD command.

We try really hard to have a new API named bpf_prog_load(), though,
because it maps naturally to BPF_PROG_LOAD command.

For that, we rename old bpf_prog_load() into bpf_prog_load_deprecated()
and mark it as COMPAT_VERSION() for shared library users compiled
against old version of libbpf. Statically linked users and shared lib
users compiled against new version of libbpf headers will get "rerouted"
to bpf_prog_deprecated() through a macro helper that decides whether to
use new or old bpf_prog_load() based on number of input arguments (see
___libbpf_overload in libbpf_common.h).

To test that existing
bpf_prog_load()-using code compiles and works as expected, I've compiled
and ran selftests as is. I had to remove (locally) selftest/bpf/Makefile
-Dbpf_prog_load=bpf_prog_test_load hack because it was conflicting with
the macro-based overload approach. I don't expect anyone else to do
something like this in practice, though. This is testing-specific way to
replace bpf_prog_load() calls with special testing variant of it, which
adds extra prog_flags value. After testing I kept this selftests hack,
but ensured that we use a new bpf_prog_load_deprecated name for this.

This patch also marks bpf_prog_load() and bpf_prog_load_xattr() as deprecated.
bpf_object interface has to be used for working with struct bpf_program.
Libbpf doesn't support loading just a bpf_program.

The silver lining is that when we get to libbpf 1.0 all these
complication will be gone and we'll have one clean bpf_prog_load()
low-level API with no backwards compatibility hackery surrounding it.

  [0] Closes: https://github.com/libbpf/libbpf/issues/284

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211103220845.2676888-4-andrii@kernel.org
2021-11-12 23:46:09 -08:00
Andrii Nakryiko
6b2db898cc libbpf: Pass number of prog load attempts explicitly
Allow to control number of BPF_PROG_LOAD attempts from outside the
sys_bpf_prog_load() helper.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20211103220845.2676888-3-andrii@kernel.org
2021-11-12 23:46:09 -08:00
Andrii Nakryiko
ea6c242fc6 libbpf: Rename DECLARE_LIBBPF_OPTS into LIBBPF_OPTS
It's confusing that libbpf-provided helper macro doesn't start with
LIBBPF. Also "declare" vs "define" is confusing terminology, I can never
remember and always have to look up previous examples.

Bypass both issues by renaming DECLARE_LIBBPF_OPTS into a short and
clean LIBBPF_OPTS. To avoid breaking existing code, provide:

  #define DECLARE_LIBBPF_OPTS LIBBPF_OPTS

in libbpf_legacy.h. We can decide later if we ever want to remove it or
we'll keep it forever because it doesn't add any maintainability burden.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20211103220845.2676888-2-andrii@kernel.org
2021-11-12 23:46:09 -08:00
130 changed files with 114979 additions and 90998 deletions

1
.gitattributes vendored Normal file
View File

@@ -0,0 +1 @@
assets/** export-ignore

3
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,3 @@
Thank you for considering a contribution!
Please note that the `libbpf` authoritative source code is developed as part of bpf-next Linux source tree under tools/lib/bpf subdirectory and is periodically synced to Github. As such, all the libbpf changes should be sent to BPF mailing list, please don't open PRs here unless you are changing Github-specific parts of libbpf (e.g., Github-specific Makefile).

View File

@@ -0,0 +1,31 @@
name: 'build-selftests'
description: 'Build BPF selftests'
inputs:
repo-path:
description: 'where is the source code'
required: true
kernel:
description: 'kernel version or LATEST'
required: true
default: 'LATEST'
vmlinux:
description: 'where is vmlinux file'
required: true
default: '${{ github.workspace }}/vmlinux'
runs:
using: "composite"
steps:
- shell: bash
run: |
source $GITHUB_ACTION_PATH/../../../ci/vmtest/helpers.sh
foldable start "Setup Env"
sudo apt-get install -y qemu-kvm zstd binutils-dev elfutils libcap-dev libelf-dev libdw-dev python3-docutils
foldable end
- shell: bash
run: |
export KERNEL=${{ inputs.kernel }}
export REPO_ROOT="${{ github.workspace }}"
export REPO_PATH="${{ inputs.repo-path }}"
export VMLINUX_BTF="${{ inputs.vmlinux }}"
${{ github.action_path }}/build_selftests.sh

View File

@@ -0,0 +1,60 @@
#!/bin/bash
set -euo pipefail
THISDIR="$(cd $(dirname $0) && pwd)"
source ${THISDIR}/helpers.sh
foldable start prepare_selftests "Building selftests"
LIBBPF_PATH="${REPO_ROOT}"
llvm_default_version() {
echo "16"
}
llvm_latest_version() {
echo "17"
}
LLVM_VERSION=$(llvm_default_version)
if [[ "${LLVM_VERSION}" == $(llvm_latest_version) ]]; then
REPO_DISTRO_SUFFIX=""
else
REPO_DISTRO_SUFFIX="-${LLVM_VERSION}"
fi
echo "deb https://apt.llvm.org/focal/ llvm-toolchain-focal${REPO_DISTRO_SUFFIX} main" \
| sudo tee /etc/apt/sources.list.d/llvm.list
PREPARE_SELFTESTS_SCRIPT=${THISDIR}/prepare_selftests-${KERNEL}.sh
if [ -f "${PREPARE_SELFTESTS_SCRIPT}" ]; then
(cd "${REPO_ROOT}/${REPO_PATH}/tools/testing/selftests/bpf" && ${PREPARE_SELFTESTS_SCRIPT})
fi
if [[ "${KERNEL}" = 'LATEST' ]]; then
VMLINUX_H=
else
VMLINUX_H=${THISDIR}/vmlinux.h
fi
cd ${REPO_ROOT}/${REPO_PATH}
make headers
make \
CLANG=clang-${LLVM_VERSION} \
LLC=llc-${LLVM_VERSION} \
LLVM_STRIP=llvm-strip-${LLVM_VERSION} \
VMLINUX_BTF="${VMLINUX_BTF}" \
VMLINUX_H=${VMLINUX_H} \
-C "${REPO_ROOT}/${REPO_PATH}/tools/testing/selftests/bpf" \
-j $((4*$(nproc))) > /dev/null
cd -
mkdir ${LIBBPF_PATH}/selftests
cp -R "${REPO_ROOT}/${REPO_PATH}/tools/testing/selftests/bpf" \
${LIBBPF_PATH}/selftests
cd ${LIBBPF_PATH}
rm selftests/bpf/.gitignore
git add selftests
foldable end prepare_selftests

View File

@@ -0,0 +1,38 @@
# shellcheck shell=bash
# $1 - start or end
# $2 - fold identifier, no spaces
# $3 - fold section description
foldable() {
local YELLOW='\033[1;33m'
local NOCOLOR='\033[0m'
if [ $1 = "start" ]; then
line="::group::$2"
if [ ! -z "${3:-}" ]; then
line="$line - ${YELLOW}$3${NOCOLOR}"
fi
else
line="::endgroup::"
fi
echo -e "$line"
}
__print() {
local TITLE=""
if [[ -n $2 ]]; then
TITLE=" title=$2"
fi
echo "::$1${TITLE}::$3"
}
# $1 - title
# $2 - message
print_error() {
__print error $1 $2
}
# $1 - title
# $2 - message
print_notice() {
__print notice $1 $2
}

View File

@@ -1,3 +1,5 @@
#!/bin/bash
printf "all:\n\ttouch bpf_testmod.ko\n\nclean:\n" > bpf_testmod/Makefile
printf "all:\n\ttouch bpf_test_no_cfi.ko\n\nclean:\n" > bpf_test_no_cfi/Makefile

View File

@@ -1,3 +1,5 @@
#!/bin/bash
printf "all:\n\ttouch bpf_testmod.ko\n\nclean:\n" > bpf_testmod/Makefile
printf "all:\n\ttouch bpf_test_no_cfi.ko\n\nclean:\n" > bpf_test_no_cfi/Makefile

File diff suppressed because it is too large Load Diff

View File

@@ -6,17 +6,17 @@ runs:
- id: variables
run: |
export REPO_ROOT=$GITHUB_WORKSPACE
export CI_ROOT=$REPO_ROOT/travis-ci
export CI_ROOT=$REPO_ROOT/ci
# this is somewhat ugly, but that is the easiest way to share this code with
# arch specific docker
echo 'echo ::group::Env setup' > /tmp/ci_setup
echo export DEBIAN_FRONTEND=noninteractive >> /tmp/ci_setup
echo sudo apt-get update >> /tmp/ci_setup
echo sudo apt-get install -y aptitude qemu-kvm zstd binutils-dev elfutils libcap-dev libelf-dev libdw-dev >> /tmp/ci_setup
echo sudo apt-get install -y aptitude qemu-kvm zstd binutils-dev elfutils libcap-dev libelf-dev libdw-dev libguestfs-tools >> /tmp/ci_setup
echo export PROJECT_NAME='libbpf' >> /tmp/ci_setup
echo export AUTHOR_EMAIL="$(git log -1 --pretty=\"%aE\")" >> /tmp/ci_setup
echo export REPO_ROOT=$GITHUB_WORKSPACE >> /tmp/ci_setup
echo export CI_ROOT=$REPO_ROOT/travis-ci >> /tmp/ci_setup
echo export CI_ROOT=$REPO_ROOT/ci >> /tmp/ci_setup
echo export VMTEST_ROOT=$CI_ROOT/vmtest >> /tmp/ci_setup
echo 'echo ::endgroup::' >> /tmp/ci_setup
shell: bash

View File

@@ -5,31 +5,114 @@ inputs:
description: 'kernel version or LATEST'
required: true
default: 'LATEST'
kernel-rev:
description: 'CHECKPOINT or rev/tag/branch'
arch:
description: 'what arch to test'
required: true
default: 'CHECKPOINT'
kernel-origin:
description: 'kernel repo'
required: true
default: 'https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git'
default: 'x86_64'
pahole:
description: 'pahole rev/tag/branch'
description: 'pahole rev or master'
required: true
default: 'master'
pahole-origin:
description: 'pahole repo'
required: true
default: 'https://git.kernel.org/pub/scm/devel/pahole/pahole.git'
runs:
using: "composite"
steps:
- run: |
source /tmp/ci_setup
export KERNEL=${{ inputs.kernel }}
export KERNEL_BRANCH=${{ inputs.kernel-rev }}
export KERNEL_ORIGIN=${{ inputs.kernel-origin }}
export PAHOLE_BRANCH=${{ inputs.pahole }}
export PAHOLE_ORIGIN=${{ inputs.pahole-origin }}
$CI_ROOT/vmtest/run_vmtest.sh
# Allow CI user to access /dev/kvm (via qemu) w/o group change/relogin
# by changing permissions set by udev.
- name: Set /dev/kvm permissions
shell: bash
run: |
if [ -e /dev/kvm ]; then
echo "/dev/kvm exists"
if [ $(id -u) != 0 ]; then
echo 'KERNEL=="kvm", GROUP="kvm", MODE="0666", OPTIONS+="static_node=kvm"' \
| sudo tee /etc/udev/rules.d/99-kvm4all.rules > /dev/null
sudo udevadm control --reload-rules
sudo udevadm trigger --name-match=kvm
fi
else
echo "/dev/kvm does not exist"
fi
# setup environment
- name: Setup environment
uses: libbpf/ci/setup-build-env@main
with:
pahole: ${{ inputs.pahole }}
arch: ${{ inputs.arch }}
# 1. download CHECKPOINT kernel source
- name: Get checkpoint commit
shell: bash
run: |
cat CHECKPOINT-COMMIT
echo "CHECKPOINT=$(cat CHECKPOINT-COMMIT)" >> $GITHUB_ENV
- name: Get kernel source at checkpoint
uses: libbpf/ci/get-linux-source@main
with:
repo: 'https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git'
rev: ${{ env.CHECKPOINT }}
dest: '${{ github.workspace }}/.kernel'
- name: Patch kernel source
uses: libbpf/ci/patch-kernel@main
with:
patches-root: '${{ github.workspace }}/ci/diffs'
repo-root: '.kernel'
- name: Prepare to build BPF selftests
shell: bash
run: |
source $GITHUB_ACTION_PATH/../../../ci/vmtest/helpers.sh
foldable start "Prepare building selftest"
cd .kernel
cat tools/testing/selftests/bpf/config \
tools/testing/selftests/bpf/config.${{ inputs.arch }} > .config
# this file might or mihgt not exist depending on kernel version
cat tools/testing/selftests/bpf/config.vm >> .config || :
make olddefconfig && make prepare
cd -
foldable end
# 2. if kernel == LATEST, build kernel image from tree
- name: Build kernel image
if: ${{ inputs.kernel == 'LATEST' }}
shell: bash
run: |
source $GITHUB_ACTION_PATH/../../../ci/vmtest/helpers.sh
foldable start "Build Kernel Image"
cd .kernel
make -j $((4*$(nproc))) all > /dev/null
cp vmlinux ${{ github.workspace }}
cd -
foldable end
# else, just download prebuilt kernel image
- name: Download prebuilt kernel
if: ${{ inputs.kernel != 'LATEST' }}
uses: libbpf/ci/download-vmlinux@main
with:
kernel: ${{ inputs.kernel }}
arch: ${{ inputs.arch }}
# 3. build selftests
- name: Build BPF selftests
uses: ./.github/actions/build-selftests
with:
repo-path: '.kernel'
kernel: ${{ inputs.kernel }}
# 4. prepare rootfs
- name: prepare rootfs
uses: libbpf/ci/prepare-rootfs@main
env:
KBUILD_OUTPUT: '.kernel'
with:
project-name: 'libbpf'
arch: ${{ inputs.arch }}
kernel: ${{ inputs.kernel }}
kernel-root: '.kernel'
kbuild-output: ${{ env.KBUILD_OUTPUT }}
image-output: '/tmp/root.img'
# 5. run selftest in QEMU
- name: Run selftests
env:
KERNEL: ${{ inputs.kernel }}
REPO_ROOT: ${{ github.workspace }}
uses: libbpf/ci/run-qemu@main
with:
arch: ${{ inputs.arch }}
img: '/tmp/root.img'
vmlinuz: 'vmlinuz'
kernel-root: '.kernel'

91
.github/workflows/build.yml vendored Normal file
View File

@@ -0,0 +1,91 @@
name: libbpf-build
on:
pull_request:
push:
schedule:
- cron: '0 18 * * *'
concurrency:
group: ci-build-${{ github.head_ref }}
cancel-in-progress: true
jobs:
debian:
runs-on: ubuntu-latest
name: Debian Build (${{ matrix.name }})
strategy:
fail-fast: false
matrix:
include:
- name: default
target: RUN
- name: ASan+UBSan
target: RUN_ASAN
- name: clang ASan+UBSan
target: RUN_CLANG_ASAN
- name: gcc-10 ASan+UBSan
target: RUN_GCC10_ASAN
- name: clang
target: RUN_CLANG
- name: clang-14
target: RUN_CLANG14
- name: clang-15
target: RUN_CLANG15
- name: clang-16
target: RUN_CLANG16
- name: gcc-10
target: RUN_GCC10
- name: gcc-11
target: RUN_GCC11
- name: gcc-12
target: RUN_GCC12
steps:
- uses: actions/checkout@v4
name: Checkout
- uses: ./.github/actions/setup
name: Setup
- uses: ./.github/actions/debian
name: Build
with:
target: ${{ matrix.target }}
ubuntu:
runs-on: ubuntu-latest
name: Ubuntu Focal Build (${{ matrix.arch }})
strategy:
fail-fast: false
matrix:
include:
- arch: aarch64
- arch: ppc64le
- arch: s390x
- arch: x86
steps:
- uses: actions/checkout@v4
name: Checkout
- uses: ./.github/actions/setup
name: Pre-Setup
- run: source /tmp/ci_setup && sudo -E $CI_ROOT/managers/ubuntu.sh
if: matrix.arch == 'x86'
name: Setup
- uses: uraimo/run-on-arch-action@v2.7.1
name: Build in docker
if: matrix.arch != 'x86'
with:
distro:
ubuntu20.04
arch:
${{ matrix.arch }}
setup:
cp /tmp/ci_setup $GITHUB_WORKSPACE
dockerRunArgs: |
--volume "${GITHUB_WORKSPACE}:${GITHUB_WORKSPACE}"
shell: /bin/bash
install: |
export DEBIAN_FRONTEND=noninteractive
export TZ="America/Los_Angeles"
apt-get update -y
apt-get install -y tzdata build-essential sudo
run: source ${GITHUB_WORKSPACE}/ci_setup && $CI_ROOT/managers/ubuntu.sh

52
.github/workflows/codeql.yml vendored Normal file
View File

@@ -0,0 +1,52 @@
---
# vi: ts=2 sw=2 et:
name: "CodeQL"
on:
push:
branches:
- master
pull_request:
branches:
- master
permissions:
contents: read
jobs:
analyze:
name: Analyze
runs-on: ubuntu-22.04
concurrency:
group: ${{ github.workflow }}-${{ matrix.language }}-${{ github.ref }}
cancel-in-progress: true
permissions:
actions: read
security-events: write
strategy:
fail-fast: false
matrix:
language: ['cpp', 'python']
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
queries: +security-extended,security-and-quality
- name: Setup
uses: ./.github/actions/setup
- name: Build
run: |
source /tmp/ci_setup
make -C ./src
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2

View File

@@ -11,16 +11,17 @@ jobs:
if: github.repository == 'libbpf/libbpf'
name: Coverity
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- uses: ./.github/actions/setup
- name: Run coverity
run: |
echo ::group::Setup CI env
source "${GITHUB_WORKSPACE}"/ci/vmtest/helpers.sh
foldable start "Setup CI env"
source /tmp/ci_setup
export COVERITY_SCAN_NOTIFICATION_EMAIL="${AUTHOR_EMAIL}"
export COVERITY_SCAN_BRANCH_PATTERN=${GITHUB_REF##refs/*/}
export TRAVIS_BRANCH=${COVERITY_SCAN_BRANCH_PATTERN}
echo ::endgroup::
foldable end
scripts/coverity.sh
env:
COVERITY_SCAN_TOKEN: ${{ secrets.COVERITY_SCAN_TOKEN }}

19
.github/workflows/lint.yml vendored Normal file
View File

@@ -0,0 +1,19 @@
name: "lint"
on:
pull_request:
push:
branches:
- master
jobs:
shellcheck:
name: ShellCheck
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Run ShellCheck
uses: ludeeus/action-shellcheck@master
env:
SHELLCHECK_OPTS: --severity=error

View File

@@ -25,7 +25,7 @@ jobs:
runs-on: ubuntu-latest
name: vmtest with customized pahole/Kernel
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- uses: ./.github/actions/setup
- uses: ./.github/actions/vmtest
with:

View File

@@ -7,12 +7,12 @@ on:
jobs:
vmtest:
runs-on: ubuntu-latest
runs-on: ubuntu-20.04
name: Kernel LATEST + staging pahole
env:
STAGING: tmp.master
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- uses: ./.github/actions/setup
- uses: ./.github/actions/vmtest
with:

View File

@@ -12,17 +12,26 @@ concurrency:
jobs:
vmtest:
runs-on: ubuntu-latest
name: Kernel ${{ matrix.kernel }} + selftests
runs-on: ${{ matrix.runs_on }}
name: Kernel ${{ matrix.kernel }} on ${{ matrix.runs_on }} + selftests
strategy:
fail-fast: false
matrix:
include:
- kernel: 'LATEST'
runs_on: ubuntu-20.04
arch: 'x86_64'
- kernel: '5.5.0'
runs_on: ubuntu-20.04
arch: 'x86_64'
- kernel: '4.9.0'
runs_on: ubuntu-20.04
arch: 'x86_64'
- kernel: 'LATEST'
runs_on: s390x
arch: 's390x'
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
name: Checkout
- uses: ./.github/actions/setup
name: Setup
@@ -30,71 +39,4 @@ jobs:
name: vmtest
with:
kernel: ${{ matrix.kernel }}
debian:
runs-on: ubuntu-latest
name: Debian Build (${{ matrix.name }})
strategy:
fail-fast: false
matrix:
include:
- name: default
target: RUN
- name: ASan+UBSan
target: RUN_ASAN
- name: clang
target: RUN_CLANG
- name: clang ASan+UBSan
target: RUN_CLANG_ASAN
- name: gcc-10
target: RUN_GCC10
- name: gcc-10 ASan+UBSan
target: RUN_GCC10_ASAN
steps:
- uses: actions/checkout@v2
name: Checkout
- uses: ./.github/actions/setup
name: Setup
- uses: ./.github/actions/debian
name: Build
with:
target: ${{ matrix.target }}
ubuntu:
runs-on: ubuntu-latest
name: Ubuntu Focal Build (${{ matrix.arch }})
strategy:
fail-fast: false
matrix:
include:
- arch: aarch64
- arch: ppc64le
- arch: s390x
- arch: x86
steps:
- uses: actions/checkout@v2
name: Checkout
- uses: ./.github/actions/setup
name: Pre-Setup
- run: source /tmp/ci_setup && sudo -E $CI_ROOT/managers/ubuntu.sh
if: matrix.arch == 'x86'
name: Setup
- uses: uraimo/run-on-arch-action@v2.0.5
name: Build in docker
if: matrix.arch != 'x86'
with:
distro:
ubuntu20.04
arch:
${{ matrix.arch }}
setup:
cp /tmp/ci_setup $GITHUB_WORKSPACE
dockerRunArgs: |
--volume "${GITHUB_WORKSPACE}:${GITHUB_WORKSPACE}"
shell: /bin/bash
install: |
export DEBIAN_FRONTEND=noninteractive
export TZ="America/Los_Angeles"
apt-get update -y
apt-get install -y tzdata build-essential sudo
run: source ${GITHUB_WORKSPACE}/ci_setup && $CI_ROOT/managers/ubuntu.sh
arch: ${{ matrix.arch }}

View File

@@ -1,14 +0,0 @@
# vi: set ts=2 sw=2:
extraction:
cpp:
prepare:
packages:
- libelf-dev
- pkg-config
after_prepare:
# As the buildsystem detection by LGTM is performed _only_ during the
# 'configure' phase, we need to trick LGTM we use a supported build
# system (configure, meson, cmake, etc.). This way LGTM correctly detects
# that our sources are in the src/ subfolder.
- touch src/configure
- chmod +x src/configure

17
.mailmap Normal file
View File

@@ -0,0 +1,17 @@
Alexei Starovoitov <ast@kernel.org> <alexei.starovoitov@gmail.com>
Antoine Tenart <atenart@kernel.org> <antoine.tenart@bootlin.com>
Benjamin Tissoires <bentiss@kernel.org> <benjamin.tissoires@redhat.com>
Björn Töpel <bjorn@kernel.org> <bjorn.topel@intel.com>
Changbin Du <changbin.du@intel.com> <changbin.du@gmail.com>
Colin Ian King <colin.i.king@gmail.com> <colin.king@canonical.com>
Dan Carpenter <error27@gmail.com> <dan.carpenter@oracle.com>
Geliang Tang <geliang@kernel.org> <geliang.tang@suse.com>
Herbert Xu <herbert@gondor.apana.org.au>
Jakub Kicinski <kuba@kernel.org> <jakub.kicinski@netronome.com>
Leo Yan <leo.yan@linux.dev> <leo.yan@linaro.org>
Mark Starovoytov <mstarovo@pm.me> <mstarovoitov@marvell.com>
Maxim Mikityanskiy <maxtram95@gmail.com> <maximmi@mellanox.com>
Maxim Mikityanskiy <maxtram95@gmail.com> <maximmi@nvidia.com>
Quentin Monnet <qmo@kernel.org> <quentin@isovalent.com>
Quentin Monnet <qmo@kernel.org> <quentin.monnet@netronome.com>
Vadim Fedorenko <vadim.fedorenko@linux.dev> <vfedorenko@novek.ru>

View File

@@ -5,13 +5,22 @@
# Required
version: 2
build:
os: "ubuntu-22.04"
tools:
python: "3.11"
# Build documentation in the docs/ directory with Sphinx
sphinx:
builder: html
configuration: docs/conf.py
formats:
- htmlzip
- pdf
- epub
# Optionally set the version of Python and requirements required to build your docs
python:
version: 3.7
install:
- requirements: docs/sphinx/requirements.txt
- requirements: docs/sphinx/requirements.txt

View File

@@ -1,130 +0,0 @@
sudo: required
language: bash
dist: focal
services:
- docker
env:
global:
- PROJECT_NAME='libbpf'
- AUTHOR_EMAIL="$(git log -1 --pretty=\"%aE\")"
- REPO_ROOT="$TRAVIS_BUILD_DIR"
- CI_ROOT="$REPO_ROOT/travis-ci"
- VMTEST_ROOT="$CI_ROOT/vmtest"
addons:
apt:
packages:
- qemu-kvm
- zstd
- binutils-dev
- elfutils
- libcap-dev
- libelf-dev
- libdw-dev
stages:
# Run Coverity periodically instead of for each PR for following reasons:
# 1) Coverity jobs are heavily rate-limited
# 2) Due to security restrictions of encrypted environment variables
# in Travis CI, pull requests made from forks can't access encrypted
# env variables, making Coverity unusable
# See: https://docs.travis-ci.com/user/pull-requests#pull-requests-and-security-restrictions
- name: Coverity
if: type = cron
jobs:
include:
- stage: Builds & Tests
name: Kernel 5.5.0 + selftests
language: bash
env: KERNEL=5.5.0
script: $CI_ROOT/vmtest/run_vmtest.sh || travis_terminate 1
- name: Kernel LATEST + selftests
language: bash
env: KERNEL=LATEST
script: $CI_ROOT/vmtest/run_vmtest.sh || travis_terminate 1
- name: Kernel 4.9.0 + selftests
language: bash
env: KERNEL=4.9.0
script: $CI_ROOT/vmtest/run_vmtest.sh || travis_terminate 1
- name: Debian Build
language: bash
install: $CI_ROOT/managers/debian.sh SETUP
script: $CI_ROOT/managers/debian.sh RUN || travis_terminate 1
after_script: $CI_ROOT/managers/debian.sh CLEANUP
- name: Debian Build (ASan+UBSan)
language: bash
install: $CI_ROOT/managers/debian.sh SETUP
script: $CI_ROOT/managers/debian.sh RUN_ASAN || travis_terminate 1
after_script: $CI_ROOT/managers/debian.sh CLEANUP
- name: Debian Build (clang)
language: bash
install: $CI_ROOT/managers/debian.sh SETUP
script: $CI_ROOT/managers/debian.sh RUN_CLANG || travis_terminate 1
after_script: $CI_ROOT/managers/debian.sh CLEANUP
- name: Debian Build (clang ASan+UBSan)
language: bash
install: $CI_ROOT/managers/debian.sh SETUP
script: $CI_ROOT/managers/debian.sh RUN_CLANG_ASAN || travis_terminate 1
after_script: $CI_ROOT/managers/debian.sh CLEANUP
- name: Debian Build (gcc-10)
language: bash
install: $CI_ROOT/managers/debian.sh SETUP
script: $CI_ROOT/managers/debian.sh RUN_GCC10 || travis_terminate 1
after_script: $CI_ROOT/managers/debian.sh CLEANUP
- name: Debian Build (gcc-10 ASan+UBSan)
language: bash
install: $CI_ROOT/managers/debian.sh SETUP
script: $CI_ROOT/managers/debian.sh RUN_GCC10_ASAN || travis_terminate 1
after_script: $CI_ROOT/managers/debian.sh CLEANUP
- name: Ubuntu Focal Build
language: bash
script: sudo $CI_ROOT/managers/ubuntu.sh || travis_terminate 1
- name: Ubuntu Focal Build (arm)
arch: arm64
language: bash
script: sudo $CI_ROOT/managers/ubuntu.sh || travis_terminate 1
- name: Ubuntu Focal Build (s390x)
arch: s390x
language: bash
script: sudo $CI_ROOT/managers/ubuntu.sh || travis_terminate 1
- name: Ubuntu Focal Build (ppc64le)
arch: ppc64le
language: bash
script: sudo $CI_ROOT/managers/ubuntu.sh || travis_terminate 1
- stage: Coverity
language: bash
env:
# Coverity configuration
# COVERITY_SCAN_TOKEN=xxx
# Encrypted using `travis encrypt --repo libbpf/libbpf COVERITY_SCAN_TOKEN=xxx`
- secure: "I9OsMRHbb82IUivDp+I+w/jEQFOJgBDAqYqf1ollqCM1QhocxMcS9bwIAgfPhdXi2hohV7sRrVMZstahY67FAvJLGxNopi4tAPDIAaIFxgO0yDxMhaTMx5xDfMwlIm2FOP/9gB9BQsd6M7CmoQZgXYwBIv7xd1ooxoQrh2rOK1YrRl7UQu3+c3zPTjDfIYZzR3bFttMqZ9/c4U0v8Ry5IFXrel3hCshndHA1TtttJrUSrILlZcmVc1ch7JIy6zCbCU/2lGv0B/7rWXfF8MT7O9jPtFOhJ1DEcd2zhw2n4j9YT3a8OhtnM61LA6ask632mwCOsxpFLTun7AzuR1Cb5mdPHsxhxnCHcXXARa2mJjem0QG1NhwxwJE8sbRDapojexxCvweYlEN40ofwMDSnj/qNt95XIcrk0tiIhGFx0gVNWvAdmZwx+N4mwGPMTAN0AEOFjpgI+ZdB89m+tL/CbEgE1flc8QxUxJhcp5OhH6yR0z9qYOp0nXIbHsIaCiRvt/7LqFRQfheifztWVz4mdQlCdKS9gcOQ09oKicPevKO1L0Ue3cb7Ug7jOpMs+cdh3XokJtUeYEr1NijMHT9+CTAhhO5RToWXIZRon719z3fwoUBNDREATwVFMlVxqSO/pbYgaKminigYbl785S89YYaZ6E5UvaKRHM6KHKMDszs="
- COVERITY_SCAN_PROJECT_NAME="libbpf"
- COVERITY_SCAN_NOTIFICATION_EMAIL="${AUTHOR_EMAIL}"
- COVERITY_SCAN_BRANCH_PATTERN="$TRAVIS_BRANCH"
# Note: `make -C src/` as a BUILD_COMMAND will not work here
- COVERITY_SCAN_BUILD_COMMAND_PREPEND="cd src/"
- COVERITY_SCAN_BUILD_COMMAND="make"
install:
- sudo echo 'deb-src http://archive.ubuntu.com/ubuntu/ focal main restricted universe multiverse' >>/etc/apt/sources.list
- sudo apt-get update
- sudo apt-get -y build-dep libelf-dev
- sudo apt-get install -y libelf-dev pkg-config
script:
- scripts/coverity.sh || travis_terminate 1
allow_failures:
- env: KERNEL=x.x.x

View File

@@ -1 +1 @@
47b3708c6088a60e7dc3b809dbb0d4c46590b32f
3e9bc0472b910d4115e16e9c2d684c7757cb6c60

View File

@@ -1 +1 @@
b8b5cb55f5d3f03cc1479a3768d68173a10359ad
0737df6de94661ae55fd3343ce9abec32c687e62

View File

@@ -7,7 +7,7 @@ Usage-Guide:
SPDX-License-Identifier: BSD-2-Clause
License-Text:
Copyright (c) <year> <owner> . All rights reserved.
Copyright (c) 2015 The Libbpf Authors. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

116
README.md
View File

@@ -1,17 +1,33 @@
This is a mirror of [bpf-next Linux source
tree](https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next)'s
`tools/lib/bpf` directory plus its supporting header files.
<picture>
<source media="(prefers-color-scheme: dark)" srcset="assets/libbpf-logo-sideways-darkbg.png" width="40%">
<img src="assets/libbpf-logo-sideways.png" width="40%">
</picture>
All the gory details of syncing can be found in `scripts/sync-kernel.sh`
script.
libbpf
[![Github Actions Builds & Tests](https://github.com/libbpf/libbpf/actions/workflows/test.yml/badge.svg)](https://github.com/libbpf/libbpf/actions/workflows/test.yml)
[![Coverity](https://img.shields.io/coverity/scan/18195.svg)](https://scan.coverity.com/projects/libbpf)
[![CodeQL](https://github.com/libbpf/libbpf/workflows/CodeQL/badge.svg?branch=master)](https://github.com/libbpf/libbpf/actions?query=workflow%3ACodeQL+branch%3Amaster)
[![OSS-Fuzz Status](https://oss-fuzz-build-logs.storage.googleapis.com/badges/libbpf.svg)](https://oss-fuzz-build-logs.storage.googleapis.com/index.html#libbpf)
[![Read the Docs](https://readthedocs.org/projects/libbpf/badge/?version=latest)](https://libbpf.readthedocs.io/en/latest/)
======
Some header files in this repo (`include/linux/*.h`) are reduced versions of
their counterpart files at
[bpf-next](https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/)'s
`tools/include/linux/*.h` to make compilation successful.
**This is the official home of the libbpf library.**
BPF/libbpf usage and questions
==============================
*Please use this Github repository for building and packaging libbpf
and when using it in your projects through Git submodule.*
Libbpf *authoritative source code* is developed as part of [bpf-next Linux source
tree](https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next) under
`tools/lib/bpf` subdirectory and is periodically synced to Github. As such, all the
libbpf changes should be sent to [BPF mailing list](http://vger.kernel.org/vger-lists.html#bpf),
please don't open PRs here unless you are changing Github-specific parts of libbpf
(e.g., Github-specific Makefile).
Libbpf and general BPF usage questions
======================================
Libbpf documentation can be found [here](https://libbpf.readthedocs.io/en/latest/api.html).
It's an ongoing effort and has ways to go, but please take a look and consider contributing as well.
Please check out [libbpf-bootstrap](https://github.com/libbpf/libbpf-bootstrap)
and [the companion blog post](https://nakryiko.com/posts/libbpf-bootstrap/) for
@@ -36,12 +52,8 @@ to help you with whatever issue you have. This repository's PRs and issues
should be opened only for dealing with issues pertaining to specific way this
libbpf mirror repo is set up and organized.
Build
[![Github Actions Builds & Tests](https://github.com/libbpf/libbpf/actions/workflows/test.yml/badge.svg)](https://github.com/libbpf/libbpf/actions/workflows/test.yml)
[![Total alerts](https://img.shields.io/lgtm/alerts/g/libbpf/libbpf.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/libbpf/libbpf/alerts/)
[![Coverity](https://img.shields.io/coverity/scan/18195.svg)](https://scan.coverity.com/projects/libbpf)
[![OSS-Fuzz Status](https://oss-fuzz-build-logs.storage.googleapis.com/badges/libbpf.svg)](https://oss-fuzz-build-logs.storage.googleapis.com/index.html#libbpf)
=====
Building libbpf
===============
libelf is an internal dependency of libbpf and thus it is required to link
against and must be installed on the system for applications to work.
pkg-config is used by default to find libelf, and the program called can be
@@ -73,34 +85,6 @@ $ cd src
$ PKG_CONFIG_PATH=/build/root/lib64/pkgconfig DESTDIR=/build/root make install
```
Distributions
=============
Distributions packaging libbpf from this mirror:
- [Fedora](https://src.fedoraproject.org/rpms/libbpf)
- [Gentoo](https://packages.gentoo.org/packages/dev-libs/libbpf)
- [Debian](https://packages.debian.org/source/sid/libbpf)
- [Arch](https://www.archlinux.org/packages/extra/x86_64/libbpf/)
- [Ubuntu](https://packages.ubuntu.com/source/groovy/libbpf)
- [Alpine](https://pkgs.alpinelinux.org/packages?name=libbpf)
Benefits of packaging from the mirror over packaging from kernel sources:
- Consistent versioning across distributions.
- No ties to any specific kernel, transparent handling of older kernels.
Libbpf is designed to be kernel-agnostic and work across multitude of
kernel versions. It has built-in mechanisms to gracefully handle older
kernels, that are missing some of the features, by working around or
gracefully degrading functionality. Thus libbpf is not tied to a specific
kernel version and can/should be packaged and versioned independently.
- Continuous integration testing via
[TravisCI](https://travis-ci.org/libbpf/libbpf).
- Static code analysis via [LGTM](https://lgtm.com/projects/g/libbpf/libbpf)
and [Coverity](https://scan.coverity.com/projects/libbpf).
Package dependencies of libbpf, package names may vary across distros:
- zlib
- libelf
BPF CO-RE (Compile Once Run Everywhere)
=========================================
@@ -154,6 +138,48 @@ use it:
converting some more to both contribute to the BPF community and gain some
more experience with it.
Distributions
=============
Distributions packaging libbpf from this mirror:
- [Fedora](https://src.fedoraproject.org/rpms/libbpf)
- [Gentoo](https://packages.gentoo.org/packages/dev-libs/libbpf)
- [Debian](https://packages.debian.org/source/sid/libbpf)
- [Arch](https://archlinux.org/packages/core/x86_64/libbpf/)
- [Ubuntu](https://packages.ubuntu.com/source/jammy/libbpf)
- [Alpine](https://pkgs.alpinelinux.org/packages?name=libbpf)
Benefits of packaging from the mirror over packaging from kernel sources:
- Consistent versioning across distributions.
- No ties to any specific kernel, transparent handling of older kernels.
Libbpf is designed to be kernel-agnostic and work across multitude of
kernel versions. It has built-in mechanisms to gracefully handle older
kernels, that are missing some of the features, by working around or
gracefully degrading functionality. Thus libbpf is not tied to a specific
kernel version and can/should be packaged and versioned independently.
- Continuous integration testing via
[GitHub Actions](https://github.com/libbpf/libbpf/actions).
- Static code analysis via [LGTM](https://lgtm.com/projects/g/libbpf/libbpf)
and [Coverity](https://scan.coverity.com/projects/libbpf).
Package dependencies of libbpf, package names may vary across distros:
- zlib
- libelf
[![libbpf distro packaging status](https://repology.org/badge/vertical-allrepos/libbpf.svg)](https://repology.org/project/libbpf/versions)
bpf-next to Github sync
=======================
All the gory details of syncing can be found in `scripts/sync-kernel.sh`
script. See [SYNC.md](SYNC.md) for instruction.
Some header files in this repo (`include/linux/*.h`) are reduced versions of
their counterpart files at
[bpf-next](https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/)'s
`tools/include/linux/*.h` to make compilation successful.
License
=======

281
SYNC.md Normal file
View File

@@ -0,0 +1,281 @@
<picture>
<source media="(prefers-color-scheme: dark)" srcset="assets/libbpf-logo-sideways-darkbg.png" width="40%">
<img src="assets/libbpf-logo-sideways.png" width="40%">
</picture>
Libbpf sync
===========
Libbpf *authoritative source code* is developed as part of [bpf-next Linux source
tree](https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next) under
`tools/lib/bpf` subdirectory and is periodically synced to Github.
Most of the mundane mechanical things like bpf and bpf-next tree merge, Git
history transformation, cherry-picking relevant commits, re-generating
auto-generated headers, etc. are taken care by
[sync-kernel.sh script](https://github.com/libbpf/libbpf/blob/master/scripts/sync-kernel.sh).
But occasionally human needs to do few extra things to make everything work
nicely.
This document goes over the process of syncing libbpf sources from Linux repo
to this Github repository. Feel free to contribute fixes and additions if you
run into new problems not outlined here.
Setup expectations
------------------
Sync script has particular expectation of upstream Linux repo setup. It
expects that current HEAD of that repo points to bpf-next's master branch and
that there is a separate local branch pointing to bpf tree's master branch.
This is important, as the script will automatically merge their histories for
the purpose of libbpf sync.
Below, we assume that Linux repo is located at `~/linux`, it's current head is
at latest `bpf-next/master`, and libbpf's Github repo is located at
`~/libbpf`, checked out to latest commit on `master` branch. It doesn't matter
from where to run `sync-kernel.sh` script, but we'll be running it from inside
`~/libbpf`.
```
$ cd ~/linux && git remote -v | grep -E '^(bpf|bpf-next)'
bpf https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git (fetch)
bpf ssh://git@gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
(push)
bpf-next
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git (fetch)
bpf-next
ssh://git@gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git (push)
$ git branch -vv | grep -E '^? (master|bpf-master)'
* bpf-master 2d311f480b52 [bpf/master] riscv, bpf: Fix patch_text implicit declaration
master c8ee37bde402 [bpf-next/master] libbpf: Fix bpf_xdp_query() in old kernels
$ git checkout bpf-master && git pull && git checkout master && git pull
...
$ git log --oneline -n1
c8ee37bde402 (HEAD -> master, bpf-next/master) libbpf: Fix bpf_xdp_query() in old kernels
$ cd ~/libbpf && git checkout master && git pull
Your branch is up to date with 'libbpf/master'.
Already up to date.
```
Running setup script
--------------------
First step is to always run `sync-kernel.sh` script. It expects three arguments:
```
$ scripts/sync-kernel.sh <libbpf-repo> <kernel-repo> <bpf-branch>
```
Note, that we'll store script's entire output in `/tmp/libbpf-sync.txt` and
put it into PR summary later on. **Please store scripts output and include it
in PR summary for others to check for anything unexpected and suspicious.**
```
$ scripts/sync-kernel.sh ~/libbpf ~/linux bpf-master | tee /tmp/libbpf-sync.txt
Dumping existing libbpf commit signatures...
WORKDIR: /home/andriin/libbpf
LINUX REPO: /home/andriin/linux
LIBBPF REPO: /home/andriin/libbpf
...
```
Most of the time this will go very uneventful. One expected case when sync
script might require user intervention is if `bpf` tree has some libbpf fixes,
which is nowadays not a very frequent occurence. But if that happens, script
will show you a diff between expected state as of latest bpf-next and synced
Github repo state. And will ask if these changes look good. Please use your
best judgement to verify that differences are indeed from expected `bpf` tree
fixes. E.g., it might look like below:
```
Comparing list of files...
Comparing file contents...
--- /home/andriin/linux/include/uapi/linux/netdev.h 2023-02-27 16:54:42.270583372 -0800
+++ /home/andriin/libbpf/include/uapi/linux/netdev.h 2023-02-27 16:54:34.615530796 -0800
@@ -19,7 +19,7 @@
* @NETDEV_XDP_ACT_XSK_ZEROCOPY: This feature informs if netdev supports AF_XDP
* in zero copy mode.
* @NETDEV_XDP_ACT_HW_OFFLOAD: This feature informs if netdev supports XDP hw
- * oflloading.
+ * offloading.
* @NETDEV_XDP_ACT_RX_SG: This feature informs if netdev implements non-linear
* XDP buffer support in the driver napi callback.
* @NETDEV_XDP_ACT_NDO_XMIT_SG: This feature informs if netdev implements
/home/andriin/linux/include/uapi/linux/netdev.h and /home/andriin/libbpf/include/uapi/linux/netdev.h are different!
Unfortunately, there are some inconsistencies, please double check.
Does everything look good? [y/N]:
```
If it looks sensible and expected, type `y` and script will proceed.
If sync is successful, your `~/linux` repo will be left in original state on
the original HEAD commit. `~/libbpf` repo will now be on a new branch, named
`libbpf-sync-<timestamp>` (e.g., `libbpf-sync-2023-02-28T00-53-40.072Z`).
Push this branch into your fork of `libbpf/libbpf` Github repo and create a PR:
```
$ git push --set-upstream origin libbpf-sync-2023-02-28T00-53-40.072Z
Enumerating objects: 130, done.
Counting objects: 100% (115/115), done.
Delta compression using up to 80 threads
Compressing objects: 100% (28/28), done.
Writing objects: 100% (32/32), 5.57 KiB | 1.86 MiB/s, done.
Total 32 (delta 21), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (21/21), completed with 9 local objects.
remote:
remote: Create a pull request for 'libbpf-sync-2023-02-28T00-53-40.072Z' on GitHub by visiting:
remote: https://github.com/anakryiko/libbpf/pull/new/libbpf-sync-2023-02-28T00-53-40.072Z
remote:
To github.com:anakryiko/libbpf.git
* [new branch] libbpf-sync-2023-02-28T00-53-40.072Z -> libbpf-sync-2023-02-28T00-53-40.072Z
Branch 'libbpf-sync-2023-02-28T00-53-40.072Z' set up to track remote branch 'libbpf-sync-2023-02-28T00-53-40.072Z' from 'origin'.
```
**Please, adjust PR name to have a properly looking timestamp. Libbpf
maintainers will be very thankful for that!**
By default Github will turn above branch name into PR with subject "Libbpf sync
2023 02 28 t00 53 40.072 z". Please fix this into a proper timestamp, e.g.:
"Libbpf sync 2023-02-28T00:53:40.072Z". Thank you!
**Please don't forget to paste contents of /tmp/libbpf-sync.txt into PR
summary!**
Once PR is created, libbpf CI will run a bunch of tests to check that
everything is good. In simple cases that would be all you'd need to do. In more
complicated cases some extra adjustments might be necessary.
**Please, keep naming and style consistent.** Prefix CI-related fixes with `ci: `
prefix. If you had to modify sync script, prefix it with `sync: `. Also make
sure that each such commit has `Signed-off-by: Your Full Name <your@email.com>`,
just like you'd do that for Linux upstream patch. Libbpf closely follows kernel
conventions and styling, so please help maintaining that.
Including new sources
---------------------
If entirely new source files (typically `*.c`) were added to the library in the
kernel repository, it may be necessary to add these to the build system
manually (you may notice linker errors otherwise), because the script cannot
handle such changes automatically. To that end, edit `src/Makefile` as
necessary. Commit
[c2495832ced4](https://github.com/libbpf/libbpf/commit/c2495832ced4239bcd376b9954db38a6addd89ca)
is an example of how to go about doing that.
Similarly, if new public API header files were added, the `Makefile` will need
to be adjusted as well.
Updating allow/deny lists
-------------------------
Libbpf CI intentionally runs a subset of latest BPF selftests on old kernel
(4.9 and 5.5, currently). It happens from time to time that some tests that
previously were successfully running on old kernels now don't, typically due to
reliance on some freshly added kernel feature. It might look something like this in [CI logs](https://github.com/libbpf/libbpf/actions/runs/4206303272/jobs/7299609578#step:4:2733):
```
All error logs:
serial_test_xdp_info:FAIL:get_xdp_none errno=2
#283 xdp_info:FAIL
Summary: 49/166 PASSED, 5 SKIPPED, 1 FAILED
```
In such case we can either work with upstream to fix test to be compatible with
old kernels, or we'll have to add a test into a denylist (or remove it from
allowlist, like was [done](https://github.com/libbpf/libbpf/commit/ea284299025bf85b85b4923191de6463cd43ccd6)
for the case above).
```
$ find . -name '*LIST*'
./ci/vmtest/configs/ALLOWLIST-4.9.0
./ci/vmtest/configs/DENYLIST-5.5.0
./ci/vmtest/configs/DENYLIST-latest.s390x
./ci/vmtest/configs/DENYLIST-latest
./ci/vmtest/configs/ALLOWLIST-5.5.0
```
Please determine which tests need to be added/removed from which list. And then
add that as a separate commit. **Please keep using the same branch name, so
that the same PR can be updated.** There is no need to open new PRs for each
such fix.
Regenerating vmlinux.h header
-----------------------------
To compile latest BPF selftests against old kernels, we check in pre-generated
[vmlinux.h](https://github.com/libbpf/libbpf/blob/master/.github/actions/build-selftests/vmlinux.h)
header file, located at `.github/actions/build-selftests/vmlinux.h`, which
contains type definitions from latest upstream kernel. When after libbpf sync
upstream BPF selftests require new kernel types, we'd need to regenerate
`vmlinux.h` and check it in as well.
This will looks something like this in [CI logs](https://github.com/libbpf/libbpf/actions/runs/4198939244/jobs/7283214243#step:4:1903):
```
In file included from progs/test_spin_lock_fail.c:5:
/home/runner/work/libbpf/libbpf/.kernel/tools/testing/selftests/bpf/bpf_experimental.h:73:53: error: declaration of 'struct bpf_rb_root' will not be visible outside of this function [-Werror,-Wvisibility]
extern struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root,
^
/home/runner/work/libbpf/libbpf/.kernel/tools/testing/selftests/bpf/bpf_experimental.h:81:35: error: declaration of 'struct bpf_rb_root' will not be visible outside of this function [-Werror,-Wvisibility]
extern void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
^
/home/runner/work/libbpf/libbpf/.kernel/tools/testing/selftests/bpf/bpf_experimental.h:90:52: error: declaration of 'struct bpf_rb_root' will not be visible outside of this function [-Werror,-Wvisibility]
extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym;
^
3 errors generated.
make: *** [Makefile:572: /home/runner/work/libbpf/libbpf/.kernel/tools/testing/selftests/bpf/test_spin_lock_fail.bpf.o] Error 1
make: *** Waiting for unfinished jobs....
Error: Process completed with exit code 2.
```
You'll need to build latest upstream kernel from `bpf-next` tree, using BPF
selftest configs. Concat arch-agnostic and arch-specific configs, build kernel,
then use bpftool to dump `vmlinux.h`:
```
$ cd ~/linux
$ cat tools/testing/selftests/bpf/config \
tools/testing/selftests/bpf/config.x86_64 > .config
$ make -j$(nproc) olddefconfig all
...
$ bpftool btf dump file ~/linux/vmlinux format c > ~/libbpf/.github/actions/build-selftests/vmlinux.h
$ cd ~/libbpf && git add . && git commit -s
```
Check in generated `vmlinux.h`, don't forget to use `ci: ` commit prefix, add
it on top of sync commits. Push to Github and let libbpf CI do the checking for
you. See [this commit](https://github.com/libbpf/libbpf/commit/34212c94a64df8eeb1dd5d064630a65e1dfd4c20)
for reference.
Troubleshooting
---------------
If something goes wrong and sync script exits early or is terminated early by
user, you might end up with `~/linux` repo on temporary sync-related branch.
Don't worry, though, sync script never destroys repo state, it follows
"copy-on-write" philosophy and creates new branches where necessary. So it's
very easy to restore previous state. So if anything goes wrong, it's easy to
start fresh:
```
$ git branch | grep -E 'libbpf-.*Z'
libbpf-baseline-2023-02-28T00-43-35.146Z
libbpf-bpf-baseline-2023-02-28T00-43-35.146Z
libbpf-bpf-tip-2023-02-28T00-43-35.146Z
libbpf-squash-base-2023-02-28T00-43-35.146Z
* libbpf-squash-tip-2023-02-28T00-43-35.146Z
$ git cherry-pick --abort
$ git checkout master && git branch | grep -E 'libbpf-.*Z' | xargs git br -D
Switched to branch 'master'
Your branch is up to date with 'bpf-next/master'.
Deleted branch libbpf-baseline-2023-02-28T00-43-35.146Z (was 951bce29c898).
Deleted branch libbpf-bpf-baseline-2023-02-28T00-43-35.146Z (was 3a70e0d4c9d7).
Deleted branch libbpf-bpf-tip-2023-02-28T00-43-35.146Z (was 2d311f480b52).
Deleted branch libbpf-squash-base-2023-02-28T00-43-35.146Z (was 957f109ef883).
Deleted branch libbpf-squash-tip-2023-02-28T00-43-35.146Z (was be66130d2339).
Deleted branch libbpf-tip-2023-02-28T00-43-35.146Z (was 2d311f480b52).
```
You might need to do the same for your `~/libbpf` repo sometimes, depending at
which stage sync script was terminated.

Binary file not shown.

After

Width:  |  Height:  |  Size: 262 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 128 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 116 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 284 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 142 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 140 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 352 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 206 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 236 KiB

0
ci/diffs/.keep Normal file
View File

View File

@@ -0,0 +1,69 @@
From c71766e8ff7a7f950522d25896fba758585500df Mon Sep 17 00:00:00 2001
From: Song Liu <song@kernel.org>
Date: Mon, 22 Apr 2024 21:14:40 -0700
Subject: [PATCH] arch/Kconfig: Move SPECULATION_MITIGATIONS to arch/Kconfig
SPECULATION_MITIGATIONS is currently defined only for x86. As a result,
IS_ENABLED(CONFIG_SPECULATION_MITIGATIONS) is always false for other
archs. f337a6a21e2f effectively set "mitigations=off" by default on
non-x86 archs, which is not desired behavior. Jakub observed this
change when running bpf selftests on s390 and arm64.
Fix this by moving SPECULATION_MITIGATIONS to arch/Kconfig so that it is
available in all archs and thus can be used safely in kernel/cpu.c
Fixes: f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n")
Cc: stable@vger.kernel.org
Cc: Sean Christopherson <seanjc@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
---
arch/Kconfig | 10 ++++++++++
arch/x86/Kconfig | 10 ----------
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 9f066785bb71..8f4af75005f8 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1609,4 +1609,14 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
# strict alignment always, even with -falign-functions.
def_bool CC_HAS_MIN_FUNCTION_ALIGNMENT || CC_IS_CLANG
+menuconfig SPECULATION_MITIGATIONS
+ bool "Mitigations for speculative execution vulnerabilities"
+ default y
+ help
+ Say Y here to enable options which enable mitigations for
+ speculative execution hardware vulnerabilities.
+
+ If you say N, all mitigations will be disabled. You really
+ should know what you are doing to say so.
+
endmenu
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 39886bab943a..50c890fce5e0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2486,16 +2486,6 @@ config PREFIX_SYMBOLS
def_bool y
depends on CALL_PADDING && !CFI_CLANG
-menuconfig SPECULATION_MITIGATIONS
- bool "Mitigations for speculative execution vulnerabilities"
- default y
- help
- Say Y here to enable options which enable mitigations for
- speculative execution hardware vulnerabilities.
-
- If you say N, all mitigations will be disabled. You really
- should know what you are doing to say so.
-
if SPECULATION_MITIGATIONS
config MITIGATION_PAGE_TABLE_ISOLATION
--
2.43.0

View File

@@ -0,0 +1,56 @@
From f267f262815033452195f46c43b572159262f533 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Tue, 5 Mar 2024 10:08:28 +0100
Subject: [PATCH 2/2] xdp, bonding: Fix feature flags when there are no slave
devs anymore
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY")
changed the driver from reporting everything as supported before a device
was bonded into having the driver report that no XDP feature is supported
until a real device is bonded as it seems to be more truthful given
eventually real underlying devices decide what XDP features are supported.
The change however did not take into account when all slave devices get
removed from the bond device. In this case after 9b0ed890ac2a, the driver
keeps reporting a feature mask of 0x77, that is, NETDEV_XDP_ACT_MASK &
~NETDEV_XDP_ACT_XSK_ZEROCOPY whereas it should have reported a feature
mask of 0.
Fix it by resetting XDP feature flags in the same way as if no XDP program
is attached to the bond device. This was uncovered by the XDP bond selftest
which let BPF CI fail. After adjusting the starting masks on the latter
to 0 instead of NETDEV_XDP_ACT_MASK the test passes again together with
this fix.
Fixes: 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Prashant Batra <prbatra.mail@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Message-ID: <20240305090829.17131-1-daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
drivers/net/bonding/bond_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a11748b8d69b..cd0683bcca03 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1811,7 +1811,7 @@ void bond_xdp_set_features(struct net_device *bond_dev)
ASSERT_RTNL();
- if (!bond_xdp_check(bond)) {
+ if (!bond_xdp_check(bond) || !bond_has_slaves(bond)) {
xdp_clear_features_flag(bond_dev);
return;
}
--
2.43.0

View File

@@ -6,8 +6,9 @@ CONT_NAME="${CONT_NAME:-libbpf-debian-$DEBIAN_RELEASE}"
ENV_VARS="${ENV_VARS:-}"
DOCKER_RUN="${DOCKER_RUN:-docker run}"
REPO_ROOT="${REPO_ROOT:-$PWD}"
ADDITIONAL_DEPS=(clang pkg-config gcc-10)
CFLAGS="-g -O2 -Werror -Wall"
ADDITIONAL_DEPS=(pkgconf)
EXTRA_CFLAGS=""
EXTRA_LDFLAGS=""
function info() {
echo -e "\033[33;1m$1\033[0m"
@@ -42,30 +43,35 @@ for phase in "${PHASES[@]}"; do
docker_exec bash -c "echo deb-src http://deb.debian.org/debian $DEBIAN_RELEASE main >>/etc/apt/sources.list"
docker_exec apt-get -y update
docker_exec apt-get -y install aptitude
docker_exec aptitude -y build-dep libelf-dev
docker_exec aptitude -y install libelf-dev
docker_exec aptitude -y install make libz-dev libelf-dev
docker_exec aptitude -y install "${ADDITIONAL_DEPS[@]}"
echo -e "::endgroup::"
;;
RUN|RUN_CLANG|RUN_GCC10|RUN_ASAN|RUN_CLANG_ASAN|RUN_GCC10_ASAN)
RUN|RUN_CLANG|RUN_CLANG14|RUN_CLANG15|RUN_CLANG16|RUN_GCC10|RUN_GCC11|RUN_GCC12|RUN_ASAN|RUN_CLANG_ASAN|RUN_GCC10_ASAN)
CC="cc"
if [[ "$phase" = *"CLANG"* ]]; then
if [[ "$phase" =~ "RUN_CLANG(\d+)(_ASAN)?" ]]; then
ENV_VARS="-e CC=clang-${BASH_REMATCH[1]} -e CXX=clang++-${BASH_REMATCH[1]}"
CC="clang-${BASH_REMATCH[1]}"
elif [[ "$phase" = *"CLANG"* ]]; then
ENV_VARS="-e CC=clang -e CXX=clang++"
CC="clang"
elif [[ "$phase" = *"GCC10"* ]]; then
ENV_VARS="-e CC=gcc-10 -e CXX=g++-10"
CC="gcc-10"
CFLAGS="${CFLAGS} -Wno-stringop-truncation"
else
CFLAGS="${CFLAGS} -Wno-stringop-truncation"
elif [[ "$phase" =~ "RUN_GCC(\d+)(_ASAN)?" ]]; then
ENV_VARS="-e CC=gcc-${BASH_REMATCH[1]} -e CXX=g++-${BASH_REMATCH[1]}"
CC="gcc-${BASH_REMATCH[1]}"
fi
if [[ "$phase" = *"ASAN"* ]]; then
CFLAGS="${CFLAGS} -fsanitize=address,undefined"
EXTRA_CFLAGS="${EXTRA_CFLAGS} -fsanitize=address,undefined"
EXTRA_LDFLAGS="${EXTRA_LDFLAGS} -fsanitize=address,undefined"
fi
if [[ "$CC" != "cc" ]]; then
docker_exec aptitude -y install "$CC"
else
docker_exec aptitude -y install gcc
fi
docker_exec mkdir build install
docker_exec ${CC} --version
info "build"
docker_exec make -j$((4*$(nproc))) CFLAGS="${CFLAGS}" -C ./src -B OBJDIR=../build
docker_exec make -j$((4*$(nproc))) EXTRA_CFLAGS="${EXTRA_CFLAGS}" EXTRA_LDFLAGS="${EXTRA_LDFLAGS}" -C ./src -B OBJDIR=../build
info "ldd build/libbpf.so:"
docker_exec ldd build/libbpf.so
if ! docker_exec ldd build/libbpf.so | grep -q libelf; then
@@ -75,7 +81,7 @@ for phase in "${PHASES[@]}"; do
info "install"
docker_exec make -j$((4*$(nproc))) -C src OBJDIR=../build DESTDIR=../install install
info "link binary"
docker_exec bash -c "CFLAGS=\"${CFLAGS}\" ./travis-ci/managers/test_compile.sh"
docker_exec bash -c "EXTRA_CFLAGS=\"${EXTRA_CFLAGS}\" EXTRA_LDFLAGS=\"${EXTRA_LDFLAGS}\" ./ci/managers/test_compile.sh"
;;
CLEANUP)
info "Cleanup phase"

15
ci/managers/test_compile.sh Executable file
View File

@@ -0,0 +1,15 @@
#!/bin/bash
set -euox pipefail
EXTRA_CFLAGS=${EXTRA_CFLAGS:-}
EXTRA_LDFLAGS=${EXTRA_LDFLAGS:-}
cat << EOF > main.c
#include <bpf/libbpf.h>
int main() {
return bpf_object__open(0) < 0;
}
EOF
# static linking
${CC:-cc} ${EXTRA_CFLAGS} ${EXTRA_LDFLAGS} -o main -I./include/uapi -I./install/usr/include main.c ./build/libbpf.a -lelf -lz

View File

@@ -10,14 +10,15 @@ source "$(dirname $0)/travis_wait.bash"
cd $REPO_ROOT
CFLAGS="-g -O2 -Werror -Wall -fsanitize=address,undefined -Wno-stringop-truncation"
EXTRA_CFLAGS="-Werror -Wall -fsanitize=address,undefined"
EXTRA_LDFLAGS="-Werror -Wall -fsanitize=address,undefined"
mkdir build install
cc --version
make -j$((4*$(nproc))) CFLAGS="${CFLAGS}" -C ./src -B OBJDIR=../build
make -j$((4*$(nproc))) EXTRA_CFLAGS="${EXTRA_CFLAGS}" EXTRA_LDFLAGS="${EXTRA_LDFLAGS}" -C ./src -B OBJDIR=../build
ldd build/libbpf.so
if ! ldd build/libbpf.so | grep -q libelf; then
echo "FAIL: No reference to libelf.so in libbpf.so!"
exit 1
fi
make -j$((4*$(nproc))) -C src OBJDIR=../build DESTDIR=../install install
CFLAGS=${CFLAGS} $(dirname $0)/test_compile.sh
EXTRA_CFLAGS=${EXTRA_CFLAGS} EXTRA_LDFLAGS=${EXTRA_LDFLAGS} $(dirname $0)/test_compile.sh

View File

@@ -2,6 +2,7 @@
core_retro
cpu_mask
hashmap
legacy_printk
perf_buffer
section_names

View File

@@ -1,4 +1,4 @@
attach_probe
# attach_probe
autoload
bpf_verif_scale
cgroup_attach_autodetach
@@ -10,14 +10,13 @@ core_reloc
core_retro
cpu_mask
endian
fexit_stress
get_branch_snapshot
get_stackid_cannot_attach
global_data
global_data_init
global_func_args
hashmap
l4lb_all
legacy_printk
linked_funcs
linked_maps
map_lock
@@ -33,23 +32,18 @@ raw_tp_writable_test_run
rdonly_maps
section_names
signal_pending
skeleton
sockmap_ktls
sockopt
sockopt_inherit
sockopt_multi
spinlock
stacktrace_map
stacktrace_map_raw_tp
static_linked
subprogs
task_fd_query_rawtp
task_fd_query_tp
tc_bpf
tcp_estats
tcp_rtt
test_global_funcs/arg_tag_ctx*
tp_attach_query
usdt/urand_pid_attach
xdp
xdp_info
xdp_noinline
xdp_perf

View File

@@ -0,0 +1,14 @@
# TEMPORARY
btf_dump/btf_dump: syntax
kprobe_multi_bench_attach
core_reloc/enum64val
core_reloc/size___diff_sz
core_reloc/type_based___diff_sz
test_ima # All of CI is broken on it following 6.3-rc1 merge
lwt_reroute # crashes kernel after netnext merge from 2ab1efad60ad "net/sched: cls_api: complement tcf_tfilter_dump_policy"
tc_links_ingress # started failing after net-next merge from 2ab1efad60ad "net/sched: cls_api: complement tcf_tfilter_dump_policy"
xdp_bonding/xdp_bonding_features # started failing after net merge from 359e54a93ab4 "l2tp: pass correct message length to ip6_append_data"
tc_redirect/tc_redirect_dtime # uapi breakage after net-next commit 885c36e59f46 ("net: Re-use and set mono_delivery_time bit for userspace tstamp packets")
migrate_reuseport/IPv4 TCP_NEW_SYN_RECV reqsk_timer_handler # flaky, under investigation
migrate_reuseport/IPv6 TCP_NEW_SYN_RECV reqsk_timer_handler # flaky, under investigation

View File

@@ -0,0 +1,5 @@
# This complements ALLOWLIST-5.5.0 but excludes subtest that can't work on 5.5
btf # "size check test", "func (Non zero vlen)"
tailcalls # tailcall_bpf2bpf_1, tailcall_bpf2bpf_2, tailcall_bpf2bpf_3
tc_bpf/tc_bpf_non_root

View File

@@ -0,0 +1,13 @@
decap_sanity # weird failure with decap_sanity_ns netns already existing, TBD
empty_skb # waiting the fix in bpf tree to make it to bpf-next
bpf_nf/tc-bpf-ct # test consistently failing on x86: https://github.com/libbpf/libbpf/pull/698#issuecomment-1590341200
bpf_nf/xdp-ct # test consistently failing on x86: https://github.com/libbpf/libbpf/pull/698#issuecomment-1590341200
kprobe_multi_bench_attach # suspected to cause crashes in CI
find_vma # test consistently fails on latest kernel, see https://github.com/libbpf/libbpf/issues/754 for details
bpf_cookie/perf_event
send_signal/send_signal_nmi
send_signal/send_signal_nmi_thread
lwt_reroute # crashes kernel, fix pending upstream
tc_links_ingress # fails, same fix is pending upstream
tc_redirect # enough is enough, banned for life for flakiness

View File

@@ -0,0 +1,17 @@
# TEMPORARY
sockmap_listen/sockhash VSOCK test_vsock_redir
usdt/basic # failing verifier due to bounds check after LLVM update
usdt/multispec # same as above
deny_namespace # not yet in bpf denylist
tc_redirect/tc_redirect_dtime # very flaky
lru_bug # not yet in bpf-next denylist
# Disabled temporarily for a crash.
# https://lore.kernel.org/bpf/c9923c1d-971d-4022-8dc8-1364e929d34c@gmail.com/
dummy_st_ops/dummy_init_ptr_arg
fexit_bpf2bpf
tailcalls
trace_ext
xdp_bpf2bpf
xdp_metadata

38
ci/vmtest/helpers.sh Executable file
View File

@@ -0,0 +1,38 @@
# shellcheck shell=bash
# $1 - start or end
# $2 - fold identifier, no spaces
# $3 - fold section description
foldable() {
local YELLOW='\033[1;33m'
local NOCOLOR='\033[0m'
if [ $1 = "start" ]; then
line="::group::$2"
if [ ! -z "${3:-}" ]; then
line="$line - ${YELLOW}$3${NOCOLOR}"
fi
else
line="::endgroup::"
fi
echo -e "$line"
}
__print() {
local TITLE=""
if [[ -n $2 ]]; then
TITLE=" title=$2"
fi
echo "::$1${TITLE}::$3"
}
# $1 - title
# $2 - message
print_error() {
__print error $1 $2
}
# $1 - title
# $2 - message
print_notice() {
__print notice $1 $2
}

94
ci/vmtest/run_selftests.sh Executable file
View File

@@ -0,0 +1,94 @@
#!/bin/bash
set -euo pipefail
source $(cd $(dirname $0) && pwd)/helpers.sh
ARCH=$(uname -m)
STATUS_FILE=/exitstatus
read_lists() {
(for path in "$@"; do
if [[ -s "$path" ]]; then
cat "$path"
fi;
done) | cut -d'#' -f1 | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//' | tr -s '\n' ','
}
test_progs() {
if [[ "${KERNEL}" != '4.9.0' ]]; then
foldable start test_progs "Testing test_progs"
# "&& true" does not change the return code (it is not executed
# if the Python script fails), but it prevents exiting on a
# failure due to the "set -e".
./test_progs ${DENYLIST:+-d"$DENYLIST"} ${ALLOWLIST:+-a"$ALLOWLIST"} && true
echo "test_progs:$?" >> "${STATUS_FILE}"
foldable end test_progs
fi
}
test_progs_no_alu32() {
foldable start test_progs-no_alu32 "Testing test_progs-no_alu32"
./test_progs-no_alu32 ${DENYLIST:+-d"$DENYLIST"} ${ALLOWLIST:+-a"$ALLOWLIST"} && true
echo "test_progs-no_alu32:$?" >> "${STATUS_FILE}"
foldable end test_progs-no_alu32
}
test_maps() {
if [[ "${KERNEL}" == 'latest' ]]; then
foldable start test_maps "Testing test_maps"
./test_maps && true
echo "test_maps:$?" >> "${STATUS_FILE}"
foldable end test_maps
fi
}
test_verifier() {
if [[ "${KERNEL}" == 'latest' ]]; then
foldable start test_verifier "Testing test_verifier"
./test_verifier && true
echo "test_verifier:$?" >> "${STATUS_FILE}"
foldable end test_verifier
fi
}
foldable end vm_init
foldable start kernel_config "Kconfig"
zcat /proc/config.gz
foldable end kernel_config
configs_path=/${PROJECT_NAME}/selftests/bpf
local_configs_path=${PROJECT_NAME}/vmtest/configs
DENYLIST=$(read_lists \
"$configs_path/DENYLIST" \
"$configs_path/DENYLIST.${ARCH}" \
"$local_configs_path/DENYLIST-${KERNEL}" \
"$local_configs_path/DENYLIST-${KERNEL}.${ARCH}" \
)
ALLOWLIST=$(read_lists \
"$configs_path/ALLOWLIST" \
"$configs_path/ALLOWLIST.${ARCH}" \
"$local_configs_path/ALLOWLIST-${KERNEL}" \
"$local_configs_path/ALLOWLIST-${KERNEL}.${ARCH}" \
)
echo "DENYLIST: ${DENYLIST}"
echo "ALLOWLIST: ${ALLOWLIST}"
cd ${PROJECT_NAME}/selftests/bpf
if [ $# -eq 0 ]; then
test_progs
test_progs_no_alu32
# test_maps
test_verifier
else
for test_name in "$@"; do
"${test_name}"
done
fi

View File

@@ -6,7 +6,49 @@
LIBBPF API
==================
==========
Error Handling
--------------
When libbpf is used in "libbpf 1.0 mode", API functions can return errors in one of two ways.
You can set "libbpf 1.0" mode with the following line:
.. code-block::
libbpf_set_strict_mode(LIBBPF_STRICT_DIRECT_ERRS | LIBBPF_STRICT_CLEAN_PTRS);
If the function returns an error code directly, it uses 0 to indicate success
and a negative error code to indicate what caused the error. In this case the
error code should be checked directly from the return, you do not need to check
errno.
For example:
.. code-block::
err = some_libbpf_api_with_error_return(...);
if (err < 0) {
/* Handle error accordingly */
}
If the function returns a pointer, it will return NULL to indicate there was
an error. In this case errno should be checked for the error code.
For example:
.. code-block::
ptr = some_libbpf_api_returning_ptr();
if (!ptr) {
/* note no minus sign for EINVAL and E2BIG below */
if (errno == EINVAL) {
/* handle EINVAL error */
} else if (errno == E2BIG) {
/* handle E2BIG error */
}
}
libbpf.h
--------

View File

@@ -18,6 +18,7 @@ extensions = [
'sphinx.ext.viewcode',
'sphinx.ext.imgmath',
'sphinx.ext.todo',
'sphinx_rtd_theme',
'breathe',
]

View File

@@ -1,22 +1,33 @@
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
.. _libbpf:
======
libbpf
======
For API documentation see the `versioned API documentation site <https://libbpf.readthedocs.io/en/latest/api.html>`_.
If you are looking to develop BPF applications using the libbpf library, this
directory contains important documentation that you should read.
To get started, it is recommended to begin with the :doc:`libbpf Overview
<libbpf_overview>` document, which provides a high-level understanding of the
libbpf APIs and their usage. This will give you a solid foundation to start
exploring and utilizing the various features of libbpf to develop your BPF
applications.
.. toctree::
:maxdepth: 1
libbpf_overview
API Documentation <https://libbpf.readthedocs.io/en/latest/api.html>
program_types
libbpf_naming_convention
libbpf_build
This is documentation for libbpf, a userspace library for loading and
interacting with bpf programs.
All general BPF questions, including kernel functionality, libbpf APIs and
their application, should be sent to bpf@vger.kernel.org mailing list.
You can `subscribe <http://vger.kernel.org/vger-lists.html#bpf>`_ to the
mailing list search its `archive <https://lore.kernel.org/bpf/>`_.
Please search the archive before asking new questions. It very well might
be that this was already addressed or answered before.
All general BPF questions, including kernel functionality, libbpf APIs and their
application, should be sent to bpf@vger.kernel.org mailing list. You can
`subscribe <http://vger.kernel.org/vger-lists.html#bpf>`_ to the mailing list
search its `archive <https://lore.kernel.org/bpf/>`_. Please search the archive
before asking new questions. It may be that this was already addressed or
answered before.

View File

@@ -9,8 +9,8 @@ described here. It's recommended to follow these conventions whenever a
new function or type is added to keep libbpf API clean and consistent.
All types and functions provided by libbpf API should have one of the
following prefixes: ``bpf_``, ``btf_``, ``libbpf_``, ``xsk_``,
``btf_dump_``, ``ring_buffer_``, ``perf_buffer_``.
following prefixes: ``bpf_``, ``btf_``, ``libbpf_``, ``btf_dump_``,
``ring_buffer_``, ``perf_buffer_``.
System call wrappers
--------------------
@@ -59,15 +59,6 @@ Auxiliary functions and types that don't fit well in any of categories
described above should have ``libbpf_`` prefix, e.g.
``libbpf_get_error`` or ``libbpf_prog_type_by_name``.
AF_XDP functions
-------------------
AF_XDP functions should have an ``xsk_`` prefix, e.g.
``xsk_umem__get_data`` or ``xsk_umem__create``. The interface consists
of both low-level ring access functions and high-level configuration
functions. These can be mixed and matched. Note that these functions
are not reentrant for performance reasons.
ABI
---
@@ -92,8 +83,8 @@ This prevents from accidentally exporting a symbol, that is not supposed
to be a part of ABI what, in turn, improves both libbpf developer- and
user-experiences.
ABI versionning
---------------
ABI versioning
--------------
To make future ABI extensions possible libbpf ABI is versioned.
Versioning is implemented by ``libbpf.map`` version script that is
@@ -157,7 +148,7 @@ API documentation convention
The libbpf API is documented via comments above definitions in
header files. These comments can be rendered by doxygen and sphinx
for well organized html output. This section describes the
convention in which these comments should be formated.
convention in which these comments should be formatted.
Here is an example from btf.h:

228
docs/libbpf_overview.rst Normal file
View File

@@ -0,0 +1,228 @@
.. SPDX-License-Identifier: GPL-2.0
===============
libbpf Overview
===============
libbpf is a C-based library containing a BPF loader that takes compiled BPF
object files and prepares and loads them into the Linux kernel. libbpf takes the
heavy lifting of loading, verifying, and attaching BPF programs to various
kernel hooks, allowing BPF application developers to focus only on BPF program
correctness and performance.
The following are the high-level features supported by libbpf:
* Provides high-level and low-level APIs for user space programs to interact
with BPF programs. The low-level APIs wrap all the bpf system call
functionality, which is useful when users need more fine-grained control
over the interactions between user space and BPF programs.
* Provides overall support for the BPF object skeleton generated by bpftool.
The skeleton file simplifies the process for the user space programs to access
global variables and work with BPF programs.
* Provides BPF-side APIS, including BPF helper definitions, BPF maps support,
and tracing helpers, allowing developers to simplify BPF code writing.
* Supports BPF CO-RE mechanism, enabling BPF developers to write portable
BPF programs that can be compiled once and run across different kernel
versions.
This document will delve into the above concepts in detail, providing a deeper
understanding of the capabilities and advantages of libbpf and how it can help
you develop BPF applications efficiently.
BPF App Lifecycle and libbpf APIs
==================================
A BPF application consists of one or more BPF programs (either cooperating or
completely independent), BPF maps, and global variables. The global
variables are shared between all BPF programs, which allows them to cooperate on
a common set of data. libbpf provides APIs that user space programs can use to
manipulate the BPF programs by triggering different phases of a BPF application
lifecycle.
The following section provides a brief overview of each phase in the BPF life
cycle:
* **Open phase**: In this phase, libbpf parses the BPF
object file and discovers BPF maps, BPF programs, and global variables. After
a BPF app is opened, user space apps can make additional adjustments
(setting BPF program types, if necessary; pre-setting initial values for
global variables, etc.) before all the entities are created and loaded.
* **Load phase**: In the load phase, libbpf creates BPF
maps, resolves various relocations, and verifies and loads BPF programs into
the kernel. At this point, libbpf validates all the parts of a BPF application
and loads the BPF program into the kernel, but no BPF program has yet been
executed. After the load phase, its possible to set up the initial BPF map
state without racing with the BPF program code execution.
* **Attachment phase**: In this phase, libbpf
attaches BPF programs to various BPF hook points (e.g., tracepoints, kprobes,
cgroup hooks, network packet processing pipeline, etc.). During this
phase, BPF programs perform useful work such as processing
packets, or updating BPF maps and global variables that can be read from user
space.
* **Tear down phase**: In the tear down phase,
libbpf detaches BPF programs and unloads them from the kernel. BPF maps are
destroyed, and all the resources used by the BPF app are freed.
BPF Object Skeleton File
========================
BPF skeleton is an alternative interface to libbpf APIs for working with BPF
objects. Skeleton code abstract away generic libbpf APIs to significantly
simplify code for manipulating BPF programs from user space. Skeleton code
includes a bytecode representation of the BPF object file, simplifying the
process of distributing your BPF code. With BPF bytecode embedded, there are no
extra files to deploy along with your application binary.
You can generate the skeleton header file ``(.skel.h)`` for a specific object
file by passing the BPF object to the bpftool. The generated BPF skeleton
provides the following custom functions that correspond to the BPF lifecycle,
each of them prefixed with the specific object name:
* ``<name>__open()`` creates and opens BPF application (``<name>`` stands for
the specific bpf object name)
* ``<name>__load()`` instantiates, loads,and verifies BPF application parts
* ``<name>__attach()`` attaches all auto-attachable BPF programs (its
optional, you can have more control by using libbpf APIs directly)
* ``<name>__destroy()`` detaches all BPF programs and
frees up all used resources
Using the skeleton code is the recommended way to work with bpf programs. Keep
in mind, BPF skeleton provides access to the underlying BPF object, so whatever
was possible to do with generic libbpf APIs is still possible even when the BPF
skeleton is used. It's an additive convenience feature, with no syscalls, and no
cumbersome code.
Other Advantages of Using Skeleton File
---------------------------------------
* BPF skeleton provides an interface for user space programs to work with BPF
global variables. The skeleton code memory maps global variables as a struct
into user space. The struct interface allows user space programs to initialize
BPF programs before the BPF load phase and fetch and update data from user
space afterward.
* The ``skel.h`` file reflects the object file structure by listing out the
available maps, programs, etc. BPF skeleton provides direct access to all the
BPF maps and BPF programs as struct fields. This eliminates the need for
string-based lookups with ``bpf_object_find_map_by_name()`` and
``bpf_object_find_program_by_name()`` APIs, reducing errors due to BPF source
code and user-space code getting out of sync.
* The embedded bytecode representation of the object file ensures that the
skeleton and the BPF object file are always in sync.
BPF Helpers
===========
libbpf provides BPF-side APIs that BPF programs can use to interact with the
system. The BPF helpers definition allows developers to use them in BPF code as
any other plain C function. For example, there are helper functions to print
debugging messages, get the time since the system was booted, interact with BPF
maps, manipulate network packets, etc.
For a complete description of what the helpers do, the arguments they take, and
the return value, see the `bpf-helpers
<https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man page.
BPF CO-RE (Compile Once Run Everywhere)
=========================================
BPF programs work in the kernel space and have access to kernel memory and data
structures. One limitation that BPF applications come across is the lack of
portability across different kernel versions and configurations. `BCC
<https://github.com/iovisor/bcc/>`_ is one of the solutions for BPF
portability. However, it comes with runtime overhead and a large binary size
from embedding the compiler with the application.
libbpf steps up the BPF program portability by supporting the BPF CO-RE concept.
BPF CO-RE brings together BTF type information, libbpf, and the compiler to
produce a single executable binary that you can run on multiple kernel versions
and configurations.
To make BPF programs portable libbpf relies on the BTF type information of the
running kernel. Kernel also exposes this self-describing authoritative BTF
information through ``sysfs`` at ``/sys/kernel/btf/vmlinux``.
You can generate the BTF information for the running kernel with the following
command:
::
$ bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
The command generates a ``vmlinux.h`` header file with all kernel types
(:doc:`BTF types <../btf>`) that the running kernel uses. Including
``vmlinux.h`` in your BPF program eliminates dependency on system-wide kernel
headers.
libbpf enables portability of BPF programs by looking at the BPF programs
recorded BTF type and relocation information and matching them to BTF
information (vmlinux) provided by the running kernel. libbpf then resolves and
matches all the types and fields, and updates necessary offsets and other
relocatable data to ensure that BPF programs logic functions correctly for a
specific kernel on the host. BPF CO-RE concept thus eliminates overhead
associated with BPF development and allows developers to write portable BPF
applications without modifications and runtime source code compilation on the
target machine.
The following code snippet shows how to read the parent field of a kernel
``task_struct`` using BPF CO-RE and libbf. The basic helper to read a field in a
CO-RE relocatable manner is ``bpf_core_read(dst, sz, src)``, which will read
``sz`` bytes from the field referenced by ``src`` into the memory pointed to by
``dst``.
.. code-block:: C
:emphasize-lines: 6
//...
struct task_struct *task = (void *)bpf_get_current_task();
struct task_struct *parent_task;
int err;
err = bpf_core_read(&parent_task, sizeof(void *), &task->parent);
if (err) {
/* handle error */
}
/* parent_task contains the value of task->parent pointer */
In the code snippet, we first get a pointer to the current ``task_struct`` using
``bpf_get_current_task()``. We then use ``bpf_core_read()`` to read the parent
field of task struct into the ``parent_task`` variable. ``bpf_core_read()`` is
just like ``bpf_probe_read_kernel()`` BPF helper, except it records information
about the field that should be relocated on the target kernel. i.e, if the
``parent`` field gets shifted to a different offset within
``struct task_struct`` due to some new field added in front of it, libbpf will
automatically adjust the actual offset to the proper value.
Getting Started with libbpf
===========================
Check out the `libbpf-bootstrap <https://github.com/libbpf/libbpf-bootstrap>`_
repository with simple examples of using libbpf to build various BPF
applications.
See also `libbpf API documentation
<https://libbpf.readthedocs.io/en/latest/api.html>`_.
libbpf and Rust
===============
If you are building BPF applications in Rust, it is recommended to use the
`Libbpf-rs <https://github.com/libbpf/libbpf-rs>`_ library instead of bindgen
bindings directly to libbpf. Libbpf-rs wraps libbpf functionality in
Rust-idiomatic interfaces and provides libbpf-cargo plugin to handle BPF code
compilation and skeleton generation. Using Libbpf-rs will make building user
space part of the BPF application easier. Note that the BPF program themselves
must still be written in plain C.
Additional Documentation
========================
* `Program types and ELF Sections <https://libbpf.readthedocs.io/en/latest/program_types.html>`_
* `API naming convention <https://libbpf.readthedocs.io/en/latest/libbpf_naming_convention.html>`_
* `Building libbpf <https://libbpf.readthedocs.io/en/latest/libbpf_build.html>`_
* `API documentation Convention <https://libbpf.readthedocs.io/en/latest/libbpf_naming_convention.html#api-documentation-convention>`_

213
docs/program_types.rst Normal file
View File

@@ -0,0 +1,213 @@
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
.. _program_types_and_elf:
Program Types and ELF Sections
==============================
The table below lists the program types, their attach types where relevant and the ELF section
names supported by libbpf for them. The ELF section names follow these rules:
- ``type`` is an exact match, e.g. ``SEC("socket")``
- ``type+`` means it can be either exact ``SEC("type")`` or well-formed ``SEC("type/extras")``
with a '``/``' separator between ``type`` and ``extras``.
When ``extras`` are specified, they provide details of how to auto-attach the BPF program. The
format of ``extras`` depends on the program type, e.g. ``SEC("tracepoint/<category>/<name>")``
for tracepoints or ``SEC("usdt/<path>:<provider>:<name>")`` for USDT probes. The extras are
described in more detail in the footnotes.
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| Program Type | Attach Type | ELF Section Name | Sleepable |
+===========================================+========================================+==================================+===========+
| ``BPF_PROG_TYPE_CGROUP_DEVICE`` | ``BPF_CGROUP_DEVICE`` | ``cgroup/dev`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_CGROUP_SKB`` | | ``cgroup/skb`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET_EGRESS`` | ``cgroup_skb/egress`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET_INGRESS`` | ``cgroup_skb/ingress`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_CGROUP_SOCKOPT`` | ``BPF_CGROUP_GETSOCKOPT`` | ``cgroup/getsockopt`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_SETSOCKOPT`` | ``cgroup/setsockopt`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_CGROUP_SOCK_ADDR`` | ``BPF_CGROUP_INET4_BIND`` | ``cgroup/bind4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET4_CONNECT`` | ``cgroup/connect4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET4_GETPEERNAME`` | ``cgroup/getpeername4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET4_GETSOCKNAME`` | ``cgroup/getsockname4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET6_BIND`` | ``cgroup/bind6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET6_CONNECT`` | ``cgroup/connect6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET6_GETPEERNAME`` | ``cgroup/getpeername6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET6_GETSOCKNAME`` | ``cgroup/getsockname6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UDP4_RECVMSG`` | ``cgroup/recvmsg4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UDP4_SENDMSG`` | ``cgroup/sendmsg4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UDP6_RECVMSG`` | ``cgroup/recvmsg6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UDP6_SENDMSG`` | ``cgroup/sendmsg6`` | |
| +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UNIX_CONNECT`` | ``cgroup/connect_unix`` | |
| +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UNIX_SENDMSG`` | ``cgroup/sendmsg_unix`` | |
| +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UNIX_RECVMSG`` | ``cgroup/recvmsg_unix`` | |
| +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UNIX_GETPEERNAME`` | ``cgroup/getpeername_unix`` | |
| +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UNIX_GETSOCKNAME`` | ``cgroup/getsockname_unix`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_CGROUP_SOCK`` | ``BPF_CGROUP_INET4_POST_BIND`` | ``cgroup/post_bind4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET6_POST_BIND`` | ``cgroup/post_bind6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET_SOCK_CREATE`` | ``cgroup/sock_create`` | |
+ + +----------------------------------+-----------+
| | | ``cgroup/sock`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET_SOCK_RELEASE`` | ``cgroup/sock_release`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_CGROUP_SYSCTL`` | ``BPF_CGROUP_SYSCTL`` | ``cgroup/sysctl`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_EXT`` | | ``freplace+`` [#fentry]_ | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_FLOW_DISSECTOR`` | ``BPF_FLOW_DISSECTOR`` | ``flow_dissector`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_KPROBE`` | | ``kprobe+`` [#kprobe]_ | |
+ + +----------------------------------+-----------+
| | | ``kretprobe+`` [#kprobe]_ | |
+ + +----------------------------------+-----------+
| | | ``ksyscall+`` [#ksyscall]_ | |
+ + +----------------------------------+-----------+
| | | ``kretsyscall+`` [#ksyscall]_ | |
+ + +----------------------------------+-----------+
| | | ``uprobe+`` [#uprobe]_ | |
+ + +----------------------------------+-----------+
| | | ``uprobe.s+`` [#uprobe]_ | Yes |
+ + +----------------------------------+-----------+
| | | ``uretprobe+`` [#uprobe]_ | |
+ + +----------------------------------+-----------+
| | | ``uretprobe.s+`` [#uprobe]_ | Yes |
+ + +----------------------------------+-----------+
| | | ``usdt+`` [#usdt]_ | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_TRACE_KPROBE_MULTI`` | ``kprobe.multi+`` [#kpmulti]_ | |
+ + +----------------------------------+-----------+
| | | ``kretprobe.multi+`` [#kpmulti]_ | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LIRC_MODE2`` | ``BPF_LIRC_MODE2`` | ``lirc_mode2`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LSM`` | ``BPF_LSM_CGROUP`` | ``lsm_cgroup+`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_LSM_MAC`` | ``lsm+`` [#lsm]_ | |
+ + +----------------------------------+-----------+
| | | ``lsm.s+`` [#lsm]_ | Yes |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LWT_IN`` | | ``lwt_in`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LWT_OUT`` | | ``lwt_out`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` | | ``lwt_seg6local`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LWT_XMIT`` | | ``lwt_xmit`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_PERF_EVENT`` | | ``perf_event`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE`` | | ``raw_tp.w+`` [#rawtp]_ | |
+ + +----------------------------------+-----------+
| | | ``raw_tracepoint.w+`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_RAW_TRACEPOINT`` | | ``raw_tp+`` [#rawtp]_ | |
+ + +----------------------------------+-----------+
| | | ``raw_tracepoint+`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SCHED_ACT`` | | ``action`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SCHED_CLS`` | | ``classifier`` | |
+ + +----------------------------------+-----------+
| | | ``tc`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SK_LOOKUP`` | ``BPF_SK_LOOKUP`` | ``sk_lookup`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SK_MSG`` | ``BPF_SK_MSG_VERDICT`` | ``sk_msg`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SK_REUSEPORT`` | ``BPF_SK_REUSEPORT_SELECT_OR_MIGRATE`` | ``sk_reuseport/migrate`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_SK_REUSEPORT_SELECT`` | ``sk_reuseport`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SK_SKB`` | | ``sk_skb`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_SK_SKB_STREAM_PARSER`` | ``sk_skb/stream_parser`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_SK_SKB_STREAM_VERDICT`` | ``sk_skb/stream_verdict`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SOCKET_FILTER`` | | ``socket`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SOCK_OPS`` | ``BPF_CGROUP_SOCK_OPS`` | ``sockops`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_STRUCT_OPS`` | | ``struct_ops+`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SYSCALL`` | | ``syscall`` | Yes |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_TRACEPOINT`` | | ``tp+`` [#tp]_ | |
+ + +----------------------------------+-----------+
| | | ``tracepoint+`` [#tp]_ | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_TRACING`` | ``BPF_MODIFY_RETURN`` | ``fmod_ret+`` [#fentry]_ | |
+ + +----------------------------------+-----------+
| | | ``fmod_ret.s+`` [#fentry]_ | Yes |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_TRACE_FENTRY`` | ``fentry+`` [#fentry]_ | |
+ + +----------------------------------+-----------+
| | | ``fentry.s+`` [#fentry]_ | Yes |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_TRACE_FEXIT`` | ``fexit+`` [#fentry]_ | |
+ + +----------------------------------+-----------+
| | | ``fexit.s+`` [#fentry]_ | Yes |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_TRACE_ITER`` | ``iter+`` [#iter]_ | |
+ + +----------------------------------+-----------+
| | | ``iter.s+`` [#iter]_ | Yes |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_TRACE_RAW_TP`` | ``tp_btf+`` [#fentry]_ | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_XDP`` | ``BPF_XDP_CPUMAP`` | ``xdp.frags/cpumap`` | |
+ + +----------------------------------+-----------+
| | | ``xdp/cpumap`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_XDP_DEVMAP`` | ``xdp.frags/devmap`` | |
+ + +----------------------------------+-----------+
| | | ``xdp/devmap`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_XDP`` | ``xdp.frags`` | |
+ + +----------------------------------+-----------+
| | | ``xdp`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
.. rubric:: Footnotes
.. [#fentry] The ``fentry`` attach format is ``fentry[.s]/<function>``.
.. [#kprobe] The ``kprobe`` attach format is ``kprobe/<function>[+<offset>]``. Valid
characters for ``function`` are ``a-zA-Z0-9_.`` and ``offset`` must be a valid
non-negative integer.
.. [#ksyscall] The ``ksyscall`` attach format is ``ksyscall/<syscall>``.
.. [#uprobe] The ``uprobe`` attach format is ``uprobe[.s]/<path>:<function>[+<offset>]``.
.. [#usdt] The ``usdt`` attach format is ``usdt/<path>:<provider>:<name>``.
.. [#kpmulti] The ``kprobe.multi`` attach format is ``kprobe.multi/<pattern>`` where ``pattern``
supports ``*`` and ``?`` wildcards. Valid characters for pattern are
``a-zA-Z0-9_.*?``.
.. [#lsm] The ``lsm`` attachment format is ``lsm[.s]/<hook>``.
.. [#rawtp] The ``raw_tp`` attach format is ``raw_tracepoint[.w]/<tracepoint>``.
.. [#tp] The ``tracepoint`` attach format is ``tracepoint/<category>/<name>``.
.. [#iter] The ``iter`` attach format is ``iter[.s]/<struct-name>``.

View File

@@ -1 +1,2 @@
breathe
breathe
sphinx_rtd_theme

23
fuzz/bpf-object-fuzzer.c Normal file
View File

@@ -0,0 +1,23 @@
#include "libbpf.h"
static int libbpf_print_fn(enum libbpf_print_level level, const char *format, va_list args)
{
return 0;
}
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
struct bpf_object *obj = NULL;
DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts);
int err;
libbpf_set_print(libbpf_print_fn);
opts.object_name = "fuzz-object";
obj = bpf_object__open_mem(data, size, &opts);
err = libbpf_get_error(obj);
if (err)
return 0;
bpf_object__close(obj);
return 0;
}

Binary file not shown.

View File

@@ -37,6 +37,14 @@
.off = 0, \
.imm = IMM })
#define BPF_CALL_REL(DST) \
((struct bpf_insn) { \
.code = BPF_JMP | BPF_CALL, \
.dst_reg = 0, \
.src_reg = BPF_PSEUDO_CALL, \
.off = 0, \
.imm = DST })
#define BPF_EXIT_INSN() \
((struct bpf_insn) { \
.code = BPF_JMP | BPF_EXIT, \

View File

@@ -3,6 +3,8 @@
#ifndef __LINUX_KERNEL_H
#define __LINUX_KERNEL_H
#include <linux/compiler.h>
#ifndef offsetof
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
#endif

View File

@@ -7,6 +7,8 @@
#include <stddef.h>
#include <stdint.h>
#include <linux/stddef.h>
#include <asm/types.h>
#include <asm/posix_types.h>

File diff suppressed because it is too large Load Diff

View File

@@ -33,17 +33,17 @@ struct btf_type {
/* "info" bits arrangement
* bits 0-15: vlen (e.g. # of struct's members)
* bits 16-23: unused
* bits 24-27: kind (e.g. int, ptr, array...etc)
* bits 28-30: unused
* bits 24-28: kind (e.g. int, ptr, array...etc)
* bits 29-30: unused
* bit 31: kind_flag, currently used by
* struct, union and fwd
* struct, union, enum, fwd and enum64
*/
__u32 info;
/* "size" is used by INT, ENUM, STRUCT, UNION and DATASEC.
/* "size" is used by INT, ENUM, STRUCT, UNION, DATASEC and ENUM64.
* "size" tells the size of the type it is describing.
*
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
* FUNC, FUNC_PROTO, VAR and DECL_TAG.
* FUNC, FUNC_PROTO, VAR, DECL_TAG and TYPE_TAG.
* "type" is a type_id referring to another type.
*/
union {
@@ -63,7 +63,7 @@ enum {
BTF_KIND_ARRAY = 3, /* Array */
BTF_KIND_STRUCT = 4, /* Struct */
BTF_KIND_UNION = 5, /* Union */
BTF_KIND_ENUM = 6, /* Enumeration */
BTF_KIND_ENUM = 6, /* Enumeration up to 32-bit values */
BTF_KIND_FWD = 7, /* Forward */
BTF_KIND_TYPEDEF = 8, /* Typedef */
BTF_KIND_VOLATILE = 9, /* Volatile */
@@ -75,6 +75,8 @@ enum {
BTF_KIND_DATASEC = 15, /* Section */
BTF_KIND_FLOAT = 16, /* Floating point */
BTF_KIND_DECL_TAG = 17, /* Decl Tag */
BTF_KIND_TYPE_TAG = 18, /* Type Tag */
BTF_KIND_ENUM64 = 19, /* Enumeration up to 64-bit values */
NR_BTF_KINDS,
BTF_KIND_MAX = NR_BTF_KINDS - 1,
@@ -185,4 +187,14 @@ struct btf_decl_tag {
__s32 component_idx;
};
/* BTF_KIND_ENUM64 is followed by multiple "struct btf_enum64".
* The exact number of btf_enum64 is stored in the vlen (of the
* info in "struct btf_type").
*/
struct btf_enum64 {
__u32 name_off;
__u32 val_lo32;
__u32 val_hi32;
};
#endif /* _UAPI__LINUX_BTF_H__ */

123
include/uapi/linux/fcntl.h Normal file
View File

@@ -0,0 +1,123 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef _UAPI_LINUX_FCNTL_H
#define _UAPI_LINUX_FCNTL_H
#include <asm/fcntl.h>
#include <linux/openat2.h>
#define F_SETLEASE (F_LINUX_SPECIFIC_BASE + 0)
#define F_GETLEASE (F_LINUX_SPECIFIC_BASE + 1)
/*
* Cancel a blocking posix lock; internal use only until we expose an
* asynchronous lock api to userspace:
*/
#define F_CANCELLK (F_LINUX_SPECIFIC_BASE + 5)
/* Create a file descriptor with FD_CLOEXEC set. */
#define F_DUPFD_CLOEXEC (F_LINUX_SPECIFIC_BASE + 6)
/*
* Request nofications on a directory.
* See below for events that may be notified.
*/
#define F_NOTIFY (F_LINUX_SPECIFIC_BASE+2)
/*
* Set and get of pipe page size array
*/
#define F_SETPIPE_SZ (F_LINUX_SPECIFIC_BASE + 7)
#define F_GETPIPE_SZ (F_LINUX_SPECIFIC_BASE + 8)
/*
* Set/Get seals
*/
#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
/*
* Types of seals
*/
#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
#define F_SEAL_GROW 0x0004 /* prevent file from growing */
#define F_SEAL_WRITE 0x0008 /* prevent writes */
#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */
#define F_SEAL_EXEC 0x0020 /* prevent chmod modifying exec bits */
/* (1U << 31) is reserved for signed error codes */
/*
* Set/Get write life time hints. {GET,SET}_RW_HINT operate on the
* underlying inode, while {GET,SET}_FILE_RW_HINT operate only on
* the specific file.
*/
#define F_GET_RW_HINT (F_LINUX_SPECIFIC_BASE + 11)
#define F_SET_RW_HINT (F_LINUX_SPECIFIC_BASE + 12)
#define F_GET_FILE_RW_HINT (F_LINUX_SPECIFIC_BASE + 13)
#define F_SET_FILE_RW_HINT (F_LINUX_SPECIFIC_BASE + 14)
/*
* Valid hint values for F_{GET,SET}_RW_HINT. 0 is "not set", or can be
* used to clear any hints previously set.
*/
#define RWH_WRITE_LIFE_NOT_SET 0
#define RWH_WRITE_LIFE_NONE 1
#define RWH_WRITE_LIFE_SHORT 2
#define RWH_WRITE_LIFE_MEDIUM 3
#define RWH_WRITE_LIFE_LONG 4
#define RWH_WRITE_LIFE_EXTREME 5
/*
* The originally introduced spelling is remained from the first
* versions of the patch set that introduced the feature, see commit
* v4.13-rc1~212^2~51.
*/
#define RWF_WRITE_LIFE_NOT_SET RWH_WRITE_LIFE_NOT_SET
/*
* Types of directory notifications that may be requested.
*/
#define DN_ACCESS 0x00000001 /* File accessed */
#define DN_MODIFY 0x00000002 /* File modified */
#define DN_CREATE 0x00000004 /* File created */
#define DN_DELETE 0x00000008 /* File removed */
#define DN_RENAME 0x00000010 /* File renamed */
#define DN_ATTRIB 0x00000020 /* File changed attibutes */
#define DN_MULTISHOT 0x80000000 /* Don't remove notifier */
/*
* The constants AT_REMOVEDIR and AT_EACCESS have the same value. AT_EACCESS is
* meaningful only to faccessat, while AT_REMOVEDIR is meaningful only to
* unlinkat. The two functions do completely different things and therefore,
* the flags can be allowed to overlap. For example, passing AT_REMOVEDIR to
* faccessat would be undefined behavior and thus treating it equivalent to
* AT_EACCESS is valid undefined behavior.
*/
#define AT_FDCWD -100 /* Special value used to indicate
openat should use the current
working directory. */
#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */
#define AT_EACCESS 0x200 /* Test access permitted for
effective IDs, not real IDs. */
#define AT_REMOVEDIR 0x200 /* Remove directory instead of
unlinking file. */
#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */
#define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */
#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */
#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */
#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */
#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */
#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
/* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */
#define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to
compare object identity and may not
be usable to open_by_handle_at(2) */
#if defined(__KERNEL__)
#define AT_GETATTR_NOSEC 0x80000000
#endif
#endif /* _UAPI_LINUX_FCNTL_H */

View File

@@ -7,24 +7,23 @@
/* This struct should be in sync with struct rtnl_link_stats64 */
struct rtnl_link_stats {
__u32 rx_packets; /* total packets received */
__u32 tx_packets; /* total packets transmitted */
__u32 rx_bytes; /* total bytes received */
__u32 tx_bytes; /* total bytes transmitted */
__u32 rx_errors; /* bad packets received */
__u32 tx_errors; /* packet transmit problems */
__u32 rx_dropped; /* no space in linux buffers */
__u32 tx_dropped; /* no space available in linux */
__u32 multicast; /* multicast packets received */
__u32 rx_packets;
__u32 tx_packets;
__u32 rx_bytes;
__u32 tx_bytes;
__u32 rx_errors;
__u32 tx_errors;
__u32 rx_dropped;
__u32 tx_dropped;
__u32 multicast;
__u32 collisions;
/* detailed rx_errors: */
__u32 rx_length_errors;
__u32 rx_over_errors; /* receiver ring buff overflow */
__u32 rx_crc_errors; /* recved pkt with crc error */
__u32 rx_frame_errors; /* recv'd frame alignment error */
__u32 rx_fifo_errors; /* recv'r fifo overrun */
__u32 rx_missed_errors; /* receiver missed packet */
__u32 rx_over_errors;
__u32 rx_crc_errors;
__u32 rx_frame_errors;
__u32 rx_fifo_errors;
__u32 rx_missed_errors;
/* detailed tx_errors */
__u32 tx_aborted_errors;
@@ -37,29 +36,204 @@ struct rtnl_link_stats {
__u32 rx_compressed;
__u32 tx_compressed;
__u32 rx_nohandler; /* dropped, no handler found */
__u32 rx_nohandler;
};
/* The main device statistics structure */
/**
* struct rtnl_link_stats64 - The main device statistics structure.
*
* @rx_packets: Number of good packets received by the interface.
* For hardware interfaces counts all good packets received from the device
* by the host, including packets which host had to drop at various stages
* of processing (even in the driver).
*
* @tx_packets: Number of packets successfully transmitted.
* For hardware interfaces counts packets which host was able to successfully
* hand over to the device, which does not necessarily mean that packets
* had been successfully transmitted out of the device, only that device
* acknowledged it copied them out of host memory.
*
* @rx_bytes: Number of good received bytes, corresponding to @rx_packets.
*
* For IEEE 802.3 devices should count the length of Ethernet Frames
* excluding the FCS.
*
* @tx_bytes: Number of good transmitted bytes, corresponding to @tx_packets.
*
* For IEEE 802.3 devices should count the length of Ethernet Frames
* excluding the FCS.
*
* @rx_errors: Total number of bad packets received on this network device.
* This counter must include events counted by @rx_length_errors,
* @rx_crc_errors, @rx_frame_errors and other errors not otherwise
* counted.
*
* @tx_errors: Total number of transmit problems.
* This counter must include events counter by @tx_aborted_errors,
* @tx_carrier_errors, @tx_fifo_errors, @tx_heartbeat_errors,
* @tx_window_errors and other errors not otherwise counted.
*
* @rx_dropped: Number of packets received but not processed,
* e.g. due to lack of resources or unsupported protocol.
* For hardware interfaces this counter may include packets discarded
* due to L2 address filtering but should not include packets dropped
* by the device due to buffer exhaustion which are counted separately in
* @rx_missed_errors (since procfs folds those two counters together).
*
* @tx_dropped: Number of packets dropped on their way to transmission,
* e.g. due to lack of resources.
*
* @multicast: Multicast packets received.
* For hardware interfaces this statistic is commonly calculated
* at the device level (unlike @rx_packets) and therefore may include
* packets which did not reach the host.
*
* For IEEE 802.3 devices this counter may be equivalent to:
*
* - 30.3.1.1.21 aMulticastFramesReceivedOK
*
* @collisions: Number of collisions during packet transmissions.
*
* @rx_length_errors: Number of packets dropped due to invalid length.
* Part of aggregate "frame" errors in `/proc/net/dev`.
*
* For IEEE 802.3 devices this counter should be equivalent to a sum
* of the following attributes:
*
* - 30.3.1.1.23 aInRangeLengthErrors
* - 30.3.1.1.24 aOutOfRangeLengthField
* - 30.3.1.1.25 aFrameTooLongErrors
*
* @rx_over_errors: Receiver FIFO overflow event counter.
*
* Historically the count of overflow events. Such events may be
* reported in the receive descriptors or via interrupts, and may
* not correspond one-to-one with dropped packets.
*
* The recommended interpretation for high speed interfaces is -
* number of packets dropped because they did not fit into buffers
* provided by the host, e.g. packets larger than MTU or next buffer
* in the ring was not available for a scatter transfer.
*
* Part of aggregate "frame" errors in `/proc/net/dev`.
*
* This statistics was historically used interchangeably with
* @rx_fifo_errors.
*
* This statistic corresponds to hardware events and is not commonly used
* on software devices.
*
* @rx_crc_errors: Number of packets received with a CRC error.
* Part of aggregate "frame" errors in `/proc/net/dev`.
*
* For IEEE 802.3 devices this counter must be equivalent to:
*
* - 30.3.1.1.6 aFrameCheckSequenceErrors
*
* @rx_frame_errors: Receiver frame alignment errors.
* Part of aggregate "frame" errors in `/proc/net/dev`.
*
* For IEEE 802.3 devices this counter should be equivalent to:
*
* - 30.3.1.1.7 aAlignmentErrors
*
* @rx_fifo_errors: Receiver FIFO error counter.
*
* Historically the count of overflow events. Those events may be
* reported in the receive descriptors or via interrupts, and may
* not correspond one-to-one with dropped packets.
*
* This statistics was used interchangeably with @rx_over_errors.
* Not recommended for use in drivers for high speed interfaces.
*
* This statistic is used on software devices, e.g. to count software
* packet queue overflow (can) or sequencing errors (GRE).
*
* @rx_missed_errors: Count of packets missed by the host.
* Folded into the "drop" counter in `/proc/net/dev`.
*
* Counts number of packets dropped by the device due to lack
* of buffer space. This usually indicates that the host interface
* is slower than the network interface, or host is not keeping up
* with the receive packet rate.
*
* This statistic corresponds to hardware events and is not used
* on software devices.
*
* @tx_aborted_errors:
* Part of aggregate "carrier" errors in `/proc/net/dev`.
* For IEEE 802.3 devices capable of half-duplex operation this counter
* must be equivalent to:
*
* - 30.3.1.1.11 aFramesAbortedDueToXSColls
*
* High speed interfaces may use this counter as a general device
* discard counter.
*
* @tx_carrier_errors: Number of frame transmission errors due to loss
* of carrier during transmission.
* Part of aggregate "carrier" errors in `/proc/net/dev`.
*
* For IEEE 802.3 devices this counter must be equivalent to:
*
* - 30.3.1.1.13 aCarrierSenseErrors
*
* @tx_fifo_errors: Number of frame transmission errors due to device
* FIFO underrun / underflow. This condition occurs when the device
* begins transmission of a frame but is unable to deliver the
* entire frame to the transmitter in time for transmission.
* Part of aggregate "carrier" errors in `/proc/net/dev`.
*
* @tx_heartbeat_errors: Number of Heartbeat / SQE Test errors for
* old half-duplex Ethernet.
* Part of aggregate "carrier" errors in `/proc/net/dev`.
*
* For IEEE 802.3 devices possibly equivalent to:
*
* - 30.3.2.1.4 aSQETestErrors
*
* @tx_window_errors: Number of frame transmission errors due
* to late collisions (for Ethernet - after the first 64B of transmission).
* Part of aggregate "carrier" errors in `/proc/net/dev`.
*
* For IEEE 802.3 devices this counter must be equivalent to:
*
* - 30.3.1.1.10 aLateCollisions
*
* @rx_compressed: Number of correctly received compressed packets.
* This counters is only meaningful for interfaces which support
* packet compression (e.g. CSLIP, PPP).
*
* @tx_compressed: Number of transmitted compressed packets.
* This counters is only meaningful for interfaces which support
* packet compression (e.g. CSLIP, PPP).
*
* @rx_nohandler: Number of packets received on the interface
* but dropped by the networking stack because the device is
* not designated to receive packets (e.g. backup link in a bond).
*
* @rx_otherhost_dropped: Number of packets dropped due to mismatch
* in destination MAC address.
*/
struct rtnl_link_stats64 {
__u64 rx_packets; /* total packets received */
__u64 tx_packets; /* total packets transmitted */
__u64 rx_bytes; /* total bytes received */
__u64 tx_bytes; /* total bytes transmitted */
__u64 rx_errors; /* bad packets received */
__u64 tx_errors; /* packet transmit problems */
__u64 rx_dropped; /* no space in linux buffers */
__u64 tx_dropped; /* no space available in linux */
__u64 multicast; /* multicast packets received */
__u64 rx_packets;
__u64 tx_packets;
__u64 rx_bytes;
__u64 tx_bytes;
__u64 rx_errors;
__u64 tx_errors;
__u64 rx_dropped;
__u64 tx_dropped;
__u64 multicast;
__u64 collisions;
/* detailed rx_errors: */
__u64 rx_length_errors;
__u64 rx_over_errors; /* receiver ring buff overflow */
__u64 rx_crc_errors; /* recved pkt with crc error */
__u64 rx_frame_errors; /* recv'd frame alignment error */
__u64 rx_fifo_errors; /* recv'r fifo overrun */
__u64 rx_missed_errors; /* receiver missed packet */
__u64 rx_over_errors;
__u64 rx_crc_errors;
__u64 rx_frame_errors;
__u64 rx_fifo_errors;
__u64 rx_missed_errors;
/* detailed tx_errors */
__u64 tx_aborted_errors;
@@ -71,8 +245,24 @@ struct rtnl_link_stats64 {
/* for cslip etc */
__u64 rx_compressed;
__u64 tx_compressed;
__u64 rx_nohandler;
__u64 rx_nohandler; /* dropped, no handler found */
__u64 rx_otherhost_dropped;
};
/* Subset of link stats useful for in-HW collection. Meaning of the fields is as
* for struct rtnl_link_stats64.
*/
struct rtnl_hw_stats64 {
__u64 rx_packets;
__u64 tx_packets;
__u64 rx_bytes;
__u64 tx_bytes;
__u64 rx_errors;
__u64 tx_errors;
__u64 rx_dropped;
__u64 tx_dropped;
__u64 multicast;
};
/* The struct should be in sync with struct ifmap */
@@ -170,12 +360,38 @@ enum {
IFLA_PROP_LIST,
IFLA_ALT_IFNAME, /* Alternative ifname */
IFLA_PERM_ADDRESS,
IFLA_PROTO_DOWN_REASON,
/* device (sysfs) name as parent, used instead
* of IFLA_LINK where there's no parent netdev
*/
IFLA_PARENT_DEV_NAME,
IFLA_PARENT_DEV_BUS_NAME,
IFLA_GRO_MAX_SIZE,
IFLA_TSO_MAX_SIZE,
IFLA_TSO_MAX_SEGS,
IFLA_ALLMULTI, /* Allmulti count: > 0 means acts ALLMULTI */
IFLA_DEVLINK_PORT,
IFLA_GSO_IPV4_MAX_SIZE,
IFLA_GRO_IPV4_MAX_SIZE,
IFLA_DPLL_PIN,
__IFLA_MAX
};
#define IFLA_MAX (__IFLA_MAX - 1)
enum {
IFLA_PROTO_DOWN_REASON_UNSPEC,
IFLA_PROTO_DOWN_REASON_MASK, /* u32, mask for reason bits */
IFLA_PROTO_DOWN_REASON_VALUE, /* u32, reason bit value */
__IFLA_PROTO_DOWN_REASON_CNT,
IFLA_PROTO_DOWN_REASON_MAX = __IFLA_PROTO_DOWN_REASON_CNT - 1
};
/* backwards compatibility for userspace */
#ifndef __KERNEL__
#define IFLA_RTA(r) ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct ifinfomsg))))
@@ -293,6 +509,7 @@ enum {
IFLA_BR_MCAST_MLD_VERSION,
IFLA_BR_VLAN_STATS_PER_PORT,
IFLA_BR_MULTI_BOOLOPT,
IFLA_BR_MCAST_QUERIER_STATE,
__IFLA_BR_MAX,
};
@@ -346,6 +563,14 @@ enum {
IFLA_BRPORT_BACKUP_PORT,
IFLA_BRPORT_MRP_RING_OPEN,
IFLA_BRPORT_MRP_IN_OPEN,
IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT,
IFLA_BRPORT_MCAST_EHT_HOSTS_CNT,
IFLA_BRPORT_LOCKED,
IFLA_BRPORT_MAB,
IFLA_BRPORT_MCAST_N_GROUPS,
IFLA_BRPORT_MCAST_MAX_GROUPS,
IFLA_BRPORT_NEIGH_VLAN_SUPPRESS,
IFLA_BRPORT_BACKUP_NHID,
__IFLA_BRPORT_MAX
};
#define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
@@ -412,6 +637,7 @@ enum {
IFLA_MACVLAN_MACADDR_COUNT,
IFLA_MACVLAN_BC_QUEUE_LEN,
IFLA_MACVLAN_BC_QUEUE_LEN_USED,
IFLA_MACVLAN_BC_CUTOFF,
__IFLA_MACVLAN_MAX,
};
@@ -433,6 +659,7 @@ enum macvlan_macaddr_mode {
};
#define MACVLAN_FLAG_NOPROMISC 1
#define MACVLAN_FLAG_NODST 2 /* skip dst macvlan if matching src macvlan */
/* VRF section */
enum {
@@ -479,6 +706,7 @@ enum {
IFLA_XFRM_UNSPEC,
IFLA_XFRM_LINK,
IFLA_XFRM_IF_ID,
IFLA_XFRM_COLLECT_METADATA,
__IFLA_XFRM_MAX
};
@@ -520,7 +748,79 @@ enum ipvlan_mode {
#define IPVLAN_F_PRIVATE 0x01
#define IPVLAN_F_VEPA 0x02
/* Tunnel RTM header */
struct tunnel_msg {
__u8 family;
__u8 flags;
__u16 reserved2;
__u32 ifindex;
};
/* netkit section */
enum netkit_action {
NETKIT_NEXT = -1,
NETKIT_PASS = 0,
NETKIT_DROP = 2,
NETKIT_REDIRECT = 7,
};
enum netkit_mode {
NETKIT_L2,
NETKIT_L3,
};
enum {
IFLA_NETKIT_UNSPEC,
IFLA_NETKIT_PEER_INFO,
IFLA_NETKIT_PRIMARY,
IFLA_NETKIT_POLICY,
IFLA_NETKIT_PEER_POLICY,
IFLA_NETKIT_MODE,
__IFLA_NETKIT_MAX,
};
#define IFLA_NETKIT_MAX (__IFLA_NETKIT_MAX - 1)
/* VXLAN section */
/* include statistics in the dump */
#define TUNNEL_MSG_FLAG_STATS 0x01
#define TUNNEL_MSG_VALID_USER_FLAGS TUNNEL_MSG_FLAG_STATS
/* Embedded inside VXLAN_VNIFILTER_ENTRY_STATS */
enum {
VNIFILTER_ENTRY_STATS_UNSPEC,
VNIFILTER_ENTRY_STATS_RX_BYTES,
VNIFILTER_ENTRY_STATS_RX_PKTS,
VNIFILTER_ENTRY_STATS_RX_DROPS,
VNIFILTER_ENTRY_STATS_RX_ERRORS,
VNIFILTER_ENTRY_STATS_TX_BYTES,
VNIFILTER_ENTRY_STATS_TX_PKTS,
VNIFILTER_ENTRY_STATS_TX_DROPS,
VNIFILTER_ENTRY_STATS_TX_ERRORS,
VNIFILTER_ENTRY_STATS_PAD,
__VNIFILTER_ENTRY_STATS_MAX
};
#define VNIFILTER_ENTRY_STATS_MAX (__VNIFILTER_ENTRY_STATS_MAX - 1)
enum {
VXLAN_VNIFILTER_ENTRY_UNSPEC,
VXLAN_VNIFILTER_ENTRY_START,
VXLAN_VNIFILTER_ENTRY_END,
VXLAN_VNIFILTER_ENTRY_GROUP,
VXLAN_VNIFILTER_ENTRY_GROUP6,
VXLAN_VNIFILTER_ENTRY_STATS,
__VXLAN_VNIFILTER_ENTRY_MAX
};
#define VXLAN_VNIFILTER_ENTRY_MAX (__VXLAN_VNIFILTER_ENTRY_MAX - 1)
enum {
VXLAN_VNIFILTER_UNSPEC,
VXLAN_VNIFILTER_ENTRY,
__VXLAN_VNIFILTER_MAX
};
#define VXLAN_VNIFILTER_MAX (__VXLAN_VNIFILTER_MAX - 1)
enum {
IFLA_VXLAN_UNSPEC,
IFLA_VXLAN_ID,
@@ -552,6 +852,8 @@ enum {
IFLA_VXLAN_GPE,
IFLA_VXLAN_TTL_INHERIT,
IFLA_VXLAN_DF,
IFLA_VXLAN_VNIFILTER, /* only applicable with COLLECT_METADATA mode */
IFLA_VXLAN_LOCALBYPASS,
__IFLA_VXLAN_MAX
};
#define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
@@ -585,6 +887,7 @@ enum {
IFLA_GENEVE_LABEL,
IFLA_GENEVE_TTL_INHERIT,
IFLA_GENEVE_DF,
IFLA_GENEVE_INNER_PROTO_INHERIT,
__IFLA_GENEVE_MAX
};
#define IFLA_GENEVE_MAX (__IFLA_GENEVE_MAX - 1)
@@ -597,6 +900,18 @@ enum ifla_geneve_df {
GENEVE_DF_MAX = __GENEVE_DF_END - 1,
};
/* Bareudp section */
enum {
IFLA_BAREUDP_UNSPEC,
IFLA_BAREUDP_PORT,
IFLA_BAREUDP_ETHERTYPE,
IFLA_BAREUDP_SRCPORT_MIN,
IFLA_BAREUDP_MULTIPROTO_MODE,
__IFLA_BAREUDP_MAX
};
#define IFLA_BAREUDP_MAX (__IFLA_BAREUDP_MAX - 1)
/* PPP section */
enum {
IFLA_PPP_UNSPEC,
@@ -618,6 +933,8 @@ enum {
IFLA_GTP_FD1,
IFLA_GTP_PDP_HASHSIZE,
IFLA_GTP_ROLE,
IFLA_GTP_CREATE_SOCKETS,
IFLA_GTP_RESTART_COUNT,
__IFLA_GTP_MAX,
};
#define IFLA_GTP_MAX (__IFLA_GTP_MAX - 1)
@@ -655,6 +972,9 @@ enum {
IFLA_BOND_TLB_DYNAMIC_LB,
IFLA_BOND_PEER_NOTIF_DELAY,
IFLA_BOND_AD_LACP_ACTIVE,
IFLA_BOND_MISSED_MAX,
IFLA_BOND_NS_IP6_TARGET,
IFLA_BOND_COUPLED_CONTROL,
__IFLA_BOND_MAX,
};
@@ -682,6 +1002,7 @@ enum {
IFLA_BOND_SLAVE_AD_AGGREGATOR_ID,
IFLA_BOND_SLAVE_AD_ACTOR_OPER_PORT_STATE,
IFLA_BOND_SLAVE_AD_PARTNER_OPER_PORT_STATE,
IFLA_BOND_SLAVE_PRIO,
__IFLA_BOND_SLAVE_MAX,
};
@@ -899,7 +1220,14 @@ enum {
#define IFLA_IPOIB_MAX (__IFLA_IPOIB_MAX - 1)
/* HSR section */
/* HSR/PRP section, both uses same interface */
/* Different redundancy protocols for hsr device */
enum {
HSR_PROTOCOL_HSR,
HSR_PROTOCOL_PRP,
HSR_PROTOCOL_MAX,
};
enum {
IFLA_HSR_UNSPEC,
@@ -909,6 +1237,9 @@ enum {
IFLA_HSR_SUPERVISION_ADDR, /* Supervision frame multicast addr */
IFLA_HSR_SEQ_NR,
IFLA_HSR_VERSION, /* HSR version */
IFLA_HSR_PROTOCOL, /* Indicate different protocol than
* HSR. For example PRP.
*/
__IFLA_HSR_MAX,
};
@@ -941,6 +1272,17 @@ enum {
#define IFLA_STATS_FILTER_BIT(ATTR) (1 << (ATTR - 1))
enum {
IFLA_STATS_GETSET_UNSPEC,
IFLA_STATS_GET_FILTERS, /* Nest of IFLA_STATS_LINK_xxx, each a u32 with
* a filter mask for the corresponding group.
*/
IFLA_STATS_SET_OFFLOAD_XSTATS_L3_STATS, /* 0 or 1 as u8 */
__IFLA_STATS_GETSET_MAX,
};
#define IFLA_STATS_GETSET_MAX (__IFLA_STATS_GETSET_MAX - 1)
/* These are embedded into IFLA_STATS_LINK_XSTATS:
* [IFLA_STATS_LINK_XSTATS]
* -> [LINK_XSTATS_TYPE_xxx]
@@ -958,10 +1300,21 @@ enum {
enum {
IFLA_OFFLOAD_XSTATS_UNSPEC,
IFLA_OFFLOAD_XSTATS_CPU_HIT, /* struct rtnl_link_stats64 */
IFLA_OFFLOAD_XSTATS_HW_S_INFO, /* HW stats info. A nest */
IFLA_OFFLOAD_XSTATS_L3_STATS, /* struct rtnl_hw_stats64 */
__IFLA_OFFLOAD_XSTATS_MAX
};
#define IFLA_OFFLOAD_XSTATS_MAX (__IFLA_OFFLOAD_XSTATS_MAX - 1)
enum {
IFLA_OFFLOAD_XSTATS_HW_S_INFO_UNSPEC,
IFLA_OFFLOAD_XSTATS_HW_S_INFO_REQUEST, /* u8 */
IFLA_OFFLOAD_XSTATS_HW_S_INFO_USED, /* u8 */
__IFLA_OFFLOAD_XSTATS_HW_S_INFO_MAX,
};
#define IFLA_OFFLOAD_XSTATS_HW_S_INFO_MAX \
(__IFLA_OFFLOAD_XSTATS_HW_S_INFO_MAX - 1)
/* XDP section */
#define XDP_FLAGS_UPDATE_IF_NOEXIST (1U << 0)
@@ -1033,6 +1386,8 @@ enum {
#define RMNET_FLAGS_INGRESS_MAP_COMMANDS (1U << 1)
#define RMNET_FLAGS_INGRESS_MAP_CKSUMV4 (1U << 2)
#define RMNET_FLAGS_EGRESS_MAP_CKSUMV4 (1U << 3)
#define RMNET_FLAGS_INGRESS_MAP_CKSUMV5 (1U << 4)
#define RMNET_FLAGS_EGRESS_MAP_CKSUMV5 (1U << 5)
enum {
IFLA_RMNET_UNSPEC,
@@ -1048,4 +1403,24 @@ struct ifla_rmnet_flags {
__u32 mask;
};
/* MCTP section */
enum {
IFLA_MCTP_UNSPEC,
IFLA_MCTP_NET,
__IFLA_MCTP_MAX,
};
#define IFLA_MCTP_MAX (__IFLA_MCTP_MAX - 1)
/* DSA section */
enum {
IFLA_DSA_UNSPEC,
IFLA_DSA_MASTER,
__IFLA_DSA_MAX,
};
#define IFLA_DSA_MAX (__IFLA_DSA_MAX - 1)
#endif /* _UAPI_LINUX_IF_LINK_H */

View File

@@ -25,9 +25,21 @@
* application.
*/
#define XDP_USE_NEED_WAKEUP (1 << 3)
/* By setting this option, userspace application indicates that it can
* handle multiple descriptors per packet thus enabling AF_XDP to split
* multi-buffer XDP frames into multiple Rx descriptors. Without this set
* such frames will be dropped.
*/
#define XDP_USE_SG (1 << 4)
/* Flags for xsk_umem_config flags */
#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
/* Force checksum calculation in software. Can be used for testing or
* working around potential HW issues. This option causes performance
* degradation and only works in XDP_COPY mode.
*/
#define XDP_UMEM_TX_SW_CSUM (1 << 1)
struct sockaddr_xdp {
__u16 sxdp_family;
@@ -70,6 +82,7 @@ struct xdp_umem_reg {
__u32 chunk_size;
__u32 headroom;
__u32 flags;
__u32 tx_metadata_len;
};
struct xdp_statistics {
@@ -99,6 +112,41 @@ struct xdp_options {
#define XSK_UNALIGNED_BUF_ADDR_MASK \
((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
/* Request transmit timestamp. Upon completion, put it into tx_timestamp
* field of union xsk_tx_metadata.
*/
#define XDP_TXMD_FLAGS_TIMESTAMP (1 << 0)
/* Request transmit checksum offload. Checksum start position and offset
* are communicated via csum_start and csum_offset fields of union
* xsk_tx_metadata.
*/
#define XDP_TXMD_FLAGS_CHECKSUM (1 << 1)
/* AF_XDP offloads request. 'request' union member is consumed by the driver
* when the packet is being transmitted. 'completion' union member is
* filled by the driver when the transmit completion arrives.
*/
struct xsk_tx_metadata {
__u64 flags;
union {
struct {
/* XDP_TXMD_FLAGS_CHECKSUM */
/* Offset from desc->addr where checksumming should start. */
__u16 csum_start;
/* Offset from csum_start where checksum should be stored. */
__u16 csum_offset;
} request;
struct {
/* XDP_TXMD_FLAGS_TIMESTAMP */
__u64 tx_timestamp;
} completion;
};
};
/* Rx/Tx descriptor */
struct xdp_desc {
__u64 addr;
@@ -108,4 +156,14 @@ struct xdp_desc {
/* UMEM descriptor is __u64 */
/* Flag indicating that the packet continues with the buffer pointed out by the
* next frame in the ring. The end of the packet is signalled by setting this
* bit to zero. For single buffer packets, every descriptor has 'options' set
* to 0 and this maintains backward compatibility.
*/
#define XDP_PKT_CONTD (1 << 0)
/* TX packet carries valid metadata. */
#define XDP_TX_METADATA (1 << 1)
#endif /* _LINUX_IF_XDP_H */

175
include/uapi/linux/netdev.h Normal file
View File

@@ -0,0 +1,175 @@
/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
/* Do not edit directly, auto-generated from: */
/* Documentation/netlink/specs/netdev.yaml */
/* YNL-GEN uapi header */
#ifndef _UAPI_LINUX_NETDEV_H
#define _UAPI_LINUX_NETDEV_H
#define NETDEV_FAMILY_NAME "netdev"
#define NETDEV_FAMILY_VERSION 1
/**
* enum netdev_xdp_act
* @NETDEV_XDP_ACT_BASIC: XDP features set supported by all drivers
* (XDP_ABORTED, XDP_DROP, XDP_PASS, XDP_TX)
* @NETDEV_XDP_ACT_REDIRECT: The netdev supports XDP_REDIRECT
* @NETDEV_XDP_ACT_NDO_XMIT: This feature informs if netdev implements
* ndo_xdp_xmit callback.
* @NETDEV_XDP_ACT_XSK_ZEROCOPY: This feature informs if netdev supports AF_XDP
* in zero copy mode.
* @NETDEV_XDP_ACT_HW_OFFLOAD: This feature informs if netdev supports XDP hw
* offloading.
* @NETDEV_XDP_ACT_RX_SG: This feature informs if netdev implements non-linear
* XDP buffer support in the driver napi callback.
* @NETDEV_XDP_ACT_NDO_XMIT_SG: This feature informs if netdev implements
* non-linear XDP buffer support in ndo_xdp_xmit callback.
*/
enum netdev_xdp_act {
NETDEV_XDP_ACT_BASIC = 1,
NETDEV_XDP_ACT_REDIRECT = 2,
NETDEV_XDP_ACT_NDO_XMIT = 4,
NETDEV_XDP_ACT_XSK_ZEROCOPY = 8,
NETDEV_XDP_ACT_HW_OFFLOAD = 16,
NETDEV_XDP_ACT_RX_SG = 32,
NETDEV_XDP_ACT_NDO_XMIT_SG = 64,
/* private: */
NETDEV_XDP_ACT_MASK = 127,
};
/**
* enum netdev_xdp_rx_metadata
* @NETDEV_XDP_RX_METADATA_TIMESTAMP: Device is capable of exposing receive HW
* timestamp via bpf_xdp_metadata_rx_timestamp().
* @NETDEV_XDP_RX_METADATA_HASH: Device is capable of exposing receive packet
* hash via bpf_xdp_metadata_rx_hash().
* @NETDEV_XDP_RX_METADATA_VLAN_TAG: Device is capable of exposing receive
* packet VLAN tag via bpf_xdp_metadata_rx_vlan_tag().
*/
enum netdev_xdp_rx_metadata {
NETDEV_XDP_RX_METADATA_TIMESTAMP = 1,
NETDEV_XDP_RX_METADATA_HASH = 2,
NETDEV_XDP_RX_METADATA_VLAN_TAG = 4,
};
/**
* enum netdev_xsk_flags
* @NETDEV_XSK_FLAGS_TX_TIMESTAMP: HW timestamping egress packets is supported
* by the driver.
* @NETDEV_XSK_FLAGS_TX_CHECKSUM: L3 checksum HW offload is supported by the
* driver.
*/
enum netdev_xsk_flags {
NETDEV_XSK_FLAGS_TX_TIMESTAMP = 1,
NETDEV_XSK_FLAGS_TX_CHECKSUM = 2,
};
enum netdev_queue_type {
NETDEV_QUEUE_TYPE_RX,
NETDEV_QUEUE_TYPE_TX,
};
enum netdev_qstats_scope {
NETDEV_QSTATS_SCOPE_QUEUE = 1,
};
enum {
NETDEV_A_DEV_IFINDEX = 1,
NETDEV_A_DEV_PAD,
NETDEV_A_DEV_XDP_FEATURES,
NETDEV_A_DEV_XDP_ZC_MAX_SEGS,
NETDEV_A_DEV_XDP_RX_METADATA_FEATURES,
NETDEV_A_DEV_XSK_FEATURES,
__NETDEV_A_DEV_MAX,
NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)
};
enum {
NETDEV_A_PAGE_POOL_ID = 1,
NETDEV_A_PAGE_POOL_IFINDEX,
NETDEV_A_PAGE_POOL_NAPI_ID,
NETDEV_A_PAGE_POOL_INFLIGHT,
NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
NETDEV_A_PAGE_POOL_DETACH_TIME,
__NETDEV_A_PAGE_POOL_MAX,
NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
};
enum {
NETDEV_A_PAGE_POOL_STATS_INFO = 1,
NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST = 8,
NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW,
NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER,
NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY,
NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL,
NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING_FULL,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_RELEASED_REFCNT,
__NETDEV_A_PAGE_POOL_STATS_MAX,
NETDEV_A_PAGE_POOL_STATS_MAX = (__NETDEV_A_PAGE_POOL_STATS_MAX - 1)
};
enum {
NETDEV_A_NAPI_IFINDEX = 1,
NETDEV_A_NAPI_ID,
NETDEV_A_NAPI_IRQ,
NETDEV_A_NAPI_PID,
__NETDEV_A_NAPI_MAX,
NETDEV_A_NAPI_MAX = (__NETDEV_A_NAPI_MAX - 1)
};
enum {
NETDEV_A_QUEUE_ID = 1,
NETDEV_A_QUEUE_IFINDEX,
NETDEV_A_QUEUE_TYPE,
NETDEV_A_QUEUE_NAPI_ID,
__NETDEV_A_QUEUE_MAX,
NETDEV_A_QUEUE_MAX = (__NETDEV_A_QUEUE_MAX - 1)
};
enum {
NETDEV_A_QSTATS_IFINDEX = 1,
NETDEV_A_QSTATS_QUEUE_TYPE,
NETDEV_A_QSTATS_QUEUE_ID,
NETDEV_A_QSTATS_SCOPE,
NETDEV_A_QSTATS_RX_PACKETS = 8,
NETDEV_A_QSTATS_RX_BYTES,
NETDEV_A_QSTATS_TX_PACKETS,
NETDEV_A_QSTATS_TX_BYTES,
NETDEV_A_QSTATS_RX_ALLOC_FAIL,
__NETDEV_A_QSTATS_MAX,
NETDEV_A_QSTATS_MAX = (__NETDEV_A_QSTATS_MAX - 1)
};
enum {
NETDEV_CMD_DEV_GET = 1,
NETDEV_CMD_DEV_ADD_NTF,
NETDEV_CMD_DEV_DEL_NTF,
NETDEV_CMD_DEV_CHANGE_NTF,
NETDEV_CMD_PAGE_POOL_GET,
NETDEV_CMD_PAGE_POOL_ADD_NTF,
NETDEV_CMD_PAGE_POOL_DEL_NTF,
NETDEV_CMD_PAGE_POOL_CHANGE_NTF,
NETDEV_CMD_PAGE_POOL_STATS_GET,
NETDEV_CMD_QUEUE_GET,
NETDEV_CMD_NAPI_GET,
NETDEV_CMD_QSTATS_GET,
__NETDEV_CMD_MAX,
NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
};
#define NETDEV_MCGRP_MGMT "mgmt"
#define NETDEV_MCGRP_PAGE_POOL "page-pool"
#endif /* _UAPI_LINUX_NETDEV_H */

View File

@@ -0,0 +1,43 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef _UAPI_LINUX_OPENAT2_H
#define _UAPI_LINUX_OPENAT2_H
#include <linux/types.h>
/*
* Arguments for how openat2(2) should open the target path. If only @flags and
* @mode are non-zero, then openat2(2) operates very similarly to openat(2).
*
* However, unlike openat(2), unknown or invalid bits in @flags result in
* -EINVAL rather than being silently ignored. @mode must be zero unless one of
* {O_CREAT, O_TMPFILE} are set.
*
* @flags: O_* flags.
* @mode: O_CREAT/O_TMPFILE file mode.
* @resolve: RESOLVE_* flags.
*/
struct open_how {
__u64 flags;
__u64 mode;
__u64 resolve;
};
/* how->resolve flags for openat2(2). */
#define RESOLVE_NO_XDEV 0x01 /* Block mount-point crossings
(includes bind-mounts). */
#define RESOLVE_NO_MAGICLINKS 0x02 /* Block traversal through procfs-style
"magic-links". */
#define RESOLVE_NO_SYMLINKS 0x04 /* Block traversal through all symlinks
(implies OEXT_NO_MAGICLINKS) */
#define RESOLVE_BENEATH 0x08 /* Block "lexical" trickery like
"..", symlinks, and absolute
paths which escape the dirfd. */
#define RESOLVE_IN_ROOT 0x10 /* Make all jumps to "/" and ".."
be scoped inside the dirfd
(similar to chroot(2)). */
#define RESOLVE_CACHED 0x20 /* Only complete if resolution can be
completed through cached lookup. May
return -EAGAIN if that's not
possible. */
#endif /* _UAPI_LINUX_OPENAT2_H */

File diff suppressed because it is too large Load Diff

View File

@@ -180,7 +180,7 @@ struct tc_u32_sel {
short hoff;
__be32 hmask;
struct tc_u32_key keys[0];
struct tc_u32_key keys[];
};
struct tc_u32_mark {
@@ -192,7 +192,7 @@ struct tc_u32_mark {
struct tc_u32_pcnt {
__u64 rcnt;
__u64 rhit;
__u64 kcnts[0];
__u64 kcnts[];
};
/* Flags */
@@ -204,37 +204,6 @@ struct tc_u32_pcnt {
#define TC_U32_MAXDEPTH 8
/* RSVP filter */
enum {
TCA_RSVP_UNSPEC,
TCA_RSVP_CLASSID,
TCA_RSVP_DST,
TCA_RSVP_SRC,
TCA_RSVP_PINFO,
TCA_RSVP_POLICE,
TCA_RSVP_ACT,
__TCA_RSVP_MAX
};
#define TCA_RSVP_MAX (__TCA_RSVP_MAX - 1 )
struct tc_rsvp_gpi {
__u32 key;
__u32 mask;
int offset;
};
struct tc_rsvp_pinfo {
struct tc_rsvp_gpi dpi;
struct tc_rsvp_gpi spi;
__u8 protocol;
__u8 tunnelid;
__u8 tunnelhdr;
__u8 pad;
};
/* ROUTE filter */
enum {
@@ -265,22 +234,6 @@ enum {
#define TCA_FW_MAX (__TCA_FW_MAX - 1)
/* TC index filter */
enum {
TCA_TCINDEX_UNSPEC,
TCA_TCINDEX_HASH,
TCA_TCINDEX_MASK,
TCA_TCINDEX_SHIFT,
TCA_TCINDEX_FALL_THROUGH,
TCA_TCINDEX_CLASSID,
TCA_TCINDEX_POLICE,
TCA_TCINDEX_ACT,
__TCA_TCINDEX_MAX
};
#define TCA_TCINDEX_MAX (__TCA_TCINDEX_MAX - 1)
/* Flow filter */
enum {

View File

@@ -457,115 +457,6 @@ enum {
#define TCA_HFSC_MAX (__TCA_HFSC_MAX - 1)
/* CBQ section */
#define TC_CBQ_MAXPRIO 8
#define TC_CBQ_MAXLEVEL 8
#define TC_CBQ_DEF_EWMA 5
struct tc_cbq_lssopt {
unsigned char change;
unsigned char flags;
#define TCF_CBQ_LSS_BOUNDED 1
#define TCF_CBQ_LSS_ISOLATED 2
unsigned char ewma_log;
unsigned char level;
#define TCF_CBQ_LSS_FLAGS 1
#define TCF_CBQ_LSS_EWMA 2
#define TCF_CBQ_LSS_MAXIDLE 4
#define TCF_CBQ_LSS_MINIDLE 8
#define TCF_CBQ_LSS_OFFTIME 0x10
#define TCF_CBQ_LSS_AVPKT 0x20
__u32 maxidle;
__u32 minidle;
__u32 offtime;
__u32 avpkt;
};
struct tc_cbq_wrropt {
unsigned char flags;
unsigned char priority;
unsigned char cpriority;
unsigned char __reserved;
__u32 allot;
__u32 weight;
};
struct tc_cbq_ovl {
unsigned char strategy;
#define TC_CBQ_OVL_CLASSIC 0
#define TC_CBQ_OVL_DELAY 1
#define TC_CBQ_OVL_LOWPRIO 2
#define TC_CBQ_OVL_DROP 3
#define TC_CBQ_OVL_RCLASSIC 4
unsigned char priority2;
__u16 pad;
__u32 penalty;
};
struct tc_cbq_police {
unsigned char police;
unsigned char __res1;
unsigned short __res2;
};
struct tc_cbq_fopt {
__u32 split;
__u32 defmap;
__u32 defchange;
};
struct tc_cbq_xstats {
__u32 borrows;
__u32 overactions;
__s32 avgidle;
__s32 undertime;
};
enum {
TCA_CBQ_UNSPEC,
TCA_CBQ_LSSOPT,
TCA_CBQ_WRROPT,
TCA_CBQ_FOPT,
TCA_CBQ_OVL_STRATEGY,
TCA_CBQ_RATE,
TCA_CBQ_RTAB,
TCA_CBQ_POLICE,
__TCA_CBQ_MAX,
};
#define TCA_CBQ_MAX (__TCA_CBQ_MAX - 1)
/* dsmark section */
enum {
TCA_DSMARK_UNSPEC,
TCA_DSMARK_INDICES,
TCA_DSMARK_DEFAULT_INDEX,
TCA_DSMARK_SET_TC_INDEX,
TCA_DSMARK_MASK,
TCA_DSMARK_VALUE,
__TCA_DSMARK_MAX,
};
#define TCA_DSMARK_MAX (__TCA_DSMARK_MAX - 1)
/* ATM section */
enum {
TCA_ATM_UNSPEC,
TCA_ATM_FD, /* file/socket descriptor */
TCA_ATM_PTR, /* pointer to descriptor - later */
TCA_ATM_HDR, /* LL header */
TCA_ATM_EXCESS, /* excess traffic class (0 for CLP) */
TCA_ATM_ADDR, /* PVC address (for output only) */
TCA_ATM_STATE, /* VC state (ATM_VS_*; for output only) */
__TCA_ATM_MAX,
};
#define TCA_ATM_MAX (__TCA_ATM_MAX - 1)
/* Network emulator */
enum {

82
scripts/build-fuzzers.sh Executable file
View File

@@ -0,0 +1,82 @@
#!/bin/bash
set -eux
SANITIZER=${SANITIZER:-address}
flags="-O1 -fno-omit-frame-pointer -g -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION -fsanitize=$SANITIZER -fsanitize=fuzzer-no-link"
export CC=${CC:-clang}
export CFLAGS=${CFLAGS:-$flags}
export CXX=${CXX:-clang++}
export CXXFLAGS=${CXXFLAGS:-$flags}
cd "$(dirname -- "$0")/.."
export OUT=${OUT:-"$(pwd)/out"}
mkdir -p "$OUT"
export LIB_FUZZING_ENGINE=${LIB_FUZZING_ENGINE:--fsanitize=fuzzer}
# libelf is compiled with _FORTIFY_SOURCE by default and it
# isn't compatible with MSan. It was borrowed
# from https://github.com/google/oss-fuzz/pull/7422
if [[ "$SANITIZER" == memory ]]; then
CFLAGS+=" -U_FORTIFY_SOURCE"
CXXFLAGS+=" -U_FORTIFY_SOURCE"
fi
# The alignment check is turned off by default on OSS-Fuzz/CFLite so it should be
# turned on explicitly there. It was borrowed from
# https://github.com/google/oss-fuzz/pull/7092
if [[ "$SANITIZER" == undefined ]]; then
additional_ubsan_checks=alignment
UBSAN_FLAGS="-fsanitize=$additional_ubsan_checks -fno-sanitize-recover=$additional_ubsan_checks"
CFLAGS+=" $UBSAN_FLAGS"
CXXFLAGS+=" $UBSAN_FLAGS"
fi
# Ideally libbelf should be built using release tarballs available
# at https://sourceware.org/elfutils/ftp/. Unfortunately sometimes they
# fail to compile (for example, elfutils-0.185 fails to compile with LDFLAGS enabled
# due to https://bugs.gentoo.org/794601) so let's just point the script to
# commits referring to versions of libelf that actually can be built
rm -rf elfutils
git clone https://sourceware.org/git/elfutils.git
(
cd elfutils
git checkout 67a187d4c1790058fc7fd218317851cb68bb087c
git log --oneline -1
# ASan isn't compatible with -Wl,--no-undefined: https://github.com/google/sanitizers/issues/380
sed -i 's/^\(NO_UNDEFINED=\).*/\1/' configure.ac
# ASan isn't compatible with -Wl,-z,defs either:
# https://clang.llvm.org/docs/AddressSanitizer.html#usage
sed -i 's/^\(ZDEFS_LDFLAGS=\).*/\1/' configure.ac
if [[ "$SANITIZER" == undefined ]]; then
# That's basicaly what --enable-sanitize-undefined does to turn off unaligned access
# elfutils heavily relies on on i386/x86_64 but without changing compiler flags along the way
sed -i 's/\(check_undefined_val\)=[0-9]/\1=1/' configure.ac
fi
autoreconf -i -f
if ! ./configure --enable-maintainer-mode --disable-debuginfod --disable-libdebuginfod \
--disable-demangler --without-bzlib --without-lzma --without-zstd \
CC="$CC" CFLAGS="-Wno-error $CFLAGS" CXX="$CXX" CXXFLAGS="-Wno-error $CXXFLAGS" LDFLAGS="$CFLAGS"; then
cat config.log
exit 1
fi
make -C config -j$(nproc) V=1
make -C lib -j$(nproc) V=1
make -C libelf -j$(nproc) V=1
)
make -C src BUILD_STATIC_ONLY=y V=1 clean
make -C src -j$(nproc) CFLAGS="-I$(pwd)/elfutils/libelf $CFLAGS" BUILD_STATIC_ONLY=y V=1
$CC $CFLAGS -Isrc -Iinclude -Iinclude/uapi -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -c fuzz/bpf-object-fuzzer.c -o bpf-object-fuzzer.o
$CXX $CXXFLAGS $LIB_FUZZING_ENGINE bpf-object-fuzzer.o src/libbpf.a "$(pwd)/elfutils/libelf/libelf.a" -l:libz.a -o "$OUT/bpf-object-fuzzer"
cp fuzz/bpf-object-fuzzer_seed_corpus.zip "$OUT"

37
scripts/mailmap-update.sh Executable file
View File

@@ -0,0 +1,37 @@
#!/usr/bin/env bash
set -eu
usage () {
echo "USAGE: ./mailmap-update.sh <libbpf-repo> <linux-repo>"
exit 1
}
LIBBPF_REPO="${1-""}"
LINUX_REPO="${2-""}"
if [ -z "${LIBBPF_REPO}" ] || [ -z "${LINUX_REPO}" ]; then
echo "Error: libbpf or linux repos are not specified"
usage
fi
LIBBPF_MAILMAP="${LIBBPF_REPO}/.mailmap"
LINUX_MAILMAP="${LINUX_REPO}/.mailmap"
tmpfile="$(mktemp)"
cleanup() {
rm -f "${tmpfile}"
}
trap cleanup EXIT
grep_lines() {
local pattern="$1"
local file="$2"
grep "${pattern}" "${file}" || true
}
while read -r email; do
grep_lines "${email}$" "${LINUX_MAILMAP}" >> "${tmpfile}"
done < <(git log --format='<%ae>' | sort -u)
sort -u "${tmpfile}" > "${LIBBPF_MAILMAP}"

View File

@@ -42,16 +42,20 @@ PATH_MAP=( \
[tools/include/uapi/linux/bpf_common.h]=include/uapi/linux/bpf_common.h \
[tools/include/uapi/linux/bpf.h]=include/uapi/linux/bpf.h \
[tools/include/uapi/linux/btf.h]=include/uapi/linux/btf.h \
[tools/include/uapi/linux/fcntl.h]=include/uapi/linux/fcntl.h \
[tools/include/uapi/linux/openat2.h]=include/uapi/linux/openat2.h \
[tools/include/uapi/linux/if_link.h]=include/uapi/linux/if_link.h \
[tools/include/uapi/linux/if_xdp.h]=include/uapi/linux/if_xdp.h \
[tools/include/uapi/linux/netdev.h]=include/uapi/linux/netdev.h \
[tools/include/uapi/linux/netlink.h]=include/uapi/linux/netlink.h \
[tools/include/uapi/linux/pkt_cls.h]=include/uapi/linux/pkt_cls.h \
[tools/include/uapi/linux/pkt_sched.h]=include/uapi/linux/pkt_sched.h \
[include/uapi/linux/perf_event.h]=include/uapi/linux/perf_event.h \
[Documentation/bpf/libbpf]=docs \
)
LIBBPF_PATHS="${!PATH_MAP[@]} :^tools/lib/bpf/Makefile :^tools/lib/bpf/Build :^tools/lib/bpf/.gitignore :^tools/include/tools/libc_compat.h"
LIBBPF_VIEW_PATHS="${PATH_MAP[@]}"
LIBBPF_PATHS=("${!PATH_MAP[@]}" ":^tools/lib/bpf/Makefile" ":^tools/lib/bpf/Build" ":^tools/lib/bpf/.gitignore" ":^tools/include/tools/libc_compat.h")
LIBBPF_VIEW_PATHS=("${PATH_MAP[@]}")
LIBBPF_VIEW_EXCLUDE_REGEX='^src/(Makefile|Build|test_libbpf\.c|bpf_helper_defs\.h|\.gitignore)$|^docs/(\.gitignore|api\.rst|conf\.py)$|^docs/sphinx/.*'
LINUX_VIEW_EXCLUDE_REGEX='^include/tools/libc_compat.h$'
@@ -84,7 +88,9 @@ commit_desc()
# $2 - paths filter
commit_signature()
{
git show --pretty='("%s")|%aI|%b' --shortstat $1 -- ${2-.} | tr '\n' '|'
local ref=$1
shift
git show --pretty='("%s")|%aI|%b' --shortstat $ref -- "${@-.}" | tr '\n' '|'
}
# Cherry-pick commits touching libbpf-related files
@@ -103,7 +109,7 @@ cherry_pick_commits()
local libbpf_conflict_cnt
local desc
new_commits=$(git rev-list --no-merges --topo-order --reverse ${baseline_tag}..${tip_tag} ${LIBBPF_PATHS[@]})
new_commits=$(git rev-list --no-merges --topo-order --reverse ${baseline_tag}..${tip_tag} -- "${LIBBPF_PATHS[@]}")
for new_commit in ${new_commits}; do
desc="$(commit_desc ${new_commit})"
signature="$(commit_signature ${new_commit} "${LIBBPF_PATHS[@]}")"
@@ -137,7 +143,7 @@ cherry_pick_commits()
echo "Picking '${desc}'..."
if ! git cherry-pick ${new_commit} &>/dev/null; then
echo "Warning! Cherry-picking '${desc} failed, checking if it's non-libbpf files causing problems..."
libbpf_conflict_cnt=$(git diff --name-only --diff-filter=U -- ${LIBBPF_PATHS[@]} | wc -l)
libbpf_conflict_cnt=$(git diff --name-only --diff-filter=U -- "${LIBBPF_PATHS[@]}" | wc -l)
conflict_cnt=$(git diff --name-only | wc -l)
prompt_resolution=1
@@ -256,15 +262,18 @@ if ((${COMMIT_CNT} <= 0)); then
fi
# Exclude baseline commit and generate nice cover letter with summary
git format-patch ${SQUASH_BASE_TAG}..${SQUASH_TIP_TAG} --cover-letter -o ${TMP_DIR}/patches
git format-patch --no-signature ${SQUASH_BASE_TAG}..${SQUASH_TIP_TAG} --cover-letter -o ${TMP_DIR}/patches
# Now is time to re-apply libbpf-related linux patches to libbpf repo
cd_to ${LIBBPF_REPO}
git checkout -b ${LIBBPF_SYNC_TAG}
for patch in $(ls -1 ${TMP_DIR}/patches | tail -n +2); do
if ! git am --3way --committer-date-is-author-date "${TMP_DIR}/patches/${patch}"; then
read -p "Applying ${TMP_DIR}/patches/${patch} failed, please resolve manually and press <return> to proceed..."
if ! git am -3 --committer-date-is-author-date "${TMP_DIR}/patches/${patch}"; then
if ! patch -p1 --merge < "${TMP_DIR}/patches/${patch}"; then
read -p "Applying ${TMP_DIR}/patches/${patch} failed, please resolve manually and press <return> to proceed..."
fi
git am --continue
fi
done
@@ -280,12 +289,28 @@ cd_to ${LIBBPF_REPO}
helpers_changes=$(git status --porcelain src/bpf_helper_defs.h | wc -l)
if ((${helpers_changes} == 1)); then
git add src/bpf_helper_defs.h
git commit -m "sync: auto-generate latest BPF helpers
git commit -s -m "sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
" -- src/bpf_helper_defs.h
fi
echo "Regenerating .mailmap..."
cd_to "${LINUX_REPO}"
git checkout "${TIP_SYM_REF}"
cd_to "${LIBBPF_REPO}"
"${LIBBPF_REPO}"/scripts/mailmap-update.sh "${LIBBPF_REPO}" "${LINUX_REPO}"
# if anything changed, commit it
mailmap_changes=$(git status --porcelain .mailmap | wc -l)
if ((${mailmap_changes} == 1)); then
git add .mailmap
git commit -s -m "sync: update .mailmap
Update .mailmap based on libbpf's list of contributors and on the latest
.mailmap version in the upstream repository.
" -- .mailmap
fi
# Use generated cover-letter as a template for "sync commit" with
# baseline and checkpoint commits from kernel repo (and leave summary
# from cover letter intact, of course)
@@ -302,7 +327,7 @@ Baseline bpf-next commit: ${BASELINE_COMMIT}\n\
Checkpoint bpf-next commit: ${TIP_COMMIT}\n\
Baseline bpf commit: ${BPF_BASELINE_COMMIT}\n\
Checkpoint bpf commit: ${BPF_TIP_COMMIT}/" | \
git commit --file=-
git commit -s --file=-
echo "SUCCESS! ${COMMIT_CNT} commits synced."
@@ -312,10 +337,10 @@ cd_to ${LINUX_REPO}
git checkout -b ${VIEW_TAG} ${TIP_COMMIT}
FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch -f --tree-filter "${LIBBPF_TREE_FILTER}" ${VIEW_TAG}^..${VIEW_TAG}
FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch -f --subdirectory-filter __libbpf ${VIEW_TAG}^..${VIEW_TAG}
git ls-files -- ${LIBBPF_VIEW_PATHS[@]} | grep -v -E "${LINUX_VIEW_EXCLUDE_REGEX}" > ${TMP_DIR}/linux-view.ls
git ls-files -- "${LIBBPF_VIEW_PATHS[@]}" | grep -v -E "${LINUX_VIEW_EXCLUDE_REGEX}" > ${TMP_DIR}/linux-view.ls
cd_to ${LIBBPF_REPO}
git ls-files -- ${LIBBPF_VIEW_PATHS[@]} | grep -v -E "${LIBBPF_VIEW_EXCLUDE_REGEX}" > ${TMP_DIR}/github-view.ls
git ls-files -- "${LIBBPF_VIEW_PATHS[@]}" | grep -v -E "${LIBBPF_VIEW_EXCLUDE_REGEX}" > ${TMP_DIR}/github-view.ls
echo "Comparing list of files..."
diff -u ${TMP_DIR}/linux-view.ls ${TMP_DIR}/github-view.ls

View File

@@ -5,13 +5,27 @@ ifeq ($(V),1)
msg =
else
Q = @
msg = @printf ' %-8s %s%s\n' "$(1)" "$(notdir $(2))" "$(if $(3), $(3))";
msg = @printf ' %-8s %s%s\n' "$(1)" "$(2)" "$(if $(3), $(3))";
endif
LIBBPF_VERSION := $(shell \
grep -oE '^LIBBPF_([0-9.]+)' libbpf.map | \
sort -rV | head -n1 | cut -d'_' -f2)
LIBBPF_MAJOR_VERSION := $(firstword $(subst ., ,$(LIBBPF_VERSION)))
LIBBPF_MAJOR_VERSION := 1
LIBBPF_MINOR_VERSION := 5
LIBBPF_PATCH_VERSION := 0
LIBBPF_VERSION := $(LIBBPF_MAJOR_VERSION).$(LIBBPF_MINOR_VERSION).$(LIBBPF_PATCH_VERSION)
LIBBPF_MAJMIN_VERSION := $(LIBBPF_MAJOR_VERSION).$(LIBBPF_MINOR_VERSION).0
LIBBPF_MAP_VERSION := $(shell grep -oE '^LIBBPF_([0-9.]+)' libbpf.map | sort -rV | head -n1 | cut -d'_' -f2)
ifneq ($(LIBBPF_MAJMIN_VERSION), $(LIBBPF_MAP_VERSION))
$(error Libbpf release ($(LIBBPF_VERSION)) and map ($(LIBBPF_MAP_VERSION)) versions are out of sync!)
endif
define allow-override
$(if $(or $(findstring environment,$(origin $(1))),\
$(findstring command line,$(origin $(1)))),,\
$(eval $(1) = $(2)))
endef
$(call allow-override,CC,$(CROSS_COMPILE)cc)
$(call allow-override,LD,$(CROSS_COMPILE)ld)
TOPDIR = ..
@@ -20,9 +34,13 @@ ALL_CFLAGS := $(INCLUDES)
SHARED_CFLAGS += -fPIC -fvisibility=hidden -DSHARED
CFLAGS ?= -g -O2 -Werror -Wall
ALL_CFLAGS += $(CFLAGS) -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64
ALL_LDFLAGS += $(LDFLAGS)
CFLAGS ?= -g -O2 -Werror -Wall -std=gnu89
ALL_CFLAGS += $(CFLAGS) \
-D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 \
-Wno-unknown-warning-option -Wno-format-overflow \
$(EXTRA_CFLAGS)
ALL_LDFLAGS += $(LDFLAGS) $(EXTRA_LDFLAGS)
ifdef NO_PKG_CONFIG
ALL_LDFLAGS += -lelf -lz
else
@@ -35,9 +53,9 @@ OBJDIR ?= .
SHARED_OBJDIR := $(OBJDIR)/sharedobjs
STATIC_OBJDIR := $(OBJDIR)/staticobjs
OBJS := bpf.o btf.o libbpf.o libbpf_errno.o netlink.o \
nlattr.o str_error.o libbpf_probes.o bpf_prog_linfo.o xsk.o \
nlattr.o str_error.o libbpf_probes.o bpf_prog_linfo.o \
btf_dump.o hashmap.o ringbuf.o strset.o linker.o gen_loader.o \
relo_core.o
relo_core.o usdt.o zip.o elf.o features.o
SHARED_OBJS := $(addprefix $(SHARED_OBJDIR)/,$(OBJS))
STATIC_OBJS := $(addprefix $(STATIC_OBJDIR)/,$(OBJS))
@@ -49,9 +67,10 @@ ifndef BUILD_STATIC_ONLY
VERSION_SCRIPT := libbpf.map
endif
HEADERS := bpf.h libbpf.h btf.h libbpf_common.h libbpf_legacy.h xsk.h \
HEADERS := bpf.h libbpf.h btf.h libbpf_common.h libbpf_legacy.h \
bpf_helpers.h bpf_helper_defs.h bpf_tracing.h \
bpf_endian.h bpf_core_read.h skel_internal.h libbpf_version.h
bpf_endian.h bpf_core_read.h skel_internal.h libbpf_version.h \
usdt.bpf.h
UAPI_HEADERS := $(addprefix $(TOPDIR)/include/uapi/linux/,\
bpf.h bpf_common.h btf.h)
@@ -61,7 +80,8 @@ INSTALL = install
DESTDIR ?=
ifeq ($(filter-out %64 %64be %64eb %64le %64el s390x, $(shell uname -m)),)
HOSTARCH = $(firstword $(subst -, ,$(shell $(CC) -dumpmachine)))
ifeq ($(filter-out %64 %64be %64eb %64le %64el s390x, $(HOSTARCH)),)
LIBSUBDIR := lib64
else
LIBSUBDIR := lib
@@ -99,7 +119,7 @@ $(OBJDIR)/libbpf.so.$(LIBBPF_VERSION): $(SHARED_OBJS)
-Wl,-soname,libbpf.so.$(LIBBPF_MAJOR_VERSION) \
$^ $(ALL_LDFLAGS) -o $@
$(OBJDIR)/libbpf.pc:
$(OBJDIR)/libbpf.pc: force
$(Q)sed -e "s|@PREFIX@|$(PREFIX)|" \
-e "s|@LIBDIR@|$(LIBDIR_PC)|" \
-e "s|@VERSION@|$(LIBBPF_VERSION)|" \
@@ -152,7 +172,7 @@ clean:
$(call msg,CLEAN)
$(Q)rm -rf *.o *.a *.so *.so.* *.pc $(SHARED_OBJDIR) $(STATIC_OBJDIR)
.PHONY: cscope tags
.PHONY: cscope tags force
cscope:
$(call msg,CSCOPE)
$(Q)ls *.c *.h > cscope.files
@@ -162,3 +182,5 @@ tags:
$(call msg,CTAGS)
$(Q)rm -f TAGS tags
$(Q)ls *.c *.h | xargs $(TAGS_PROG) -a
force:

1135
src/bpf.c

File diff suppressed because it is too large Load Diff

590
src/bpf.h
View File

@@ -1,7 +1,7 @@
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
/*
* common eBPF ELF operations.
* Common BPF ELF operations.
*
* Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
* Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
@@ -29,92 +29,121 @@
#include <stdint.h>
#include "libbpf_common.h"
#include "libbpf_legacy.h"
#ifdef __cplusplus
extern "C" {
#endif
struct bpf_create_map_attr {
const char *name;
enum bpf_map_type map_type;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
__u32 max_entries;
__u32 numa_node;
LIBBPF_API int libbpf_set_memlock_rlim(size_t memlock_bytes);
struct bpf_map_create_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 btf_fd;
__u32 btf_key_type_id;
__u32 btf_value_type_id;
__u32 btf_vmlinux_value_type_id;
__u32 inner_map_fd;
__u32 map_flags;
__u64 map_extra;
__u32 numa_node;
__u32 map_ifindex;
union {
__u32 inner_map_fd;
__u32 btf_vmlinux_value_type_id;
};
__s32 value_type_btf_obj_fd;
__u32 token_fd;
size_t :0;
};
#define bpf_map_create_opts__last_field token_fd
LIBBPF_API int
bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr);
LIBBPF_API int bpf_create_map_node(enum bpf_map_type map_type, const char *name,
int key_size, int value_size,
int max_entries, __u32 map_flags, int node);
LIBBPF_API int bpf_create_map_name(enum bpf_map_type map_type, const char *name,
int key_size, int value_size,
int max_entries, __u32 map_flags);
LIBBPF_API int bpf_create_map(enum bpf_map_type map_type, int key_size,
int value_size, int max_entries, __u32 map_flags);
LIBBPF_API int bpf_create_map_in_map_node(enum bpf_map_type map_type,
const char *name, int key_size,
int inner_map_fd, int max_entries,
__u32 map_flags, int node);
LIBBPF_API int bpf_create_map_in_map(enum bpf_map_type map_type,
const char *name, int key_size,
int inner_map_fd, int max_entries,
__u32 map_flags);
LIBBPF_API int bpf_map_create(enum bpf_map_type map_type,
const char *map_name,
__u32 key_size,
__u32 value_size,
__u32 max_entries,
const struct bpf_map_create_opts *opts);
struct bpf_prog_load_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
/* libbpf can retry BPF_PROG_LOAD command if bpf() syscall returns
* -EAGAIN. This field determines how many attempts libbpf has to
* make. If not specified, libbpf will use default value of 5.
*/
int attempts;
struct bpf_load_program_attr {
enum bpf_prog_type prog_type;
enum bpf_attach_type expected_attach_type;
const char *name;
const struct bpf_insn *insns;
size_t insns_cnt;
const char *license;
union {
__u32 kern_version;
__u32 attach_prog_fd;
};
union {
__u32 prog_ifindex;
__u32 attach_btf_id;
};
__u32 prog_btf_fd;
__u32 func_info_rec_size;
__u32 prog_flags;
__u32 prog_ifindex;
__u32 kern_version;
__u32 attach_btf_id;
__u32 attach_prog_fd;
__u32 attach_btf_obj_fd;
const int *fd_array;
/* .BTF.ext func info data */
const void *func_info;
__u32 func_info_cnt;
__u32 line_info_rec_size;
__u32 func_info_rec_size;
/* .BTF.ext line info data */
const void *line_info;
__u32 line_info_cnt;
__u32 line_info_rec_size;
/* verifier log options */
__u32 log_level;
__u32 prog_flags;
__u32 log_size;
char *log_buf;
/* output: actual total log contents size (including termintaing zero).
* It could be both larger than original log_size (if log was
* truncated), or smaller (if log buffer wasn't filled completely).
* If kernel doesn't support this feature, log_size is left unchanged.
*/
__u32 log_true_size;
__u32 token_fd;
size_t :0;
};
#define bpf_prog_load_opts__last_field token_fd
LIBBPF_API int bpf_prog_load(enum bpf_prog_type prog_type,
const char *prog_name, const char *license,
const struct bpf_insn *insns, size_t insn_cnt,
struct bpf_prog_load_opts *opts);
/* Flags to direct loading requirements */
#define MAPS_RELAX_COMPAT 0x01
/* Recommend log buffer size */
/* Recommended log buffer size */
#define BPF_LOG_BUF_SIZE (UINT32_MAX >> 8) /* verifier maximum in kernels <= 5.1 */
LIBBPF_API int
bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
char *log_buf, size_t log_buf_sz);
LIBBPF_API int bpf_load_program(enum bpf_prog_type type,
const struct bpf_insn *insns, size_t insns_cnt,
const char *license, __u32 kern_version,
char *log_buf, size_t log_buf_sz);
LIBBPF_API int bpf_verify_program(enum bpf_prog_type type,
const struct bpf_insn *insns,
size_t insns_cnt, __u32 prog_flags,
const char *license, __u32 kern_version,
char *log_buf, size_t log_buf_sz,
int log_level);
struct bpf_btf_load_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
/* kernel log options */
char *log_buf;
__u32 log_level;
__u32 log_size;
/* output: actual total log contents size (including termintaing zero).
* It could be both larger than original log_size (if log was
* truncated), or smaller (if log buffer wasn't filled completely).
* If kernel doesn't support this feature, log_size is left unchanged.
*/
__u32 log_true_size;
__u32 btf_flags;
__u32 token_fd;
size_t :0;
};
#define bpf_btf_load_opts__last_field token_fd
LIBBPF_API int bpf_btf_load(const void *btf_data, size_t btf_size,
struct bpf_btf_load_opts *opts);
LIBBPF_API int bpf_map_update_elem(int fd, const void *key, const void *value,
__u64 flags);
@@ -127,6 +156,7 @@ LIBBPF_API int bpf_map_lookup_and_delete_elem(int fd, const void *key,
LIBBPF_API int bpf_map_lookup_and_delete_elem_flags(int fd, const void *key,
void *value, __u64 flags);
LIBBPF_API int bpf_map_delete_elem(int fd, const void *key);
LIBBPF_API int bpf_map_delete_elem_flags(int fd, const void *key, __u64 flags);
LIBBPF_API int bpf_map_get_next_key(int fd, const void *key, void *next_key);
LIBBPF_API int bpf_map_freeze(int fd);
@@ -137,39 +167,228 @@ struct bpf_map_batch_opts {
};
#define bpf_map_batch_opts__last_field flags
LIBBPF_API int bpf_map_delete_batch(int fd, void *keys,
/**
* @brief **bpf_map_delete_batch()** allows for batch deletion of multiple
* elements in a BPF map.
*
* @param fd BPF map file descriptor
* @param keys pointer to an array of *count* keys
* @param count input and output parameter; on input **count** represents the
* number of elements in the map to delete in batch;
* on output if a non-EFAULT error is returned, **count** represents the number of deleted
* elements if the output **count** value is not equal to the input **count** value
* If EFAULT is returned, **count** should not be trusted to be correct.
* @param opts options for configuring the way the batch deletion works
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_map_delete_batch(int fd, const void *keys,
__u32 *count,
const struct bpf_map_batch_opts *opts);
/**
* @brief **bpf_map_lookup_batch()** allows for batch lookup of BPF map elements.
*
* The parameter *in_batch* is the address of the first element in the batch to
* read. *out_batch* is an output parameter that should be passed as *in_batch*
* to subsequent calls to **bpf_map_lookup_batch()**. NULL can be passed for
* *in_batch* to indicate that the batched lookup starts from the beginning of
* the map. Both *in_batch* and *out_batch* must point to memory large enough to
* hold a single key, except for maps of type **BPF_MAP_TYPE_{HASH, PERCPU_HASH,
* LRU_HASH, LRU_PERCPU_HASH}**, for which the memory size must be at
* least 4 bytes wide regardless of key size.
*
* The *keys* and *values* are output parameters which must point to memory large enough to
* hold *count* items based on the key and value size of the map *map_fd*. The *keys*
* buffer must be of *key_size* * *count*. The *values* buffer must be of
* *value_size* * *count*.
*
* @param fd BPF map file descriptor
* @param in_batch address of the first element in batch to read, can pass NULL to
* indicate that the batched lookup starts from the beginning of the map.
* @param out_batch output parameter that should be passed to next call as *in_batch*
* @param keys pointer to an array large enough for *count* keys
* @param values pointer to an array large enough for *count* values
* @param count input and output parameter; on input it's the number of elements
* in the map to read in batch; on output it's the number of elements that were
* successfully read.
* If a non-EFAULT error is returned, count will be set as the number of elements
* that were read before the error occurred.
* If EFAULT is returned, **count** should not be trusted to be correct.
* @param opts options for configuring the way the batch lookup works
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_map_lookup_batch(int fd, void *in_batch, void *out_batch,
void *keys, void *values, __u32 *count,
const struct bpf_map_batch_opts *opts);
/**
* @brief **bpf_map_lookup_and_delete_batch()** allows for batch lookup and deletion
* of BPF map elements where each element is deleted after being retrieved.
*
* @param fd BPF map file descriptor
* @param in_batch address of the first element in batch to read, can pass NULL to
* get address of the first element in *out_batch*. If not NULL, must be large
* enough to hold a key. For **BPF_MAP_TYPE_{HASH, PERCPU_HASH, LRU_HASH,
* LRU_PERCPU_HASH}**, the memory size must be at least 4 bytes wide regardless
* of key size.
* @param out_batch output parameter that should be passed to next call as *in_batch*
* @param keys pointer to an array of *count* keys
* @param values pointer to an array large enough for *count* values
* @param count input and output parameter; on input it's the number of elements
* in the map to read and delete in batch; on output it represents the number of
* elements that were successfully read and deleted
* If a non-**EFAULT** error code is returned and if the output **count** value
* is not equal to the input **count** value, up to **count** elements may
* have been deleted.
* if **EFAULT** is returned up to *count* elements may have been deleted without
* being returned via the *keys* and *values* output parameters.
* @param opts options for configuring the way the batch lookup and delete works
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_map_lookup_and_delete_batch(int fd, void *in_batch,
void *out_batch, void *keys,
void *values, __u32 *count,
const struct bpf_map_batch_opts *opts);
LIBBPF_API int bpf_map_update_batch(int fd, void *keys, void *values,
/**
* @brief **bpf_map_update_batch()** updates multiple elements in a map
* by specifying keys and their corresponding values.
*
* The *keys* and *values* parameters must point to memory large enough
* to hold *count* items based on the key and value size of the map.
*
* The *opts* parameter can be used to control how *bpf_map_update_batch()*
* should handle keys that either do or do not already exist in the map.
* In particular the *flags* parameter of *bpf_map_batch_opts* can be
* one of the following:
*
* Note that *count* is an input and output parameter, where on output it
* represents how many elements were successfully updated. Also note that if
* **EFAULT** then *count* should not be trusted to be correct.
*
* **BPF_ANY**
* Create new elements or update existing.
*
* **BPF_NOEXIST**
* Create new elements only if they do not exist.
*
* **BPF_EXIST**
* Update existing elements.
*
* **BPF_F_LOCK**
* Update spin_lock-ed map elements. This must be
* specified if the map value contains a spinlock.
*
* @param fd BPF map file descriptor
* @param keys pointer to an array of *count* keys
* @param values pointer to an array of *count* values
* @param count input and output parameter; on input it's the number of elements
* in the map to update in batch; on output if a non-EFAULT error is returned,
* **count** represents the number of updated elements if the output **count**
* value is not equal to the input **count** value.
* If EFAULT is returned, **count** should not be trusted to be correct.
* @param opts options for configuring the way the batch update works
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_map_update_batch(int fd, const void *keys, const void *values,
__u32 *count,
const struct bpf_map_batch_opts *opts);
LIBBPF_API int bpf_obj_pin(int fd, const char *pathname);
LIBBPF_API int bpf_obj_get(const char *pathname);
struct bpf_prog_attach_opts {
struct bpf_obj_pin_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
unsigned int flags;
int replace_prog_fd;
__u32 file_flags;
int path_fd;
size_t :0;
};
#define bpf_prog_attach_opts__last_field replace_prog_fd
#define bpf_obj_pin_opts__last_field path_fd
LIBBPF_API int bpf_obj_pin(int fd, const char *pathname);
LIBBPF_API int bpf_obj_pin_opts(int fd, const char *pathname,
const struct bpf_obj_pin_opts *opts);
struct bpf_obj_get_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 file_flags;
int path_fd;
size_t :0;
};
#define bpf_obj_get_opts__last_field path_fd
LIBBPF_API int bpf_obj_get(const char *pathname);
LIBBPF_API int bpf_obj_get_opts(const char *pathname,
const struct bpf_obj_get_opts *opts);
LIBBPF_API int bpf_prog_attach(int prog_fd, int attachable_fd,
enum bpf_attach_type type, unsigned int flags);
LIBBPF_API int bpf_prog_attach_xattr(int prog_fd, int attachable_fd,
enum bpf_attach_type type,
const struct bpf_prog_attach_opts *opts);
LIBBPF_API int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
LIBBPF_API int bpf_prog_detach2(int prog_fd, int attachable_fd,
enum bpf_attach_type type);
struct bpf_prog_attach_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 flags;
union {
int replace_prog_fd;
int replace_fd;
};
int relative_fd;
__u32 relative_id;
__u64 expected_revision;
size_t :0;
};
#define bpf_prog_attach_opts__last_field expected_revision
struct bpf_prog_detach_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 flags;
int relative_fd;
__u32 relative_id;
__u64 expected_revision;
size_t :0;
};
#define bpf_prog_detach_opts__last_field expected_revision
/**
* @brief **bpf_prog_attach_opts()** attaches the BPF program corresponding to
* *prog_fd* to a *target* which can represent a file descriptor or netdevice
* ifindex.
*
* @param prog_fd BPF program file descriptor
* @param target attach location file descriptor or ifindex
* @param type attach type for the BPF program
* @param opts options for configuring the attachment
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_prog_attach_opts(int prog_fd, int target,
enum bpf_attach_type type,
const struct bpf_prog_attach_opts *opts);
/**
* @brief **bpf_prog_detach_opts()** detaches the BPF program corresponding to
* *prog_fd* from a *target* which can represent a file descriptor or netdevice
* ifindex.
*
* @param prog_fd BPF program file descriptor
* @param target detach location file descriptor or ifindex
* @param type detach type for the BPF program
* @param opts options for configuring the detachment
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_prog_detach_opts(int prog_fd, int target,
enum bpf_attach_type type,
const struct bpf_prog_detach_opts *opts);
union bpf_iter_link_info; /* defined in up-to-date linux/bpf.h */
struct bpf_link_create_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
@@ -181,10 +400,45 @@ struct bpf_link_create_opts {
struct {
__u64 bpf_cookie;
} perf_event;
struct {
__u32 flags;
__u32 cnt;
const char **syms;
const unsigned long *addrs;
const __u64 *cookies;
} kprobe_multi;
struct {
__u32 flags;
__u32 cnt;
const char *path;
const unsigned long *offsets;
const unsigned long *ref_ctr_offsets;
const __u64 *cookies;
__u32 pid;
} uprobe_multi;
struct {
__u64 cookie;
} tracing;
struct {
__u32 pf;
__u32 hooknum;
__s32 priority;
__u32 flags;
} netfilter;
struct {
__u32 relative_fd;
__u32 relative_id;
__u64 expected_revision;
} tcx;
struct {
__u32 relative_fd;
__u32 relative_id;
__u64 expected_revision;
} netkit;
};
size_t :0;
};
#define bpf_link_create_opts__last_field perf_event
#define bpf_link_create_opts__last_field uprobe_multi.pid
LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
enum bpf_attach_type attach_type,
@@ -196,8 +450,9 @@ struct bpf_link_update_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 flags; /* extra flags */
__u32 old_prog_fd; /* expected old program FD */
__u32 old_map_fd; /* expected old map FD */
};
#define bpf_link_update_opts__last_field old_prog_fd
#define bpf_link_update_opts__last_field old_map_fd
LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd,
const struct bpf_link_update_opts *opts);
@@ -221,36 +476,170 @@ struct bpf_prog_test_run_attr {
* out: length of cxt_out */
};
LIBBPF_API int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr);
/*
* bpf_prog_test_run does not check that data_out is large enough. Consider
* using bpf_prog_test_run_xattr instead.
*/
LIBBPF_API int bpf_prog_test_run(int prog_fd, int repeat, void *data,
__u32 size, void *data_out, __u32 *size_out,
__u32 *retval, __u32 *duration);
LIBBPF_API int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id);
LIBBPF_API int bpf_map_get_next_id(__u32 start_id, __u32 *next_id);
LIBBPF_API int bpf_btf_get_next_id(__u32 start_id, __u32 *next_id);
LIBBPF_API int bpf_link_get_next_id(__u32 start_id, __u32 *next_id);
struct bpf_get_fd_by_id_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 open_flags; /* permissions requested for the operation on fd */
size_t :0;
};
#define bpf_get_fd_by_id_opts__last_field open_flags
LIBBPF_API int bpf_prog_get_fd_by_id(__u32 id);
LIBBPF_API int bpf_prog_get_fd_by_id_opts(__u32 id,
const struct bpf_get_fd_by_id_opts *opts);
LIBBPF_API int bpf_map_get_fd_by_id(__u32 id);
LIBBPF_API int bpf_map_get_fd_by_id_opts(__u32 id,
const struct bpf_get_fd_by_id_opts *opts);
LIBBPF_API int bpf_btf_get_fd_by_id(__u32 id);
LIBBPF_API int bpf_btf_get_fd_by_id_opts(__u32 id,
const struct bpf_get_fd_by_id_opts *opts);
LIBBPF_API int bpf_link_get_fd_by_id(__u32 id);
LIBBPF_API int bpf_link_get_fd_by_id_opts(__u32 id,
const struct bpf_get_fd_by_id_opts *opts);
LIBBPF_API int bpf_obj_get_info_by_fd(int bpf_fd, void *info, __u32 *info_len);
/**
* @brief **bpf_prog_get_info_by_fd()** obtains information about the BPF
* program corresponding to *prog_fd*.
*
* Populates up to *info_len* bytes of *info* and updates *info_len* with the
* actual number of bytes written to *info*. Note that *info* should be
* zero-initialized or initialized as expected by the requested *info*
* type. Failing to (zero-)initialize *info* under certain circumstances can
* result in this helper returning an error.
*
* @param prog_fd BPF program file descriptor
* @param info pointer to **struct bpf_prog_info** that will be populated with
* BPF program information
* @param info_len pointer to the size of *info*; on success updated with the
* number of bytes written to *info*
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_prog_get_info_by_fd(int prog_fd, struct bpf_prog_info *info, __u32 *info_len);
/**
* @brief **bpf_map_get_info_by_fd()** obtains information about the BPF
* map corresponding to *map_fd*.
*
* Populates up to *info_len* bytes of *info* and updates *info_len* with the
* actual number of bytes written to *info*. Note that *info* should be
* zero-initialized or initialized as expected by the requested *info*
* type. Failing to (zero-)initialize *info* under certain circumstances can
* result in this helper returning an error.
*
* @param map_fd BPF map file descriptor
* @param info pointer to **struct bpf_map_info** that will be populated with
* BPF map information
* @param info_len pointer to the size of *info*; on success updated with the
* number of bytes written to *info*
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_map_get_info_by_fd(int map_fd, struct bpf_map_info *info, __u32 *info_len);
/**
* @brief **bpf_btf_get_info_by_fd()** obtains information about the
* BTF object corresponding to *btf_fd*.
*
* Populates up to *info_len* bytes of *info* and updates *info_len* with the
* actual number of bytes written to *info*. Note that *info* should be
* zero-initialized or initialized as expected by the requested *info*
* type. Failing to (zero-)initialize *info* under certain circumstances can
* result in this helper returning an error.
*
* @param btf_fd BTF object file descriptor
* @param info pointer to **struct bpf_btf_info** that will be populated with
* BTF object information
* @param info_len pointer to the size of *info*; on success updated with the
* number of bytes written to *info*
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_btf_get_info_by_fd(int btf_fd, struct bpf_btf_info *info, __u32 *info_len);
/**
* @brief **bpf_btf_get_info_by_fd()** obtains information about the BPF
* link corresponding to *link_fd*.
*
* Populates up to *info_len* bytes of *info* and updates *info_len* with the
* actual number of bytes written to *info*. Note that *info* should be
* zero-initialized or initialized as expected by the requested *info*
* type. Failing to (zero-)initialize *info* under certain circumstances can
* result in this helper returning an error.
*
* @param link_fd BPF link file descriptor
* @param info pointer to **struct bpf_link_info** that will be populated with
* BPF link information
* @param info_len pointer to the size of *info*; on success updated with the
* number of bytes written to *info*
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_link_get_info_by_fd(int link_fd, struct bpf_link_info *info, __u32 *info_len);
struct bpf_prog_query_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 query_flags;
__u32 attach_flags; /* output argument */
__u32 *prog_ids;
union {
/* input+output argument */
__u32 prog_cnt;
__u32 count;
};
__u32 *prog_attach_flags;
__u32 *link_ids;
__u32 *link_attach_flags;
__u64 revision;
size_t :0;
};
#define bpf_prog_query_opts__last_field revision
/**
* @brief **bpf_prog_query_opts()** queries the BPF programs and BPF links
* which are attached to *target* which can represent a file descriptor or
* netdevice ifindex.
*
* @param target query location file descriptor or ifindex
* @param type attach type for the BPF program
* @param opts options for configuring the query
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_prog_query_opts(int target, enum bpf_attach_type type,
struct bpf_prog_query_opts *opts);
LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type,
__u32 query_flags, __u32 *attach_flags,
__u32 *prog_ids, __u32 *prog_cnt);
struct bpf_raw_tp_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
const char *tp_name;
__u64 cookie;
size_t :0;
};
#define bpf_raw_tp_opts__last_field cookie
LIBBPF_API int bpf_raw_tracepoint_open_opts(int prog_fd, struct bpf_raw_tp_opts *opts);
LIBBPF_API int bpf_raw_tracepoint_open(const char *name, int prog_fd);
LIBBPF_API int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf,
__u32 log_buf_size, bool do_log);
LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf,
__u32 *buf_len, __u32 *prog_id, __u32 *fd_type,
__u64 *probe_offset, __u64 *probe_addr);
#ifdef __cplusplus
/* forward-declaring enums in C++ isn't compatible with pure C enums, so
* instead define bpf_enable_stats() as accepting int as an input
*/
LIBBPF_API int bpf_enable_stats(int type);
#else
enum bpf_stats_type; /* defined in up-to-date linux/bpf.h */
LIBBPF_API int bpf_enable_stats(enum bpf_stats_type type);
#endif
struct bpf_prog_bind_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
@@ -280,12 +669,37 @@ struct bpf_test_run_opts {
__u32 duration; /* out: average per repetition in ns */
__u32 flags;
__u32 cpu;
__u32 batch_size;
};
#define bpf_test_run_opts__last_field cpu
#define bpf_test_run_opts__last_field batch_size
LIBBPF_API int bpf_prog_test_run_opts(int prog_fd,
struct bpf_test_run_opts *opts);
struct bpf_token_create_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 flags;
size_t :0;
};
#define bpf_token_create_opts__last_field flags
/**
* @brief **bpf_token_create()** creates a new instance of BPF token derived
* from specified BPF FS mount point.
*
* BPF token created with this API can be passed to bpf() syscall for
* commands like BPF_PROG_LOAD, BPF_MAP_CREATE, etc.
*
* @param bpffs_fd FD for BPF FS instance from which to derive a BPF token
* instance.
* @param opts optional BPF token creation options, can be NULL
*
* @return BPF token FD > 0, on success; negative error code, otherwise (errno
* is also set to the error code)
*/
LIBBPF_API int bpf_token_create(int bpffs_fd,
struct bpf_token_create_opts *opts);
#ifdef __cplusplus
} /* extern "C" */
#endif

View File

@@ -2,6 +2,8 @@
#ifndef __BPF_CORE_READ_H__
#define __BPF_CORE_READ_H__
#include "bpf_helpers.h"
/*
* enum bpf_field_info_kind is passed as a second argument into
* __builtin_preserve_field_info() built-in to get a specific aspect of
@@ -29,6 +31,7 @@ enum bpf_type_id_kind {
enum bpf_type_info_kind {
BPF_TYPE_EXISTS = 0, /* type existence in target kernel */
BPF_TYPE_SIZE = 1, /* type size in target kernel */
BPF_TYPE_MATCHES = 2, /* type match in target kernel */
};
/* second argument to __builtin_preserve_enum_value() built-in */
@@ -43,7 +46,7 @@ enum bpf_enum_value_kind {
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define __CORE_BITFIELD_PROBE_READ(dst, src, fld) \
bpf_probe_read_kernel( \
(void *)dst, \
(void *)dst, \
__CORE_RELO(src, fld, BYTE_SIZE), \
(const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET))
#else
@@ -110,21 +113,103 @@ enum bpf_enum_value_kind {
val; \
})
/*
* Write to a bitfield, identified by s->field.
* This is the inverse of BPF_CORE_WRITE_BITFIELD().
*/
#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({ \
void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET); \
unsigned int byte_size = __CORE_RELO(s, field, BYTE_SIZE); \
unsigned int lshift = __CORE_RELO(s, field, LSHIFT_U64); \
unsigned int rshift = __CORE_RELO(s, field, RSHIFT_U64); \
unsigned long long mask, val, nval = new_val; \
unsigned int rpad = rshift - lshift; \
\
asm volatile("" : "+r"(p)); \
\
switch (byte_size) { \
case 1: val = *(unsigned char *)p; break; \
case 2: val = *(unsigned short *)p; break; \
case 4: val = *(unsigned int *)p; break; \
case 8: val = *(unsigned long long *)p; break; \
} \
\
mask = (~0ULL << rshift) >> lshift; \
val = (val & ~mask) | ((nval << rpad) & mask); \
\
switch (byte_size) { \
case 1: *(unsigned char *)p = val; break; \
case 2: *(unsigned short *)p = val; break; \
case 4: *(unsigned int *)p = val; break; \
case 8: *(unsigned long long *)p = val; break; \
} \
})
/* Differentiator between compilers builtin implementations. This is a
* requirement due to the compiler parsing differences where GCC optimizes
* early in parsing those constructs of type pointers to the builtin specific
* type, resulting in not being possible to collect the required type
* information in the builtin expansion.
*/
#ifdef __clang__
#define ___bpf_typeof(type) ((typeof(type) *) 0)
#else
#define ___bpf_typeof1(type, NR) ({ \
extern typeof(type) *___concat(bpf_type_tmp_, NR); \
___concat(bpf_type_tmp_, NR); \
})
#define ___bpf_typeof(type) ___bpf_typeof1(type, __COUNTER__)
#endif
#ifdef __clang__
#define ___bpf_field_ref1(field) (field)
#define ___bpf_field_ref2(type, field) (___bpf_typeof(type)->field)
#else
#define ___bpf_field_ref1(field) (&(field))
#define ___bpf_field_ref2(type, field) (&(___bpf_typeof(type)->field))
#endif
#define ___bpf_field_ref(args...) \
___bpf_apply(___bpf_field_ref, ___bpf_narg(args))(args)
/*
* Convenience macro to check that field actually exists in target kernel's.
* Returns:
* 1, if matching field is present in target kernel;
* 0, if no matching field found.
*
* Supports two forms:
* - field reference through variable access:
* bpf_core_field_exists(p->my_field);
* - field reference through type and field names:
* bpf_core_field_exists(struct my_type, my_field).
*/
#define bpf_core_field_exists(field) \
__builtin_preserve_field_info(field, BPF_FIELD_EXISTS)
#define bpf_core_field_exists(field...) \
__builtin_preserve_field_info(___bpf_field_ref(field), BPF_FIELD_EXISTS)
/*
* Convenience macro to get the byte size of a field. Works for integers,
* struct/unions, pointers, arrays, and enums.
*
* Supports two forms:
* - field reference through variable access:
* bpf_core_field_size(p->my_field);
* - field reference through type and field names:
* bpf_core_field_size(struct my_type, my_field).
*/
#define bpf_core_field_size(field) \
__builtin_preserve_field_info(field, BPF_FIELD_BYTE_SIZE)
#define bpf_core_field_size(field...) \
__builtin_preserve_field_info(___bpf_field_ref(field), BPF_FIELD_BYTE_SIZE)
/*
* Convenience macro to get field's byte offset.
*
* Supports two forms:
* - field reference through variable access:
* bpf_core_field_offset(p->my_field);
* - field reference through type and field names:
* bpf_core_field_offset(struct my_type, my_field).
*/
#define bpf_core_field_offset(field...) \
__builtin_preserve_field_info(___bpf_field_ref(field), BPF_FIELD_BYTE_OFFSET)
/*
* Convenience macro to get BTF type ID of a specified type, using a local BTF
@@ -132,7 +217,7 @@ enum bpf_enum_value_kind {
* BTF. Always succeeds.
*/
#define bpf_core_type_id_local(type) \
__builtin_btf_type_id(*(typeof(type) *)0, BPF_TYPE_ID_LOCAL)
__builtin_btf_type_id(*___bpf_typeof(type), BPF_TYPE_ID_LOCAL)
/*
* Convenience macro to get BTF type ID of a target kernel's type that matches
@@ -142,7 +227,7 @@ enum bpf_enum_value_kind {
* - 0, if no matching type was found in a target kernel BTF.
*/
#define bpf_core_type_id_kernel(type) \
__builtin_btf_type_id(*(typeof(type) *)0, BPF_TYPE_ID_TARGET)
__builtin_btf_type_id(*___bpf_typeof(type), BPF_TYPE_ID_TARGET)
/*
* Convenience macro to check that provided named type
@@ -152,7 +237,17 @@ enum bpf_enum_value_kind {
* 0, if no matching type is found.
*/
#define bpf_core_type_exists(type) \
__builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_EXISTS)
__builtin_preserve_type_info(*___bpf_typeof(type), BPF_TYPE_EXISTS)
/*
* Convenience macro to check that provided named type
* (struct/union/enum/typedef) "matches" that in a target kernel.
* Returns:
* 1, if the type matches in the target kernel's BTF;
* 0, if the type does not match any in the target kernel
*/
#define bpf_core_type_matches(type) \
__builtin_preserve_type_info(*___bpf_typeof(type), BPF_TYPE_MATCHES)
/*
* Convenience macro to get the byte size of a provided named type
@@ -162,7 +257,7 @@ enum bpf_enum_value_kind {
* 0, if no matching type is found.
*/
#define bpf_core_type_size(type) \
__builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_SIZE)
__builtin_preserve_type_info(*___bpf_typeof(type), BPF_TYPE_SIZE)
/*
* Convenience macro to check that provided enumerator value is defined in
@@ -172,8 +267,13 @@ enum bpf_enum_value_kind {
* kernel's BTF;
* 0, if no matching enum and/or enum value within that enum is found.
*/
#ifdef __clang__
#define bpf_core_enum_value_exists(enum_type, enum_value) \
__builtin_preserve_enum_value(*(typeof(enum_type) *)enum_value, BPF_ENUMVAL_EXISTS)
#else
#define bpf_core_enum_value_exists(enum_type, enum_value) \
__builtin_preserve_enum_value(___bpf_typeof(enum_type), enum_value, BPF_ENUMVAL_EXISTS)
#endif
/*
* Convenience macro to get the integer value of an enumerator value in
@@ -183,8 +283,13 @@ enum bpf_enum_value_kind {
* present in target kernel's BTF;
* 0, if no matching enum and/or enum value within that enum is found.
*/
#ifdef __clang__
#define bpf_core_enum_value(enum_type, enum_value) \
__builtin_preserve_enum_value(*(typeof(enum_type) *)enum_value, BPF_ENUMVAL_VALUE)
#else
#define bpf_core_enum_value(enum_type, enum_value) \
__builtin_preserve_enum_value(___bpf_typeof(enum_type), enum_value, BPF_ENUMVAL_VALUE)
#endif
/*
* bpf_core_read() abstracts away bpf_probe_read_kernel() call and captures
@@ -196,7 +301,7 @@ enum bpf_enum_value_kind {
* a relocation, which records BTF type ID describing root struct/union and an
* accessor string which describes exact embedded field that was used to take
* an address. See detailed description of this relocation format and
* semantics in comments to struct bpf_field_reloc in libbpf_internal.h.
* semantics in comments to struct bpf_core_relo in include/uapi/linux/bpf.h.
*
* This relocation allows libbpf to adjust BPF instruction to use correct
* actual field offset, based on target kernel BTF type that matches original
@@ -220,6 +325,17 @@ enum bpf_enum_value_kind {
#define bpf_core_read_user_str(dst, sz, src) \
bpf_probe_read_user_str(dst, sz, (const void *)__builtin_preserve_access_index(src))
extern void *bpf_rdonly_cast(const void *obj, __u32 btf_id) __ksym __weak;
/*
* Cast provided pointer *ptr* into a pointer to a specified *type* in such
* a way that BPF verifier will become aware of associated kernel-side BTF
* type. This allows to access members of kernel types directly without the
* need to use BPF_CORE_READ() macros.
*/
#define bpf_core_cast(ptr, type) \
((typeof(type) *)bpf_rdonly_cast((ptr), bpf_core_type_id_kernel(type)))
#define ___concat(a, b) a ## b
#define ___apply(fn, n) ___concat(fn, n)
#define ___nth(_1, _2, _3, _4, _5, _6, _7, _8, _9, _10, __11, N, ...) N
@@ -324,7 +440,7 @@ enum bpf_enum_value_kind {
/* Non-CO-RE variant of BPF_CORE_READ_INTO() */
#define BPF_PROBE_READ_INTO(dst, src, a, ...) ({ \
___core_read(bpf_probe_read, bpf_probe_read, \
___core_read(bpf_probe_read_kernel, bpf_probe_read_kernel, \
dst, (src), a, ##__VA_ARGS__) \
})
@@ -360,7 +476,7 @@ enum bpf_enum_value_kind {
/* Non-CO-RE variant of BPF_CORE_READ_STR_INTO() */
#define BPF_PROBE_READ_STR_INTO(dst, src, a, ...) ({ \
___core_read(bpf_probe_read_str, bpf_probe_read, \
___core_read(bpf_probe_read_kernel_str, bpf_probe_read_kernel, \
dst, (src), a, ##__VA_ARGS__) \
})

View File

@@ -3,12 +3,15 @@
#ifndef __BPF_GEN_INTERNAL_H
#define __BPF_GEN_INTERNAL_H
#include "bpf.h"
struct ksym_relo_desc {
const char *name;
int kind;
int insn_idx;
bool is_weak;
bool is_typeless;
bool is_ld64;
};
struct ksym_desc {
@@ -22,6 +25,7 @@ struct ksym_desc {
bool typeless;
};
int insn;
bool is_ld64;
};
struct bpf_gen {
@@ -37,6 +41,8 @@ struct bpf_gen {
int error;
struct ksym_relo_desc *relos;
int relo_cnt;
struct bpf_core_relo *core_relos;
int core_relo_cnt;
char attach_target[128];
int attach_kind;
struct ksym_desc *ksyms;
@@ -45,17 +51,24 @@ struct bpf_gen {
int nr_fd_array;
};
void bpf_gen__init(struct bpf_gen *gen, int log_level);
int bpf_gen__finish(struct bpf_gen *gen);
void bpf_gen__init(struct bpf_gen *gen, int log_level, int nr_progs, int nr_maps);
int bpf_gen__finish(struct bpf_gen *gen, int nr_progs, int nr_maps);
void bpf_gen__free(struct bpf_gen *gen);
void bpf_gen__load_btf(struct bpf_gen *gen, const void *raw_data, __u32 raw_size);
void bpf_gen__map_create(struct bpf_gen *gen, struct bpf_create_map_params *map_attr, int map_idx);
struct bpf_prog_load_params;
void bpf_gen__prog_load(struct bpf_gen *gen, struct bpf_prog_load_params *load_attr, int prog_idx);
void bpf_gen__map_create(struct bpf_gen *gen,
enum bpf_map_type map_type, const char *map_name,
__u32 key_size, __u32 value_size, __u32 max_entries,
struct bpf_map_create_opts *map_attr, int map_idx);
void bpf_gen__prog_load(struct bpf_gen *gen,
enum bpf_prog_type prog_type, const char *prog_name,
const char *license, struct bpf_insn *insns, size_t insn_cnt,
struct bpf_prog_load_opts *load_attr, int prog_idx);
void bpf_gen__map_update_elem(struct bpf_gen *gen, int map_idx, void *value, __u32 value_size);
void bpf_gen__map_freeze(struct bpf_gen *gen, int map_idx);
void bpf_gen__record_attach_target(struct bpf_gen *gen, const char *name, enum bpf_attach_type type);
void bpf_gen__record_extern(struct bpf_gen *gen, const char *name, bool is_weak,
bool is_typeless, int kind, int insn_idx);
bool is_typeless, bool is_ld64, int kind, int insn_idx);
void bpf_gen__record_relo_core(struct bpf_gen *gen, const struct bpf_core_relo *core_relo);
void bpf_gen__populate_outer_map(struct bpf_gen *gen, int outer_map_idx, int key, int inner_map_idx);
#endif

File diff suppressed because it is too large Load Diff

View File

@@ -13,6 +13,7 @@
#define __uint(name, val) int (*name)[val]
#define __type(name, val) typeof(val) *name
#define __array(name, val) typeof(val) *name[]
#define __ulong(name, val) enum { ___bpf_concat(__unique_value, __COUNTER__) = val } name
/*
* Helper macro to place programs, maps, license in
@@ -22,12 +23,25 @@
* To allow use of SEC() with externs (e.g., for extern .maps declarations),
* make sure __attribute__((unused)) doesn't trigger compilation warning.
*/
#if __GNUC__ && !__clang__
/*
* Pragma macros are broken on GCC
* https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55578
* https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90400
*/
#define SEC(name) __attribute__((section(name), used))
#else
#define SEC(name) \
_Pragma("GCC diagnostic push") \
_Pragma("GCC diagnostic ignored \"-Wignored-attributes\"") \
__attribute__((section(name), used)) \
_Pragma("GCC diagnostic pop") \
#endif
/* Avoid 'linux/stddef.h' definition of '__always_inline'. */
#undef __always_inline
#define __always_inline inline __attribute__((always_inline))
@@ -64,15 +78,44 @@
/*
* Helper macros to manipulate data structures
*/
#ifndef offsetof
#define offsetof(TYPE, MEMBER) ((unsigned long)&((TYPE *)0)->MEMBER)
#endif
#ifndef container_of
/* offsetof() definition that uses __builtin_offset() might not preserve field
* offset CO-RE relocation properly, so force-redefine offsetof() using
* old-school approach which works with CO-RE correctly
*/
#undef offsetof
#define offsetof(type, member) ((unsigned long)&((type *)0)->member)
/* redefined container_of() to ensure we use the above offsetof() macro */
#undef container_of
#define container_of(ptr, type, member) \
({ \
void *__mptr = (void *)(ptr); \
((type *)(__mptr - offsetof(type, member))); \
})
/*
* Compiler (optimization) barrier.
*/
#ifndef barrier
#define barrier() asm volatile("" ::: "memory")
#endif
/* Variable-specific compiler (optimization) barrier. It's a no-op which makes
* compiler believe that there is some black box modification of a given
* variable and thus prevents compiler from making extra assumption about its
* value and potential simplifications and optimizations on this variable.
*
* E.g., compiler might often delay or even omit 32-bit to 64-bit casting of
* a variable, making some code patterns unverifiable. Putting barrier_var()
* in place will ensure that cast is performed before the barrier_var()
* invocation, because compiler has to pessimistically assume that embedded
* asm section might perform some extra operations on that variable.
*
* This is a variable-specific variant of more global barrier().
*/
#ifndef barrier_var
#define barrier_var(var) asm volatile("" : "+r"(var))
#endif
/*
@@ -94,7 +137,8 @@
/*
* Helper function to perform a tail call with a constant/immediate map slot.
*/
#if __clang_major__ >= 8 && defined(__bpf__)
#if (defined(__clang__) && __clang_major__ >= 8) || (!defined(__clang__) && __GNUC__ > 12)
#if defined(__bpf__)
static __always_inline void
bpf_tail_call_static(void *ctx, const void *map, const __u32 slot)
{
@@ -122,18 +166,7 @@ bpf_tail_call_static(void *ctx, const void *map, const __u32 slot)
: "r0", "r1", "r2", "r3", "r4", "r5");
}
#endif
/*
* Helper structure used by eBPF C program
* to describe BPF map attributes to libbpf loader
*/
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
unsigned int map_flags;
};
#endif
enum libbpf_pin_type {
LIBBPF_PIN_NONE,
@@ -149,6 +182,20 @@ enum libbpf_tristate {
#define __kconfig __attribute__((section(".kconfig")))
#define __ksym __attribute__((section(".ksyms")))
#define __kptr_untrusted __attribute__((btf_type_tag("kptr_untrusted")))
#define __kptr __attribute__((btf_type_tag("kptr")))
#define __percpu_kptr __attribute__((btf_type_tag("percpu_kptr")))
#define bpf_ksym_exists(sym) ({ \
_Static_assert(!__builtin_constant_p(!!sym), #sym " should be marked as __weak"); \
!!sym; \
})
#define __arg_ctx __attribute__((btf_decl_tag("arg:ctx")))
#define __arg_nonnull __attribute((btf_decl_tag("arg:nonnull")))
#define __arg_nullable __attribute((btf_decl_tag("arg:nullable")))
#define __arg_trusted __attribute((btf_decl_tag("arg:trusted")))
#define __arg_arena __attribute((btf_decl_tag("arg:arena")))
#ifndef ___bpf_concat
#define ___bpf_concat(a, b) a ## b
@@ -259,4 +306,107 @@ enum libbpf_tristate {
/* Helper macro to print out debug messages */
#define bpf_printk(fmt, args...) ___bpf_pick_printk(args)(fmt, ##args)
struct bpf_iter_num;
extern int bpf_iter_num_new(struct bpf_iter_num *it, int start, int end) __weak __ksym;
extern int *bpf_iter_num_next(struct bpf_iter_num *it) __weak __ksym;
extern void bpf_iter_num_destroy(struct bpf_iter_num *it) __weak __ksym;
#ifndef bpf_for_each
/* bpf_for_each(iter_type, cur_elem, args...) provides generic construct for
* using BPF open-coded iterators without having to write mundane explicit
* low-level loop logic. Instead, it provides for()-like generic construct
* that can be used pretty naturally. E.g., for some hypothetical cgroup
* iterator, you'd write:
*
* struct cgroup *cg, *parent_cg = <...>;
*
* bpf_for_each(cgroup, cg, parent_cg, CG_ITER_CHILDREN) {
* bpf_printk("Child cgroup id = %d", cg->cgroup_id);
* if (cg->cgroup_id == 123)
* break;
* }
*
* I.e., it looks almost like high-level for each loop in other languages,
* supports continue/break, and is verifiable by BPF verifier.
*
* For iterating integers, the difference betwen bpf_for_each(num, i, N, M)
* and bpf_for(i, N, M) is in that bpf_for() provides additional proof to
* verifier that i is in [N, M) range, and in bpf_for_each() case i is `int
* *`, not just `int`. So for integers bpf_for() is more convenient.
*
* Note: this macro relies on C99 feature of allowing to declare variables
* inside for() loop, bound to for() loop lifetime. It also utilizes GCC
* extension: __attribute__((cleanup(<func>))), supported by both GCC and
* Clang.
*/
#define bpf_for_each(type, cur, args...) for ( \
/* initialize and define destructor */ \
struct bpf_iter_##type ___it __attribute__((aligned(8), /* enforce, just in case */, \
cleanup(bpf_iter_##type##_destroy))), \
/* ___p pointer is just to call bpf_iter_##type##_new() *once* to init ___it */ \
*___p __attribute__((unused)) = ( \
bpf_iter_##type##_new(&___it, ##args), \
/* this is a workaround for Clang bug: it currently doesn't emit BTF */ \
/* for bpf_iter_##type##_destroy() when used from cleanup() attribute */ \
(void)bpf_iter_##type##_destroy, (void *)0); \
/* iteration and termination check */ \
(((cur) = bpf_iter_##type##_next(&___it))); \
)
#endif /* bpf_for_each */
#ifndef bpf_for
/* bpf_for(i, start, end) implements a for()-like looping construct that sets
* provided integer variable *i* to values starting from *start* through,
* but not including, *end*. It also proves to BPF verifier that *i* belongs
* to range [start, end), so this can be used for accessing arrays without
* extra checks.
*
* Note: *start* and *end* are assumed to be expressions with no side effects
* and whose values do not change throughout bpf_for() loop execution. They do
* not have to be statically known or constant, though.
*
* Note: similarly to bpf_for_each(), it relies on C99 feature of declaring for()
* loop bound variables and cleanup attribute, supported by GCC and Clang.
*/
#define bpf_for(i, start, end) for ( \
/* initialize and define destructor */ \
struct bpf_iter_num ___it __attribute__((aligned(8), /* enforce, just in case */ \
cleanup(bpf_iter_num_destroy))), \
/* ___p pointer is necessary to call bpf_iter_num_new() *once* to init ___it */ \
*___p __attribute__((unused)) = ( \
bpf_iter_num_new(&___it, (start), (end)), \
/* this is a workaround for Clang bug: it currently doesn't emit BTF */ \
/* for bpf_iter_num_destroy() when used from cleanup() attribute */ \
(void)bpf_iter_num_destroy, (void *)0); \
({ \
/* iteration step */ \
int *___t = bpf_iter_num_next(&___it); \
/* termination and bounds check */ \
(___t && ((i) = *___t, (i) >= (start) && (i) < (end))); \
}); \
)
#endif /* bpf_for */
#ifndef bpf_repeat
/* bpf_repeat(N) performs N iterations without exposing iteration number
*
* Note: similarly to bpf_for_each(), it relies on C99 feature of declaring for()
* loop bound variables and cleanup attribute, supported by GCC and Clang.
*/
#define bpf_repeat(N) for ( \
/* initialize and define destructor */ \
struct bpf_iter_num ___it __attribute__((aligned(8), /* enforce, just in case */ \
cleanup(bpf_iter_num_destroy))), \
/* ___p pointer is necessary to call bpf_iter_num_new() *once* to init ___it */ \
*___p __attribute__((unused)) = ( \
bpf_iter_num_new(&___it, 0, (N)), \
/* this is a workaround for Clang bug: it currently doesn't emit BTF */ \
/* for bpf_iter_num_destroy() when used from cleanup() attribute */ \
(void)bpf_iter_num_destroy, (void *)0); \
bpf_iter_num_next(&___it); \
/* nothing here */ \
)
#endif /* bpf_repeat */
#endif

File diff suppressed because it is too large Load Diff

1059
src/btf.c

File diff suppressed because it is too large Load Diff

138
src/btf.h
View File

@@ -116,24 +116,15 @@ LIBBPF_API struct btf *btf__parse_raw_split(const char *path, struct btf *base_b
LIBBPF_API struct btf *btf__load_vmlinux_btf(void);
LIBBPF_API struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_btf);
LIBBPF_API struct btf *libbpf_find_kernel_btf(void);
LIBBPF_API struct btf *btf__load_from_kernel_by_id(__u32 id);
LIBBPF_API struct btf *btf__load_from_kernel_by_id_split(__u32 id, struct btf *base_btf);
LIBBPF_DEPRECATED_SINCE(0, 6, "use btf__load_from_kernel_by_id instead")
LIBBPF_API int btf__get_from_id(__u32 id, struct btf **btf);
LIBBPF_DEPRECATED_SINCE(0, 6, "intended for internal libbpf use only")
LIBBPF_API int btf__finalize_data(struct bpf_object *obj, struct btf *btf);
LIBBPF_DEPRECATED_SINCE(0, 6, "use btf__load_into_kernel instead")
LIBBPF_API int btf__load(struct btf *btf);
LIBBPF_API int btf__load_into_kernel(struct btf *btf);
LIBBPF_API __s32 btf__find_by_name(const struct btf *btf,
const char *type_name);
LIBBPF_API __s32 btf__find_by_name_kind(const struct btf *btf,
const char *type_name, __u32 kind);
LIBBPF_DEPRECATED_SINCE(0, 7, "use btf__type_cnt() instead; note that btf__get_nr_types() == btf__type_cnt() - 1")
LIBBPF_API __u32 btf__get_nr_types(const struct btf *btf);
LIBBPF_API __u32 btf__type_cnt(const struct btf *btf);
LIBBPF_API const struct btf *btf__base_btf(const struct btf *btf);
LIBBPF_API const struct btf_type *btf__type_by_id(const struct btf *btf,
@@ -147,32 +138,13 @@ LIBBPF_API int btf__resolve_type(const struct btf *btf, __u32 type_id);
LIBBPF_API int btf__align_of(const struct btf *btf, __u32 id);
LIBBPF_API int btf__fd(const struct btf *btf);
LIBBPF_API void btf__set_fd(struct btf *btf, int fd);
LIBBPF_DEPRECATED_SINCE(0, 7, "use btf__raw_data() instead")
LIBBPF_API const void *btf__get_raw_data(const struct btf *btf, __u32 *size);
LIBBPF_API const void *btf__raw_data(const struct btf *btf, __u32 *size);
LIBBPF_API const char *btf__name_by_offset(const struct btf *btf, __u32 offset);
LIBBPF_API const char *btf__str_by_offset(const struct btf *btf, __u32 offset);
LIBBPF_API int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
__u32 expected_key_size,
__u32 expected_value_size,
__u32 *key_type_id, __u32 *value_type_id);
LIBBPF_API struct btf_ext *btf_ext__new(__u8 *data, __u32 size);
LIBBPF_API struct btf_ext *btf_ext__new(const __u8 *data, __u32 size);
LIBBPF_API void btf_ext__free(struct btf_ext *btf_ext);
LIBBPF_API const void *btf_ext__get_raw_data(const struct btf_ext *btf_ext,
__u32 *size);
LIBBPF_API LIBBPF_DEPRECATED("btf_ext__reloc_func_info was never meant as a public API and has wrong assumptions embedded in it; it will be removed in the future libbpf versions")
int btf_ext__reloc_func_info(const struct btf *btf,
const struct btf_ext *btf_ext,
const char *sec_name, __u32 insns_cnt,
void **func_info, __u32 *cnt);
LIBBPF_API LIBBPF_DEPRECATED("btf_ext__reloc_line_info was never meant as a public API and has wrong assumptions embedded in it; it will be removed in the future libbpf versions")
int btf_ext__reloc_line_info(const struct btf *btf,
const struct btf_ext *btf_ext,
const char *sec_name, __u32 insns_cnt,
void **line_info, __u32 *cnt);
LIBBPF_API __u32 btf_ext__func_info_rec_size(const struct btf_ext *btf_ext);
LIBBPF_API __u32 btf_ext__line_info_rec_size(const struct btf_ext *btf_ext);
LIBBPF_API const void *btf_ext__raw_data(const struct btf_ext *btf_ext, __u32 *size);
LIBBPF_API int btf__find_str(struct btf *btf, const char *s);
LIBBPF_API int btf__add_str(struct btf *btf, const char *s);
@@ -215,6 +187,8 @@ LIBBPF_API int btf__add_field(struct btf *btf, const char *name, int field_type_
/* enum construction APIs */
LIBBPF_API int btf__add_enum(struct btf *btf, const char *name, __u32 bytes_sz);
LIBBPF_API int btf__add_enum_value(struct btf *btf, const char *name, __s64 value);
LIBBPF_API int btf__add_enum64(struct btf *btf, const char *name, __u32 bytes_sz, bool is_signed);
LIBBPF_API int btf__add_enum64_value(struct btf *btf, const char *name, __u64 value);
enum btf_fwd_kind {
BTF_FWD_STRUCT = 0,
@@ -227,6 +201,7 @@ LIBBPF_API int btf__add_typedef(struct btf *btf, const char *name, int ref_type_
LIBBPF_API int btf__add_volatile(struct btf *btf, int ref_type_id);
LIBBPF_API int btf__add_const(struct btf *btf, int ref_type_id);
LIBBPF_API int btf__add_restrict(struct btf *btf, int ref_type_id);
LIBBPF_API int btf__add_type_tag(struct btf *btf, const char *value, int ref_type_id);
/* func and func_proto construction APIs */
LIBBPF_API int btf__add_func(struct btf *btf, const char *name,
@@ -245,25 +220,31 @@ LIBBPF_API int btf__add_decl_tag(struct btf *btf, const char *value, int ref_typ
int component_idx);
struct btf_dedup_opts {
unsigned int dedup_table_size;
bool dont_resolve_fwds;
size_t sz;
/* optional .BTF.ext info to dedup along the main BTF info */
struct btf_ext *btf_ext;
/* force hash collisions (used for testing) */
bool force_collisions;
size_t :0;
};
#define btf_dedup_opts__last_field force_collisions
LIBBPF_API int btf__dedup(struct btf *btf, struct btf_ext *btf_ext,
const struct btf_dedup_opts *opts);
LIBBPF_API int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts);
struct btf_dump;
struct btf_dump_opts {
void *ctx;
size_t sz;
};
#define btf_dump_opts__last_field sz
typedef void (*btf_dump_printf_fn_t)(void *ctx, const char *fmt, va_list args);
LIBBPF_API struct btf_dump *btf_dump__new(const struct btf *btf,
const struct btf_ext *btf_ext,
const struct btf_dump_opts *opts,
btf_dump_printf_fn_t printf_fn);
btf_dump_printf_fn_t printf_fn,
void *ctx,
const struct btf_dump_opts *opts);
LIBBPF_API void btf_dump__free(struct btf_dump *d);
LIBBPF_API int btf_dump__dump_type(struct btf_dump *d, __u32 id);
@@ -313,8 +294,29 @@ btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
const struct btf_dump_type_data_opts *opts);
/*
* A set of helpers for easier BTF types handling
* A set of helpers for easier BTF types handling.
*
* The inline functions below rely on constants from the kernel headers which
* may not be available for applications including this header file. To avoid
* compilation errors, we define all the constants here that were added after
* the initial introduction of the BTF_KIND* constants.
*/
#ifndef BTF_KIND_FUNC
#define BTF_KIND_FUNC 12 /* Function */
#define BTF_KIND_FUNC_PROTO 13 /* Function Proto */
#endif
#ifndef BTF_KIND_VAR
#define BTF_KIND_VAR 14 /* Variable */
#define BTF_KIND_DATASEC 15 /* Section */
#endif
#ifndef BTF_KIND_FLOAT
#define BTF_KIND_FLOAT 16 /* Floating point */
#endif
/* The kernel header switched to enums, so the following were never #defined */
#define BTF_KIND_DECL_TAG 17 /* Decl Tag */
#define BTF_KIND_TYPE_TAG 18 /* Type Tag */
#define BTF_KIND_ENUM64 19 /* Enum for up-to 64bit values */
static inline __u16 btf_kind(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info);
@@ -372,6 +374,11 @@ static inline bool btf_is_enum(const struct btf_type *t)
return btf_kind(t) == BTF_KIND_ENUM;
}
static inline bool btf_is_enum64(const struct btf_type *t)
{
return btf_kind(t) == BTF_KIND_ENUM64;
}
static inline bool btf_is_fwd(const struct btf_type *t)
{
return btf_kind(t) == BTF_KIND_FWD;
@@ -403,7 +410,8 @@ static inline bool btf_is_mod(const struct btf_type *t)
return kind == BTF_KIND_VOLATILE ||
kind == BTF_KIND_CONST ||
kind == BTF_KIND_RESTRICT;
kind == BTF_KIND_RESTRICT ||
kind == BTF_KIND_TYPE_TAG;
}
static inline bool btf_is_func(const struct btf_type *t)
@@ -436,6 +444,23 @@ static inline bool btf_is_decl_tag(const struct btf_type *t)
return btf_kind(t) == BTF_KIND_DECL_TAG;
}
static inline bool btf_is_type_tag(const struct btf_type *t)
{
return btf_kind(t) == BTF_KIND_TYPE_TAG;
}
static inline bool btf_is_any_enum(const struct btf_type *t)
{
return btf_is_enum(t) || btf_is_enum64(t);
}
static inline bool btf_kind_core_compat(const struct btf_type *t1,
const struct btf_type *t2)
{
return btf_kind(t1) == btf_kind(t2) ||
(btf_is_any_enum(t1) && btf_is_any_enum(t2));
}
static inline __u8 btf_int_encoding(const struct btf_type *t)
{
return BTF_INT_ENCODING(*(__u32 *)(t + 1));
@@ -461,6 +486,39 @@ static inline struct btf_enum *btf_enum(const struct btf_type *t)
return (struct btf_enum *)(t + 1);
}
struct btf_enum64;
static inline struct btf_enum64 *btf_enum64(const struct btf_type *t)
{
return (struct btf_enum64 *)(t + 1);
}
static inline __u64 btf_enum64_value(const struct btf_enum64 *e)
{
/* struct btf_enum64 is introduced in Linux 6.0, which is very
* bleeding-edge. Here we are avoiding relying on struct btf_enum64
* definition coming from kernel UAPI headers to support wider range
* of system-wide kernel headers.
*
* Given this header can be also included from C++ applications, that
* further restricts C tricks we can use (like using compatible
* anonymous struct). So just treat struct btf_enum64 as
* a three-element array of u32 and access second (lo32) and third
* (hi32) elements directly.
*
* For reference, here is a struct btf_enum64 definition:
*
* const struct btf_enum64 {
* __u32 name_off;
* __u32 val_lo32;
* __u32 val_hi32;
* };
*/
const __u32 *e64 = (const __u32 *)e;
return ((__u64)e64[2] << 32) | e64[1];
}
static inline struct btf_member *btf_members(const struct btf_type *t)
{
return (struct btf_member *)(t + 1);

View File

@@ -13,6 +13,7 @@
#include <ctype.h>
#include <endian.h>
#include <errno.h>
#include <limits.h>
#include <linux/err.h>
#include <linux/btf.h>
#include <linux/kernel.h>
@@ -77,9 +78,8 @@ struct btf_dump_data {
struct btf_dump {
const struct btf *btf;
const struct btf_ext *btf_ext;
btf_dump_printf_fn_t printf_fn;
struct btf_dump_opts opts;
void *cb_ctx;
int ptr_sz;
bool strip_mods;
bool skip_anon_defs;
@@ -118,14 +118,14 @@ struct btf_dump {
struct btf_dump_data *typed_dump;
};
static size_t str_hash_fn(const void *key, void *ctx)
static size_t str_hash_fn(long key, void *ctx)
{
return str_hash(key);
return str_hash((void *)key);
}
static bool str_equal_fn(const void *a, const void *b, void *ctx)
static bool str_equal_fn(long a, long b, void *ctx)
{
return strcmp(a, b) == 0;
return strcmp((void *)a, (void *)b) == 0;
}
static const char *btf_name_of(const struct btf_dump *d, __u32 name_off)
@@ -138,7 +138,7 @@ static void btf_dump_printf(const struct btf_dump *d, const char *fmt, ...)
va_list args;
va_start(args, fmt);
d->printf_fn(d->opts.ctx, fmt, args);
d->printf_fn(d->cb_ctx, fmt, args);
va_end(args);
}
@@ -146,21 +146,26 @@ static int btf_dump_mark_referenced(struct btf_dump *d);
static int btf_dump_resize(struct btf_dump *d);
struct btf_dump *btf_dump__new(const struct btf *btf,
const struct btf_ext *btf_ext,
const struct btf_dump_opts *opts,
btf_dump_printf_fn_t printf_fn)
btf_dump_printf_fn_t printf_fn,
void *ctx,
const struct btf_dump_opts *opts)
{
struct btf_dump *d;
int err;
if (!OPTS_VALID(opts, btf_dump_opts))
return libbpf_err_ptr(-EINVAL);
if (!printf_fn)
return libbpf_err_ptr(-EINVAL);
d = calloc(1, sizeof(struct btf_dump));
if (!d)
return libbpf_err_ptr(-ENOMEM);
d->btf = btf;
d->btf_ext = btf_ext;
d->printf_fn = printf_fn;
d->opts.ctx = opts ? opts->ctx : NULL;
d->cb_ctx = ctx;
d->ptr_sz = btf__pointer_size(btf) ? : sizeof(void *);
d->type_names = hashmap__new(str_hash_fn, str_equal_fn, NULL);
@@ -215,6 +220,17 @@ static int btf_dump_resize(struct btf_dump *d)
return 0;
}
static void btf_dump_free_names(struct hashmap *map)
{
size_t bkt;
struct hashmap_entry *cur;
hashmap__for_each_entry(map, cur, bkt)
free((void *)cur->pkey);
hashmap__free(map);
}
void btf_dump__free(struct btf_dump *d)
{
int i;
@@ -233,8 +249,8 @@ void btf_dump__free(struct btf_dump *d)
free(d->cached_names);
free(d->emit_queue);
free(d->decl_stack);
hashmap__free(d->type_names);
hashmap__free(d->ident_names);
btf_dump_free_names(d->type_names);
btf_dump_free_names(d->ident_names);
free(d);
}
@@ -305,6 +321,7 @@ static int btf_dump_mark_referenced(struct btf_dump *d)
switch (btf_kind(t)) {
case BTF_KIND_INT:
case BTF_KIND_ENUM:
case BTF_KIND_ENUM64:
case BTF_KIND_FWD:
case BTF_KIND_FLOAT:
break;
@@ -317,6 +334,7 @@ static int btf_dump_mark_referenced(struct btf_dump *d)
case BTF_KIND_FUNC:
case BTF_KIND_VAR:
case BTF_KIND_DECL_TAG:
case BTF_KIND_TYPE_TAG:
d->type_states[t->type].referenced = 1;
break;
@@ -524,6 +542,7 @@ static int btf_dump_order_type(struct btf_dump *d, __u32 id, bool through_ptr)
return 1;
}
case BTF_KIND_ENUM:
case BTF_KIND_ENUM64:
case BTF_KIND_FWD:
/*
* non-anonymous or non-referenced enums are top-level
@@ -560,6 +579,7 @@ static int btf_dump_order_type(struct btf_dump *d, __u32 id, bool through_ptr)
case BTF_KIND_VOLATILE:
case BTF_KIND_CONST:
case BTF_KIND_RESTRICT:
case BTF_KIND_TYPE_TAG:
return btf_dump_order_type(d, t->type, through_ptr);
case BTF_KIND_FUNC_PROTO: {
@@ -724,6 +744,7 @@ static void btf_dump_emit_type(struct btf_dump *d, __u32 id, __u32 cont_id)
tstate->emit_state = EMITTED;
break;
case BTF_KIND_ENUM:
case BTF_KIND_ENUM64:
if (top_level_def) {
btf_dump_emit_enum_def(d, id, t, 0);
btf_dump_printf(d, ";\n\n");
@@ -734,6 +755,7 @@ static void btf_dump_emit_type(struct btf_dump *d, __u32 id, __u32 cont_id)
case BTF_KIND_VOLATILE:
case BTF_KIND_CONST:
case BTF_KIND_RESTRICT:
case BTF_KIND_TYPE_TAG:
btf_dump_emit_type(d, t->type, cont_id);
break;
case BTF_KIND_ARRAY:
@@ -812,14 +834,9 @@ static bool btf_is_struct_packed(const struct btf *btf, __u32 id,
const struct btf_type *t)
{
const struct btf_member *m;
int align, i, bit_sz;
int max_align = 1, align, i, bit_sz;
__u16 vlen;
align = btf__align_of(btf, id);
/* size of a non-packed struct has to be a multiple of its alignment*/
if (align && t->size % align)
return true;
m = btf_members(t);
vlen = btf_vlen(t);
/* all non-bitfield fields have to be naturally aligned */
@@ -828,8 +845,11 @@ static bool btf_is_struct_packed(const struct btf *btf, __u32 id,
bit_sz = btf_member_bitfield_size(t, i);
if (align && bit_sz == 0 && m->offset % (8 * align) != 0)
return true;
max_align = max(align, max_align);
}
/* size of a non-packed struct has to be a multiple of its alignment */
if (t->size % max_align != 0)
return true;
/*
* if original struct was marked as packed, but its layout is
* naturally aligned, we'll detect that it's not packed
@@ -837,44 +857,97 @@ static bool btf_is_struct_packed(const struct btf *btf, __u32 id,
return false;
}
static int chip_away_bits(int total, int at_most)
{
return total % at_most ? : at_most;
}
static void btf_dump_emit_bit_padding(const struct btf_dump *d,
int cur_off, int m_off, int m_bit_sz,
int align, int lvl)
int cur_off, int next_off, int next_align,
bool in_bitfield, int lvl)
{
int off_diff = m_off - cur_off;
int ptr_bits = d->ptr_sz * 8;
const struct {
const char *name;
int bits;
} pads[] = {
{"long", d->ptr_sz * 8}, {"int", 32}, {"short", 16}, {"char", 8}
};
int new_off, pad_bits, bits, i;
const char *pad_type;
if (off_diff <= 0)
/* no gap */
return;
if (m_bit_sz == 0 && off_diff < align * 8)
/* natural padding will take care of a gap */
return;
if (cur_off >= next_off)
return; /* no gap */
while (off_diff > 0) {
const char *pad_type;
int pad_bits;
/* For filling out padding we want to take advantage of
* natural alignment rules to minimize unnecessary explicit
* padding. First, we find the largest type (among long, int,
* short, or char) that can be used to force naturally aligned
* boundary. Once determined, we'll use such type to fill in
* the remaining padding gap. In some cases we can rely on
* compiler filling some gaps, but sometimes we need to force
* alignment to close natural alignment with markers like
* `long: 0` (this is always the case for bitfields). Note
* that even if struct itself has, let's say 4-byte alignment
* (i.e., it only uses up to int-aligned types), using `long:
* X;` explicit padding doesn't actually change struct's
* overall alignment requirements, but compiler does take into
* account that type's (long, in this example) natural
* alignment requirements when adding implicit padding. We use
* this fact heavily and don't worry about ruining correct
* struct alignment requirement.
*/
for (i = 0; i < ARRAY_SIZE(pads); i++) {
pad_bits = pads[i].bits;
pad_type = pads[i].name;
if (ptr_bits > 32 && off_diff > 32) {
pad_type = "long";
pad_bits = chip_away_bits(off_diff, ptr_bits);
} else if (off_diff > 16) {
pad_type = "int";
pad_bits = chip_away_bits(off_diff, 32);
} else if (off_diff > 8) {
pad_type = "short";
pad_bits = chip_away_bits(off_diff, 16);
} else {
pad_type = "char";
pad_bits = chip_away_bits(off_diff, 8);
new_off = roundup(cur_off, pad_bits);
if (new_off <= next_off)
break;
}
if (new_off > cur_off && new_off <= next_off) {
/* We need explicit `<type>: 0` aligning mark if next
* field is right on alignment offset and its
* alignment requirement is less strict than <type>'s
* alignment (so compiler won't naturally align to the
* offset we expect), or if subsequent `<type>: X`,
* will actually completely fit in the remaining hole,
* making compiler basically ignore `<type>: X`
* completely.
*/
if (in_bitfield ||
(new_off == next_off && roundup(cur_off, next_align * 8) != new_off) ||
(new_off != next_off && next_off - new_off <= new_off - cur_off))
/* but for bitfields we'll emit explicit bit count */
btf_dump_printf(d, "\n%s%s: %d;", pfx(lvl), pad_type,
in_bitfield ? new_off - cur_off : 0);
cur_off = new_off;
}
/* Now we know we start at naturally aligned offset for a chosen
* padding type (long, int, short, or char), and so the rest is just
* a straightforward filling of remaining padding gap with full
* `<type>: sizeof(<type>);` markers, except for the last one, which
* might need smaller than sizeof(<type>) padding.
*/
while (cur_off != next_off) {
bits = min(next_off - cur_off, pad_bits);
if (bits == pad_bits) {
btf_dump_printf(d, "\n%s%s: %d;", pfx(lvl), pad_type, pad_bits);
cur_off += bits;
continue;
}
/* For the remainder padding that doesn't cover entire
* pad_type bit length, we pick the smallest necessary type.
* This is pure aesthetics, we could have just used `long`,
* but having smallest necessary one communicates better the
* scale of the padding gap.
*/
for (i = ARRAY_SIZE(pads) - 1; i >= 0; i--) {
pad_type = pads[i].name;
pad_bits = pads[i].bits;
if (pad_bits < bits)
continue;
btf_dump_printf(d, "\n%s%s: %d;", pfx(lvl), pad_type, bits);
cur_off += bits;
break;
}
btf_dump_printf(d, "\n%s%s: %d;", pfx(lvl), pad_type, pad_bits);
off_diff -= pad_bits;
}
}
@@ -894,9 +967,11 @@ static void btf_dump_emit_struct_def(struct btf_dump *d,
{
const struct btf_member *m = btf_members(t);
bool is_struct = btf_is_struct(t);
int align, i, packed, off = 0;
bool packed, prev_bitfield = false;
int align, i, off = 0;
__u16 vlen = btf_vlen(t);
align = btf__align_of(d->btf, id);
packed = is_struct ? btf_is_struct_packed(d->btf, id, t) : 0;
btf_dump_printf(d, "%s%s%s {",
@@ -906,37 +981,47 @@ static void btf_dump_emit_struct_def(struct btf_dump *d,
for (i = 0; i < vlen; i++, m++) {
const char *fname;
int m_off, m_sz;
int m_off, m_sz, m_align;
bool in_bitfield;
fname = btf_name_of(d, m->name_off);
m_sz = btf_member_bitfield_size(t, i);
m_off = btf_member_bit_offset(t, i);
align = packed ? 1 : btf__align_of(d->btf, m->type);
m_align = packed ? 1 : btf__align_of(d->btf, m->type);
btf_dump_emit_bit_padding(d, off, m_off, m_sz, align, lvl + 1);
in_bitfield = prev_bitfield && m_sz != 0;
btf_dump_emit_bit_padding(d, off, m_off, m_align, in_bitfield, lvl + 1);
btf_dump_printf(d, "\n%s", pfx(lvl + 1));
btf_dump_emit_type_decl(d, m->type, fname, lvl + 1);
if (m_sz) {
btf_dump_printf(d, ": %d", m_sz);
off = m_off + m_sz;
prev_bitfield = true;
} else {
m_sz = max((__s64)0, btf__resolve_size(d->btf, m->type));
off = m_off + m_sz * 8;
prev_bitfield = false;
}
btf_dump_printf(d, ";");
}
/* pad at the end, if necessary */
if (is_struct) {
align = packed ? 1 : btf__align_of(d->btf, id);
btf_dump_emit_bit_padding(d, off, t->size * 8, 0, align,
lvl + 1);
}
if (is_struct)
btf_dump_emit_bit_padding(d, off, t->size * 8, align, false, lvl + 1);
if (vlen)
/*
* Keep `struct empty {}` on a single line,
* only print newline when there are regular or padding fields.
*/
if (vlen || t->size) {
btf_dump_printf(d, "\n");
btf_dump_printf(d, "%s}", pfx(lvl));
btf_dump_printf(d, "%s}", pfx(lvl));
} else {
btf_dump_printf(d, "}");
}
if (packed)
btf_dump_printf(d, " __attribute__((packed))");
}
@@ -973,38 +1058,118 @@ static void btf_dump_emit_enum_fwd(struct btf_dump *d, __u32 id,
btf_dump_printf(d, "enum %s", btf_dump_type_name(d, id));
}
static void btf_dump_emit_enum32_val(struct btf_dump *d,
const struct btf_type *t,
int lvl, __u16 vlen)
{
const struct btf_enum *v = btf_enum(t);
bool is_signed = btf_kflag(t);
const char *fmt_str;
const char *name;
size_t dup_cnt;
int i;
for (i = 0; i < vlen; i++, v++) {
name = btf_name_of(d, v->name_off);
/* enumerators share namespace with typedef idents */
dup_cnt = btf_dump_name_dups(d, d->ident_names, name);
if (dup_cnt > 1) {
fmt_str = is_signed ? "\n%s%s___%zd = %d," : "\n%s%s___%zd = %u,";
btf_dump_printf(d, fmt_str, pfx(lvl + 1), name, dup_cnt, v->val);
} else {
fmt_str = is_signed ? "\n%s%s = %d," : "\n%s%s = %u,";
btf_dump_printf(d, fmt_str, pfx(lvl + 1), name, v->val);
}
}
}
static void btf_dump_emit_enum64_val(struct btf_dump *d,
const struct btf_type *t,
int lvl, __u16 vlen)
{
const struct btf_enum64 *v = btf_enum64(t);
bool is_signed = btf_kflag(t);
const char *fmt_str;
const char *name;
size_t dup_cnt;
__u64 val;
int i;
for (i = 0; i < vlen; i++, v++) {
name = btf_name_of(d, v->name_off);
dup_cnt = btf_dump_name_dups(d, d->ident_names, name);
val = btf_enum64_value(v);
if (dup_cnt > 1) {
fmt_str = is_signed ? "\n%s%s___%zd = %lldLL,"
: "\n%s%s___%zd = %lluULL,";
btf_dump_printf(d, fmt_str,
pfx(lvl + 1), name, dup_cnt,
(unsigned long long)val);
} else {
fmt_str = is_signed ? "\n%s%s = %lldLL,"
: "\n%s%s = %lluULL,";
btf_dump_printf(d, fmt_str,
pfx(lvl + 1), name,
(unsigned long long)val);
}
}
}
static void btf_dump_emit_enum_def(struct btf_dump *d, __u32 id,
const struct btf_type *t,
int lvl)
{
const struct btf_enum *v = btf_enum(t);
__u16 vlen = btf_vlen(t);
const char *name;
size_t dup_cnt;
int i;
btf_dump_printf(d, "enum%s%s",
t->name_off ? " " : "",
btf_dump_type_name(d, id));
if (vlen) {
btf_dump_printf(d, " {");
for (i = 0; i < vlen; i++, v++) {
name = btf_name_of(d, v->name_off);
/* enumerators share namespace with typedef idents */
dup_cnt = btf_dump_name_dups(d, d->ident_names, name);
if (dup_cnt > 1) {
btf_dump_printf(d, "\n%s%s___%zu = %u,",
pfx(lvl + 1), name, dup_cnt,
(__u32)v->val);
} else {
btf_dump_printf(d, "\n%s%s = %u,",
pfx(lvl + 1), name,
(__u32)v->val);
if (!vlen)
return;
btf_dump_printf(d, " {");
if (btf_is_enum(t))
btf_dump_emit_enum32_val(d, t, lvl, vlen);
else
btf_dump_emit_enum64_val(d, t, lvl, vlen);
btf_dump_printf(d, "\n%s}", pfx(lvl));
/* special case enums with special sizes */
if (t->size == 1) {
/* one-byte enums can be forced with mode(byte) attribute */
btf_dump_printf(d, " __attribute__((mode(byte)))");
} else if (t->size == 8 && d->ptr_sz == 8) {
/* enum can be 8-byte sized if one of the enumerator values
* doesn't fit in 32-bit integer, or by adding mode(word)
* attribute (but probably only on 64-bit architectures); do
* our best here to try to satisfy the contract without adding
* unnecessary attributes
*/
bool needs_word_mode;
if (btf_is_enum(t)) {
/* enum can't represent 64-bit values, so we need word mode */
needs_word_mode = true;
} else {
/* enum64 needs mode(word) if none of its values has
* non-zero upper 32-bits (which means that all values
* fit in 32-bit integers and won't cause compiler to
* bump enum to be 64-bit naturally
*/
int i;
needs_word_mode = true;
for (i = 0; i < vlen; i++) {
if (btf_enum64(t)[i].val_hi32 != 0) {
needs_word_mode = false;
break;
}
}
}
btf_dump_printf(d, "\n%s}", pfx(lvl));
if (needs_word_mode)
btf_dump_printf(d, " __attribute__((mode(word)))");
}
}
static void btf_dump_emit_fwd_def(struct btf_dump *d, __u32 id,
@@ -1154,6 +1319,7 @@ skip_mod:
case BTF_KIND_CONST:
case BTF_KIND_RESTRICT:
case BTF_KIND_FUNC_PROTO:
case BTF_KIND_TYPE_TAG:
id = t->type;
break;
case BTF_KIND_ARRAY:
@@ -1161,6 +1327,7 @@ skip_mod:
break;
case BTF_KIND_INT:
case BTF_KIND_ENUM:
case BTF_KIND_ENUM64:
case BTF_KIND_FWD:
case BTF_KIND_STRUCT:
case BTF_KIND_UNION:
@@ -1295,6 +1462,7 @@ static void btf_dump_emit_type_chain(struct btf_dump *d,
btf_dump_emit_struct_fwd(d, id, t);
break;
case BTF_KIND_ENUM:
case BTF_KIND_ENUM64:
btf_dump_emit_mods(d, decls);
/* inline anonymous enum */
if (t->name_off == 0 && !d->skip_anon_defs)
@@ -1322,6 +1490,11 @@ static void btf_dump_emit_type_chain(struct btf_dump *d,
case BTF_KIND_RESTRICT:
btf_dump_printf(d, " restrict");
break;
case BTF_KIND_TYPE_TAG:
btf_dump_emit_mods(d, decls);
name = btf_name_of(d, t->name_off);
btf_dump_printf(d, " __attribute__((btf_type_tag(\"%s\")))", name);
break;
case BTF_KIND_ARRAY: {
const struct btf_array *a = btf_array(t);
const struct btf_type *next_t;
@@ -1459,11 +1632,22 @@ static void btf_dump_emit_type_cast(struct btf_dump *d, __u32 id,
static size_t btf_dump_name_dups(struct btf_dump *d, struct hashmap *name_map,
const char *orig_name)
{
char *old_name, *new_name;
size_t dup_cnt = 0;
int err;
hashmap__find(name_map, orig_name, (void **)&dup_cnt);
new_name = strdup(orig_name);
if (!new_name)
return 1;
(void)hashmap__find(name_map, orig_name, &dup_cnt);
dup_cnt++;
hashmap__set(name_map, orig_name, (void *)dup_cnt, NULL, NULL);
err = hashmap__set(name_map, new_name, dup_cnt, &old_name, NULL);
if (err)
free(new_name);
free(old_name);
return dup_cnt;
}
@@ -1483,6 +1667,11 @@ static const char *btf_dump_resolve_name(struct btf_dump *d, __u32 id,
if (s->name_resolved)
return *cached_name ? *cached_name : orig_name;
if (btf_is_fwd(t) || (btf_is_enum(t) && btf_vlen(t) == 0)) {
s->name_resolved = 1;
return orig_name;
}
dup_cnt = btf_dump_name_dups(d, name_map, orig_name);
if (dup_cnt > 1) {
const size_t max_len = 256;
@@ -1740,6 +1929,7 @@ static int btf_dump_int_data(struct btf_dump *d,
if (d->typed_dump->is_array_terminated)
break;
if (*(char *)data == '\0') {
btf_dump_type_values(d, "'\\0'");
d->typed_dump->is_array_terminated = true;
break;
}
@@ -1839,14 +2029,17 @@ static int btf_dump_array_data(struct btf_dump *d,
{
const struct btf_array *array = btf_array(t);
const struct btf_type *elem_type;
__u32 i, elem_size = 0, elem_type_id;
__u32 i, elem_type_id;
__s64 elem_size;
bool is_array_member;
bool is_array_terminated;
elem_type_id = array->type;
elem_type = skip_mods_and_typedefs(d->btf, elem_type_id, NULL);
elem_size = btf__resolve_size(d->btf, elem_type_id);
if (elem_size <= 0) {
pr_warn("unexpected elem size %d for array type [%u]\n", elem_size, id);
pr_warn("unexpected elem size %zd for array type [%u]\n",
(ssize_t)elem_size, id);
return -EINVAL;
}
@@ -1875,12 +2068,15 @@ static int btf_dump_array_data(struct btf_dump *d,
*/
is_array_member = d->typed_dump->is_array_member;
d->typed_dump->is_array_member = true;
is_array_terminated = d->typed_dump->is_array_terminated;
d->typed_dump->is_array_terminated = false;
for (i = 0; i < array->nelems; i++, data += elem_size) {
if (d->typed_dump->is_array_terminated)
break;
btf_dump_dump_type_data(d, NULL, elem_type, elem_type_id, data, 0, 0);
}
d->typed_dump->is_array_member = is_array_member;
d->typed_dump->is_array_terminated = is_array_terminated;
d->typed_dump->depth--;
btf_dump_data_pfx(d);
btf_dump_type_values(d, "]");
@@ -1895,7 +2091,7 @@ static int btf_dump_struct_data(struct btf_dump *d,
{
const struct btf_member *m = btf_members(t);
__u16 n = btf_vlen(t);
int i, err;
int i, err = 0;
/* note that we increment depth before calling btf_dump_print() below;
* this is intentional. btf_dump_data_newline() will not print a
@@ -1959,7 +2155,8 @@ static int btf_dump_get_enum_value(struct btf_dump *d,
__u32 id,
__s64 *value)
{
/* handle unaligned enum value */
bool is_signed = btf_kflag(t);
if (!ptr_is_aligned(d->btf, id, data)) {
__u64 val;
int err;
@@ -1976,13 +2173,13 @@ static int btf_dump_get_enum_value(struct btf_dump *d,
*value = *(__s64 *)data;
return 0;
case 4:
*value = *(__s32 *)data;
*value = is_signed ? (__s64)*(__s32 *)data : *(__u32 *)data;
return 0;
case 2:
*value = *(__s16 *)data;
*value = is_signed ? *(__s16 *)data : *(__u16 *)data;
return 0;
case 1:
*value = *(__s8 *)data;
*value = is_signed ? *(__s8 *)data : *(__u8 *)data;
return 0;
default:
pr_warn("unexpected size %d for enum, id:[%u]\n", t->size, id);
@@ -1995,7 +2192,7 @@ static int btf_dump_enum_data(struct btf_dump *d,
__u32 id,
const void *data)
{
const struct btf_enum *e;
bool is_signed;
__s64 value;
int i, err;
@@ -2003,14 +2200,31 @@ static int btf_dump_enum_data(struct btf_dump *d,
if (err)
return err;
for (i = 0, e = btf_enum(t); i < btf_vlen(t); i++, e++) {
if (value != e->val)
continue;
btf_dump_type_values(d, "%s", btf_name_of(d, e->name_off));
return 0;
}
is_signed = btf_kflag(t);
if (btf_is_enum(t)) {
const struct btf_enum *e;
btf_dump_type_values(d, "%d", value);
for (i = 0, e = btf_enum(t); i < btf_vlen(t); i++, e++) {
if (value != e->val)
continue;
btf_dump_type_values(d, "%s", btf_name_of(d, e->name_off));
return 0;
}
btf_dump_type_values(d, is_signed ? "%d" : "%u", value);
} else {
const struct btf_enum64 *e;
for (i = 0, e = btf_enum64(t); i < btf_vlen(t); i++, e++) {
if (value != btf_enum64_value(e))
continue;
btf_dump_type_values(d, "%s", btf_name_of(d, e->name_off));
return 0;
}
btf_dump_type_values(d, is_signed ? "%lldLL" : "%lluULL",
(unsigned long long)value);
}
return 0;
}
@@ -2041,9 +2255,25 @@ static int btf_dump_type_data_check_overflow(struct btf_dump *d,
const struct btf_type *t,
__u32 id,
const void *data,
__u8 bits_offset)
__u8 bits_offset,
__u8 bit_sz)
{
__s64 size = btf__resolve_size(d->btf, id);
__s64 size;
if (bit_sz) {
/* bits_offset is at most 7. bit_sz is at most 128. */
__u8 nr_bytes = (bits_offset + bit_sz + 7) / 8;
/* When bit_sz is non zero, it is called from
* btf_dump_struct_data() where it only cares about
* negative error value.
* Return nr_bytes in success case to make it
* consistent as the regular integer case below.
*/
return data + nr_bytes > d->typed_dump->data_end ? -E2BIG : nr_bytes;
}
size = btf__resolve_size(d->btf, id);
if (size < 0 || size >= INT_MAX) {
pr_warn("unexpected size [%zu] for id [%u]\n",
@@ -2070,6 +2300,7 @@ static int btf_dump_type_data_check_overflow(struct btf_dump *d,
case BTF_KIND_FLOAT:
case BTF_KIND_PTR:
case BTF_KIND_ENUM:
case BTF_KIND_ENUM64:
if (data + bits_offset / 8 + size > d->typed_dump->data_end)
return -E2BIG;
break;
@@ -2174,6 +2405,7 @@ static int btf_dump_type_data_check_zero(struct btf_dump *d,
return -ENODATA;
}
case BTF_KIND_ENUM:
case BTF_KIND_ENUM64:
err = btf_dump_get_enum_value(d, t, data, id, &value);
if (err)
return err;
@@ -2194,9 +2426,9 @@ static int btf_dump_dump_type_data(struct btf_dump *d,
__u8 bits_offset,
__u8 bit_sz)
{
int size, err;
int size, err = 0;
size = btf_dump_type_data_check_overflow(d, t, id, data, bits_offset);
size = btf_dump_type_data_check_overflow(d, t, id, data, bits_offset, bit_sz);
if (size < 0)
return size;
err = btf_dump_type_data_check_zero(d, t, id, data, bits_offset, bit_sz);
@@ -2246,6 +2478,7 @@ static int btf_dump_dump_type_data(struct btf_dump *d,
err = btf_dump_struct_data(d, t, id, data);
break;
case BTF_KIND_ENUM:
case BTF_KIND_ENUM64:
/* handle bitfield and int enum values */
if (bit_sz) {
__u64 print_num;
@@ -2296,11 +2529,11 @@ int btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
d->typed_dump->indent_lvl = OPTS_GET(opts, indent_level, 0);
/* default indent string is a tab */
if (!opts->indent_str)
if (!OPTS_GET(opts, indent_str, NULL))
d->typed_dump->indent_str[0] = '\t';
else
strncat(d->typed_dump->indent_str, opts->indent_str,
sizeof(d->typed_dump->indent_str) - 1);
libbpf_strlcpy(d->typed_dump->indent_str, opts->indent_str,
sizeof(d->typed_dump->indent_str));
d->typed_dump->compact = OPTS_GET(opts, compact, false);
d->typed_dump->skip_names = OPTS_GET(opts, skip_names, false);

558
src/elf.c Normal file
View File

@@ -0,0 +1,558 @@
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <libelf.h>
#include <gelf.h>
#include <fcntl.h>
#include <linux/kernel.h>
#include "libbpf_internal.h"
#include "str_error.h"
/* A SHT_GNU_versym section holds 16-bit words. This bit is set if
* the symbol is hidden and can only be seen when referenced using an
* explicit version number. This is a GNU extension.
*/
#define VERSYM_HIDDEN 0x8000
/* This is the mask for the rest of the data in a word read from a
* SHT_GNU_versym section.
*/
#define VERSYM_VERSION 0x7fff
int elf_open(const char *binary_path, struct elf_fd *elf_fd)
{
char errmsg[STRERR_BUFSIZE];
int fd, ret;
Elf *elf;
if (elf_version(EV_CURRENT) == EV_NONE) {
pr_warn("elf: failed to init libelf for %s\n", binary_path);
return -LIBBPF_ERRNO__LIBELF;
}
fd = open(binary_path, O_RDONLY | O_CLOEXEC);
if (fd < 0) {
ret = -errno;
pr_warn("elf: failed to open %s: %s\n", binary_path,
libbpf_strerror_r(ret, errmsg, sizeof(errmsg)));
return ret;
}
elf = elf_begin(fd, ELF_C_READ_MMAP, NULL);
if (!elf) {
pr_warn("elf: could not read elf from %s: %s\n", binary_path, elf_errmsg(-1));
close(fd);
return -LIBBPF_ERRNO__FORMAT;
}
elf_fd->fd = fd;
elf_fd->elf = elf;
return 0;
}
void elf_close(struct elf_fd *elf_fd)
{
if (!elf_fd)
return;
elf_end(elf_fd->elf);
close(elf_fd->fd);
}
/* Return next ELF section of sh_type after scn, or first of that type if scn is NULL. */
static Elf_Scn *elf_find_next_scn_by_type(Elf *elf, int sh_type, Elf_Scn *scn)
{
while ((scn = elf_nextscn(elf, scn)) != NULL) {
GElf_Shdr sh;
if (!gelf_getshdr(scn, &sh))
continue;
if (sh.sh_type == sh_type)
return scn;
}
return NULL;
}
struct elf_sym {
const char *name;
GElf_Sym sym;
GElf_Shdr sh;
int ver;
bool hidden;
};
struct elf_sym_iter {
Elf *elf;
Elf_Data *syms;
Elf_Data *versyms;
Elf_Data *verdefs;
size_t nr_syms;
size_t strtabidx;
size_t verdef_strtabidx;
size_t next_sym_idx;
struct elf_sym sym;
int st_type;
};
static int elf_sym_iter_new(struct elf_sym_iter *iter,
Elf *elf, const char *binary_path,
int sh_type, int st_type)
{
Elf_Scn *scn = NULL;
GElf_Ehdr ehdr;
GElf_Shdr sh;
memset(iter, 0, sizeof(*iter));
if (!gelf_getehdr(elf, &ehdr)) {
pr_warn("elf: failed to get ehdr from %s: %s\n", binary_path, elf_errmsg(-1));
return -EINVAL;
}
scn = elf_find_next_scn_by_type(elf, sh_type, NULL);
if (!scn) {
pr_debug("elf: failed to find symbol table ELF sections in '%s'\n",
binary_path);
return -ENOENT;
}
if (!gelf_getshdr(scn, &sh))
return -EINVAL;
iter->strtabidx = sh.sh_link;
iter->syms = elf_getdata(scn, 0);
if (!iter->syms) {
pr_warn("elf: failed to get symbols for symtab section in '%s': %s\n",
binary_path, elf_errmsg(-1));
return -EINVAL;
}
iter->nr_syms = iter->syms->d_size / sh.sh_entsize;
iter->elf = elf;
iter->st_type = st_type;
/* Version symbol table is meaningful to dynsym only */
if (sh_type != SHT_DYNSYM)
return 0;
scn = elf_find_next_scn_by_type(elf, SHT_GNU_versym, NULL);
if (!scn)
return 0;
iter->versyms = elf_getdata(scn, 0);
scn = elf_find_next_scn_by_type(elf, SHT_GNU_verdef, NULL);
if (!scn)
return 0;
iter->verdefs = elf_getdata(scn, 0);
if (!iter->verdefs || !gelf_getshdr(scn, &sh)) {
pr_warn("elf: failed to get verdef ELF section in '%s'\n", binary_path);
return -EINVAL;
}
iter->verdef_strtabidx = sh.sh_link;
return 0;
}
static struct elf_sym *elf_sym_iter_next(struct elf_sym_iter *iter)
{
struct elf_sym *ret = &iter->sym;
GElf_Sym *sym = &ret->sym;
const char *name = NULL;
GElf_Versym versym;
Elf_Scn *sym_scn;
size_t idx;
for (idx = iter->next_sym_idx; idx < iter->nr_syms; idx++) {
if (!gelf_getsym(iter->syms, idx, sym))
continue;
if (GELF_ST_TYPE(sym->st_info) != iter->st_type)
continue;
name = elf_strptr(iter->elf, iter->strtabidx, sym->st_name);
if (!name)
continue;
sym_scn = elf_getscn(iter->elf, sym->st_shndx);
if (!sym_scn)
continue;
if (!gelf_getshdr(sym_scn, &ret->sh))
continue;
iter->next_sym_idx = idx + 1;
ret->name = name;
ret->ver = 0;
ret->hidden = false;
if (iter->versyms) {
if (!gelf_getversym(iter->versyms, idx, &versym))
continue;
ret->ver = versym & VERSYM_VERSION;
ret->hidden = versym & VERSYM_HIDDEN;
}
return ret;
}
return NULL;
}
static const char *elf_get_vername(struct elf_sym_iter *iter, int ver)
{
GElf_Verdaux verdaux;
GElf_Verdef verdef;
int offset;
if (!iter->verdefs)
return NULL;
offset = 0;
while (gelf_getverdef(iter->verdefs, offset, &verdef)) {
if (verdef.vd_ndx != ver) {
if (!verdef.vd_next)
break;
offset += verdef.vd_next;
continue;
}
if (!gelf_getverdaux(iter->verdefs, offset + verdef.vd_aux, &verdaux))
break;
return elf_strptr(iter->elf, iter->verdef_strtabidx, verdaux.vda_name);
}
return NULL;
}
static bool symbol_match(struct elf_sym_iter *iter, int sh_type, struct elf_sym *sym,
const char *name, size_t name_len, const char *lib_ver)
{
const char *ver_name;
/* Symbols are in forms of func, func@LIB_VER or func@@LIB_VER
* make sure the func part matches the user specified name
*/
if (strncmp(sym->name, name, name_len) != 0)
return false;
/* ...but we don't want a search for "foo" to match 'foo2" also, so any
* additional characters in sname should be of the form "@@LIB".
*/
if (sym->name[name_len] != '\0' && sym->name[name_len] != '@')
return false;
/* If user does not specify symbol version, then we got a match */
if (!lib_ver)
return true;
/* If user specifies symbol version, for dynamic symbols,
* get version name from ELF verdef section for comparison.
*/
if (sh_type == SHT_DYNSYM) {
ver_name = elf_get_vername(iter, sym->ver);
if (!ver_name)
return false;
return strcmp(ver_name, lib_ver) == 0;
}
/* For normal symbols, it is already in form of func@LIB_VER */
return strcmp(sym->name, name) == 0;
}
/* Transform symbol's virtual address (absolute for binaries and relative
* for shared libs) into file offset, which is what kernel is expecting
* for uprobe/uretprobe attachment.
* See Documentation/trace/uprobetracer.rst for more details. This is done
* by looking up symbol's containing section's header and using iter's virtual
* address (sh_addr) and corresponding file offset (sh_offset) to transform
* sym.st_value (virtual address) into desired final file offset.
*/
static unsigned long elf_sym_offset(struct elf_sym *sym)
{
return sym->sym.st_value - sym->sh.sh_addr + sym->sh.sh_offset;
}
/* Find offset of function name in the provided ELF object. "binary_path" is
* the path to the ELF binary represented by "elf", and only used for error
* reporting matters. "name" matches symbol name or name@@LIB for library
* functions.
*/
long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
{
int i, sh_types[2] = { SHT_DYNSYM, SHT_SYMTAB };
const char *at_symbol, *lib_ver;
bool is_shared_lib;
long ret = -ENOENT;
size_t name_len;
GElf_Ehdr ehdr;
if (!gelf_getehdr(elf, &ehdr)) {
pr_warn("elf: failed to get ehdr from %s: %s\n", binary_path, elf_errmsg(-1));
ret = -LIBBPF_ERRNO__FORMAT;
goto out;
}
/* for shared lib case, we do not need to calculate relative offset */
is_shared_lib = ehdr.e_type == ET_DYN;
/* Does name specify "@@LIB_VER" or "@LIB_VER" ? */
at_symbol = strchr(name, '@');
if (at_symbol) {
name_len = at_symbol - name;
/* skip second @ if it's @@LIB_VER case */
if (at_symbol[1] == '@')
at_symbol++;
lib_ver = at_symbol + 1;
} else {
name_len = strlen(name);
lib_ver = NULL;
}
/* Search SHT_DYNSYM, SHT_SYMTAB for symbol. This search order is used because if
* a binary is stripped, it may only have SHT_DYNSYM, and a fully-statically
* linked binary may not have SHT_DYMSYM, so absence of a section should not be
* reported as a warning/error.
*/
for (i = 0; i < ARRAY_SIZE(sh_types); i++) {
struct elf_sym_iter iter;
struct elf_sym *sym;
int last_bind = -1;
int cur_bind;
ret = elf_sym_iter_new(&iter, elf, binary_path, sh_types[i], STT_FUNC);
if (ret == -ENOENT)
continue;
if (ret)
goto out;
while ((sym = elf_sym_iter_next(&iter))) {
if (!symbol_match(&iter, sh_types[i], sym, name, name_len, lib_ver))
continue;
cur_bind = GELF_ST_BIND(sym->sym.st_info);
if (ret > 0) {
/* handle multiple matches */
if (elf_sym_offset(sym) == ret) {
/* same offset, no problem */
continue;
} else if (last_bind != STB_WEAK && cur_bind != STB_WEAK) {
/* Only accept one non-weak bind. */
pr_warn("elf: ambiguous match for '%s', '%s' in '%s'\n",
sym->name, name, binary_path);
ret = -LIBBPF_ERRNO__FORMAT;
goto out;
} else if (cur_bind == STB_WEAK) {
/* already have a non-weak bind, and
* this is a weak bind, so ignore.
*/
continue;
}
}
ret = elf_sym_offset(sym);
last_bind = cur_bind;
}
if (ret > 0)
break;
}
if (ret > 0) {
pr_debug("elf: symbol address match for '%s' in '%s': 0x%lx\n", name, binary_path,
ret);
} else {
if (ret == 0) {
pr_warn("elf: '%s' is 0 in symtab for '%s': %s\n", name, binary_path,
is_shared_lib ? "should not be 0 in a shared library" :
"try using shared library path instead");
ret = -ENOENT;
} else {
pr_warn("elf: failed to find symbol '%s' in '%s'\n", name, binary_path);
}
}
out:
return ret;
}
/* Find offset of function name in ELF object specified by path. "name" matches
* symbol name or name@@LIB for library functions.
*/
long elf_find_func_offset_from_file(const char *binary_path, const char *name)
{
struct elf_fd elf_fd;
long ret = -ENOENT;
ret = elf_open(binary_path, &elf_fd);
if (ret)
return ret;
ret = elf_find_func_offset(elf_fd.elf, binary_path, name);
elf_close(&elf_fd);
return ret;
}
struct symbol {
const char *name;
int bind;
int idx;
};
static int symbol_cmp(const void *a, const void *b)
{
const struct symbol *sym_a = a;
const struct symbol *sym_b = b;
return strcmp(sym_a->name, sym_b->name);
}
/*
* Return offsets in @poffsets for symbols specified in @syms array argument.
* On success returns 0 and offsets are returned in allocated array with @cnt
* size, that needs to be released by the caller.
*/
int elf_resolve_syms_offsets(const char *binary_path, int cnt,
const char **syms, unsigned long **poffsets,
int st_type)
{
int sh_types[2] = { SHT_DYNSYM, SHT_SYMTAB };
int err = 0, i, cnt_done = 0;
unsigned long *offsets;
struct symbol *symbols;
struct elf_fd elf_fd;
err = elf_open(binary_path, &elf_fd);
if (err)
return err;
offsets = calloc(cnt, sizeof(*offsets));
symbols = calloc(cnt, sizeof(*symbols));
if (!offsets || !symbols) {
err = -ENOMEM;
goto out;
}
for (i = 0; i < cnt; i++) {
symbols[i].name = syms[i];
symbols[i].idx = i;
}
qsort(symbols, cnt, sizeof(*symbols), symbol_cmp);
for (i = 0; i < ARRAY_SIZE(sh_types); i++) {
struct elf_sym_iter iter;
struct elf_sym *sym;
err = elf_sym_iter_new(&iter, elf_fd.elf, binary_path, sh_types[i], st_type);
if (err == -ENOENT)
continue;
if (err)
goto out;
while ((sym = elf_sym_iter_next(&iter))) {
unsigned long sym_offset = elf_sym_offset(sym);
int bind = GELF_ST_BIND(sym->sym.st_info);
struct symbol *found, tmp = {
.name = sym->name,
};
unsigned long *offset;
found = bsearch(&tmp, symbols, cnt, sizeof(*symbols), symbol_cmp);
if (!found)
continue;
offset = &offsets[found->idx];
if (*offset > 0) {
/* same offset, no problem */
if (*offset == sym_offset)
continue;
/* handle multiple matches */
if (found->bind != STB_WEAK && bind != STB_WEAK) {
/* Only accept one non-weak bind. */
pr_warn("elf: ambiguous match found '%s@%lu' in '%s' previous offset %lu\n",
sym->name, sym_offset, binary_path, *offset);
err = -ESRCH;
goto out;
} else if (bind == STB_WEAK) {
/* already have a non-weak bind, and
* this is a weak bind, so ignore.
*/
continue;
}
} else {
cnt_done++;
}
*offset = sym_offset;
found->bind = bind;
}
}
if (cnt != cnt_done) {
err = -ENOENT;
goto out;
}
*poffsets = offsets;
out:
free(symbols);
if (err)
free(offsets);
elf_close(&elf_fd);
return err;
}
/*
* Return offsets in @poffsets for symbols specified by @pattern argument.
* On success returns 0 and offsets are returned in allocated @poffsets
* array with the @pctn size, that needs to be released by the caller.
*/
int elf_resolve_pattern_offsets(const char *binary_path, const char *pattern,
unsigned long **poffsets, size_t *pcnt)
{
int sh_types[2] = { SHT_SYMTAB, SHT_DYNSYM };
unsigned long *offsets = NULL;
size_t cap = 0, cnt = 0;
struct elf_fd elf_fd;
int err = 0, i;
err = elf_open(binary_path, &elf_fd);
if (err)
return err;
for (i = 0; i < ARRAY_SIZE(sh_types); i++) {
struct elf_sym_iter iter;
struct elf_sym *sym;
err = elf_sym_iter_new(&iter, elf_fd.elf, binary_path, sh_types[i], STT_FUNC);
if (err == -ENOENT)
continue;
if (err)
goto out;
while ((sym = elf_sym_iter_next(&iter))) {
if (!glob_match(sym->name, pattern))
continue;
err = libbpf_ensure_mem((void **) &offsets, &cap, sizeof(*offsets),
cnt + 1);
if (err)
goto out;
offsets[cnt++] = elf_sym_offset(sym);
}
/* If we found anything in the first symbol section,
* do not search others to avoid duplicates.
*/
if (cnt)
break;
}
if (cnt) {
*poffsets = offsets;
*pcnt = cnt;
} else {
err = -ENOENT;
}
out:
if (err)
free(offsets);
elf_close(&elf_fd);
return err;
}

583
src/features.c Normal file
View File

@@ -0,0 +1,583 @@
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
#include <linux/kernel.h>
#include <linux/filter.h>
#include "bpf.h"
#include "libbpf.h"
#include "libbpf_common.h"
#include "libbpf_internal.h"
#include "str_error.h"
static inline __u64 ptr_to_u64(const void *ptr)
{
return (__u64)(unsigned long)ptr;
}
int probe_fd(int fd)
{
if (fd >= 0)
close(fd);
return fd >= 0;
}
static int probe_kern_prog_name(int token_fd)
{
const size_t attr_sz = offsetofend(union bpf_attr, prog_name);
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
union bpf_attr attr;
int ret;
memset(&attr, 0, attr_sz);
attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
attr.license = ptr_to_u64("GPL");
attr.insns = ptr_to_u64(insns);
attr.insn_cnt = (__u32)ARRAY_SIZE(insns);
attr.prog_token_fd = token_fd;
if (token_fd)
attr.prog_flags |= BPF_F_TOKEN_FD;
libbpf_strlcpy(attr.prog_name, "libbpf_nametest", sizeof(attr.prog_name));
/* make sure loading with name works */
ret = sys_bpf_prog_load(&attr, attr_sz, PROG_LOAD_ATTEMPTS);
return probe_fd(ret);
}
static int probe_kern_global_data(int token_fd)
{
char *cp, errmsg[STRERR_BUFSIZE];
struct bpf_insn insns[] = {
BPF_LD_MAP_VALUE(BPF_REG_1, 0, 16),
BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 42),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
LIBBPF_OPTS(bpf_map_create_opts, map_opts,
.token_fd = token_fd,
.map_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
LIBBPF_OPTS(bpf_prog_load_opts, prog_opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int ret, map, insn_cnt = ARRAY_SIZE(insns);
map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_global", sizeof(int), 32, 1, &map_opts);
if (map < 0) {
ret = -errno;
cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n",
__func__, cp, -ret);
return ret;
}
insns[0].imm = map;
ret = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, &prog_opts);
close(map);
return probe_fd(ret);
}
static int probe_kern_btf(int token_fd)
{
static const char strs[] = "\0int";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_func(int token_fd)
{
static const char strs[] = "\0int\0x\0a";
/* void x(int a) {} */
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* FUNC_PROTO */ /* [2] */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0),
BTF_PARAM_ENC(7, 1),
/* FUNC x */ /* [3] */
BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 0), 2),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_func_global(int token_fd)
{
static const char strs[] = "\0int\0x\0a";
/* static void x(int a) {} */
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* FUNC_PROTO */ /* [2] */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0),
BTF_PARAM_ENC(7, 1),
/* FUNC x BTF_FUNC_GLOBAL */ /* [3] */
BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, BTF_FUNC_GLOBAL), 2),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_datasec(int token_fd)
{
static const char strs[] = "\0x\0.data";
/* static int a; */
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* VAR x */ /* [2] */
BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1),
BTF_VAR_STATIC,
/* DATASEC val */ /* [3] */
BTF_TYPE_ENC(3, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
BTF_VAR_SECINFO_ENC(2, 0, 4),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_qmark_datasec(int token_fd)
{
static const char strs[] = "\0x\0?.data";
/* static int a; */
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* VAR x */ /* [2] */
BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1),
BTF_VAR_STATIC,
/* DATASEC ?.data */ /* [3] */
BTF_TYPE_ENC(3, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
BTF_VAR_SECINFO_ENC(2, 0, 4),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_float(int token_fd)
{
static const char strs[] = "\0float";
__u32 types[] = {
/* float */
BTF_TYPE_FLOAT_ENC(1, 4),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_decl_tag(int token_fd)
{
static const char strs[] = "\0tag";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* VAR x */ /* [2] */
BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1),
BTF_VAR_STATIC,
/* attr */
BTF_TYPE_DECL_TAG_ENC(1, 2, -1),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_type_tag(int token_fd)
{
static const char strs[] = "\0tag";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* attr */
BTF_TYPE_TYPE_TAG_ENC(1, 1), /* [2] */
/* ptr */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 2), /* [3] */
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_array_mmap(int token_fd)
{
LIBBPF_OPTS(bpf_map_create_opts, opts,
.map_flags = BPF_F_MMAPABLE | (token_fd ? BPF_F_TOKEN_FD : 0),
.token_fd = token_fd,
);
int fd;
fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_mmap", sizeof(int), sizeof(int), 1, &opts);
return probe_fd(fd);
}
static int probe_kern_exp_attach_type(int token_fd)
{
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.expected_attach_type = BPF_CGROUP_INET_SOCK_CREATE,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
int fd, insn_cnt = ARRAY_SIZE(insns);
/* use any valid combination of program type and (optional)
* non-zero expected attach type (i.e., not a BPF_CGROUP_INET_INGRESS)
* to see if kernel supports expected_attach_type field for
* BPF_PROG_LOAD command
*/
fd = bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, NULL, "GPL", insns, insn_cnt, &opts);
return probe_fd(fd);
}
static int probe_kern_probe_read_kernel(int token_fd)
{
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
struct bpf_insn insns[] = {
BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), /* r1 = r10 (fp) */
BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), /* r1 += -8 */
BPF_MOV64_IMM(BPF_REG_2, 8), /* r2 = 8 */
BPF_MOV64_IMM(BPF_REG_3, 0), /* r3 = 0 */
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_probe_read_kernel),
BPF_EXIT_INSN(),
};
int fd, insn_cnt = ARRAY_SIZE(insns);
fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, &opts);
return probe_fd(fd);
}
static int probe_prog_bind_map(int token_fd)
{
char *cp, errmsg[STRERR_BUFSIZE];
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
LIBBPF_OPTS(bpf_map_create_opts, map_opts,
.token_fd = token_fd,
.map_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
LIBBPF_OPTS(bpf_prog_load_opts, prog_opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int ret, map, prog, insn_cnt = ARRAY_SIZE(insns);
map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_det_bind", sizeof(int), 32, 1, &map_opts);
if (map < 0) {
ret = -errno;
cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n",
__func__, cp, -ret);
return ret;
}
prog = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, &prog_opts);
if (prog < 0) {
close(map);
return 0;
}
ret = bpf_prog_bind_map(prog, map, NULL);
close(map);
close(prog);
return ret >= 0;
}
static int probe_module_btf(int token_fd)
{
static const char strs[] = "\0int";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),
};
struct bpf_btf_info info;
__u32 len = sizeof(info);
char name[16];
int fd, err;
fd = libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs), token_fd);
if (fd < 0)
return 0; /* BTF not supported at all */
memset(&info, 0, sizeof(info));
info.name = ptr_to_u64(name);
info.name_len = sizeof(name);
/* check that BPF_OBJ_GET_INFO_BY_FD supports specifying name pointer;
* kernel's module BTF support coincides with support for
* name/name_len fields in struct bpf_btf_info.
*/
err = bpf_btf_get_info_by_fd(fd, &info, &len);
close(fd);
return !err;
}
static int probe_perf_link(int token_fd)
{
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int prog_fd, link_fd, err;
prog_fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL",
insns, ARRAY_SIZE(insns), &opts);
if (prog_fd < 0)
return -errno;
/* use invalid perf_event FD to get EBADF, if link is supported;
* otherwise EINVAL should be returned
*/
link_fd = bpf_link_create(prog_fd, -1, BPF_PERF_EVENT, NULL);
err = -errno; /* close() can clobber errno */
if (link_fd >= 0)
close(link_fd);
close(prog_fd);
return link_fd < 0 && err == -EBADF;
}
static int probe_uprobe_multi_link(int token_fd)
{
LIBBPF_OPTS(bpf_prog_load_opts, load_opts,
.expected_attach_type = BPF_TRACE_UPROBE_MULTI,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
LIBBPF_OPTS(bpf_link_create_opts, link_opts);
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
int prog_fd, link_fd, err;
unsigned long offset = 0;
prog_fd = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL",
insns, ARRAY_SIZE(insns), &load_opts);
if (prog_fd < 0)
return -errno;
/* Creating uprobe in '/' binary should fail with -EBADF. */
link_opts.uprobe_multi.path = "/";
link_opts.uprobe_multi.offsets = &offset;
link_opts.uprobe_multi.cnt = 1;
link_fd = bpf_link_create(prog_fd, -1, BPF_TRACE_UPROBE_MULTI, &link_opts);
err = -errno; /* close() can clobber errno */
if (link_fd >= 0)
close(link_fd);
close(prog_fd);
return link_fd < 0 && err == -EBADF;
}
static int probe_kern_bpf_cookie(int token_fd)
{
struct bpf_insn insns[] = {
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_attach_cookie),
BPF_EXIT_INSN(),
};
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int ret, insn_cnt = ARRAY_SIZE(insns);
ret = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, &opts);
return probe_fd(ret);
}
static int probe_kern_btf_enum64(int token_fd)
{
static const char strs[] = "\0enum64";
__u32 types[] = {
BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_ENUM64, 0, 0), 8),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_arg_ctx_tag(int token_fd)
{
static const char strs[] = "\0a\0b\0arg:ctx\0";
const __u32 types[] = {
/* [1] INT */
BTF_TYPE_INT_ENC(1 /* "a" */, BTF_INT_SIGNED, 0, 32, 4),
/* [2] PTR -> VOID */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 0),
/* [3] FUNC_PROTO `int(void *a)` */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 1),
BTF_PARAM_ENC(1 /* "a" */, 2),
/* [4] FUNC 'a' -> FUNC_PROTO (main prog) */
BTF_TYPE_ENC(1 /* "a" */, BTF_INFO_ENC(BTF_KIND_FUNC, 0, BTF_FUNC_GLOBAL), 3),
/* [5] FUNC_PROTO `int(void *b __arg_ctx)` */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 1),
BTF_PARAM_ENC(3 /* "b" */, 2),
/* [6] FUNC 'b' -> FUNC_PROTO (subprog) */
BTF_TYPE_ENC(3 /* "b" */, BTF_INFO_ENC(BTF_KIND_FUNC, 0, BTF_FUNC_GLOBAL), 5),
/* [7] DECL_TAG 'arg:ctx' -> func 'b' arg 'b' */
BTF_TYPE_DECL_TAG_ENC(5 /* "arg:ctx" */, 6, 0),
};
const struct bpf_insn insns[] = {
/* main prog */
BPF_CALL_REL(+1),
BPF_EXIT_INSN(),
/* global subprog */
BPF_EMIT_CALL(BPF_FUNC_get_func_ip), /* needs PTR_TO_CTX */
BPF_EXIT_INSN(),
};
const struct bpf_func_info_min func_infos[] = {
{ 0, 4 }, /* main prog -> FUNC 'a' */
{ 2, 6 }, /* subprog -> FUNC 'b' */
};
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int prog_fd, btf_fd, insn_cnt = ARRAY_SIZE(insns);
btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs), token_fd);
if (btf_fd < 0)
return 0;
opts.prog_btf_fd = btf_fd;
opts.func_info = &func_infos;
opts.func_info_cnt = ARRAY_SIZE(func_infos);
opts.func_info_rec_size = sizeof(func_infos[0]);
prog_fd = bpf_prog_load(BPF_PROG_TYPE_KPROBE, "det_arg_ctx",
"GPL", insns, insn_cnt, &opts);
close(btf_fd);
return probe_fd(prog_fd);
}
typedef int (*feature_probe_fn)(int /* token_fd */);
static struct kern_feature_cache feature_cache;
static struct kern_feature_desc {
const char *desc;
feature_probe_fn probe;
} feature_probes[__FEAT_CNT] = {
[FEAT_PROG_NAME] = {
"BPF program name", probe_kern_prog_name,
},
[FEAT_GLOBAL_DATA] = {
"global variables", probe_kern_global_data,
},
[FEAT_BTF] = {
"minimal BTF", probe_kern_btf,
},
[FEAT_BTF_FUNC] = {
"BTF functions", probe_kern_btf_func,
},
[FEAT_BTF_GLOBAL_FUNC] = {
"BTF global function", probe_kern_btf_func_global,
},
[FEAT_BTF_DATASEC] = {
"BTF data section and variable", probe_kern_btf_datasec,
},
[FEAT_ARRAY_MMAP] = {
"ARRAY map mmap()", probe_kern_array_mmap,
},
[FEAT_EXP_ATTACH_TYPE] = {
"BPF_PROG_LOAD expected_attach_type attribute",
probe_kern_exp_attach_type,
},
[FEAT_PROBE_READ_KERN] = {
"bpf_probe_read_kernel() helper", probe_kern_probe_read_kernel,
},
[FEAT_PROG_BIND_MAP] = {
"BPF_PROG_BIND_MAP support", probe_prog_bind_map,
},
[FEAT_MODULE_BTF] = {
"module BTF support", probe_module_btf,
},
[FEAT_BTF_FLOAT] = {
"BTF_KIND_FLOAT support", probe_kern_btf_float,
},
[FEAT_PERF_LINK] = {
"BPF perf link support", probe_perf_link,
},
[FEAT_BTF_DECL_TAG] = {
"BTF_KIND_DECL_TAG support", probe_kern_btf_decl_tag,
},
[FEAT_BTF_TYPE_TAG] = {
"BTF_KIND_TYPE_TAG support", probe_kern_btf_type_tag,
},
[FEAT_MEMCG_ACCOUNT] = {
"memcg-based memory accounting", probe_memcg_account,
},
[FEAT_BPF_COOKIE] = {
"BPF cookie support", probe_kern_bpf_cookie,
},
[FEAT_BTF_ENUM64] = {
"BTF_KIND_ENUM64 support", probe_kern_btf_enum64,
},
[FEAT_SYSCALL_WRAPPER] = {
"Kernel using syscall wrapper", probe_kern_syscall_wrapper,
},
[FEAT_UPROBE_MULTI_LINK] = {
"BPF multi-uprobe link support", probe_uprobe_multi_link,
},
[FEAT_ARG_CTX_TAG] = {
"kernel-side __arg_ctx tag", probe_kern_arg_ctx_tag,
},
[FEAT_BTF_QMARK_DATASEC] = {
"BTF DATASEC names starting from '?'", probe_kern_btf_qmark_datasec,
},
};
bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id)
{
struct kern_feature_desc *feat = &feature_probes[feat_id];
int ret;
/* assume global feature cache, unless custom one is provided */
if (!cache)
cache = &feature_cache;
if (READ_ONCE(cache->res[feat_id]) == FEAT_UNKNOWN) {
ret = feat->probe(cache->token_fd);
if (ret > 0) {
WRITE_ONCE(cache->res[feat_id], FEAT_SUPPORTED);
} else if (ret == 0) {
WRITE_ONCE(cache->res[feat_id], FEAT_MISSING);
} else {
pr_warn("Detection of kernel %s support failed: %d\n", feat->desc, ret);
WRITE_ONCE(cache->res[feat_id], FEAT_MISSING);
}
}
return READ_ONCE(cache->res[feat_id]) == FEAT_SUPPORTED;
}

View File

@@ -18,7 +18,7 @@
#define MAX_USED_MAPS 64
#define MAX_USED_PROGS 32
#define MAX_KFUNC_DESCS 256
#define MAX_FD_ARRAY_SZ (MAX_USED_PROGS + MAX_KFUNC_DESCS)
#define MAX_FD_ARRAY_SZ (MAX_USED_MAPS + MAX_KFUNC_DESCS)
/* The following structure describes the stack layout of the loader program.
* In addition R6 contains the pointer to context.
@@ -33,8 +33,8 @@
*/
struct loader_stack {
__u32 btf_fd;
__u32 prog_fd[MAX_USED_PROGS];
__u32 inner_map_fd;
__u32 prog_fd[MAX_USED_PROGS];
};
#define stack_off(field) \
@@ -42,6 +42,11 @@ struct loader_stack {
#define attr_field(attr, field) (attr + offsetof(union bpf_attr, field))
static int blob_fd_array_off(struct bpf_gen *gen, int index)
{
return gen->fd_array + index * sizeof(int);
}
static int realloc_insn_buf(struct bpf_gen *gen, __u32 size)
{
size_t off = gen->insn_cur - gen->insn_start;
@@ -102,11 +107,15 @@ static void emit2(struct bpf_gen *gen, struct bpf_insn insn1, struct bpf_insn in
emit(gen, insn2);
}
void bpf_gen__init(struct bpf_gen *gen, int log_level)
static int add_data(struct bpf_gen *gen, const void *data, __u32 size);
static void emit_sys_close_blob(struct bpf_gen *gen, int blob_off);
void bpf_gen__init(struct bpf_gen *gen, int log_level, int nr_progs, int nr_maps)
{
size_t stack_sz = sizeof(struct loader_stack);
size_t stack_sz = sizeof(struct loader_stack), nr_progs_sz;
int i;
gen->fd_array = add_data(gen, NULL, MAX_FD_ARRAY_SZ * sizeof(int));
gen->log_level = log_level;
/* save ctx pointer into R6 */
emit(gen, BPF_MOV64_REG(BPF_REG_6, BPF_REG_1));
@@ -118,19 +127,27 @@ void bpf_gen__init(struct bpf_gen *gen, int log_level)
emit(gen, BPF_MOV64_IMM(BPF_REG_3, 0));
emit(gen, BPF_EMIT_CALL(BPF_FUNC_probe_read_kernel));
/* amount of stack actually used, only used to calculate iterations, not stack offset */
nr_progs_sz = offsetof(struct loader_stack, prog_fd[nr_progs]);
/* jump over cleanup code */
emit(gen, BPF_JMP_IMM(BPF_JA, 0, 0,
/* size of cleanup code below */
(stack_sz / 4) * 3 + 2));
/* size of cleanup code below (including map fd cleanup) */
(nr_progs_sz / 4) * 3 + 2 +
/* 6 insns for emit_sys_close_blob,
* 6 insns for debug_regs in emit_sys_close_blob
*/
nr_maps * (6 + (gen->log_level ? 6 : 0))));
/* remember the label where all error branches will jump to */
gen->cleanup_label = gen->insn_cur - gen->insn_start;
/* emit cleanup code: close all temp FDs */
for (i = 0; i < stack_sz; i += 4) {
for (i = 0; i < nr_progs_sz; i += 4) {
emit(gen, BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_10, -stack_sz + i));
emit(gen, BPF_JMP_IMM(BPF_JSLE, BPF_REG_1, 0, 1));
emit(gen, BPF_EMIT_CALL(BPF_FUNC_sys_close));
}
for (i = 0; i < nr_maps; i++)
emit_sys_close_blob(gen, blob_fd_array_off(gen, i));
/* R7 contains the error code from sys_bpf. Copy it into R0 and exit. */
emit(gen, BPF_MOV64_REG(BPF_REG_0, BPF_REG_7));
emit(gen, BPF_EXIT_INSN());
@@ -160,8 +177,6 @@ static int add_data(struct bpf_gen *gen, const void *data, __u32 size)
*/
static int add_map_fd(struct bpf_gen *gen)
{
if (!gen->fd_array)
gen->fd_array = add_data(gen, NULL, MAX_FD_ARRAY_SZ * sizeof(int));
if (gen->nr_maps == MAX_USED_MAPS) {
pr_warn("Total maps exceeds %d\n", MAX_USED_MAPS);
gen->error = -E2BIG;
@@ -174,8 +189,6 @@ static int add_kfunc_btf_fd(struct bpf_gen *gen)
{
int cur;
if (!gen->fd_array)
gen->fd_array = add_data(gen, NULL, MAX_FD_ARRAY_SZ * sizeof(int));
if (gen->nr_fd_array == MAX_KFUNC_DESCS) {
cur = add_data(gen, NULL, sizeof(int));
return (cur - gen->fd_array) / sizeof(int);
@@ -183,11 +196,6 @@ static int add_kfunc_btf_fd(struct bpf_gen *gen)
return MAX_USED_MAPS + gen->nr_fd_array++;
}
static int blob_fd_array_off(struct bpf_gen *gen, int index)
{
return gen->fd_array + index * sizeof(int);
}
static int insn_bytes_to_bpf_size(__u32 sz)
{
switch (sz) {
@@ -359,10 +367,16 @@ static void emit_sys_close_blob(struct bpf_gen *gen, int blob_off)
__emit_sys_close(gen);
}
int bpf_gen__finish(struct bpf_gen *gen)
int bpf_gen__finish(struct bpf_gen *gen, int nr_progs, int nr_maps)
{
int i;
if (nr_progs < gen->nr_progs || nr_maps != gen->nr_maps) {
pr_warn("nr_progs %d/%d nr_maps %d/%d mismatch\n",
nr_progs, gen->nr_progs, nr_maps, gen->nr_maps);
gen->error = -EFAULT;
return gen->error;
}
emit_sys_close_stack(gen, stack_off(btf_fd));
for (i = 0; i < gen->nr_progs; i++)
move_stack2ctx(gen,
@@ -432,47 +446,32 @@ void bpf_gen__load_btf(struct bpf_gen *gen, const void *btf_raw_data,
}
void bpf_gen__map_create(struct bpf_gen *gen,
struct bpf_create_map_params *map_attr, int map_idx)
enum bpf_map_type map_type,
const char *map_name,
__u32 key_size, __u32 value_size, __u32 max_entries,
struct bpf_map_create_opts *map_attr, int map_idx)
{
int attr_size = offsetofend(union bpf_attr, btf_vmlinux_value_type_id);
int attr_size = offsetofend(union bpf_attr, map_extra);
bool close_inner_map_fd = false;
int map_create_attr, idx;
union bpf_attr attr;
memset(&attr, 0, attr_size);
attr.map_type = map_attr->map_type;
attr.key_size = map_attr->key_size;
attr.value_size = map_attr->value_size;
attr.map_type = map_type;
attr.key_size = key_size;
attr.value_size = value_size;
attr.map_flags = map_attr->map_flags;
attr.map_extra = map_attr->map_extra;
memcpy(attr.map_name, map_attr->name,
min((unsigned)strlen(map_attr->name), BPF_OBJ_NAME_LEN - 1));
if (map_name)
libbpf_strlcpy(attr.map_name, map_name, sizeof(attr.map_name));
attr.numa_node = map_attr->numa_node;
attr.map_ifindex = map_attr->map_ifindex;
attr.max_entries = map_attr->max_entries;
switch (attr.map_type) {
case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
case BPF_MAP_TYPE_CGROUP_ARRAY:
case BPF_MAP_TYPE_STACK_TRACE:
case BPF_MAP_TYPE_ARRAY_OF_MAPS:
case BPF_MAP_TYPE_HASH_OF_MAPS:
case BPF_MAP_TYPE_DEVMAP:
case BPF_MAP_TYPE_DEVMAP_HASH:
case BPF_MAP_TYPE_CPUMAP:
case BPF_MAP_TYPE_XSKMAP:
case BPF_MAP_TYPE_SOCKMAP:
case BPF_MAP_TYPE_SOCKHASH:
case BPF_MAP_TYPE_QUEUE:
case BPF_MAP_TYPE_STACK:
case BPF_MAP_TYPE_RINGBUF:
break;
default:
attr.btf_key_type_id = map_attr->btf_key_type_id;
attr.btf_value_type_id = map_attr->btf_value_type_id;
}
attr.max_entries = max_entries;
attr.btf_key_type_id = map_attr->btf_key_type_id;
attr.btf_value_type_id = map_attr->btf_value_type_id;
pr_debug("gen: map_create: %s idx %d type %d value_type_id %d\n",
attr.map_name, map_idx, map_attr->map_type, attr.btf_value_type_id);
attr.map_name, map_idx, map_type, attr.btf_value_type_id);
map_create_attr = add_data(gen, &attr, attr_size);
if (attr.btf_value_type_id)
@@ -499,7 +498,7 @@ void bpf_gen__map_create(struct bpf_gen *gen,
/* emit MAP_CREATE command */
emit_sys_bpf(gen, BPF_MAP_CREATE, map_create_attr, attr_size);
debug_ret(gen, "map_create %s idx %d type %d value_size %d value_btf_id %d",
attr.map_name, map_idx, map_attr->map_type, attr.value_size,
attr.map_name, map_idx, map_type, value_size,
attr.btf_value_type_id);
emit_check_err(gen);
/* remember map_fd in the stack, if successful */
@@ -534,7 +533,7 @@ void bpf_gen__record_attach_target(struct bpf_gen *gen, const char *attach_name,
gen->attach_kind = kind;
ret = snprintf(gen->attach_target, sizeof(gen->attach_target), "%s%s",
prefix, attach_name);
if (ret == sizeof(gen->attach_target))
if (ret >= sizeof(gen->attach_target))
gen->error = -ENOSPC;
}
@@ -561,7 +560,7 @@ static void emit_find_attach_target(struct bpf_gen *gen)
}
void bpf_gen__record_extern(struct bpf_gen *gen, const char *name, bool is_weak,
bool is_typeless, int kind, int insn_idx)
bool is_typeless, bool is_ld64, int kind, int insn_idx)
{
struct ksym_relo_desc *relo;
@@ -575,6 +574,7 @@ void bpf_gen__record_extern(struct bpf_gen *gen, const char *name, bool is_weak,
relo->name = name;
relo->is_weak = is_weak;
relo->is_typeless = is_typeless;
relo->is_ld64 = is_ld64;
relo->kind = kind;
relo->insn_idx = insn_idx;
gen->relo_cnt++;
@@ -587,9 +587,11 @@ static struct ksym_desc *get_ksym_desc(struct bpf_gen *gen, struct ksym_relo_des
int i;
for (i = 0; i < gen->nr_ksyms; i++) {
if (!strcmp(gen->ksyms[i].name, relo->name)) {
gen->ksyms[i].ref++;
return &gen->ksyms[i];
kdesc = &gen->ksyms[i];
if (kdesc->kind == relo->kind && kdesc->is_ld64 == relo->is_ld64 &&
!strcmp(kdesc->name, relo->name)) {
kdesc->ref++;
return kdesc;
}
}
kdesc = libbpf_reallocarray(gen->ksyms, gen->nr_ksyms + 1, sizeof(*kdesc));
@@ -604,6 +606,7 @@ static struct ksym_desc *get_ksym_desc(struct bpf_gen *gen, struct ksym_relo_des
kdesc->ref = 1;
kdesc->off = 0;
kdesc->insn = 0;
kdesc->is_ld64 = relo->is_ld64;
return kdesc;
}
@@ -688,27 +691,29 @@ static void emit_relo_kfunc_btf(struct bpf_gen *gen, struct ksym_relo_desc *relo
return;
}
kdesc->off = btf_fd_idx;
/* set a default value for imm */
/* jump to success case */
emit(gen, BPF_JMP_IMM(BPF_JSGE, BPF_REG_7, 0, 3));
/* set value for imm, off as 0 */
emit(gen, BPF_ST_MEM(BPF_W, BPF_REG_8, offsetof(struct bpf_insn, imm), 0));
/* skip success case store if ret < 0 */
emit(gen, BPF_JMP_IMM(BPF_JSLT, BPF_REG_7, 0, 1));
emit(gen, BPF_ST_MEM(BPF_H, BPF_REG_8, offsetof(struct bpf_insn, off), 0));
/* skip success case for ret < 0 */
emit(gen, BPF_JMP_IMM(BPF_JA, 0, 0, 10));
/* store btf_id into insn[insn_idx].imm */
emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_8, BPF_REG_7, offsetof(struct bpf_insn, imm)));
/* obtain fd in BPF_REG_9 */
emit(gen, BPF_MOV64_REG(BPF_REG_9, BPF_REG_7));
emit(gen, BPF_ALU64_IMM(BPF_RSH, BPF_REG_9, 32));
/* load fd_array slot pointer */
emit2(gen, BPF_LD_IMM64_RAW_FULL(BPF_REG_0, BPF_PSEUDO_MAP_IDX_VALUE,
0, 0, 0, blob_fd_array_off(gen, btf_fd_idx)));
/* skip store of BTF fd if ret < 0 */
emit(gen, BPF_JMP_IMM(BPF_JSLT, BPF_REG_7, 0, 3));
/* store BTF fd in slot */
emit(gen, BPF_MOV64_REG(BPF_REG_9, BPF_REG_7));
emit(gen, BPF_ALU64_IMM(BPF_RSH, BPF_REG_9, 32));
/* store BTF fd in slot, 0 for vmlinux */
emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_0, BPF_REG_9, 0));
/* set a default value for off */
/* jump to insn[insn_idx].off store if fd denotes module BTF */
emit(gen, BPF_JMP_IMM(BPF_JNE, BPF_REG_9, 0, 2));
/* set the default value for off */
emit(gen, BPF_ST_MEM(BPF_H, BPF_REG_8, offsetof(struct bpf_insn, off), 0));
/* skip insn->off store if ret < 0 */
emit(gen, BPF_JMP_IMM(BPF_JSLT, BPF_REG_7, 0, 2));
/* skip if vmlinux BTF */
emit(gen, BPF_JMP_IMM(BPF_JEQ, BPF_REG_9, 0, 1));
/* skip BTF fd store for vmlinux BTF */
emit(gen, BPF_JMP_IMM(BPF_JA, 0, 0, 1));
/* store index into insn[insn_idx].off */
emit(gen, BPF_ST_MEM(BPF_H, BPF_REG_8, offsetof(struct bpf_insn, off), btf_fd_idx));
log:
@@ -803,13 +808,14 @@ static void emit_relo_ksym_btf(struct bpf_gen *gen, struct ksym_relo_desc *relo,
return;
/* try to copy from existing ldimm64 insn */
if (kdesc->ref > 1) {
move_blob2blob(gen, insn + offsetof(struct bpf_insn, imm), 4,
kdesc->insn + offsetof(struct bpf_insn, imm));
move_blob2blob(gen, insn + sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm), 4,
kdesc->insn + sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm));
emit(gen, BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_8, offsetof(struct bpf_insn, imm)));
/* jump over src_reg adjustment if imm is not 0 */
emit(gen, BPF_JMP_IMM(BPF_JNE, BPF_REG_9, 0, 3));
move_blob2blob(gen, insn + offsetof(struct bpf_insn, imm), 4,
kdesc->insn + offsetof(struct bpf_insn, imm));
/* jump over src_reg adjustment if imm (btf_id) is not 0, reuse BPF_REG_0 from move_blob2blob
* If btf_id is zero, clear BPF_PSEUDO_BTF_ID flag in src_reg of ld_imm64 insn
*/
emit(gen, BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 3));
goto clear_src_reg;
}
/* remember insn offset, so we can copy BTF ID and FD later */
@@ -817,18 +823,21 @@ static void emit_relo_ksym_btf(struct bpf_gen *gen, struct ksym_relo_desc *relo,
emit_bpf_find_by_name_kind(gen, relo);
if (!relo->is_weak)
emit_check_err(gen);
/* set default values as 0 */
/* jump to success case */
emit(gen, BPF_JMP_IMM(BPF_JSGE, BPF_REG_7, 0, 3));
/* set values for insn[insn_idx].imm, insn[insn_idx + 1].imm as 0 */
emit(gen, BPF_ST_MEM(BPF_W, BPF_REG_8, offsetof(struct bpf_insn, imm), 0));
emit(gen, BPF_ST_MEM(BPF_W, BPF_REG_8, sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm), 0));
/* skip success case stores if ret < 0 */
emit(gen, BPF_JMP_IMM(BPF_JSLT, BPF_REG_7, 0, 4));
/* skip success case for ret < 0 */
emit(gen, BPF_JMP_IMM(BPF_JA, 0, 0, 4));
/* store btf_id into insn[insn_idx].imm */
emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_8, BPF_REG_7, offsetof(struct bpf_insn, imm)));
/* store btf_obj_fd into insn[insn_idx + 1].imm */
emit(gen, BPF_ALU64_IMM(BPF_RSH, BPF_REG_7, 32));
emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_8, BPF_REG_7,
sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm)));
emit(gen, BPF_JMP_IMM(BPF_JSGE, BPF_REG_7, 0, 3));
/* skip src_reg adjustment */
emit(gen, BPF_JMP_IMM(BPF_JA, 0, 0, 3));
clear_src_reg:
/* clear bpf_object__relocate_data's src_reg assignment, otherwise we get a verifier failure */
reg_mask = src_reg_mask();
@@ -839,27 +848,37 @@ clear_src_reg:
emit_ksym_relo_log(gen, relo, kdesc->ref);
}
void bpf_gen__record_relo_core(struct bpf_gen *gen,
const struct bpf_core_relo *core_relo)
{
struct bpf_core_relo *relos;
relos = libbpf_reallocarray(gen->core_relos, gen->core_relo_cnt + 1, sizeof(*relos));
if (!relos) {
gen->error = -ENOMEM;
return;
}
gen->core_relos = relos;
relos += gen->core_relo_cnt;
memcpy(relos, core_relo, sizeof(*relos));
gen->core_relo_cnt++;
}
static void emit_relo(struct bpf_gen *gen, struct ksym_relo_desc *relo, int insns)
{
int insn;
pr_debug("gen: emit_relo (%d): %s at %d\n", relo->kind, relo->name, relo->insn_idx);
pr_debug("gen: emit_relo (%d): %s at %d %s\n",
relo->kind, relo->name, relo->insn_idx, relo->is_ld64 ? "ld64" : "call");
insn = insns + sizeof(struct bpf_insn) * relo->insn_idx;
emit2(gen, BPF_LD_IMM64_RAW_FULL(BPF_REG_8, BPF_PSEUDO_MAP_IDX_VALUE, 0, 0, 0, insn));
switch (relo->kind) {
case BTF_KIND_VAR:
if (relo->is_ld64) {
if (relo->is_typeless)
emit_relo_ksym_typeless(gen, relo, insn);
else
emit_relo_ksym_btf(gen, relo, insn);
break;
case BTF_KIND_FUNC:
} else {
emit_relo_kfunc_btf(gen, relo, insn);
break;
default:
pr_warn("Unknown relocation kind '%d'\n", relo->kind);
gen->error = -EDOM;
return;
}
}
@@ -871,20 +890,31 @@ static void emit_relos(struct bpf_gen *gen, int insns)
emit_relo(gen, gen->relos + i, insns);
}
static void cleanup_core_relo(struct bpf_gen *gen)
{
if (!gen->core_relo_cnt)
return;
free(gen->core_relos);
gen->core_relo_cnt = 0;
gen->core_relos = NULL;
}
static void cleanup_relos(struct bpf_gen *gen, int insns)
{
struct ksym_desc *kdesc;
int i, insn;
for (i = 0; i < gen->nr_ksyms; i++) {
kdesc = &gen->ksyms[i];
/* only close fds for typed ksyms and kfuncs */
if (gen->ksyms[i].kind == BTF_KIND_VAR && !gen->ksyms[i].typeless) {
if (kdesc->is_ld64 && !kdesc->typeless) {
/* close fd recorded in insn[insn_idx + 1].imm */
insn = gen->ksyms[i].insn;
insn = kdesc->insn;
insn += sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm);
emit_sys_close_blob(gen, insn);
} else if (gen->ksyms[i].kind == BTF_KIND_FUNC) {
emit_sys_close_blob(gen, blob_fd_array_off(gen, gen->ksyms[i].off));
if (gen->ksyms[i].off < MAX_FD_ARRAY_SZ)
} else if (!kdesc->is_ld64) {
emit_sys_close_blob(gen, blob_fd_array_off(gen, kdesc->off));
if (kdesc->off < MAX_FD_ARRAY_SZ)
gen->nr_fd_array--;
}
}
@@ -898,30 +928,32 @@ static void cleanup_relos(struct bpf_gen *gen, int insns)
gen->relo_cnt = 0;
gen->relos = NULL;
}
cleanup_core_relo(gen);
}
void bpf_gen__prog_load(struct bpf_gen *gen,
struct bpf_prog_load_params *load_attr, int prog_idx)
enum bpf_prog_type prog_type, const char *prog_name,
const char *license, struct bpf_insn *insns, size_t insn_cnt,
struct bpf_prog_load_opts *load_attr, int prog_idx)
{
int attr_size = offsetofend(union bpf_attr, fd_array);
int prog_load_attr, license, insns, func_info, line_info;
int prog_load_attr, license_off, insns_off, func_info, line_info, core_relos;
int attr_size = offsetofend(union bpf_attr, core_relo_rec_size);
union bpf_attr attr;
memset(&attr, 0, attr_size);
pr_debug("gen: prog_load: type %d insns_cnt %zd\n",
load_attr->prog_type, load_attr->insn_cnt);
pr_debug("gen: prog_load: type %d insns_cnt %zd progi_idx %d\n",
prog_type, insn_cnt, prog_idx);
/* add license string to blob of bytes */
license = add_data(gen, load_attr->license, strlen(load_attr->license) + 1);
license_off = add_data(gen, license, strlen(license) + 1);
/* add insns to blob of bytes */
insns = add_data(gen, load_attr->insns,
load_attr->insn_cnt * sizeof(struct bpf_insn));
insns_off = add_data(gen, insns, insn_cnt * sizeof(struct bpf_insn));
attr.prog_type = load_attr->prog_type;
attr.prog_type = prog_type;
attr.expected_attach_type = load_attr->expected_attach_type;
attr.attach_btf_id = load_attr->attach_btf_id;
attr.prog_ifindex = load_attr->prog_ifindex;
attr.kern_version = 0;
attr.insn_cnt = (__u32)load_attr->insn_cnt;
attr.insn_cnt = (__u32)insn_cnt;
attr.prog_flags = load_attr->prog_flags;
attr.func_info_rec_size = load_attr->func_info_rec_size;
@@ -934,15 +966,19 @@ void bpf_gen__prog_load(struct bpf_gen *gen,
line_info = add_data(gen, load_attr->line_info,
attr.line_info_cnt * attr.line_info_rec_size);
memcpy(attr.prog_name, load_attr->name,
min((unsigned)strlen(load_attr->name), BPF_OBJ_NAME_LEN - 1));
attr.core_relo_rec_size = sizeof(struct bpf_core_relo);
attr.core_relo_cnt = gen->core_relo_cnt;
core_relos = add_data(gen, gen->core_relos,
attr.core_relo_cnt * attr.core_relo_rec_size);
libbpf_strlcpy(attr.prog_name, prog_name, sizeof(attr.prog_name));
prog_load_attr = add_data(gen, &attr, attr_size);
/* populate union bpf_attr with a pointer to license */
emit_rel_store(gen, attr_field(prog_load_attr, license), license);
emit_rel_store(gen, attr_field(prog_load_attr, license), license_off);
/* populate union bpf_attr with a pointer to instructions */
emit_rel_store(gen, attr_field(prog_load_attr, insns), insns);
emit_rel_store(gen, attr_field(prog_load_attr, insns), insns_off);
/* populate union bpf_attr with a pointer to func_info */
emit_rel_store(gen, attr_field(prog_load_attr, func_info), func_info);
@@ -950,6 +986,9 @@ void bpf_gen__prog_load(struct bpf_gen *gen,
/* populate union bpf_attr with a pointer to line_info */
emit_rel_store(gen, attr_field(prog_load_attr, line_info), line_info);
/* populate union bpf_attr with a pointer to core_relos */
emit_rel_store(gen, attr_field(prog_load_attr, core_relos), core_relos);
/* populate union bpf_attr fd_array with a pointer to data where map_fds are saved */
emit_rel_store(gen, attr_field(prog_load_attr, fd_array), gen->fd_array);
@@ -974,15 +1013,17 @@ void bpf_gen__prog_load(struct bpf_gen *gen,
emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_0, BPF_REG_7,
offsetof(union bpf_attr, attach_btf_obj_fd)));
}
emit_relos(gen, insns);
emit_relos(gen, insns_off);
/* emit PROG_LOAD command */
emit_sys_bpf(gen, BPF_PROG_LOAD, prog_load_attr, attr_size);
debug_ret(gen, "prog_load %s insn_cnt %d", attr.prog_name, attr.insn_cnt);
/* successful or not, close btf module FDs used in extern ksyms and attach_btf_obj_fd */
cleanup_relos(gen, insns);
if (gen->attach_kind)
cleanup_relos(gen, insns_off);
if (gen->attach_kind) {
emit_sys_close_blob(gen,
attr_field(prog_load_attr, attach_btf_obj_fd));
gen->attach_kind = 0;
}
emit_check_err(gen);
/* remember prog_fd in the stack, if successful */
emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_7,
@@ -1004,18 +1045,27 @@ void bpf_gen__map_update_elem(struct bpf_gen *gen, int map_idx, void *pvalue,
value = add_data(gen, pvalue, value_size);
key = add_data(gen, &zero, sizeof(zero));
/* if (map_desc[map_idx].initial_value)
* copy_from_user(value, initial_value, value_size);
/* if (map_desc[map_idx].initial_value) {
* if (ctx->flags & BPF_SKEL_KERNEL)
* bpf_probe_read_kernel(value, value_size, initial_value);
* else
* bpf_copy_from_user(value, value_size, initial_value);
* }
*/
emit(gen, BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_6,
sizeof(struct bpf_loader_ctx) +
sizeof(struct bpf_map_desc) * map_idx +
offsetof(struct bpf_map_desc, initial_value)));
emit(gen, BPF_JMP_IMM(BPF_JEQ, BPF_REG_3, 0, 4));
emit(gen, BPF_JMP_IMM(BPF_JEQ, BPF_REG_3, 0, 8));
emit2(gen, BPF_LD_IMM64_RAW_FULL(BPF_REG_1, BPF_PSEUDO_MAP_IDX_VALUE,
0, 0, 0, value));
emit(gen, BPF_MOV64_IMM(BPF_REG_2, value_size));
emit(gen, BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_6,
offsetof(struct bpf_loader_ctx, flags)));
emit(gen, BPF_JMP_IMM(BPF_JSET, BPF_REG_0, BPF_SKEL_KERNEL, 2));
emit(gen, BPF_EMIT_CALL(BPF_FUNC_copy_from_user));
emit(gen, BPF_JMP_IMM(BPF_JA, 0, 0, 1));
emit(gen, BPF_EMIT_CALL(BPF_FUNC_probe_read_kernel));
map_update_attr = add_data(gen, &attr, attr_size);
move_blob2blob(gen, attr_field(map_update_attr, map_fd), 4,
@@ -1028,6 +1078,33 @@ void bpf_gen__map_update_elem(struct bpf_gen *gen, int map_idx, void *pvalue,
emit_check_err(gen);
}
void bpf_gen__populate_outer_map(struct bpf_gen *gen, int outer_map_idx, int slot,
int inner_map_idx)
{
int attr_size = offsetofend(union bpf_attr, flags);
int map_update_attr, key;
union bpf_attr attr;
memset(&attr, 0, attr_size);
pr_debug("gen: populate_outer_map: outer %d key %d inner %d\n",
outer_map_idx, slot, inner_map_idx);
key = add_data(gen, &slot, sizeof(slot));
map_update_attr = add_data(gen, &attr, attr_size);
move_blob2blob(gen, attr_field(map_update_attr, map_fd), 4,
blob_fd_array_off(gen, outer_map_idx));
emit_rel_store(gen, attr_field(map_update_attr, key), key);
emit_rel_store(gen, attr_field(map_update_attr, value),
blob_fd_array_off(gen, inner_map_idx));
/* emit MAP_UPDATE_ELEM command */
emit_sys_bpf(gen, BPF_MAP_UPDATE_ELEM, map_update_attr, attr_size);
debug_ret(gen, "populate_outer_map outer %d key %d inner %d",
outer_map_idx, slot, inner_map_idx);
emit_check_err(gen);
}
void bpf_gen__map_freeze(struct bpf_gen *gen, int map_idx)
{
int attr_size = offsetofend(union bpf_attr, map_fd);

View File

@@ -75,7 +75,7 @@ void hashmap__clear(struct hashmap *map)
void hashmap__free(struct hashmap *map)
{
if (!map)
if (IS_ERR_OR_NULL(map))
return;
hashmap__clear(map);
@@ -128,7 +128,7 @@ static int hashmap_grow(struct hashmap *map)
}
static bool hashmap_find_entry(const struct hashmap *map,
const void *key, size_t hash,
const long key, size_t hash,
struct hashmap_entry ***pprev,
struct hashmap_entry **entry)
{
@@ -151,18 +151,18 @@ static bool hashmap_find_entry(const struct hashmap *map,
return false;
}
int hashmap__insert(struct hashmap *map, const void *key, void *value,
enum hashmap_insert_strategy strategy,
const void **old_key, void **old_value)
int hashmap_insert(struct hashmap *map, long key, long value,
enum hashmap_insert_strategy strategy,
long *old_key, long *old_value)
{
struct hashmap_entry *entry;
size_t h;
int err;
if (old_key)
*old_key = NULL;
*old_key = 0;
if (old_value)
*old_value = NULL;
*old_value = 0;
h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
if (strategy != HASHMAP_APPEND &&
@@ -203,7 +203,7 @@ int hashmap__insert(struct hashmap *map, const void *key, void *value,
return 0;
}
bool hashmap__find(const struct hashmap *map, const void *key, void **value)
bool hashmap_find(const struct hashmap *map, long key, long *value)
{
struct hashmap_entry *entry;
size_t h;
@@ -217,8 +217,8 @@ bool hashmap__find(const struct hashmap *map, const void *key, void **value)
return true;
}
bool hashmap__delete(struct hashmap *map, const void *key,
const void **old_key, void **old_value)
bool hashmap_delete(struct hashmap *map, long key,
long *old_key, long *old_value)
{
struct hashmap_entry **pprev, *entry;
size_t h;
@@ -238,4 +238,3 @@ bool hashmap__delete(struct hashmap *map, const void *key,
return true;
}

View File

@@ -40,12 +40,32 @@ static inline size_t str_hash(const char *s)
return h;
}
typedef size_t (*hashmap_hash_fn)(const void *key, void *ctx);
typedef bool (*hashmap_equal_fn)(const void *key1, const void *key2, void *ctx);
typedef size_t (*hashmap_hash_fn)(long key, void *ctx);
typedef bool (*hashmap_equal_fn)(long key1, long key2, void *ctx);
/*
* Hashmap interface is polymorphic, keys and values could be either
* long-sized integers or pointers, this is achieved as follows:
* - interface functions that operate on keys and values are hidden
* behind auxiliary macros, e.g. hashmap_insert <-> hashmap__insert;
* - these auxiliary macros cast the key and value parameters as
* long or long *, so the user does not have to specify the casts explicitly;
* - for pointer parameters (e.g. old_key) the size of the pointed
* type is verified by hashmap_cast_ptr using _Static_assert;
* - when iterating using hashmap__for_each_* forms
* hasmap_entry->key should be used for integer keys and
* hasmap_entry->pkey should be used for pointer keys,
* same goes for values.
*/
struct hashmap_entry {
const void *key;
void *value;
union {
long key;
const void *pkey;
};
union {
long value;
void *pvalue;
};
struct hashmap_entry *next;
};
@@ -60,16 +80,6 @@ struct hashmap {
size_t sz;
};
#define HASHMAP_INIT(hash_fn, equal_fn, ctx) { \
.hash_fn = (hash_fn), \
.equal_fn = (equal_fn), \
.ctx = (ctx), \
.buckets = NULL, \
.cap = 0, \
.cap_bits = 0, \
.sz = 0, \
}
void hashmap__init(struct hashmap *map, hashmap_hash_fn hash_fn,
hashmap_equal_fn equal_fn, void *ctx);
struct hashmap *hashmap__new(hashmap_hash_fn hash_fn,
@@ -102,6 +112,13 @@ enum hashmap_insert_strategy {
HASHMAP_APPEND,
};
#define hashmap_cast_ptr(p) ({ \
_Static_assert((__builtin_constant_p((p)) ? (p) == NULL : 0) || \
sizeof(*(p)) == sizeof(long), \
#p " pointee should be a long-sized integer or a pointer"); \
(long *)(p); \
})
/*
* hashmap__insert() adds key/value entry w/ various semantics, depending on
* provided strategy value. If a given key/value pair replaced already
@@ -109,42 +126,38 @@ enum hashmap_insert_strategy {
* through old_key and old_value to allow calling code do proper memory
* management.
*/
int hashmap__insert(struct hashmap *map, const void *key, void *value,
enum hashmap_insert_strategy strategy,
const void **old_key, void **old_value);
int hashmap_insert(struct hashmap *map, long key, long value,
enum hashmap_insert_strategy strategy,
long *old_key, long *old_value);
static inline int hashmap__add(struct hashmap *map,
const void *key, void *value)
{
return hashmap__insert(map, key, value, HASHMAP_ADD, NULL, NULL);
}
#define hashmap__insert(map, key, value, strategy, old_key, old_value) \
hashmap_insert((map), (long)(key), (long)(value), (strategy), \
hashmap_cast_ptr(old_key), \
hashmap_cast_ptr(old_value))
static inline int hashmap__set(struct hashmap *map,
const void *key, void *value,
const void **old_key, void **old_value)
{
return hashmap__insert(map, key, value, HASHMAP_SET,
old_key, old_value);
}
#define hashmap__add(map, key, value) \
hashmap__insert((map), (key), (value), HASHMAP_ADD, NULL, NULL)
static inline int hashmap__update(struct hashmap *map,
const void *key, void *value,
const void **old_key, void **old_value)
{
return hashmap__insert(map, key, value, HASHMAP_UPDATE,
old_key, old_value);
}
#define hashmap__set(map, key, value, old_key, old_value) \
hashmap__insert((map), (key), (value), HASHMAP_SET, (old_key), (old_value))
static inline int hashmap__append(struct hashmap *map,
const void *key, void *value)
{
return hashmap__insert(map, key, value, HASHMAP_APPEND, NULL, NULL);
}
#define hashmap__update(map, key, value, old_key, old_value) \
hashmap__insert((map), (key), (value), HASHMAP_UPDATE, (old_key), (old_value))
bool hashmap__delete(struct hashmap *map, const void *key,
const void **old_key, void **old_value);
#define hashmap__append(map, key, value) \
hashmap__insert((map), (key), (value), HASHMAP_APPEND, NULL, NULL)
bool hashmap__find(const struct hashmap *map, const void *key, void **value);
bool hashmap_delete(struct hashmap *map, long key, long *old_key, long *old_value);
#define hashmap__delete(map, key, old_key, old_value) \
hashmap_delete((map), (long)(key), \
hashmap_cast_ptr(old_key), \
hashmap_cast_ptr(old_value))
bool hashmap_find(const struct hashmap *map, long key, long *value);
#define hashmap__find(map, key, value) \
hashmap_find((map), (long)(key), hashmap_cast_ptr(value))
/*
* hashmap__for_each_entry - iterate over all entries in hashmap

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,29 +1,14 @@
LIBBPF_0.0.1 {
global:
bpf_btf_get_fd_by_id;
bpf_create_map;
bpf_create_map_in_map;
bpf_create_map_in_map_node;
bpf_create_map_name;
bpf_create_map_node;
bpf_create_map_xattr;
bpf_load_btf;
bpf_load_program;
bpf_load_program_xattr;
bpf_map__btf_key_type_id;
bpf_map__btf_value_type_id;
bpf_map__def;
bpf_map__fd;
bpf_map__is_offload_neutral;
bpf_map__name;
bpf_map__next;
bpf_map__pin;
bpf_map__prev;
bpf_map__priv;
bpf_map__reuse_fd;
bpf_map__set_ifindex;
bpf_map__set_inner_map_fd;
bpf_map__set_priv;
bpf_map__unpin;
bpf_map_delete_elem;
bpf_map_get_fd_by_id;
@@ -38,79 +23,37 @@ LIBBPF_0.0.1 {
bpf_object__btf_fd;
bpf_object__close;
bpf_object__find_map_by_name;
bpf_object__find_map_by_offset;
bpf_object__find_program_by_title;
bpf_object__kversion;
bpf_object__load;
bpf_object__name;
bpf_object__next;
bpf_object__open;
bpf_object__open_buffer;
bpf_object__open_xattr;
bpf_object__pin;
bpf_object__pin_maps;
bpf_object__pin_programs;
bpf_object__priv;
bpf_object__set_priv;
bpf_object__unload;
bpf_object__unpin_maps;
bpf_object__unpin_programs;
bpf_perf_event_read_simple;
bpf_prog_attach;
bpf_prog_detach;
bpf_prog_detach2;
bpf_prog_get_fd_by_id;
bpf_prog_get_next_id;
bpf_prog_load;
bpf_prog_load_xattr;
bpf_prog_query;
bpf_prog_test_run;
bpf_prog_test_run_xattr;
bpf_program__fd;
bpf_program__is_kprobe;
bpf_program__is_perf_event;
bpf_program__is_raw_tracepoint;
bpf_program__is_sched_act;
bpf_program__is_sched_cls;
bpf_program__is_socket_filter;
bpf_program__is_tracepoint;
bpf_program__is_xdp;
bpf_program__load;
bpf_program__next;
bpf_program__nth_fd;
bpf_program__pin;
bpf_program__pin_instance;
bpf_program__prev;
bpf_program__priv;
bpf_program__set_expected_attach_type;
bpf_program__set_ifindex;
bpf_program__set_kprobe;
bpf_program__set_perf_event;
bpf_program__set_prep;
bpf_program__set_priv;
bpf_program__set_raw_tracepoint;
bpf_program__set_sched_act;
bpf_program__set_sched_cls;
bpf_program__set_socket_filter;
bpf_program__set_tracepoint;
bpf_program__set_type;
bpf_program__set_xdp;
bpf_program__title;
bpf_program__unload;
bpf_program__unpin;
bpf_program__unpin_instance;
bpf_prog_linfo__free;
bpf_prog_linfo__new;
bpf_prog_linfo__lfind_addr_func;
bpf_prog_linfo__lfind;
bpf_raw_tracepoint_open;
bpf_set_link_xdp_fd;
bpf_task_fd_query;
bpf_verify_program;
btf__fd;
btf__find_by_name;
btf__free;
btf__get_from_id;
btf__name_by_offset;
btf__new;
btf__resolve_size;
@@ -127,48 +70,24 @@ LIBBPF_0.0.1 {
LIBBPF_0.0.2 {
global:
bpf_probe_helper;
bpf_probe_map_type;
bpf_probe_prog_type;
bpf_map__resize;
bpf_map_lookup_elem_flags;
bpf_object__btf;
bpf_object__find_map_fd_by_name;
bpf_get_link_xdp_id;
btf__dedup;
btf__get_map_kv_tids;
btf__get_nr_types;
btf__get_raw_data;
btf__load;
btf_ext__free;
btf_ext__func_info_rec_size;
btf_ext__get_raw_data;
btf_ext__line_info_rec_size;
btf_ext__new;
btf_ext__reloc_func_info;
btf_ext__reloc_line_info;
xsk_umem__create;
xsk_socket__create;
xsk_umem__delete;
xsk_socket__delete;
xsk_umem__fd;
xsk_socket__fd;
bpf_program__get_prog_info_linear;
bpf_program__bpil_addr_to_offs;
bpf_program__bpil_offs_to_addr;
} LIBBPF_0.0.1;
LIBBPF_0.0.3 {
global:
bpf_map__is_internal;
bpf_map_freeze;
btf__finalize_data;
} LIBBPF_0.0.2;
LIBBPF_0.0.4 {
global:
bpf_link__destroy;
bpf_object__load_xattr;
bpf_program__attach_kprobe;
bpf_program__attach_perf_event;
bpf_program__attach_raw_tracepoint;
@@ -176,14 +95,10 @@ LIBBPF_0.0.4 {
bpf_program__attach_uprobe;
btf_dump__dump_type;
btf_dump__free;
btf_dump__new;
btf__parse_elf;
libbpf_num_possible_cpus;
perf_buffer__free;
perf_buffer__new;
perf_buffer__new_raw;
perf_buffer__poll;
xsk_umem__create;
} LIBBPF_0.0.3;
LIBBPF_0.0.5 {
@@ -193,7 +108,6 @@ LIBBPF_0.0.5 {
LIBBPF_0.0.6 {
global:
bpf_get_link_xdp_info;
bpf_map__get_pin_path;
bpf_map__is_pinned;
bpf_map__set_pin_path;
@@ -202,9 +116,6 @@ LIBBPF_0.0.6 {
bpf_program__attach_trace;
bpf_program__get_expected_attach_type;
bpf_program__get_type;
bpf_program__is_tracing;
bpf_program__set_tracing;
bpf_program__size;
btf__find_by_name_kind;
libbpf_find_vmlinux_btf_id;
} LIBBPF_0.0.5;
@@ -224,14 +135,8 @@ LIBBPF_0.0.7 {
bpf_object__detach_skeleton;
bpf_object__load_skeleton;
bpf_object__open_skeleton;
bpf_probe_large_insn_limit;
bpf_prog_attach_xattr;
bpf_program__attach;
bpf_program__name;
bpf_program__is_extension;
bpf_program__is_struct_ops;
bpf_program__set_extension;
bpf_program__set_struct_ops;
btf__align_of;
libbpf_find_kernel_btf;
} LIBBPF_0.0.6;
@@ -247,12 +152,10 @@ LIBBPF_0.0.8 {
bpf_link_create;
bpf_link_update;
bpf_map__set_initial_value;
bpf_prog_attach_opts;
bpf_program__attach_cgroup;
bpf_program__attach_lsm;
bpf_program__is_lsm;
bpf_program__set_attach_target;
bpf_program__set_lsm;
bpf_set_link_xdp_fd_opts;
} LIBBPF_0.0.7;
LIBBPF_0.0.9 {
@@ -290,9 +193,7 @@ LIBBPF_0.1.0 {
bpf_map__value_size;
bpf_program__attach_xdp;
bpf_program__autoload;
bpf_program__is_sk_lookup;
bpf_program__set_autoload;
bpf_program__set_sk_lookup;
btf__parse;
btf__parse_raw;
btf__pointer_size;
@@ -335,7 +236,6 @@ LIBBPF_0.2.0 {
perf_buffer__buffer_fd;
perf_buffer__epoll_fd;
perf_buffer__consume_buffer;
xsk_socket__create_shared;
} LIBBPF_0.1.0;
LIBBPF_0.3.0 {
@@ -345,10 +245,7 @@ LIBBPF_0.3.0 {
btf__parse_raw_split;
btf__parse_split;
btf__new_empty_split;
btf__new_split;
ring_buffer__epoll_fd;
xsk_setup_xdp_prog;
xsk_socket__update_xskmap;
} LIBBPF_0.2.0;
LIBBPF_0.4.0 {
@@ -391,14 +288,138 @@ LIBBPF_0.6.0 {
global:
bpf_map__map_extra;
bpf_map__set_map_extra;
bpf_map_create;
bpf_object__next_map;
bpf_object__next_program;
bpf_object__prev_map;
bpf_object__prev_program;
bpf_prog_load;
bpf_program__flags;
bpf_program__insn_cnt;
bpf_program__insns;
bpf_program__set_flags;
btf__add_btf;
btf__add_decl_tag;
btf__add_type_tag;
btf__dedup;
btf__raw_data;
btf__type_cnt;
btf_dump__new;
libbpf_major_version;
libbpf_minor_version;
libbpf_version_string;
perf_buffer__new;
perf_buffer__new_raw;
} LIBBPF_0.5.0;
LIBBPF_0.7.0 {
global:
bpf_btf_load;
bpf_program__expected_attach_type;
bpf_program__log_buf;
bpf_program__log_level;
bpf_program__set_log_buf;
bpf_program__set_log_level;
bpf_program__type;
bpf_xdp_attach;
bpf_xdp_detach;
bpf_xdp_query;
bpf_xdp_query_id;
libbpf_probe_bpf_helper;
libbpf_probe_bpf_map_type;
libbpf_probe_bpf_prog_type;
libbpf_set_memlock_rlim;
} LIBBPF_0.6.0;
LIBBPF_0.8.0 {
global:
bpf_map__autocreate;
bpf_map__get_next_key;
bpf_map__delete_elem;
bpf_map__lookup_and_delete_elem;
bpf_map__lookup_elem;
bpf_map__set_autocreate;
bpf_map__update_elem;
bpf_map_delete_elem_flags;
bpf_object__destroy_subskeleton;
bpf_object__open_subskeleton;
bpf_program__attach_kprobe_multi_opts;
bpf_program__attach_trace_opts;
bpf_program__attach_usdt;
bpf_program__set_insns;
libbpf_register_prog_handler;
libbpf_unregister_prog_handler;
} LIBBPF_0.7.0;
LIBBPF_1.0.0 {
global:
bpf_obj_get_opts;
bpf_prog_query_opts;
bpf_program__attach_ksyscall;
bpf_program__autoattach;
bpf_program__set_autoattach;
btf__add_enum64;
btf__add_enum64_value;
libbpf_bpf_attach_type_str;
libbpf_bpf_link_type_str;
libbpf_bpf_map_type_str;
libbpf_bpf_prog_type_str;
perf_buffer__buffer;
} LIBBPF_0.8.0;
LIBBPF_1.1.0 {
global:
bpf_btf_get_fd_by_id_opts;
bpf_link_get_fd_by_id_opts;
bpf_map_get_fd_by_id_opts;
bpf_prog_get_fd_by_id_opts;
user_ring_buffer__discard;
user_ring_buffer__free;
user_ring_buffer__new;
user_ring_buffer__reserve;
user_ring_buffer__reserve_blocking;
user_ring_buffer__submit;
} LIBBPF_1.0.0;
LIBBPF_1.2.0 {
global:
bpf_btf_get_info_by_fd;
bpf_link__update_map;
bpf_link_get_info_by_fd;
bpf_map_get_info_by_fd;
bpf_prog_get_info_by_fd;
} LIBBPF_1.1.0;
LIBBPF_1.3.0 {
global:
bpf_obj_pin_opts;
bpf_object__unpin;
bpf_prog_detach_opts;
bpf_program__attach_netfilter;
bpf_program__attach_netkit;
bpf_program__attach_tcx;
bpf_program__attach_uprobe_multi;
ring__avail_data_size;
ring__consume;
ring__consumer_pos;
ring__map_fd;
ring__producer_pos;
ring__size;
ring_buffer__ring;
} LIBBPF_1.2.0;
LIBBPF_1.4.0 {
global:
bpf_program__attach_raw_tracepoint_opts;
bpf_raw_tracepoint_open_opts;
bpf_token_create;
btf__new_split;
btf_ext__raw_data;
} LIBBPF_1.3.0;
LIBBPF_1.5.0 {
global:
bpf_program__attach_sockmap;
ring__consume_n;
ring_buffer__consume_n;
} LIBBPF_1.4.0;

View File

@@ -30,17 +30,24 @@
/* Add checks for other versions below when planning deprecation of API symbols
* with the LIBBPF_DEPRECATED_SINCE macro.
*/
#if __LIBBPF_CURRENT_VERSION_GEQ(0, 6)
#define __LIBBPF_MARK_DEPRECATED_0_6(X) X
#if __LIBBPF_CURRENT_VERSION_GEQ(1, 0)
#define __LIBBPF_MARK_DEPRECATED_1_0(X) X
#else
#define __LIBBPF_MARK_DEPRECATED_0_6(X)
#endif
#if __LIBBPF_CURRENT_VERSION_GEQ(0, 7)
#define __LIBBPF_MARK_DEPRECATED_0_7(X) X
#else
#define __LIBBPF_MARK_DEPRECATED_0_7(X)
#define __LIBBPF_MARK_DEPRECATED_1_0(X)
#endif
/* This set of internal macros allows to do "function overloading" based on
* number of arguments provided by used in backwards-compatible way during the
* transition to libbpf 1.0
* It's ugly but necessary evil that will be cleaned up when we get to 1.0.
* See bpf_prog_load() overload for example.
*/
#define ___libbpf_cat(A, B) A ## B
#define ___libbpf_select(NAME, NUM) ___libbpf_cat(NAME, NUM)
#define ___libbpf_nth(_1, _2, _3, _4, _5, _6, N, ...) N
#define ___libbpf_cnt(...) ___libbpf_nth(__VA_ARGS__, 6, 5, 4, 3, 2, 1)
#define ___libbpf_overload(NAME, ...) ___libbpf_select(NAME, ___libbpf_cnt(__VA_ARGS__))(__VA_ARGS__)
/* Helper macro to declare and initialize libbpf options struct
*
* This dance with uninitialized declaration, followed by memset to zero,
@@ -54,7 +61,7 @@
* including any extra padding, it with memset() and then assigns initial
* values provided by users in struct initializer-syntax as varargs.
*/
#define DECLARE_LIBBPF_OPTS(TYPE, NAME, ...) \
#define LIBBPF_OPTS(TYPE, NAME, ...) \
struct TYPE NAME = ({ \
memset(&NAME, 0, sizeof(struct TYPE)); \
(struct TYPE) { \
@@ -63,4 +70,23 @@
}; \
})
/* Helper macro to clear and optionally reinitialize libbpf options struct
*
* Small helper macro to reset all fields and to reinitialize the common
* structure size member. Values provided by users in struct initializer-
* syntax as varargs can be provided as well to reinitialize options struct
* specific members.
*/
#define LIBBPF_OPTS_RESET(NAME, ...) \
do { \
typeof(NAME) ___##NAME = ({ \
memset(&___##NAME, 0, sizeof(NAME)); \
(typeof(NAME)) { \
.sz = sizeof(NAME), \
__VA_ARGS__ \
}; \
}); \
memcpy(&NAME, &___##NAME, sizeof(NAME)); \
} while (0)
#endif /* __LIBBPF_LIBBPF_COMMON_H */

View File

@@ -39,14 +39,14 @@ static const char *libbpf_strerror_table[NR_ERRNO] = {
int libbpf_strerror(int err, char *buf, size_t size)
{
int ret;
if (!buf || !size)
return libbpf_err(-EINVAL);
err = err > 0 ? err : -err;
if (err < __LIBBPF_ERRNO__START) {
int ret;
ret = strerror_r(err, buf, size);
buf[size - 1] = '\0';
return libbpf_err_errno(ret);
@@ -56,12 +56,20 @@ int libbpf_strerror(int err, char *buf, size_t size)
const char *msg;
msg = libbpf_strerror_table[ERRNO_OFFSET(err)];
snprintf(buf, size, "%s", msg);
ret = snprintf(buf, size, "%s", msg);
buf[size - 1] = '\0';
/* The length of the buf and msg is positive.
* A negative number may be returned only when the
* size exceeds INT_MAX. Not likely to appear.
*/
if (ret >= size)
return libbpf_err(-ERANGE);
return 0;
}
snprintf(buf, size, "Unknown libbpf error %d", err);
ret = snprintf(buf, size, "Unknown libbpf error %d", err);
buf[size - 1] = '\0';
if (ret >= size)
return libbpf_err(-ERANGE);
return libbpf_err(-ENOENT);
}

View File

@@ -15,9 +15,24 @@
#include <linux/err.h>
#include <fcntl.h>
#include <unistd.h>
#include "libbpf_legacy.h"
#include <sys/syscall.h>
#include <libelf.h>
#include "relo_core.h"
/* Android's libc doesn't support AT_EACCESS in faccessat() implementation
* ([0]), and just returns -EINVAL even if file exists and is accessible.
* See [1] for issues caused by this.
*
* So just redefine it to 0 on Android.
*
* [0] https://android.googlesource.com/platform/bionic/+/refs/heads/android13-release/libc/bionic/faccessat.cpp#50
* [1] https://github.com/libbpf/libbpf-bootstrap/issues/250#issuecomment-1911324250
*/
#ifdef __ANDROID__
#undef AT_EACCESS
#define AT_EACCESS 0
#endif
/* make sure libbpf doesn't use kernel-only integer typedefs */
#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64
@@ -73,6 +88,8 @@
BTF_TYPE_ENC(name, BTF_INFO_ENC(BTF_KIND_FLOAT, 0, 0), sz)
#define BTF_TYPE_DECL_TAG_ENC(value, type, component_idx) \
BTF_TYPE_ENC(value, BTF_INFO_ENC(BTF_KIND_DECL_TAG, 0, 0), type), (component_idx)
#define BTF_TYPE_TYPE_TAG_ENC(value, type) \
BTF_TYPE_ENC(value, BTF_INFO_ENC(BTF_KIND_TYPE_TAG, 0, 0), type)
#ifndef likely
#define likely(x) __builtin_expect(!!(x), 1)
@@ -90,6 +107,9 @@
# define offsetofend(TYPE, FIELD) \
(offsetof(TYPE, FIELD) + sizeof(((TYPE *)0)->FIELD))
#endif
#ifndef __alias
#define __alias(symbol) __attribute__((alias(#symbol)))
#endif
/* Check whether a string `str` has prefix `pfx`, regardless if `pfx` is
* a string literal known at compilation time or char * pointer known only at
@@ -98,6 +118,17 @@
#define str_has_pfx(str, pfx) \
(strncmp(str, pfx, __builtin_constant_p(pfx) ? sizeof(pfx) - 1 : strlen(pfx)) == 0)
/* suffix check */
static inline bool str_has_sfx(const char *str, const char *sfx)
{
size_t str_len = strlen(str);
size_t sfx_len = strlen(sfx);
if (sfx_len > str_len)
return false;
return strcmp(str + str_len - sfx_len, sfx) == 0;
}
/* Symbol versioning is different between static and shared library.
* Properly versioned symbols are needed for shared library, but
* only the symbol of the new version is needed for static library.
@@ -143,6 +174,15 @@ do { \
#ifndef __has_builtin
#define __has_builtin(x) 0
#endif
struct bpf_link {
int (*detach)(struct bpf_link *link);
void (*dealloc)(struct bpf_link *link);
char *pin_path; /* NULL, if not pinned */
int fd; /* hook FD, -1 if not applicable */
bool disconnected;
};
/*
* Re-implement glibc's reallocarray() for libbpf internal-only use.
* reallocarray(), unfortunately, is not available in all versions of glibc,
@@ -167,10 +207,31 @@ static inline void *libbpf_reallocarray(void *ptr, size_t nmemb, size_t size)
return realloc(ptr, total);
}
/* Copy up to sz - 1 bytes from zero-terminated src string and ensure that dst
* is zero-terminated string no matter what (unless sz == 0, in which case
* it's a no-op). It's conceptually close to FreeBSD's strlcpy(), but differs
* in what is returned. Given this is internal helper, it's trivial to extend
* this, when necessary. Use this instead of strncpy inside libbpf source code.
*/
static inline void libbpf_strlcpy(char *dst, const char *src, size_t sz)
{
size_t i;
if (sz == 0)
return;
sz--;
for (i = 0; i < sz && src[i]; i++)
dst[i] = src[i];
dst[i] = '\0';
}
__u32 get_kernel_version(void);
struct btf;
struct btf_type;
struct btf_type *btf_type_by_id(struct btf *btf, __u32 type_id);
struct btf_type *btf_type_by_id(const struct btf *btf, __u32 type_id);
const char *btf_kind_str(const struct btf_type *t);
const struct btf_type *skip_mods_and_typedefs(const struct btf *btf, __u32 id, __u32 *res_id);
@@ -270,63 +331,80 @@ static inline bool libbpf_validate_opts(const char *opts,
(opts)->sz - __off); \
})
enum kern_feature_id {
/* v4.14: kernel support for program & map names. */
FEAT_PROG_NAME,
/* v5.2: kernel support for global data sections. */
FEAT_GLOBAL_DATA,
/* BTF support */
FEAT_BTF,
/* BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO support */
FEAT_BTF_FUNC,
/* BTF_KIND_VAR and BTF_KIND_DATASEC support */
FEAT_BTF_DATASEC,
/* BTF_FUNC_GLOBAL is supported */
FEAT_BTF_GLOBAL_FUNC,
/* BPF_F_MMAPABLE is supported for arrays */
FEAT_ARRAY_MMAP,
/* kernel support for expected_attach_type in BPF_PROG_LOAD */
FEAT_EXP_ATTACH_TYPE,
/* bpf_probe_read_{kernel,user}[_str] helpers */
FEAT_PROBE_READ_KERN,
/* BPF_PROG_BIND_MAP is supported */
FEAT_PROG_BIND_MAP,
/* Kernel support for module BTFs */
FEAT_MODULE_BTF,
/* BTF_KIND_FLOAT support */
FEAT_BTF_FLOAT,
/* BPF perf link support */
FEAT_PERF_LINK,
/* BTF_KIND_DECL_TAG support */
FEAT_BTF_DECL_TAG,
/* BTF_KIND_TYPE_TAG support */
FEAT_BTF_TYPE_TAG,
/* memcg-based accounting for BPF maps and progs */
FEAT_MEMCG_ACCOUNT,
/* BPF cookie (bpf_get_attach_cookie() BPF helper) support */
FEAT_BPF_COOKIE,
/* BTF_KIND_ENUM64 support and BTF_KIND_ENUM kflag support */
FEAT_BTF_ENUM64,
/* Kernel uses syscall wrapper (CONFIG_ARCH_HAS_SYSCALL_WRAPPER) */
FEAT_SYSCALL_WRAPPER,
/* BPF multi-uprobe link support */
FEAT_UPROBE_MULTI_LINK,
/* Kernel supports arg:ctx tag (__arg_ctx) for global subprogs natively */
FEAT_ARG_CTX_TAG,
/* Kernel supports '?' at the front of datasec names */
FEAT_BTF_QMARK_DATASEC,
__FEAT_CNT,
};
enum kern_feature_result {
FEAT_UNKNOWN = 0,
FEAT_SUPPORTED = 1,
FEAT_MISSING = 2,
};
struct kern_feature_cache {
enum kern_feature_result res[__FEAT_CNT];
int token_fd;
};
bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id);
bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id);
int probe_kern_syscall_wrapper(int token_fd);
int probe_memcg_account(int token_fd);
int bump_rlimit_memlock(void);
int parse_cpu_mask_str(const char *s, bool **mask, int *mask_sz);
int parse_cpu_mask_file(const char *fcpu, bool **mask, int *mask_sz);
int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
const char *str_sec, size_t str_len);
struct bpf_prog_load_params {
enum bpf_prog_type prog_type;
enum bpf_attach_type expected_attach_type;
const char *name;
const struct bpf_insn *insns;
size_t insn_cnt;
const char *license;
__u32 kern_version;
__u32 attach_prog_fd;
__u32 attach_btf_obj_fd;
__u32 attach_btf_id;
__u32 prog_ifindex;
__u32 prog_btf_fd;
__u32 prog_flags;
__u32 func_info_rec_size;
const void *func_info;
__u32 func_info_cnt;
__u32 line_info_rec_size;
const void *line_info;
__u32 line_info_cnt;
__u32 log_level;
char *log_buf;
size_t log_buf_sz;
int *fd_array;
};
int libbpf__bpf_prog_load(const struct bpf_prog_load_params *load_attr);
struct bpf_create_map_params {
const char *name;
enum bpf_map_type map_type;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
__u32 max_entries;
__u32 numa_node;
__u32 btf_fd;
__u32 btf_key_type_id;
__u32 btf_value_type_id;
__u32 map_ifindex;
union {
__u32 inner_map_fd;
__u32 btf_vmlinux_value_type_id;
};
__u64 map_extra;
};
int libbpf__bpf_create_map_xattr(const struct bpf_create_map_params *create_attr);
const char *str_sec, size_t str_len,
int token_fd);
int btf_load_into_kernel(struct btf *btf,
char *log_buf, size_t log_sz, __u32 log_level,
int token_fd);
struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf);
void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type,
@@ -340,6 +418,13 @@ struct btf_ext_info {
void *info;
__u32 rec_size;
__u32 len;
/* optional (maintained internally by libbpf) mapping between .BTF.ext
* section and corresponding ELF section. This is used to join
* information like CO-RE relocation records with corresponding BPF
* programs defined in ELF sections
*/
__u32 *sec_idxs;
int sec_cnt;
};
#define for_each_btf_ext_sec(seg, sec) \
@@ -433,8 +518,6 @@ int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void
__s32 btf__find_by_name_kind_own(const struct btf *btf, const char *type_name,
__u32 kind);
extern enum libbpf_strict_mode libbpf_mode;
/* handle direct returned errors */
static inline int libbpf_err(int ret)
{
@@ -448,12 +531,8 @@ static inline int libbpf_err(int ret)
*/
static inline int libbpf_err_errno(int ret)
{
if (libbpf_mode & LIBBPF_STRICT_DIRECT_ERRS)
/* errno is already assumed to be set on error */
return ret < 0 ? -errno : ret;
/* legacy: on error return -1 directly and don't touch errno */
return ret;
/* errno is already assumed to be set on error */
return ret < 0 ? -errno : ret;
}
/* handle error for pointer-returning APIs, err is assumed to be < 0 always */
@@ -461,12 +540,7 @@ static inline void *libbpf_err_ptr(int err)
{
/* set errno on error, this doesn't break anything */
errno = -err;
if (libbpf_mode & LIBBPF_STRICT_CLEAN_PTRS)
return NULL;
/* legacy: encode err as ptr */
return ERR_PTR(err);
return NULL;
}
/* handle pointer-returning APIs' error handling */
@@ -476,11 +550,7 @@ static inline void *libbpf_ptr(void *ret)
if (IS_ERR(ret))
errno = -PTR_ERR(ret);
if (libbpf_mode & LIBBPF_STRICT_CLEAN_PTRS)
return IS_ERR(ret) ? NULL : ret;
/* legacy: pass-through original pointer */
return ret;
return IS_ERR(ret) ? NULL : ret;
}
static inline bool str_is_empty(const char *s)
@@ -493,6 +563,17 @@ static inline bool is_ldimm64_insn(struct bpf_insn *insn)
return insn->code == (BPF_LD | BPF_IMM | BPF_DW);
}
/* Unconditionally dup FD, ensuring it doesn't use [0, 2] range.
* Original FD is not closed or altered in any other way.
* Preserves original FD value, if it's invalid (negative).
*/
static inline int dup_good_fd(int fd)
{
if (fd < 0)
return fd;
return fcntl(fd, F_DUPFD_CLOEXEC, 3);
}
/* if fd is stdin, stdout, or stderr, dup to a fd greater than 2
* Takes ownership of the fd passed in, and closes it if calling
* fcntl(fd, F_DUPFD_CLOEXEC, 3).
@@ -504,9 +585,10 @@ static inline int ensure_good_fd(int fd)
if (fd < 0)
return fd;
if (fd < 3) {
fd = fcntl(fd, F_DUPFD_CLOEXEC, 3);
fd = dup_good_fd(fd);
saved_errno = errno;
close(old_fd);
errno = saved_errno;
if (fd < 0) {
pr_warn("failed to dup FD %d to FD > 2: %d\n", old_fd, -saved_errno);
errno = saved_errno;
@@ -515,4 +597,73 @@ static inline int ensure_good_fd(int fd)
return fd;
}
static inline int sys_dup2(int oldfd, int newfd)
{
#ifdef __NR_dup2
return syscall(__NR_dup2, oldfd, newfd);
#else
return syscall(__NR_dup3, oldfd, newfd, 0);
#endif
}
/* Point *fixed_fd* to the same file that *tmp_fd* points to.
* Regardless of success, *tmp_fd* is closed.
* Whatever *fixed_fd* pointed to is closed silently.
*/
static inline int reuse_fd(int fixed_fd, int tmp_fd)
{
int err;
err = sys_dup2(tmp_fd, fixed_fd);
err = err < 0 ? -errno : 0;
close(tmp_fd); /* clean up temporary FD */
return err;
}
/* The following two functions are exposed to bpftool */
int bpf_core_add_cands(struct bpf_core_cand *local_cand,
size_t local_essent_len,
const struct btf *targ_btf,
const char *targ_btf_name,
int targ_start_id,
struct bpf_core_cand_list *cands);
void bpf_core_free_cands(struct bpf_core_cand_list *cands);
struct usdt_manager *usdt_manager_new(struct bpf_object *obj);
void usdt_manager_free(struct usdt_manager *man);
struct bpf_link * usdt_manager_attach_usdt(struct usdt_manager *man,
const struct bpf_program *prog,
pid_t pid, const char *path,
const char *usdt_provider, const char *usdt_name,
__u64 usdt_cookie);
static inline bool is_pow_of_2(size_t x)
{
return x && (x & (x - 1)) == 0;
}
#define PROG_LOAD_ATTEMPTS 5
int sys_bpf_prog_load(union bpf_attr *attr, unsigned int size, int attempts);
bool glob_match(const char *str, const char *pat);
long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name);
long elf_find_func_offset_from_file(const char *binary_path, const char *name);
struct elf_fd {
Elf *elf;
int fd;
};
int elf_open(const char *binary_path, struct elf_fd *elf_fd);
void elf_close(struct elf_fd *elf_fd);
int elf_resolve_syms_offsets(const char *binary_path, int cnt,
const char **syms, unsigned long **poffsets,
int st_type);
int elf_resolve_pattern_offsets(const char *binary_path, const char *pattern,
unsigned long **poffsets, size_t *pcnt);
int probe_fd(int fd);
#endif /* __LIBBPF_LIBBPF_INTERNAL_H */

View File

@@ -20,6 +20,11 @@
extern "C" {
#endif
/* As of libbpf 1.0 libbpf_set_strict_mode() and enum libbpf_struct_mode have
* no effect. But they are left in libbpf_legacy.h so that applications that
* prepared for libbpf 1.0 before final release by using
* libbpf_set_strict_mode() still work with libbpf 1.0+ without any changes.
*/
enum libbpf_strict_mode {
/* Turn on all supported strict features of libbpf to simulate libbpf
* v1.0 behavior.
@@ -45,7 +50,6 @@ enum libbpf_strict_mode {
* (positive) error code.
*/
LIBBPF_STRICT_DIRECT_ERRS = 0x02,
/*
* Enforce strict BPF program section (SEC()) names.
* E.g., while prefiously SEC("xdp_whatever") or SEC("perf_event_blah") were
@@ -55,6 +59,10 @@ enum libbpf_strict_mode {
*
* Note, in this mode the program pin path will be based on the
* function name instead of section name.
*
* Additionally, routines in the .text section are always considered
* sub-programs. Legacy behavior allows for a single routine in .text
* to be a program.
*/
LIBBPF_STRICT_SEC_NAME = 0x04,
/*
@@ -63,12 +71,67 @@ enum libbpf_strict_mode {
* Clients can maintain it on their own if it is valuable for them.
*/
LIBBPF_STRICT_NO_OBJECT_LIST = 0x08,
/*
* Automatically bump RLIMIT_MEMLOCK using setrlimit() before the
* first BPF program or map creation operation. This is done only if
* kernel is too old to support memcg-based memory accounting for BPF
* subsystem. By default, RLIMIT_MEMLOCK limit is set to RLIM_INFINITY,
* but it can be overriden with libbpf_set_memlock_rlim() API.
* Note that libbpf_set_memlock_rlim() needs to be called before
* the very first bpf_prog_load(), bpf_map_create() or bpf_object__load()
* operation.
*/
LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK = 0x10,
/*
* Error out on any SEC("maps") map definition, which are deprecated
* in favor of BTF-defined map definitions in SEC(".maps").
*/
LIBBPF_STRICT_MAP_DEFINITIONS = 0x20,
__LIBBPF_STRICT_LAST,
};
LIBBPF_API int libbpf_set_strict_mode(enum libbpf_strict_mode mode);
/**
* @brief **libbpf_get_error()** extracts the error code from the passed
* pointer
* @param ptr pointer returned from libbpf API function
* @return error code; or 0 if no error occured
*
* Note, as of libbpf 1.0 this function is not necessary and not recommended
* to be used. Libbpf doesn't return error code embedded into the pointer
* itself. Instead, NULL is returned on error and error code is passed through
* thread-local errno variable. **libbpf_get_error()** is just returning -errno
* value if it receives NULL, which is correct only if errno hasn't been
* modified between libbpf API call and corresponding **libbpf_get_error()**
* call. Prefer to check return for NULL and use errno directly.
*
* This API is left in libbpf 1.0 to allow applications that were 1.0-ready
* before final libbpf 1.0 without needing to change them.
*/
LIBBPF_API long libbpf_get_error(const void *ptr);
#define DECLARE_LIBBPF_OPTS LIBBPF_OPTS
/* "Discouraged" APIs which don't follow consistent libbpf naming patterns.
* They are normally a trivial aliases or wrappers for proper APIs and are
* left to minimize unnecessary disruption for users of libbpf. But they
* shouldn't be used going forward.
*/
struct bpf_program;
struct bpf_map;
struct btf;
struct btf_ext;
LIBBPF_API struct btf *libbpf_find_kernel_btf(void);
LIBBPF_API enum bpf_prog_type bpf_program__get_type(const struct bpf_program *prog);
LIBBPF_API enum bpf_attach_type bpf_program__get_expected_attach_type(const struct bpf_program *prog);
LIBBPF_API const char *bpf_map__get_pin_path(const struct bpf_map *map);
LIBBPF_API const void *btf__get_raw_data(const struct btf *btf, __u32 *size);
LIBBPF_API const void *btf_ext__get_raw_data(const struct btf_ext *btf_ext, __u32 *size);
#ifdef __cplusplus
} /* extern "C" */

View File

@@ -12,77 +12,154 @@
#include <linux/btf.h>
#include <linux/filter.h>
#include <linux/kernel.h>
#include <linux/version.h>
#include "bpf.h"
#include "libbpf.h"
#include "libbpf_internal.h"
static bool grep(const char *buffer, const char *pattern)
/* On Ubuntu LINUX_VERSION_CODE doesn't correspond to info.release,
* but Ubuntu provides /proc/version_signature file, as described at
* https://ubuntu.com/kernel, with an example contents below, which we
* can use to get a proper LINUX_VERSION_CODE.
*
* Ubuntu 5.4.0-12.15-generic 5.4.8
*
* In the above, 5.4.8 is what kernel is actually expecting, while
* uname() call will return 5.4.0 in info.release.
*/
static __u32 get_ubuntu_kernel_version(void)
{
return !!strstr(buffer, pattern);
}
const char *ubuntu_kver_file = "/proc/version_signature";
__u32 major, minor, patch;
int ret;
FILE *f;
static int get_vendor_id(int ifindex)
{
char ifname[IF_NAMESIZE], path[64], buf[8];
ssize_t len;
int fd;
if (!if_indextoname(ifindex, ifname))
return -1;
snprintf(path, sizeof(path), "/sys/class/net/%s/device/vendor", ifname);
fd = open(path, O_RDONLY | O_CLOEXEC);
if (fd < 0)
return -1;
len = read(fd, buf, sizeof(buf));
close(fd);
if (len < 0)
return -1;
if (len >= (ssize_t)sizeof(buf))
return -1;
buf[len] = '\0';
return strtol(buf, NULL, 0);
}
static int get_kernel_version(void)
{
int version, subversion, patchlevel;
struct utsname utsn;
/* Return 0 on failure, and attempt to probe with empty kversion */
if (uname(&utsn))
if (faccessat(AT_FDCWD, ubuntu_kver_file, R_OK, AT_EACCESS) != 0)
return 0;
if (sscanf(utsn.release, "%d.%d.%d",
&version, &subversion, &patchlevel) != 3)
f = fopen(ubuntu_kver_file, "re");
if (!f)
return 0;
return (version << 16) + (subversion << 8) + patchlevel;
ret = fscanf(f, "%*s %*s %u.%u.%u\n", &major, &minor, &patch);
fclose(f);
if (ret != 3)
return 0;
return KERNEL_VERSION(major, minor, patch);
}
static void
probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
size_t insns_cnt, char *buf, size_t buf_len, __u32 ifindex)
/* On Debian LINUX_VERSION_CODE doesn't correspond to info.release.
* Instead, it is provided in info.version. An example content of
* Debian 10 looks like the below.
*
* utsname::release 4.19.0-22-amd64
* utsname::version #1 SMP Debian 4.19.260-1 (2022-09-29)
*
* In the above, 4.19.260 is what kernel is actually expecting, while
* uname() call will return 4.19.0 in info.release.
*/
static __u32 get_debian_kernel_version(struct utsname *info)
{
struct bpf_load_program_attr xattr = {};
int fd;
__u32 major, minor, patch;
char *p;
p = strstr(info->version, "Debian ");
if (!p) {
/* This is not a Debian kernel. */
return 0;
}
if (sscanf(p, "Debian %u.%u.%u", &major, &minor, &patch) != 3)
return 0;
return KERNEL_VERSION(major, minor, patch);
}
__u32 get_kernel_version(void)
{
__u32 major, minor, patch, version;
struct utsname info;
/* Check if this is an Ubuntu kernel. */
version = get_ubuntu_kernel_version();
if (version != 0)
return version;
uname(&info);
/* Check if this is a Debian kernel. */
version = get_debian_kernel_version(&info);
if (version != 0)
return version;
if (sscanf(info.release, "%u.%u.%u", &major, &minor, &patch) != 3)
return 0;
if (major == 4 && minor == 19 && patch > 255)
return KERNEL_VERSION(major, minor, 255);;
return KERNEL_VERSION(major, minor, patch);
}
static int probe_prog_load(enum bpf_prog_type prog_type,
const struct bpf_insn *insns, size_t insns_cnt,
char *log_buf, size_t log_buf_sz)
{
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.log_buf = log_buf,
.log_size = log_buf_sz,
.log_level = log_buf ? 1 : 0,
);
int fd, err, exp_err = 0;
const char *exp_msg = NULL;
char buf[4096];
switch (prog_type) {
case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
xattr.expected_attach_type = BPF_CGROUP_INET4_CONNECT;
opts.expected_attach_type = BPF_CGROUP_INET4_CONNECT;
break;
case BPF_PROG_TYPE_CGROUP_SOCKOPT:
xattr.expected_attach_type = BPF_CGROUP_GETSOCKOPT;
opts.expected_attach_type = BPF_CGROUP_GETSOCKOPT;
break;
case BPF_PROG_TYPE_SK_LOOKUP:
xattr.expected_attach_type = BPF_SK_LOOKUP;
opts.expected_attach_type = BPF_SK_LOOKUP;
break;
case BPF_PROG_TYPE_KPROBE:
xattr.kern_version = get_kernel_version();
opts.kern_version = get_kernel_version();
break;
case BPF_PROG_TYPE_LIRC_MODE2:
opts.expected_attach_type = BPF_LIRC_MODE2;
break;
case BPF_PROG_TYPE_TRACING:
case BPF_PROG_TYPE_LSM:
opts.log_buf = buf;
opts.log_size = sizeof(buf);
opts.log_level = 1;
if (prog_type == BPF_PROG_TYPE_TRACING)
opts.expected_attach_type = BPF_TRACE_FENTRY;
else
opts.expected_attach_type = BPF_MODIFY_RETURN;
opts.attach_btf_id = 1;
exp_err = -EINVAL;
exp_msg = "attach_btf_id 1 is not a function";
break;
case BPF_PROG_TYPE_EXT:
opts.log_buf = buf;
opts.log_size = sizeof(buf);
opts.log_level = 1;
opts.attach_btf_id = 1;
exp_err = -EINVAL;
exp_msg = "Cannot replace kernel functions";
break;
case BPF_PROG_TYPE_SYSCALL:
opts.prog_flags = BPF_F_SLEEPABLE;
break;
case BPF_PROG_TYPE_STRUCT_OPS:
exp_err = -524; /* -ENOTSUPP */
break;
case BPF_PROG_TYPE_UNSPEC:
case BPF_PROG_TYPE_SOCKET_FILTER:
@@ -103,48 +180,50 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
case BPF_PROG_TYPE_RAW_TRACEPOINT:
case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
case BPF_PROG_TYPE_LWT_SEG6LOCAL:
case BPF_PROG_TYPE_LIRC_MODE2:
case BPF_PROG_TYPE_SK_REUSEPORT:
case BPF_PROG_TYPE_FLOW_DISSECTOR:
case BPF_PROG_TYPE_CGROUP_SYSCTL:
case BPF_PROG_TYPE_TRACING:
case BPF_PROG_TYPE_STRUCT_OPS:
case BPF_PROG_TYPE_EXT:
case BPF_PROG_TYPE_LSM:
default:
break;
case BPF_PROG_TYPE_NETFILTER:
opts.expected_attach_type = BPF_NETFILTER;
break;
default:
return -EOPNOTSUPP;
}
xattr.prog_type = prog_type;
xattr.insns = insns;
xattr.insns_cnt = insns_cnt;
xattr.license = "GPL";
xattr.prog_ifindex = ifindex;
fd = bpf_load_program_xattr(&xattr, buf, buf_len);
fd = bpf_prog_load(prog_type, NULL, "GPL", insns, insns_cnt, &opts);
err = -errno;
if (fd >= 0)
close(fd);
if (exp_err) {
if (fd >= 0 || err != exp_err)
return 0;
if (exp_msg && !strstr(buf, exp_msg))
return 0;
return 1;
}
return fd >= 0 ? 1 : 0;
}
bool bpf_probe_prog_type(enum bpf_prog_type prog_type, __u32 ifindex)
int libbpf_probe_bpf_prog_type(enum bpf_prog_type prog_type, const void *opts)
{
struct bpf_insn insns[2] = {
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN()
};
const size_t insn_cnt = ARRAY_SIZE(insns);
int ret;
if (ifindex && prog_type == BPF_PROG_TYPE_SCHED_CLS)
/* nfp returns -EINVAL on exit(0) with TC offload */
insns[0].imm = 2;
if (opts)
return libbpf_err(-EINVAL);
errno = 0;
probe_load(prog_type, insns, ARRAY_SIZE(insns), NULL, 0, ifindex);
return errno != EINVAL && errno != EOPNOTSUPP;
ret = probe_prog_load(prog_type, insns, insn_cnt, NULL, 0);
return libbpf_err(ret);
}
int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
const char *str_sec, size_t str_len)
const char *str_sec, size_t str_len,
int token_fd)
{
struct btf_header hdr = {
.magic = BTF_MAGIC,
@@ -154,6 +233,10 @@ int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
.str_off = types_len,
.str_len = str_len,
};
LIBBPF_OPTS(bpf_btf_load_opts, opts,
.token_fd = token_fd,
.btf_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int btf_fd, btf_len;
__u8 *raw_btf;
@@ -166,7 +249,7 @@ int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
memcpy(raw_btf + hdr.hdr_len, raw_types, hdr.type_len);
memcpy(raw_btf + hdr.hdr_len + hdr.type_len, str_sec, hdr.str_len);
btf_fd = bpf_load_btf(raw_btf, btf_len, NULL, 0, false);
btf_fd = bpf_btf_load(raw_btf, btf_len, &opts);
free(raw_btf);
return btf_fd;
@@ -196,20 +279,19 @@ static int load_local_storage_btf(void)
};
return libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs));
strs, sizeof(strs), 0);
}
bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
static int probe_map_create(enum bpf_map_type map_type)
{
int key_size, value_size, max_entries, map_flags;
LIBBPF_OPTS(bpf_map_create_opts, opts);
int key_size, value_size, max_entries;
__u32 btf_key_type_id = 0, btf_value_type_id = 0;
struct bpf_create_map_attr attr = {};
int fd = -1, btf_fd = -1, fd_inner;
int fd = -1, btf_fd = -1, fd_inner = -1, exp_err = 0, err = 0;
key_size = sizeof(__u32);
value_size = sizeof(__u32);
max_entries = 1;
map_flags = 0;
switch (map_type) {
case BPF_MAP_TYPE_STACK_TRACE:
@@ -218,7 +300,7 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
case BPF_MAP_TYPE_LPM_TRIE:
key_size = sizeof(__u64);
value_size = sizeof(__u64);
map_flags = BPF_F_NO_PREALLOC;
opts.map_flags = BPF_F_NO_PREALLOC;
break;
case BPF_MAP_TYPE_CGROUP_STORAGE:
case BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE:
@@ -233,21 +315,39 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
case BPF_MAP_TYPE_SK_STORAGE:
case BPF_MAP_TYPE_INODE_STORAGE:
case BPF_MAP_TYPE_TASK_STORAGE:
case BPF_MAP_TYPE_CGRP_STORAGE:
btf_key_type_id = 1;
btf_value_type_id = 3;
value_size = 8;
max_entries = 0;
map_flags = BPF_F_NO_PREALLOC;
opts.map_flags = BPF_F_NO_PREALLOC;
btf_fd = load_local_storage_btf();
if (btf_fd < 0)
return false;
return btf_fd;
break;
case BPF_MAP_TYPE_RINGBUF:
case BPF_MAP_TYPE_USER_RINGBUF:
key_size = 0;
value_size = 0;
max_entries = 4096;
max_entries = sysconf(_SC_PAGE_SIZE);
break;
case BPF_MAP_TYPE_STRUCT_OPS:
/* we'll get -ENOTSUPP for invalid BTF type ID for struct_ops */
opts.btf_vmlinux_value_type_id = 1;
opts.value_type_btf_obj_fd = -1;
exp_err = -524; /* -ENOTSUPP */
break;
case BPF_MAP_TYPE_BLOOM_FILTER:
key_size = 0;
max_entries = 1;
break;
case BPF_MAP_TYPE_ARENA:
key_size = 0;
value_size = 0;
max_entries = 1; /* one page */
opts.map_extra = 0; /* can mmap() at any address */
opts.map_flags = BPF_F_MMAPABLE;
break;
case BPF_MAP_TYPE_UNSPEC:
case BPF_MAP_TYPE_HASH:
case BPF_MAP_TYPE_ARRAY:
case BPF_MAP_TYPE_PROG_ARRAY:
@@ -266,95 +366,103 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
case BPF_MAP_TYPE_XSKMAP:
case BPF_MAP_TYPE_SOCKHASH:
case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY:
case BPF_MAP_TYPE_STRUCT_OPS:
default:
break;
case BPF_MAP_TYPE_UNSPEC:
default:
return -EOPNOTSUPP;
}
if (map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS ||
map_type == BPF_MAP_TYPE_HASH_OF_MAPS) {
/* TODO: probe for device, once libbpf has a function to create
* map-in-map for offload
*/
if (ifindex)
return false;
fd_inner = bpf_create_map(BPF_MAP_TYPE_HASH,
sizeof(__u32), sizeof(__u32), 1, 0);
fd_inner = bpf_map_create(BPF_MAP_TYPE_HASH, NULL,
sizeof(__u32), sizeof(__u32), 1, NULL);
if (fd_inner < 0)
return false;
fd = bpf_create_map_in_map(map_type, NULL, sizeof(__u32),
fd_inner, 1, 0);
close(fd_inner);
} else {
/* Note: No other restriction on map type probes for offload */
attr.map_type = map_type;
attr.key_size = key_size;
attr.value_size = value_size;
attr.max_entries = max_entries;
attr.map_flags = map_flags;
attr.map_ifindex = ifindex;
if (btf_fd >= 0) {
attr.btf_fd = btf_fd;
attr.btf_key_type_id = btf_key_type_id;
attr.btf_value_type_id = btf_value_type_id;
}
goto cleanup;
fd = bpf_create_map_xattr(&attr);
opts.inner_map_fd = fd_inner;
}
if (btf_fd >= 0) {
opts.btf_fd = btf_fd;
opts.btf_key_type_id = btf_key_type_id;
opts.btf_value_type_id = btf_value_type_id;
}
fd = bpf_map_create(map_type, NULL, key_size, value_size, max_entries, &opts);
err = -errno;
cleanup:
if (fd >= 0)
close(fd);
if (fd_inner >= 0)
close(fd_inner);
if (btf_fd >= 0)
close(btf_fd);
return fd >= 0;
if (exp_err)
return fd < 0 && err == exp_err ? 1 : 0;
else
return fd >= 0 ? 1 : 0;
}
bool bpf_probe_helper(enum bpf_func_id id, enum bpf_prog_type prog_type,
__u32 ifindex)
int libbpf_probe_bpf_map_type(enum bpf_map_type map_type, const void *opts)
{
struct bpf_insn insns[2] = {
BPF_EMIT_CALL(id),
BPF_EXIT_INSN()
int ret;
if (opts)
return libbpf_err(-EINVAL);
ret = probe_map_create(map_type);
return libbpf_err(ret);
}
int libbpf_probe_bpf_helper(enum bpf_prog_type prog_type, enum bpf_func_id helper_id,
const void *opts)
{
struct bpf_insn insns[] = {
BPF_EMIT_CALL((__u32)helper_id),
BPF_EXIT_INSN(),
};
char buf[4096] = {};
bool res;
const size_t insn_cnt = ARRAY_SIZE(insns);
char buf[4096];
int ret;
probe_load(prog_type, insns, ARRAY_SIZE(insns), buf, sizeof(buf),
ifindex);
res = !grep(buf, "invalid func ") && !grep(buf, "unknown func ");
if (opts)
return libbpf_err(-EINVAL);
if (ifindex) {
switch (get_vendor_id(ifindex)) {
case 0x19ee: /* Netronome specific */
res = res && !grep(buf, "not supported by FW") &&
!grep(buf, "unsupported function id");
break;
default:
break;
}
/* we can't successfully load all prog types to check for BPF helper
* support, so bail out with -EOPNOTSUPP error
*/
switch (prog_type) {
case BPF_PROG_TYPE_TRACING:
case BPF_PROG_TYPE_EXT:
case BPF_PROG_TYPE_LSM:
case BPF_PROG_TYPE_STRUCT_OPS:
return -EOPNOTSUPP;
default:
break;
}
return res;
}
/*
* Probe for availability of kernel commit (5.3):
*
* c04c0d2b968a ("bpf: increase complexity limit and maximum program size")
*/
bool bpf_probe_large_insn_limit(__u32 ifindex)
{
struct bpf_insn insns[BPF_MAXINSNS + 1];
int i;
for (i = 0; i < BPF_MAXINSNS; i++)
insns[i] = BPF_MOV64_IMM(BPF_REG_0, 1);
insns[BPF_MAXINSNS] = BPF_EXIT_INSN();
errno = 0;
probe_load(BPF_PROG_TYPE_SCHED_CLS, insns, ARRAY_SIZE(insns), NULL, 0,
ifindex);
return errno != E2BIG && errno != EINVAL;
buf[0] = '\0';
ret = probe_prog_load(prog_type, insns, insn_cnt, buf, sizeof(buf));
if (ret < 0)
return libbpf_err(ret);
/* If BPF verifier doesn't recognize BPF helper ID (enum bpf_func_id)
* at all, it will emit something like "invalid func unknown#181".
* If BPF verifier recognizes BPF helper but it's not supported for
* given BPF program type, it will emit "unknown func bpf_sys_bpf#166"
* or "program of this type cannot use helper bpf_sys_bpf#166".
* In both cases, provided combination of BPF program type and BPF
* helper is not supported by the kernel.
* In all other cases, probe_prog_load() above will either succeed (e.g.,
* because BPF helper happens to accept no input arguments or it
* accepts one input argument and initial PTR_TO_CTX is fine for
* that), or we'll get some more specific BPF verifier error about
* some unsatisfied conditions.
*/
if (ret == 0 && (strstr(buf, "invalid func ") || strstr(buf, "unknown func ") ||
strstr(buf, "program of this type cannot use helper ")))
return 0;
return 1; /* assume supported */
}

View File

@@ -3,7 +3,7 @@
#ifndef __LIBBPF_VERSION_H
#define __LIBBPF_VERSION_H
#define LIBBPF_MAJOR_VERSION 0
#define LIBBPF_MINOR_VERSION 6
#define LIBBPF_MAJOR_VERSION 1
#define LIBBPF_MINOR_VERSION 5
#endif /* __LIBBPF_VERSION_H */

Some files were not shown because too many files have changed in this diff Show More