Commit Graph

2381 Commits

Author SHA1 Message Date
Quentin Deslandes
2c6f445a8e libbpf: Fix misaligned array closing bracket
In btf_dump_array_data(), libbpf will call btf_dump_dump_type_data() for
each element. For an array of characters, each element will be
processed the following way:

- btf_dump_dump_type_data() is called to print the character
- btf_dump_data_pfx() prefixes the current line with the proper number
  of indentations
- btf_dump_int_data() is called to print the character
- After the last character is printed, btf_dump_dump_type_data() calls
  btf_dump_data_pfx() before writing the closing bracket

However, for an array containing characters, btf_dump_int_data() won't
print any '\0' and subsequent characters. This leads to situations where
the line prefix is written, no character is added, then the prefix is
written again before adding the closing bracket:

(struct sk_metadata){
    .str_array = (__u8[14])[
        'H',
        'e',
        'l',
        'l',
        'o',
                ],

This change solves this issue by printing the '\0' character, which
has two benefits:

- The bracket closing the array is properly aligned
- It's clear from a user point of view that libbpf uses '\0' as a
  terminator for arrays of characters.

Signed-off-by: Quentin Deslandes <qde@naccy.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240413211258.134421-2-qde@naccy.de
2024-04-24 15:16:35 -07:00
Yonghong Song
09397e309a libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP
Introduce a libbpf API function bpf_program__attach_sockmap()
which allow user to get a bpf_link for their corresponding programs.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240410043532.3737722-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24 15:16:35 -07:00
Yonghong Song
62217fb32a bpf: Add bpf_link support for sk_msg and sk_skb progs
Add bpf_link support for sk_msg and sk_skb programs. We have an
internal request to support bpf_link for sk_msg programs so user
space can have a uniform handling with bpf_link based libbpf
APIs. Using bpf_link based libbpf API also has a benefit which
makes system robust by decoupling prog life cycle and
attachment life cycle.

Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240410043527.3737160-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24 15:16:35 -07:00
Andrea Righi
b521a722b9 libbpf: Add ring__consume_n / ring_buffer__consume_n
Introduce a new API to consume items from a ring buffer, limited to a
specified amount, and return to the caller the actual number of items
consumed.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/lkml/20240310154726.734289-1-andrea.righi@canonical.com/T
Link: https://lore.kernel.org/bpf/20240406092005.92399-4-andrea.righi@canonical.com
2024-04-24 15:16:35 -07:00
Andrea Righi
98de9ace4d libbpf: ringbuf: Allow to consume up to a certain amount of items
In some cases, instead of always consuming all items from ring buffers
in a greedy way, we may want to consume up to a certain amount of items,
for example when we need to copy items from the BPF ring buffer to a
limited user buffer.

This change allows to set an upper limit to the amount of items consumed
from one or more ring buffers.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240406092005.92399-3-andrea.righi@canonical.com
2024-04-24 15:16:35 -07:00
Andrea Righi
26d9ab5f78 libbpf: Start v1.5 development cycle
Bump libbpf.map to v1.5.0 to start a new libbpf version cycle.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240406092005.92399-2-andrea.righi@canonical.com
2024-04-24 15:16:35 -07:00
Anton Protopopov
c5219d1b3d bpf: Pack struct bpf_fib_lookup
The struct bpf_fib_lookup is supposed to be of size 64. A recent commit
59b418c7063d ("bpf: Add a check for struct bpf_fib_lookup size") added
a static assertion to check this property so that future changes to the
structure will not accidentally break this assumption.

As it immediately turned out, on some 32-bit arm systems, when AEABI=n,
the total size of the structure was equal to 68, see [1]. This happened
because the bpf_fib_lookup structure contains a union of two 16-bit
fields:

    union {
            __u16 tot_len;
            __u16 mtu_result;
    };

which was supposed to compile to a 16-bit-aligned 16-bit field. On the
aforementioned setups it was instead both aligned and padded to 32-bits.

Declare this inner union as __attribute__((packed, aligned(2))) such
that it always is of size 2 and is aligned to 16 bits.

  [1] https://lore.kernel.org/all/CA+G9fYtsoP51f-oP_Sp5MOq-Ffv8La2RztNpwvE6+R1VtFiLrw@mail.gmail.com/#t

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes: e1850ea9bd9e ("bpf: bpf_fib_lookup return MTU value as output when looked up")
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240403123303.1452184-1-aspsk@isovalent.com
2024-04-24 15:16:35 -07:00
Tobias Böhm
8d3a3e138b libbpf: Use local bpf_helpers.h include
Commit 20d59ee55172fdf6 ("libbpf: add bpf_core_cast() macro") added a
bpf_helpers include in bpf_core_read.h as a system include. Usually, the
includes are local, though, like in bpf_tracing.h. This commit adjusts
the include to be local as well.

Signed-off-by: Tobias Böhm <tobias@aibor.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/q5d5bgc6vty2fmaazd5e73efd6f5bhiru2le6fxn43vkw45bls@fhlw2s5ootdb
2024-04-24 15:16:35 -07:00
David Lechner
9f2853a352 bpf: Fix typo in uapi doc comments
In a few places in the bpf uapi headers, EOPNOTSUPP is missing a "P" in
the doc comments. This adds the missing "P".

Signed-off-by: David Lechner <dlechner@baylibre.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240329152900.398260-2-dlechner@baylibre.com
2024-04-24 15:16:35 -07:00
Yonghong Song
d2f83fb976 libbpf: Handle <orig_name>.llvm.<hash> symbol properly
With CONFIG_LTO_CLANG_THIN enabled, with some of previous
version of kernel code base ([1]), I hit the following
error:
   test_ksyms:PASS:kallsyms_fopen 0 nsec
   test_ksyms:FAIL:ksym_find symbol 'bpf_link_fops' not found
   #118     ksyms:FAIL

The reason is that 'bpf_link_fops' is renamed to
   bpf_link_fops.llvm.8325593422554671469
Due to cross-file inlining, the static variable 'bpf_link_fops'
in syscall.c is used by a function in another file. To avoid
potential duplicated names, the llvm added suffix
'.llvm.<hash>' ([2]) to 'bpf_link_fops' variable.
Such renaming caused a problem in libbpf if 'bpf_link_fops'
is used in bpf prog as a ksym but 'bpf_link_fops' does not
match any symbol in /proc/kallsyms.

To fix this issue, libbpf needs to understand that suffix '.llvm.<hash>'
is caused by clang lto kernel and to process such symbols properly.

With latest bpf-next code base built with CONFIG_LTO_CLANG_THIN,
I cannot reproduce the above failure any more. But such an issue
could happen with other symbols or in the future for bpf_link_fops symbol.

For example, with my current kernel, I got the following from
/proc/kallsyms:
  ffffffff84782154 d __func__.net_ratelimit.llvm.6135436931166841955
  ffffffff85f0a500 d tk_core.llvm.726630847145216431
  ffffffff85fdb960 d __fs_reclaim_map.llvm.10487989720912350772
  ffffffff864c7300 d fake_dst_ops.llvm.54750082607048300

I could not easily create a selftest to test newly-added
libbpf functionality with a static C test since I do not know
which symbol is cross-file inlined. But based on my particular kernel,
the following test change can run successfully.

>  diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms.c b/tools/testing/selftests/bpf/prog_tests/ksyms.c
>  index 6a86d1f07800..904a103f7b1d 100644
>  --- a/tools/testing/selftests/bpf/prog_tests/ksyms.c
>  +++ b/tools/testing/selftests/bpf/prog_tests/ksyms.c
>  @@ -42,6 +42,7 @@ void test_ksyms(void)
>          ASSERT_EQ(data->out__bpf_link_fops, link_fops_addr, "bpf_link_fops");
>          ASSERT_EQ(data->out__bpf_link_fops1, 0, "bpf_link_fops1");
>          ASSERT_EQ(data->out__btf_size, btf_size, "btf_size");
>  +       ASSERT_NEQ(data->out__fake_dst_ops, 0, "fake_dst_ops");
>          ASSERT_EQ(data->out__per_cpu_start, per_cpu_start_addr, "__per_cpu_start");
>
>   cleanup:
>  diff --git a/tools/testing/selftests/bpf/progs/test_ksyms.c b/tools/testing/selftests/bpf/progs/test_ksyms.c
>  index 6c9cbb5a3bdf..fe91eef54b66 100644
>  --- a/tools/testing/selftests/bpf/progs/test_ksyms.c
>  +++ b/tools/testing/selftests/bpf/progs/test_ksyms.c
>  @@ -9,11 +9,13 @@ __u64 out__bpf_link_fops = -1;
>   __u64 out__bpf_link_fops1 = -1;
>   __u64 out__btf_size = -1;
>   __u64 out__per_cpu_start = -1;
>  +__u64 out__fake_dst_ops = -1;
>
>   extern const void bpf_link_fops __ksym;
>   extern const void __start_BTF __ksym;
>   extern const void __stop_BTF __ksym;
>   extern const void __per_cpu_start __ksym;
>  +extern const void fake_dst_ops __ksym;
>   /* non-existing symbol, weak, default to zero */
>   extern const void bpf_link_fops1 __ksym __weak;
>
>  @@ -23,6 +25,7 @@ int handler(const void *ctx)
>          out__bpf_link_fops = (__u64)&bpf_link_fops;
>          out__btf_size = (__u64)(&__stop_BTF - &__start_BTF);
>          out__per_cpu_start = (__u64)&__per_cpu_start;
>  +       out__fake_dst_ops = (__u64)&fake_dst_ops;
>
>          out__bpf_link_fops1 = (__u64)&bpf_link_fops1;

This patch fixed the issue in libbpf such that
the suffix '.llvm.<hash>' will be ignored during comparison of
bpf prog ksym vs. symbols in /proc/kallsyms, this resolved the issue.
Currently, only static variables in /proc/kallsyms are checked
with '.llvm.<hash>' suffix since in bpf programs function ksyms
with '.llvm.<hash>' suffix are most likely kfunc's and unlikely
to be cross-file inlined.

Note that currently kernel does not support gcc build with lto.

  [1] https://lore.kernel.org/bpf/20240302165017.1627295-1-yonghong.song@linux.dev/
  [2] https://github.com/llvm/llvm-project/blob/release/18.x/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1714-L1719

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240326041458.1198161-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24 15:16:35 -07:00
Yonghong Song
8b9cb7d479 libbpf: Mark libbpf_kallsyms_parse static function
Currently libbpf_kallsyms_parse() function is declared as a global
function but actually it is not a API and there is no external
users in bpftool/bpf-selftests. So let us mark the function as
static.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240326041453.1197949-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24 15:16:35 -07:00
Mykyta Yatsenko
b062410166 bpf: improve error message for unsupported helper
BPF verifier emits "unknown func" message when given BPF program type
does not support BPF helper. This message may be confusing for users, as
important context that helper is unknown only to current program type is
not provided.

This patch changes message to "program of this type cannot use helper "
and aligns dependent code in libbpf and tests. Any suggestions on
improving/changing this message are welcome.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/r/20240325152210.377548-1-yatsenko@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24 15:16:35 -07:00
Anton Protopopov
89d8cdf741 bpf: Add support for passing mark with bpf_fib_lookup
Extend the bpf_fib_lookup() helper by making it to utilize mark if
the BPF_FIB_LOOKUP_MARK flag is set. In order to pass the mark the
four bytes of struct bpf_fib_lookup are used, shared with the
output-only smac/dmac fields.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240326101742.17421-2-aspsk@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24 15:16:35 -07:00
Song Liu
8a2054f417 ci/diffs: Add temporary fix for mitigation config
Upstream is discussing the exact config to ship. In the meanwhile, which
would unblock CI.

More discussions here:

https://lore.kernel.org/lkml/20240423045548.1324969-1-song@kernel.org/T/#u

Signed-off-by: Song Liu <song@kernel.org>
2024-04-24 10:42:01 -07:00
Geyslan Gregório
46eafba62e ci: bump run-on-arch action to v2.7.1
More info: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/

Signed-off-by: Geyslan Gregório <geyslan@gmail.com>
2024-04-02 21:51:25 -07:00
Geyslan Gregório
6d3595d215 ci: bump checkout action to v4
Due to the transition from Node 16 to Node 20, the checkout action
needs to be updated to v4.

More info: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/

Signed-off-by: Geyslan Gregório <geyslan@gmail.com>
2024-04-02 10:25:11 -07:00
Andrii Nakryiko
20ea95b450 ci: sync DENYLISTs with BPF CI
Keep all the denylisted tests in sync.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
902af6913a ci: clean up temporary patch
It's already applied upstream.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
25a9cc27d7 ci: regenerate latest vmlinux.h
Update vmlinux.h to make BPF selftests compile.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
8db4a2feeb sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   e63985ecd22681c7f5975f2e8637187a326b6791
Checkpoint bpf-next commit: 14bb1e8c8d4ad5d9d2febb7d19c70a3cf536e1e5
Baseline bpf commit:        2487007aa3b9fafbd2cb14068f49791ce1d7ede5
Checkpoint bpf commit:      443574b033876c85a35de4c65c14f7fe092222b2

Alexei Starovoitov (6):
  libbpf: Allow specifying 64-bit integers in map BTF.
  bpf: Introduce bpf_arena.
  bpf: Disasm support for addr_space_cast instruction.
  libbpf: Add __arg_arena to bpf_helpers.h
  libbpf: Add support for bpf_arena.
  libbpf, selftests/bpf: Adjust libbpf, bpftool, selftests to match LLVM

Andrii Nakryiko (4):
  libbpf: Recognize __arena global variables.
  bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs
  libbpf: add support for BPF cookie for raw_tp/tp_btf programs
  libbpf: fix u64-to-pointer cast on 32-bit arches

Arnaldo Carvalho de Melo (1):
  libbpf: Define MFD_CLOEXEC if not available

Jakub Kicinski (2):
  netdev: add per-queue statistics
  netdev: add queue stat for alloc failures

Kui-Feng Lee (1):
  libbpf: Skip zeroed or null fields if not found in the kernel type.

Mykyta Yatsenko (1):
  libbpbpf: Check bpf_map/bpf_program fd validity

Quentin Monnet (1):
  libbpf: Prevent null-pointer dereference when prog to load has no BTF

Yonghong Song (2):
  libbpf: Add new sec_def "sk_skb/verdict"
  bpf: Sync uapi bpf.h to tools directory

 include/uapi/linux/bpf.h    |  20 ++-
 include/uapi/linux/netdev.h |  20 +++
 src/bpf.c                   |  16 +-
 src/bpf.h                   |   9 +
 src/bpf_helpers.h           |   2 +
 src/libbpf.c                | 322 +++++++++++++++++++++++++++++++-----
 src/libbpf.h                |  13 +-
 src/libbpf.map              |   2 +
 src/libbpf_probes.c         |   7 +
 9 files changed, 366 insertions(+), 45 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-25 21:58:26 -07:00
Arnaldo Carvalho de Melo
7fee466676 libbpf: Define MFD_CLOEXEC if not available
Since its going directly to the syscall to avoid not having
memfd_create() available in some systems, do the same for its
MFD_CLOEXEC flags, defining it if not available.

This fixes the build in those systems, noticed while building perf on a
set of build containers.

Fixes: 9fa5e1a180aa639f ("libbpf: Call memfd_create() syscall directly")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/ZfxZ9nCyKvwmpKkE@x1
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
ddf722fb5c libbpf: fix u64-to-pointer cast on 32-bit arches
It's been reported that (void *)map->map_extra is causing compilation
warnings on 32-bit architectures. It's easy enough to fix this by
casting to long first.

Fixes: 79ff13e99169 ("libbpf: Add support for bpf_arena.")
Reported-by: Ryan Eatmon <reatmon@ti.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319215143.1279312-1-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
137193b655 libbpf, selftests/bpf: Adjust libbpf, bpftool, selftests to match LLVM
The selftests use
to tell LLVM about special pointers. For LLVM there is nothing "arena"
about them. They are simply pointers in a different address space.
Hence LLVM diff https://github.com/llvm/llvm-project/pull/85161 renamed:
. macro __BPF_FEATURE_ARENA_CAST -> __BPF_FEATURE_ADDR_SPACE_CAST
. global variables in __attribute__((address_space(N))) are now
  placed in section named ".addr_space.N" instead of ".arena.N".

Adjust libbpf, bpftool, and selftests to match LLVM.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20240315021834.62988-3-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Yonghong Song
d2676a58de bpf: Sync uapi bpf.h to tools directory
There is a difference between kernel uapi bpf.h and tools
uapi bpf.h. There is no functionality difference, but let
us sync properly to make it easy for later bpf.h update.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240325033842.1693553-1-yonghong.song@linux.dev
2024-03-25 21:58:26 -07:00
Yonghong Song
4d95d8b7f0 libbpf: Add new sec_def "sk_skb/verdict"
The new sec_def specifies sk_skb program type with
BPF_SK_SKB_VERDICT attachment type. This way, libbpf
will set expected_attach_type properly for the program.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240319175412.2941149-1-yonghong.song@linux.dev
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
f5828cc352 libbpf: add support for BPF cookie for raw_tp/tp_btf programs
Wire up BPF cookie passing or raw_tp and tp_btf programs, both in
low-level and high-level APIs.

Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319233852.1977493-5-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
cbd6e3596c bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs
Wire up BPF cookie for raw tracepoint programs (both BTF and non-BTF
aware variants). This brings them up to part w.r.t. BPF cookie usage
with classic tracepoint and fentry/fexit programs.

Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319233852.1977493-4-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-25 21:58:26 -07:00
Mykyta Yatsenko
7cfc365995 libbpbpf: Check bpf_map/bpf_program fd validity
libbpf creates bpf_program/bpf_map structs for each program/map that
user defines, but it allows to disable creating/loading those objects in
kernel, in that case they won't have associated file descriptor
(fd < 0). Such functionality is used for backward compatibility
with some older kernels.

Nothing prevents users from passing these maps or programs with no
kernel counterpart to libbpf APIs. This change introduces explicit
checks for kernel objects existence, aiming to improve visibility of
those edge cases and provide meaningful warnings to users.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240318131808.95959-1-yatsenko@meta.com
2024-03-25 21:58:26 -07:00
Kui-Feng Lee
a5459eac49 libbpf: Skip zeroed or null fields if not found in the kernel type.
Accept additional fields of a struct_ops type with all zero values even if
these fields are not in the corresponding type in the kernel. This provides
a way to be backward compatible. User space programs can use the same map
on a machine running an old kernel by clearing fields that do not exist in
the kernel.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240313214139.685112-2-thinker.li@gmail.com
2024-03-25 21:58:26 -07:00
Quentin Monnet
f84ee80801 libbpf: Prevent null-pointer dereference when prog to load has no BTF
In bpf_objec_load_prog(), there's no guarantee that obj->btf is non-NULL
when passing it to btf__fd(), and this function does not perform any
check before dereferencing its argument (as bpf_object__btf_fd() used to
do). As a consequence, we get segmentation fault errors in bpftool (for
example) when trying to load programs that come without BTF information.

v2: Keep btf__fd() in the fix instead of reverting to bpf_object__btf_fd().

Fixes: df7c3f7d3a3d ("libbpf: make uniform use of btf__fd() accessor inside libbpf")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240314150438.232462-1-qmo@kernel.org
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
2d042d22a7 libbpf: Recognize __arena global variables.
LLVM automatically places __arena variables into ".arena.1" ELF section.
In order to use such global variables bpf program must include definition
of arena map in ".maps" section, like:
struct {
       __uint(type, BPF_MAP_TYPE_ARENA);
       __uint(map_flags, BPF_F_MMAPABLE);
       __uint(max_entries, 1000);         /* number of pages */
       __ulong(map_extra, 2ull << 44);    /* start of mmap() region */
} arena SEC(".maps");

libbpf recognizes both uses of arena and creates single `struct bpf_map *`
instance in libbpf APIs.
".arena.1" ELF section data is used as initial data image, which is exposed
through skeleton and bpf_map__initial_value() to the user, if they need to tune
it before the load phase. During load phase, this initial image is copied over
into mmap()'ed region corresponding to arena, and discarded.

Few small checks here and there had to be added to make sure this
approach works with bpf_map__initial_value(), mostly due to hard-coded
assumption that map->mmaped is set up with mmap() syscall and should be
munmap()'ed. For arena, .arena.1 can be (much) smaller than maximum
arena size, so this smaller data size has to be tracked separately.
Given it is enforced that there is only one arena for entire bpf_object
instance, we just keep it in a separate field. This can be generalized
if necessary later.

All global variables from ".arena.1" section are accessible from user space
via skel->arena->name_of_var.

For bss/data/rodata the skeleton/libbpf perform the following sequence:
1. addr = mmap(MAP_ANONYMOUS)
2. user space optionally modifies global vars
3. map_fd = bpf_create_map()
4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel
5. mmap(addr, MAP_FIXED, map_fd)
after step 5 user spaces see the values it wrote at step 2 at the same addresses

arena doesn't support update_map_elem. Hence skeleton/libbpf do:
1. addr = malloc(sizeof SEC ".arena.1")
2. user space optionally modifies global vars
3. map_fd = bpf_create_map(MAP_TYPE_ARENA)
4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd)
5. memcpy(real_addr, addr) // this will fault-in and allocate pages

At the end look and feel of global data vs __arena global data is the same from
bpf prog pov.

Another complication is:
struct {
  __uint(type, BPF_MAP_TYPE_ARENA);
} arena SEC(".maps");

int __arena foo;
int bar;

  ptr1 = &foo;   // relocation against ".arena.1" section
  ptr2 = &arena; // relocation against ".maps" section
  ptr3 = &bar;   // relocation against ".bss" section

Fo the kernel ptr1 and ptr2 has point to the same arena's map_fd
while ptr3 points to a different global array's map_fd.
For the verifier:
ptr1->type == unknown_scalar
ptr2->type == const_ptr_to_map
ptr3->type == ptr_to_map_value

After verification, from JIT pov all 3 ptr-s are normal ld_imm64 insns.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-11-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
4524a45a2a libbpf: Add support for bpf_arena.
mmap() bpf_arena right after creation, since the kernel needs to
remember the address returned from mmap. This is user_vm_start.
LLVM will generate bpf_arena_cast_user() instructions where
necessary and JIT will add upper 32-bit of user_vm_start
to such pointers.

Fix up bpf_map_mmap_sz() to compute mmap size as
map->value_size * map->max_entries for arrays and
PAGE_SIZE * map->max_entries for arena.

Don't set BTF at arena creation time, since it doesn't support it.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-9-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
086825355f libbpf: Add __arg_arena to bpf_helpers.h
Add __arg_arena to bpf_helpers.h

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-8-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
6de941bc1e bpf: Disasm support for addr_space_cast instruction.
LLVM generates rX = addr_space_cast(rY, dst_addr_space, src_addr_space)
instruction when pointers in non-zero address space are used by the bpf
program. Recognize this insn in uapi and in bpf disassembler.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-3-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
1675c13fae bpf: Introduce bpf_arena.
Introduce bpf_arena, which is a sparse shared memory region between the bpf
program and user space.

Use cases:
1. User space mmap-s bpf_arena and uses it as a traditional mmap-ed
   anonymous region, like memcached or any key/value storage. The bpf
   program implements an in-kernel accelerator. XDP prog can search for
   a key in bpf_arena and return a value without going to user space.
2. The bpf program builds arbitrary data structures in bpf_arena (hash
   tables, rb-trees, sparse arrays), while user space consumes it.
3. bpf_arena is a "heap" of memory from the bpf program's point of view.
   The user space may mmap it, but bpf program will not convert pointers
   to user base at run-time to improve bpf program speed.

Initially, the kernel vm_area and user vma are not populated. User space
can fault in pages within the range. While servicing a page fault,
bpf_arena logic will insert a new page into the kernel and user vmas. The
bpf program can allocate pages from that region via
bpf_arena_alloc_pages(). This kernel function will insert pages into the
kernel vm_area. The subsequent fault-in from user space will populate that
page into the user vma. The BPF_F_SEGV_ON_FAULT flag at arena creation time
can be used to prevent fault-in from user space. In such a case, if a page
is not allocated by the bpf program and not present in the kernel vm_area,
the user process will segfault. This is useful for use cases 2 and 3 above.

bpf_arena_alloc_pages() is similar to user space mmap(). It allocates pages
either at a specific address within the arena or allocates a range with the
maple tree. bpf_arena_free_pages() is analogous to munmap(), which frees
pages and removes the range from the kernel vm_area and from user process
vmas.

bpf_arena can be used as a bpf program "heap" of up to 4GB. The speed of
bpf program is more important than ease of sharing with user space. This is
use case 3. In such a case, the BPF_F_NO_USER_CONV flag is recommended.
It will tell the verifier to treat the rX = bpf_arena_cast_user(rY)
instruction as a 32-bit move wX = wY, which will improve bpf prog
performance. Otherwise, bpf_arena_cast_user is translated by JIT to
conditionally add the upper 32 bits of user vm_start (if the pointer is not
NULL) to arena pointers before they are stored into memory. This way, user
space sees them as valid 64-bit pointers.

Diff https://github.com/llvm/llvm-project/pull/84410 enables LLVM BPF
backend generate the bpf_addr_space_cast() instruction to cast pointers
between address_space(1) which is reserved for bpf_arena pointers and
default address space zero. All arena pointers in a bpf program written in
C language are tagged as __attribute__((address_space(1))). Hence, clang
provides helpful diagnostics when pointers cross address space. Libbpf and
the kernel support only address_space == 1. All other address space
identifiers are reserved.

rX = bpf_addr_space_cast(rY, /* dst_as */ 1, /* src_as */ 0) tells the
verifier that rX->type = PTR_TO_ARENA. Any further operations on
PTR_TO_ARENA register have to be in the 32-bit domain. The verifier will
mark load/store through PTR_TO_ARENA with PROBE_MEM32. JIT will generate
them as kern_vm_start + 32bit_addr memory accesses. The behavior is similar
to copy_from_kernel_nofault() except that no address checks are necessary.
The address is guaranteed to be in the 4GB range. If the page is not
present, the destination register is zeroed on read, and the operation is
ignored on write.

rX = bpf_addr_space_cast(rY, 0, 1) tells the verifier that rX->type =
unknown scalar. If arena->map_flags has BPF_F_NO_USER_CONV set, then the
verifier converts such cast instructions to mov32. Otherwise, JIT will emit
native code equivalent to:
rX = (u32)rY;
if (rY)
  rX |= clear_lo32_bits(arena->user_vm_start); /* replace hi32 bits in rX */

After such conversion, the pointer becomes a valid user pointer within
bpf_arena range. The user process can access data structures created in
bpf_arena without any additional computations. For example, a linked list
built by a bpf program can be walked natively by user space.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Barret Rhoden <brho@google.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-2-alexei.starovoitov@gmail.com
2024-03-25 21:58:26 -07:00
Alexei Starovoitov
385d344839 libbpf: Allow specifying 64-bit integers in map BTF.
__uint() macro that is used to specify map attributes like:
  __uint(type, BPF_MAP_TYPE_ARRAY);
  __uint(map_flags, BPF_F_MMAPABLE);
It is limited to 32-bit, since BTF_KIND_ARRAY has u32 "number of elements"
field in "struct btf_array".

Introduce __ulong() macro that allows specifying values bigger than 32-bit.
In map definition "map_extra" is the only u64 field, so far.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20240307031228.42896-5-alexei.starovoitov@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-25 21:58:26 -07:00
Jakub Kicinski
d71c0ed2ef netdev: add queue stat for alloc failures
Rx alloc failures are commonly counted by drivers.
Support reporting those via netdev-genl queue stats.

Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306195509.1502746-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25 21:58:26 -07:00
Jakub Kicinski
5e80833e50 netdev: add per-queue statistics
The ethtool-nl family does a good job exposing various protocol
related and IEEE/IETF statistics which used to get dumped under
ethtool -S, with creative names. Queue stats don't have a netlink
API, yet, and remain a lion's share of ethtool -S output for new
drivers. Not only is that bad because the names differ driver to
driver but it's also bug-prone. Intuitively drivers try to report
only the stats for active queues, but querying ethtool stats
involves multiple system calls, and the number of stats is
read separately from the stats themselves. Worse still when user
space asks for values of the stats, it doesn't inform the kernel
how big the buffer is. If number of stats increases in the meantime
kernel will overflow user buffer.

Add a netlink API for dumping queue stats. Queue information is
exposed via the netdev-genl family, so add the stats there.
Support per-queue and sum-for-device dumps. Latter will be useful
when subsequent patches add more interesting common stats than
just bytes and packets.

The API does not currently distinguish between HW and SW stats.
The expectation is that the source of the stats will either not
matter much (good packets) or be obvious (skb alloc errors).

Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306195509.1502746-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25 21:58:26 -07:00
Andrii Nakryiko
2778cbce60 ci: add xdp_bonding fixes from bpf/master
bpf tree has fixes for xdp_bonding selftests which are not yet in
bpf-next, so add them as temporary CI-only patches.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-06 15:22:13 -08:00
Andrii Nakryiko
4f875865b7 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   2ab256e93249f5ac1da665861aa0f03fb4208d9c
Checkpoint bpf-next commit: 7d763bc4a44a51e48dde406d6c6c8a26a60ec647
Baseline bpf commit:        dced881ead78e4d6add3735d02a9186ba2415630
Checkpoint bpf commit:      2487007aa3b9fafbd2cb14068f49791ce1d7ede5

Aahil Awatramani (1):
  bonding: Add independent control state machine

Alexei Starovoitov (1):
  bpf: Introduce may_goto instruction

Chen Shen (1):
  libbpf: Correct debug message in btf__load_vmlinux_btf

Eduard Zingerman (7):
  libbpf: Allow version suffixes (___smth) for struct_ops types
  libbpf: Tie struct_ops programs to kernel BTF ids, not to local ids
  libbpf: Honor autocreate flag for struct_ops maps
  libbpf: Sync progs autoload with maps autocreate for struct_ops maps
  libbpf: Replace elf_state->st_ops_* fields with SEC_ST_OPS sec_type
  libbpf: Struct_ops in SEC("?.struct_ops") / SEC("?.struct_ops.link")
  libbpf: Rewrite btf datasec names starting from '?'

Kees Cook (1):
  bpf: Replace bpf_lpm_trie_key 0-length array with flexible array

Kui-Feng Lee (2):
  libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
  libbpf: Convert st_ops->data to shadow type.

 include/uapi/linux/bpf.h     |  24 +++-
 include/uapi/linux/if_link.h |   1 +
 src/btf.c                    |   2 +-
 src/features.c               |  22 +++
 src/libbpf.c                 | 252 +++++++++++++++++++++++++++--------
 src/libbpf_internal.h        |   2 +
 6 files changed, 242 insertions(+), 61 deletions(-)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-06 15:22:13 -08:00
Eduard Zingerman
438adf417d libbpf: Rewrite btf datasec names starting from '?'
Optional struct_ops maps are defined using question mark at the start
of the section name, e.g.:

    SEC("?.struct_ops")
    struct test_ops optional_map = { ... };

This commit teaches libbpf to detect if kernel allows '?' prefix
in datasec names, and if it doesn't then to rewrite such names
by replacing '?' with '_', e.g.:

    DATASEC ?.struct_ops -> DATASEC _.struct_ops

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-13-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
fc8b86bda2 libbpf: Struct_ops in SEC("?.struct_ops") / SEC("?.struct_ops.link")
Allow using two new section names for struct_ops maps:
- SEC("?.struct_ops")
- SEC("?.struct_ops.link")

To specify maps that have bpf_map->autocreate == false after open.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-12-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
d5d0b6e920 libbpf: Replace elf_state->st_ops_* fields with SEC_ST_OPS sec_type
The next patch would add two new section names for struct_ops maps.
To make working with multiple struct_ops sections more convenient:
- remove fields like elf_state->st_ops_{shndx,link_shndx};
- mark section descriptions hosting struct_ops as
  elf_sec_desc->sec_type == SEC_ST_OPS;

After these changes struct_ops sections could be processed uniformly
by iterating bpf_object->efile.secs entries.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-11-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
060c604db8 libbpf: Sync progs autoload with maps autocreate for struct_ops maps
Automatically select which struct_ops programs to load depending on
which struct_ops maps are selected for automatic creation.
E.g. for the BPF code below:

    SEC("struct_ops/test_1") int BPF_PROG(foo) { ... }
    SEC("struct_ops/test_2") int BPF_PROG(bar) { ... }

    SEC(".struct_ops.link")
    struct test_ops___v1 A = {
        .foo = (void *)foo
    };

    SEC(".struct_ops.link")
    struct test_ops___v2 B = {
        .foo = (void *)foo,
        .bar = (void *)bar,
    };

And the following libbpf API calls:

    bpf_map__set_autocreate(skel->maps.A, true);
    bpf_map__set_autocreate(skel->maps.B, false);

The autoload would be enabled for program 'foo' and disabled for
program 'bar'.

During load, for each struct_ops program P, referenced from some
struct_ops map M:
- set P.autoload = true if M.autocreate is true for some M;
- set P.autoload = false if M.autocreate is false for all M;
- don't change P.autoload, if P is not referenced from any map.

Do this after bpf_object__init_kern_struct_ops_maps()
to make sure that shadow vars assignment is done.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-9-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
cb426140d0 libbpf: Honor autocreate flag for struct_ops maps
Skip load steps for struct_ops maps not marked for automatic creation.
This should allow to load bpf object in situations like below:

    SEC("struct_ops/foo") int BPF_PROG(foo) { ... }
    SEC("struct_ops/bar") int BPF_PROG(bar) { ... }

    struct test_ops___v1 {
    	int (*foo)(void);
    };

    struct test_ops___v2 {
    	int (*foo)(void);
    	int (*does_not_exist)(void);
    };

    SEC(".struct_ops.link")
    struct test_ops___v1 map_for_old = {
    	.test_1 = (void *)foo
    };

    SEC(".struct_ops.link")
    struct test_ops___v2 map_for_new = {
    	.test_1 = (void *)foo,
    	.does_not_exist = (void *)bar
    };

Suppose program is loaded on old kernel that does not have definition
for 'does_not_exist' struct_ops member. After this commit it would be
possible to load such object file after the following tweaks:

    bpf_program__set_autoload(skel->progs.bar, false);
    bpf_map__set_autocreate(skel->maps.map_for_new, false);

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20240306104529.6453-4-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
aded62b120 libbpf: Tie struct_ops programs to kernel BTF ids, not to local ids
Enforce the following existing limitation on struct_ops programs based
on kernel BTF id instead of program-local BTF id:

    struct_ops BPF prog can be re-used between multiple .struct_ops &
    .struct_ops.link as long as it's the same struct_ops struct
    definition and the same function pointer field

This allows reusing same BPF program for versioned struct_ops map
definitions, e.g.:

    SEC("struct_ops/test")
    int BPF_PROG(foo) { ... }

    struct some_ops___v1 { int (*test)(void); };
    struct some_ops___v2 { int (*test)(void); };

    SEC(".struct_ops.link") struct some_ops___v1 a = { .test = foo }
    SEC(".struct_ops.link") struct some_ops___v2 b = { .test = foo }

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-3-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Eduard Zingerman
f7fd5dbc07 libbpf: Allow version suffixes (___smth) for struct_ops types
E.g. allow the following struct_ops definitions:

    struct bpf_testmod_ops___v1 { int (*test)(void); };
    struct bpf_testmod_ops___v2 { int (*test)(void); };

    SEC(".struct_ops.link")
    struct bpf_testmod_ops___v1 a = { .test = ... }
    SEC(".struct_ops.link")
    struct bpf_testmod_ops___v2 b = { .test = ... }

Where both bpf_testmod_ops__v1 and bpf_testmod_ops__v2 would be
resolved as 'struct bpf_testmod_ops' from kernel BTF.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-2-eddyz87@gmail.com
2024-03-06 13:58:27 -08:00
Alexei Starovoitov
00b08dceea bpf: Introduce may_goto instruction
Introduce may_goto instruction that from the verifier pov is similar to
open coded iterators bpf_for()/bpf_repeat() and bpf_loop() helper, but it
doesn't iterate any objects.
In assembly 'may_goto' is a nop most of the time until bpf runtime has to
terminate the program for whatever reason. In the current implementation
may_goto has a hidden counter, but other mechanisms can be used.
For programs written in C the later patch introduces 'cond_break' macro
that combines 'may_goto' with 'break' statement and has similar semantics:
cond_break is a nop until bpf runtime has to break out of this loop.
It can be used in any normal "for" or "while" loop, like

  for (i = zero; i < cnt; cond_break, i++) {

The verifier recognizes that may_goto is used in the program, reserves
additional 8 bytes of stack, initializes them in subprog prologue, and
replaces may_goto instruction with:
aux_reg = *(u64 *)(fp - 40)
if aux_reg == 0 goto pc+off
aux_reg -= 1
*(u64 *)(fp - 40) = aux_reg

may_goto instruction can be used by LLVM to implement __builtin_memcpy,
__builtin_strcmp.

may_goto is not a full substitute for bpf_for() macro.
bpf_for() doesn't have induction variable that verifiers sees,
so 'i' in bpf_for(i, 0, 100) is seen as imprecise and bounded.

But when the code is written as:
for (i = 0; i < 100; cond_break, i++)
the verifier see 'i' as precise constant zero,
hence cond_break (aka may_goto) doesn't help to converge the loop.
A static or global variable can be used as a workaround:
static int zero = 0;
for (i = zero; i < 100; cond_break, i++) // works!

may_goto works well with arena pointers that don't need to be bounds
checked on access. Load/store from arena returns imprecise unbounded
scalar and loops with may_goto pass the verifier.

Reserve new opcode BPF_JMP | BPF_JCOND for may_goto insn.
JCOND stands for conditional pseudo jump.
Since goto_or_nop insn was proposed, it may use the same opcode.
may_goto vs goto_or_nop can be distinguished by src_reg:
code = BPF_JMP | BPF_JCOND
src_reg = 0 - may_goto
src_reg = 1 - goto_or_nop

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240306031929.42666-2-alexei.starovoitov@gmail.com
2024-03-06 13:58:27 -08:00
Chen Shen
bf52494e2b libbpf: Correct debug message in btf__load_vmlinux_btf
In the function btf__load_vmlinux_btf, the debug message incorrectly
refers to 'path' instead of 'sysfs_btf_path'.

Signed-off-by: Chen Shen <peterchenshen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240302062218.3587-1-peterchenshen@gmail.com
2024-03-06 13:58:27 -08:00
Kui-Feng Lee
acfaeffeaa libbpf: Convert st_ops->data to shadow type.
Convert st_ops->data to the shadow type of the struct_ops map. The shadow
type of a struct_ops type is a variant of the original struct type
providing a way to access/change the values in the maps of the struct_ops
type.

bpf_map__initial_value() will return st_ops->data for struct_ops types. The
skeleton is going to use it as the pointer to the shadow type of the
original struct type.

One of the main differences between the original struct type and the shadow
type is that all function pointers of the shadow type are converted to
pointers of struct bpf_program. Users can replace these bpf_program
pointers with other BPF programs. The st_ops->progs[] will be updated
before updating the value of a map to reflect the changes made by users.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240229064523.2091270-3-thinker.li@gmail.com
2024-03-06 13:58:27 -08:00