Compare commits

..

197 Commits
v0.2 ... v0.4.0

Author SHA1 Message Date
Yonghong Song
db9614b6bd libbpf: Add support for new llvm bpf relocations
LLVM patch https://reviews.llvm.org/D102712
narrowed the scope of existing R_BPF_64_64
and R_BPF_64_32 relocations, and added three
new relocations, R_BPF_64_ABS64, R_BPF_64_ABS32
and R_BPF_64_NODYLD32. The main motivation is
to make relocations linker friendly.

This change, unfortunately, breaks libbpf build,
and we will see errors like below:
  libbpf: ELF relo #0 in section #6 has unexpected type 2 in
     /home/yhs/work/bpf-next/tools/testing/selftests/bpf/bpf_tcp_nogpl.o
  Error: failed to link
     '/home/yhs/work/bpf-next/tools/testing/selftests/bpf/bpf_tcp_nogpl.o':
     Unknown error -22 (-22)
The new relocation R_BPF_64_ABS64 is generated
and libbpf linker sanity check doesn't understand it.
Relocation section '.rel.struct_ops' at offset 0x1410 contains 1 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name
0000000000000018  0000000700000002 R_BPF_64_ABS64         0000000000000000 nogpltcp_init

Look at the selftests/bpf/bpf_tcp_nogpl.c,
  void BPF_STRUCT_OPS(nogpltcp_init, struct sock *sk)
  {
  }

  SEC(".struct_ops")
  struct tcp_congestion_ops bpf_nogpltcp = {
          .init           = (void *)nogpltcp_init,
          .name           = "bpf_nogpltcp",
  };
The new llvm relocation scheme categorizes 'nogpltcp_init' reference
as R_BPF_64_ABS64 instead of R_BPF_64_64 which is used to specify
ld_imm64 relocation in the new scheme.

Let us fix the linker sanity checking by including
R_BPF_64_ABS64 and R_BPF_64_ABS32. There is no need to
check R_BPF_64_NODYLD32 which is used for .BTF and .BTF.ext.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210522162341.3687617-1-yhs@fb.com
2021-05-24 21:24:56 -07:00
Andrii Nakryiko
57375504c6 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   c1cccec9c63637c4c5ee0aa2da2850d983c19e88
Checkpoint bpf-next commit: f18ba26da88a89db9b50cb4ff47fadb159f2810b
Baseline bpf commit:        c9a7c013569d73881199a7a3011c03336f592cc8
Checkpoint bpf commit:      d0c0fe10ce6d87734b65c18dc8f4bcae3f4dbea4

Andrii Nakryiko (1):
  libbpf: Reject static entry-point BPF programs

Kumar Kartikeya Dwivedi (2):
  libbpf: Add various netlink helpers
  libbpf: Add low level TC-BPF management API

 src/libbpf.c   |   5 +
 src/libbpf.h   |  44 ++++
 src/libbpf.map |   5 +
 src/netlink.c  | 568 +++++++++++++++++++++++++++++++++++++++++--------
 src/nlattr.h   |  48 +++++
 5 files changed, 587 insertions(+), 83 deletions(-)

--
2.30.2
2021-05-17 17:12:06 -07:00
Kumar Kartikeya Dwivedi
d71ff87a2d libbpf: Add low level TC-BPF management API
This adds functions that wrap the netlink API used for adding, manipulating,
and removing traffic control filters.

The API summary:

A bpf_tc_hook represents a location where a TC-BPF filter can be attached.
This means that creating a hook leads to creation of the backing qdisc,
while destruction either removes all filters attached to a hook, or destroys
qdisc if requested explicitly (as discussed below).

The TC-BPF API functions operate on this bpf_tc_hook to attach, replace,
query, and detach tc filters. All functions return 0 on success, and a
negative error code on failure.

bpf_tc_hook_create - Create a hook
Parameters:
	@hook - Cannot be NULL, ifindex > 0, attach_point must be set to
		proper enum constant. Note that parent must be unset when
		attach_point is one of BPF_TC_INGRESS or BPF_TC_EGRESS. Note
		that as an exception BPF_TC_INGRESS|BPF_TC_EGRESS is also a
		valid value for attach_point.

		Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.

bpf_tc_hook_destroy - Destroy a hook
Parameters:
	@hook - Cannot be NULL. The behaviour depends on value of
		attach_point. If BPF_TC_INGRESS, all filters attached to
		the ingress hook will be detached. If BPF_TC_EGRESS, all
		filters attached to the egress hook will be detached. If
		BPF_TC_INGRESS|BPF_TC_EGRESS, the clsact qdisc will be
		deleted, also detaching all filters. As before, parent must
		be unset for these attach_points, and set for BPF_TC_CUSTOM.

		It is advised that if the qdisc is operated on by many programs,
		then the program at least check that there are no other existing
		filters before deleting the clsact qdisc. An example is shown
		below:

		DECLARE_LIBBPF_OPTS(bpf_tc_hook, .ifindex = if_nametoindex("lo"),
				    .attach_point = BPF_TC_INGRESS);
		/* set opts as NULL, as we're not really interested in
		 * getting any info for a particular filter, but just
	 	 * detecting its presence.
		 */
		r = bpf_tc_query(&hook, NULL);
		if (r == -ENOENT) {
			/* no filters */
			hook.attach_point = BPF_TC_INGRESS|BPF_TC_EGREESS;
			return bpf_tc_hook_destroy(&hook);
		} else {
			/* failed or r == 0, the latter means filters do exist */
			return r;
		}

		Note that there is a small race between checking for no
		filters and deleting the qdisc. This is currently unavoidable.

		Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.

bpf_tc_attach - Attach a filter to a hook
Parameters:
	@hook - Cannot be NULL. Represents the hook the filter will be
		attached to. Requirements for ifindex and attach_point are
		same as described in bpf_tc_hook_create, but BPF_TC_CUSTOM
		is also supported.  In that case, parent must be set to the
		handle where the filter will be attached (using BPF_TC_PARENT).
		E.g. to set parent to 1:16 like in tc command line, the
		equivalent would be BPF_TC_PARENT(1, 16).

	@opts - Cannot be NULL. The following opts are optional:
		* handle   - The handle of the filter
		* priority - The priority of the filter
			     Must be >= 0 and <= UINT16_MAX
		Note that when left unset, they will be auto-allocated by
		the kernel. The following opts must be set:
		* prog_fd - The fd of the loaded SCHED_CLS prog
		The following opts must be unset:
		* prog_id - The ID of the BPF prog
		The following opts are optional:
		* flags - Currently only BPF_TC_F_REPLACE is allowed. It
			  allows replacing an existing filter instead of
			  failing with -EEXIST.
		The following opts will be filled by bpf_tc_attach on a
		successful attach operation if they are unset:
		* handle   - The handle of the attached filter
		* priority - The priority of the attached filter
		* prog_id  - The ID of the attached SCHED_CLS prog
		This way, the user can know what the auto allocated values
		for optional opts like handle and priority are for the newly
		attached filter, if they were unset.

		Note that some other attributes are set to fixed default
		values listed below (this holds for all bpf_tc_* APIs):
		protocol as ETH_P_ALL, direct action mode, chain index of 0,
		and class ID of 0 (this can be set by writing to the
		skb->tc_classid field from the BPF program).

bpf_tc_detach
Parameters:
	@hook - Cannot be NULL. Represents the hook the filter will be
		detached from. Requirements are same as described above
		in bpf_tc_attach.

	@opts - Cannot be NULL. The following opts must be set:
		* handle, priority
		The following opts must be unset:
		* prog_fd, prog_id, flags

bpf_tc_query
Parameters:
	@hook - Cannot be NULL. Represents the hook where the filter lookup will
		be performed. Requirements are same as described above in
		bpf_tc_attach().

	@opts - Cannot be NULL. The following opts must be set:
		* handle, priority
		The following opts must be unset:
		* prog_fd, prog_id, flags
		The following fields will be filled by bpf_tc_query upon a
		successful lookup:
		* prog_id

Some usage examples (using BPF skeleton infrastructure):

BPF program (test_tc_bpf.c):

	#include <linux/bpf.h>
	#include <bpf/bpf_helpers.h>

	SEC("classifier")
	int cls(struct __sk_buff *skb)
	{
		return 0;
	}

Userspace loader:

	struct test_tc_bpf *skel = NULL;
	int fd, r;

	skel = test_tc_bpf__open_and_load();
	if (!skel)
		return -ENOMEM;

	fd = bpf_program__fd(skel->progs.cls);

	DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex =
			    if_nametoindex("lo"), .attach_point =
			    BPF_TC_INGRESS);
	/* Create clsact qdisc */
	r = bpf_tc_hook_create(&hook);
	if (r < 0)
		goto end;

	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);
	r = bpf_tc_attach(&hook, &opts);
	if (r < 0)
		goto end;
	/* Print the auto allocated handle and priority */
	printf("Handle=%u", opts.handle);
	printf("Priority=%u", opts.priority);

	opts.prog_fd = opts.prog_id = 0;
	bpf_tc_detach(&hook, &opts);
end:
	test_tc_bpf__destroy(skel);

This is equivalent to doing the following using tc command line:
  # tc qdisc add dev lo clsact
  # tc filter add dev lo ingress bpf obj foo.o sec classifier da
  # tc filter del dev lo ingress handle <h> prio <p> bpf
... where the handle and priority can be found using:
  # tc filter show dev lo ingress

Another example replacing a filter (extending prior example):

	/* We can also choose both (or one), let's try replacing an
	 * existing filter.
	 */
	DECLARE_LIBBPF_OPTS(bpf_tc_opts, replace_opts, .handle =
			    opts.handle, .priority = opts.priority,
			    .prog_fd = fd);
	r = bpf_tc_attach(&hook, &replace_opts);
	if (r == -EEXIST) {
		/* Expected, now use BPF_TC_F_REPLACE to replace it */
		replace_opts.flags = BPF_TC_F_REPLACE;
		return bpf_tc_attach(&hook, &replace_opts);
	} else if (r < 0) {
		return r;
	}
	/* There must be no existing filter with these
	 * attributes, so cleanup and return an error.
	 */
	replace_opts.prog_fd = replace_opts.prog_id = 0;
	bpf_tc_detach(&hook, &replace_opts);
	return -1;

To obtain info of a particular filter:

	/* Find info for filter with handle 1 and priority 50 */
	DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .handle = 1,
			    .priority = 50);
	r = bpf_tc_query(&hook, &info_opts);
	if (r == -ENOENT)
		printf("Filter not found");
	else if (r < 0)
		return r;
	printf("Prog ID: %u", info_opts.prog_id);
	return 0;

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> # libbpf API design
[ Daniel: also did major patch cleanup ]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210512103451.989420-3-memxor@gmail.com
2021-05-17 17:12:06 -07:00
Kumar Kartikeya Dwivedi
01515b8f05 libbpf: Add various netlink helpers
This change introduces a few helpers to wrap open coded attribute
preparation in netlink.c. It also adds a libbpf_netlink_send_recv() that
is useful to wrap send + recv handling in a generic way. Subsequent patch
will also use this function for sending and receiving a netlink response.
The libbpf_nl_get_link() helper has been removed instead, moving socket
creation into the newly named libbpf_netlink_send_recv().

Every nested attribute's closure must happen using the helper
nlattr_end_nested(), which sets its length properly. NLA_F_NESTED is
enforced using nlattr_begin_nested() helper. Other simple attributes
can be added directly.

The maxsz parameter corresponds to the size of the request structure
which is being filled in, so for instance with req being:

  struct {
	struct nlmsghdr nh;
	struct tcmsg t;
	char buf[4096];
  } req;

Then, maxsz should be sizeof(req).

This change also converts the open coded attribute preparation with these
helpers. Note that the only failure the internal call to nlattr_add()
could result in the nested helper would be -EMSGSIZE, hence that is what
we return to our caller.

The libbpf_netlink_send_recv() call takes care of opening the socket,
sending the netlink message, receiving the response, potentially invoking
callbacks, and return errors if any, and then finally close the socket.
This allows users to avoid identical socket setup code in different places.
The only user of libbpf_nl_get_link() has been converted to make use of it.
__bpf_set_link_xdp_fd_replace() has also been refactored to use it.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
[ Daniel: major patch cleanup ]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210512103451.989420-2-memxor@gmail.com
2021-05-17 17:12:06 -07:00
Andrii Nakryiko
6028cec50c libbpf: Reject static entry-point BPF programs
Detect use of static entry-point BPF programs (those with SEC() markings) and
emit error message. This is similar to
c1cccec9c636 ("libbpf: Reject static maps") but for BPF programs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210514195534.1440970-1-andrii@kernel.org
2021-05-17 17:12:06 -07:00
Andrii Nakryiko
68695d0173 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   9d31d2338950293ec19d9b095fbaa9030899dcb4
Checkpoint bpf-next commit: c1cccec9c63637c4c5ee0aa2da2850d983c19e88
Baseline bpf commit:        9683e5775c75097c46bd24e65411b16ac6c6cbb3
Checkpoint bpf commit:      c9a7c013569d73881199a7a3011c03336f592cc8

Andrii Nakryiko (4):
  libbpf: Add per-file linker opts
  libbpf: Fix ELF symbol visibility update logic
  libbpf: Treat STV_INTERNAL same as STV_HIDDEN for functions
  libbpf: Reject static maps

Arnaldo Carvalho de Melo (1):
  libbpf: Provide GELF_ST_VISIBILITY() define for older libelf

 src/libbpf.c          | 35 +++++++++++++++++++++++++----------
 src/libbpf.h          | 10 +++++++++-
 src/libbpf_internal.h |  5 +++++
 src/linker.c          | 18 +++++++++++++-----
 4 files changed, 52 insertions(+), 16 deletions(-)

--
2.30.2
2021-05-17 14:36:22 -07:00
Arnaldo Carvalho de Melo
72cdd6ed42 libbpf: Provide GELF_ST_VISIBILITY() define for older libelf
Where that macro isn't available.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/YJaspEh0qZr4LYOc@kernel.org
2021-05-17 14:36:22 -07:00
Andrii Nakryiko
b5bfbab488 libbpf: Reject static maps
Static maps never really worked with libbpf, because all such maps were always
silently resolved to the very first map. Detect static maps (both legacy and
BTF-defined) and report user-friendly error.

Tested locally by switching few maps (legacy and BTF-defined) in selftests to
static ones and verifying that now libbpf rejects them loudly.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210513233643.194711-2-andrii@kernel.org
2021-05-17 14:36:22 -07:00
Andrii Nakryiko
1cf1c245d1 libbpf: Treat STV_INTERNAL same as STV_HIDDEN for functions
Do the same global -> static BTF update for global functions with STV_INTERNAL
visibility to turn on static BPF verification mode.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-7-andrii@kernel.org
2021-05-17 14:36:22 -07:00
Andrii Nakryiko
076dd5dadb libbpf: Fix ELF symbol visibility update logic
Fix silly bug in updating ELF symbol's visibility.

Fixes: a46349227cd8 ("libbpf: Add linker extern resolution support for functions and global variables")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-6-andrii@kernel.org
2021-05-17 14:36:22 -07:00
Andrii Nakryiko
6f585ab88f libbpf: Add per-file linker opts
For better future extensibility add per-file linker options. Currently
the set of available options is empty. This changes bpf_linker__add_file()
API, but it's not a breaking change as bpf_linker APIs hasn't been released
yet.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-3-andrii@kernel.org
2021-05-17 14:36:22 -07:00
Andrii Nakryiko
c5389a965b sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   87bd9e602e39585c5280556a2b6a6363bb334257
Checkpoint bpf-next commit: 9d31d2338950293ec19d9b095fbaa9030899dcb4
Baseline bpf commit:        b02265429681c9c827c45978a61a9f00be5ea9aa
Checkpoint bpf commit:      9683e5775c75097c46bd24e65411b16ac6c6cbb3

Andrii Nakryiko (2):
  libbpf: Support BTF_KIND_FLOAT during type compatibility checks in
    CO-RE
  selftests/bpf: Fix BPF_CORE_READ_BITFIELD() macro

Brendan Jackman (1):
  libbpf: Fix signed overflow in ringbuf_process_ring

Ian Rogers (1):
  libbpf: Add NULL check to add_dummy_ksym_var

 src/bpf_core_read.h | 16 ++++++++++++----
 src/libbpf.c        |  9 +++++++--
 src/ringbuf.c       | 30 +++++++++++++++++++++---------
 3 files changed, 40 insertions(+), 15 deletions(-)

--
2.30.2
2021-05-05 16:39:05 -07:00
Ian Rogers
a58b8ca93e libbpf: Add NULL check to add_dummy_ksym_var
Avoids a segv if btf isn't present. Seen on the call path
__bpf_object__open calling bpf_object__collect_externs.

Fixes: 5bd022ec01f0 (libbpf: Support extern kernel function)
Suggested-by: Stanislav Fomichev <sdf@google.com>
Suggested-by: Petar Penkov <ppenkov@google.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210504234910.976501-1-irogers@google.com
2021-05-05 16:39:05 -07:00
Brendan Jackman
1691c37b39 libbpf: Fix signed overflow in ringbuf_process_ring
One of our benchmarks running in (Google-internal) CI pushes data
through the ringbuf faster htan than userspace is able to consume
it. In this case it seems we're actually able to get >INT_MAX entries
in a single ring_buffer__consume() call. ASAN detected that cnt
overflows in this case.

Fix by using 64-bit counter internally and then capping the result to
INT_MAX before converting to the int return type. Do the same for
the ring_buffer__poll().

Fixes: bf99c936f947 (libbpf: Add BPF ring buffer support)
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210429130510.1621665-1-jackmanb@google.com
2021-05-05 16:39:05 -07:00
Andrii Nakryiko
242842b34c selftests/bpf: Fix BPF_CORE_READ_BITFIELD() macro
Fix BPF_CORE_READ_BITFIELD() macro used for reading CO-RE-relocatable
bitfields. Missing breaks in a switch caused 8-byte reads always. This can
confuse libbpf because it does strict checks that memory load size corresponds
to the original size of the field, which in this case quite often would be
wrong.

After fixing that, we run into another problem, which quite subtle, so worth
documenting here. The issue is in Clang optimization and CO-RE relocation
interactions. Without that asm volatile construct (also known as
barrier_var()), Clang will re-order BYTE_OFFSET and BYTE_SIZE relocations and
will apply BYTE_OFFSET 4 times for each switch case arm. This will result in
the same error from libbpf about mismatch of memory load size and original
field size. I.e., if we were reading u32, we'd still have *(u8 *), *(u16 *),
*(u32 *), and *(u64 *) memory loads, three of which will fail. Using
barrier_var() forces Clang to apply BYTE_OFFSET relocation first (and once) to
calculate p, after which value of p is used without relocation in each of
switch case arms, doing appropiately-sized memory load.

Here's the list of relevant relocations and pieces of generated BPF code
before and after this patch for test_core_reloc_bitfields_direct selftests.

BEFORE
=====
 #45: core_reloc: insn #160 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32
 #46: core_reloc: insn #167 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
 #47: core_reloc: insn #174 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
 #48: core_reloc: insn #178 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
 #49: core_reloc: insn #182 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32

     157:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll
     159:       7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1
     160:       b7 02 00 00 04 00 00 00 r2 = 4
; BYTE_SIZE relocation here                 ^^^
     161:       66 02 07 00 03 00 00 00 if w2 s> 3 goto +7 <LBB0_63>
     162:       16 02 0d 00 01 00 00 00 if w2 == 1 goto +13 <LBB0_65>
     163:       16 02 01 00 02 00 00 00 if w2 == 2 goto +1 <LBB0_66>
     164:       05 00 12 00 00 00 00 00 goto +18 <LBB0_69>

0000000000000528 <LBB0_66>:
     165:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
     167:       69 11 08 00 00 00 00 00 r1 = *(u16 *)(r1 + 8)
; BYTE_OFFSET relo here w/ WRONG size        ^^^^^^^^^^^^^^^^
     168:       05 00 0e 00 00 00 00 00 goto +14 <LBB0_69>

0000000000000548 <LBB0_63>:
     169:       16 02 0a 00 04 00 00 00 if w2 == 4 goto +10 <LBB0_67>
     170:       16 02 01 00 08 00 00 00 if w2 == 8 goto +1 <LBB0_68>
     171:       05 00 0b 00 00 00 00 00 goto +11 <LBB0_69>

0000000000000560 <LBB0_68>:
     172:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
     174:       79 11 08 00 00 00 00 00 r1 = *(u64 *)(r1 + 8)
; BYTE_OFFSET relo here w/ WRONG size        ^^^^^^^^^^^^^^^^
     175:       05 00 07 00 00 00 00 00 goto +7 <LBB0_69>

0000000000000580 <LBB0_65>:
     176:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
     178:       71 11 08 00 00 00 00 00 r1 = *(u8 *)(r1 + 8)
; BYTE_OFFSET relo here w/ WRONG size        ^^^^^^^^^^^^^^^^
     179:       05 00 03 00 00 00 00 00 goto +3 <LBB0_69>

00000000000005a0 <LBB0_67>:
     180:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
     182:       61 11 08 00 00 00 00 00 r1 = *(u32 *)(r1 + 8)
; BYTE_OFFSET relo here w/ RIGHT size        ^^^^^^^^^^^^^^^^

00000000000005b8 <LBB0_69>:
     183:       67 01 00 00 20 00 00 00 r1 <<= 32
     184:       b7 02 00 00 00 00 00 00 r2 = 0
     185:       16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71>
     186:       c7 01 00 00 20 00 00 00 r1 s>>= 32
     187:       05 00 01 00 00 00 00 00 goto +1 <LBB0_72>

00000000000005e0 <LBB0_71>:
     188:       77 01 00 00 20 00 00 00 r1 >>= 32

AFTER
=====

 #30: core_reloc: insn #132 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
 #31: core_reloc: insn #134 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32

     129:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll
     131:       7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1
     132:       b7 01 00 00 08 00 00 00 r1 = 8
; BYTE_OFFSET relo here                     ^^^
; no size check for non-memory dereferencing instructions
     133:       0f 12 00 00 00 00 00 00 r2 += r1
     134:       b7 03 00 00 04 00 00 00 r3 = 4
; BYTE_SIZE relocation here                 ^^^
     135:       66 03 05 00 03 00 00 00 if w3 s> 3 goto +5 <LBB0_63>
     136:       16 03 09 00 01 00 00 00 if w3 == 1 goto +9 <LBB0_65>
     137:       16 03 01 00 02 00 00 00 if w3 == 2 goto +1 <LBB0_66>
     138:       05 00 0a 00 00 00 00 00 goto +10 <LBB0_69>

0000000000000458 <LBB0_66>:
     139:       69 21 00 00 00 00 00 00 r1 = *(u16 *)(r2 + 0)
; NO CO-RE relocation here                   ^^^^^^^^^^^^^^^^
     140:       05 00 08 00 00 00 00 00 goto +8 <LBB0_69>

0000000000000468 <LBB0_63>:
     141:       16 03 06 00 04 00 00 00 if w3 == 4 goto +6 <LBB0_67>
     142:       16 03 01 00 08 00 00 00 if w3 == 8 goto +1 <LBB0_68>
     143:       05 00 05 00 00 00 00 00 goto +5 <LBB0_69>

0000000000000480 <LBB0_68>:
     144:       79 21 00 00 00 00 00 00 r1 = *(u64 *)(r2 + 0)
; NO CO-RE relocation here                   ^^^^^^^^^^^^^^^^
     145:       05 00 03 00 00 00 00 00 goto +3 <LBB0_69>

0000000000000490 <LBB0_65>:
     146:       71 21 00 00 00 00 00 00 r1 = *(u8 *)(r2 + 0)
; NO CO-RE relocation here                   ^^^^^^^^^^^^^^^^
     147:       05 00 01 00 00 00 00 00 goto +1 <LBB0_69>

00000000000004a0 <LBB0_67>:
     148:       61 21 00 00 00 00 00 00 r1 = *(u32 *)(r2 + 0)
; NO CO-RE relocation here                   ^^^^^^^^^^^^^^^^

00000000000004a8 <LBB0_69>:
     149:       67 01 00 00 20 00 00 00 r1 <<= 32
     150:       b7 02 00 00 00 00 00 00 r2 = 0
     151:       16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71>
     152:       c7 01 00 00 20 00 00 00 r1 s>>= 32
     153:       05 00 01 00 00 00 00 00 goto +1 <LBB0_72>

00000000000004d0 <LBB0_71>:
     154:       77 01 00 00 20 00 00 00 r1 >>= 323

Fixes: ee26dade0e3b ("libbpf: Add support for relocatable bitfields")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210426192949.416837-4-andrii@kernel.org
2021-05-05 16:39:05 -07:00
Andrii Nakryiko
6b14cfa56e libbpf: Support BTF_KIND_FLOAT during type compatibility checks in CO-RE
Add BTF_KIND_FLOAT support when doing CO-RE field type compatibility check.
Without this, relocations against float/double fields will fail.

Also adjust one error message to emit instruction index instead of less
convenient instruction byte offset.

Fixes: 22541a9eeb0d ("libbpf: Add BTF_KIND_FLOAT support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210426192949.416837-3-andrii@kernel.org
2021-05-05 16:39:05 -07:00
Andrii Nakryiko
b8b1faa3d4 travis: fix libelf-dev build-dep issues by using aptitude instead of apt-get
Use aptitude to actually see what's wrong with the dependencies. And it
actually magically resolves whatever minor version conflicts there are.

The big surprise came from the apparent difference in build-dep command
behavior. Aptitude's build-dep doesn't seem to install the libpfelf-dev
package itself. Adding explicit `aptitude install libelf-dev` after build-dep
solves the issue for now.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-05-05 13:57:07 -07:00
Andrii Nakryiko
9e123fa5d2 vmtests: fix libc6 dependency and remove explicit libelf-dev install
Force libc6 dependency version.

Drop explicit libelf-dev install command, as it should be pre-installed by
Travis CI already, according to .travis.yaml.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-04-30 09:39:10 -07:00
Andrii Nakryiko
4ccc1f0b9f vmtest: blacklist 2 tests on 5.5
Blacklist linked_vars, which expects typed ksym support. Blacklist snprintf,
which expectes new bpf_snprintf() helper.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-04-26 16:30:18 -07:00
Andrii Nakryiko
b9278634aa sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   1e1032b0c4afaed7739a6681ff6b4cb120b82994
Checkpoint bpf-next commit: 87bd9e602e39585c5280556a2b6a6363bb334257
Baseline bpf commit:        a14d273ba15968495896a38b7b3399dba66d0270
Checkpoint bpf commit:      b02265429681c9c827c45978a61a9f00be5ea9aa

Alexei Starovoitov (1):
  libbpf: Remove unused field.

Andrii Nakryiko (11):
  libbpf: Add bpf_map__inner_map API
  libbpf: Suppress compiler warning when using SEC() macro with externs
  libbpf: Mark BPF subprogs with hidden visibility as static for BPF
    verifier
  libbpf: Allow gaps in BPF program sections to support overriden weak
    functions
  libbpf: Refactor BTF map definition parsing
  libbpf: Factor out symtab and relos sanity checks
  libbpf: Make few internal helpers available outside of libbpf.c
  libbpf: Extend sanity checking ELF symbols with externs validation
  libbpf: Tighten BTF type ID rewriting with error checking
  libbpf: Add linker extern resolution support for functions and global
    variables
  libbpf: Support extern resolution for BTF-defined maps in .maps
    section

Ciara Loftus (1):
  libbpf: Fix potential NULL pointer dereference

Daniel Borkmann (1):
  bpf: Sync bpf headers in tooling infrastucture

Florent Revest (3):
  bpf: Add a bpf_snprintf helper
  libbpf: Initialize the bpf_seq_printf parameters array field by field
  libbpf: Introduce a BPF_SNPRINTF helper macro

Pedro Tammela (1):
  libbpf: Clarify flags in ringbuf helpers

Toke Høiland-Jørgensen (1):
  bpf: Return target info when a tracing bpf_link is queried

 include/uapi/linux/bpf.h |   83 ++-
 src/bpf_helpers.h        |   19 +-
 src/bpf_tracing.h        |   58 +-
 src/btf.c                |    5 -
 src/libbpf.c             |  396 +++++++-----
 src/libbpf.h             |    1 +
 src/libbpf.map           |    1 +
 src/libbpf_internal.h    |   45 ++
 src/linker.c             | 1270 ++++++++++++++++++++++++++++++++------
 src/xsk.c                |    3 +
 10 files changed, 1512 insertions(+), 369 deletions(-)

--
2.30.2
2021-04-26 16:30:18 -07:00
Andrii Nakryiko
af47e6c199 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2021-04-26 16:30:18 -07:00
Andrii Nakryiko
b2c06aec99 libbpf: Support extern resolution for BTF-defined maps in .maps section
Add extra logic to handle map externs (only BTF-defined maps are supported for
linking). Re-use the map parsing logic used during bpf_object__open(). Map
externs are currently restricted to always match complete map definition. So
all the specified attributes will be compared (down to pining, map_flags,
numa_node, etc). In the future this restriction might be relaxed with no
backwards compatibility issues. If any attribute is mismatched between extern
and actual map definition, linker will report an error, pointing out which one
mismatches.

The original intent was to allow for extern to specify attributes that matters
(to user) to enforce. E.g., if you specify just key information and omit
value, then any value fits. Similarly, it should have been possible to enforce
map_flags, pinning, and any other possible map attribute. Unfortunately, that
means that multiple externs can be only partially overlapping with each other,
which means linker would need to combine their type definitions to end up with
the most restrictive and fullest map definition. This requires an extra amount
of BTF manipulation which at this time was deemed unnecessary and would
require further extending generic BTF writer APIs. So that is left for future
follow ups, if there will be demand for that. But the idea seems intresting
and useful, so I want to document it here.

Weak definitions are also supported, but are pretty strict as well, just
like externs: all weak map definitions have to match exactly. In the follow up
patches this most probably will be relaxed, with __weak map definitions being
able to differ between each other (with non-weak definition always winning, of
course).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-13-andrii@kernel.org
2021-04-26 16:30:18 -07:00
Andrii Nakryiko
29e4840915 libbpf: Add linker extern resolution support for functions and global variables
Add BPF static linker logic to resolve extern variables and functions across
multiple linked together BPF object files.

For that, linker maintains a separate list of struct glob_sym structures,
which keeps track of few pieces of metadata (is it extern or resolved global,
is it a weak symbol, which ELF section it belongs to, etc) and ties together
BTF type info and ELF symbol information and keeps them in sync.

With adding support for extern variables/funcs, it's now possible for some
sections to contain both extern and non-extern definitions. This means that
some sections may start out as ephemeral (if only externs are present and thus
there is not corresponding ELF section), but will be "upgraded" to actual ELF
section as symbols are resolved or new non-extern definitions are appended.

Additional care is taken to not duplicate extern entries in sections like
.kconfig and .ksyms.

Given libbpf requires BTF type to always be present for .kconfig/.ksym
externs, linker extends this requirement to all the externs, even those that
are supposed to be resolved during static linking and which won't be visible
to libbpf. With BTF information always present, static linker will check not
just ELF symbol matches, but entire BTF type signature match as well. That
logic is stricter that BPF CO-RE checks. It probably should be re-used by
.ksym resolution logic in libbpf as well, but that's left for follow up
patches.

To make it unnecessary to rewrite ELF symbols and minimize BTF type
rewriting/removal, ELF symbols that correspond to externs initially will be
updated in place once they are resolved. Similarly for BTF type info, VAR/FUNC
and var_secinfo's (sec_vars in struct bpf_linker) are staying stable, but
types they point to might get replaced when extern is resolved. This might
leave some left-over types (even though we try to minimize this for common
cases of having extern funcs with not argument names vs concrete function with
names properly specified). That can be addresses later with a generic BTF
garbage collection. That's left for a follow up as well.

Given BTF type appending phase is separate from ELF symbol
appending/resolution, special struct glob_sym->underlying_btf_id variable is
used to communicate resolution and rewrite decisions. 0 means
underlying_btf_id needs to be appended (it's not yet in final linker->btf), <0
values are used for temporary storage of source BTF type ID (not yet
rewritten), so -glob_sym->underlying_btf_id is BTF type id in obj-btf. But by
the end of linker_append_btf() phase, that underlying_btf_id will be remapped
and will always be > 0. This is the uglies part of the whole process, but
keeps the other parts much simpler due to stability of sec_var and VAR/FUNC
types, as well as ELF symbol, so please keep that in mind while reviewing.

BTF-defined maps require some extra custom logic and is addressed separate in
the next patch, so that to keep this one smaller and easier to review.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-12-andrii@kernel.org
2021-04-26 16:30:18 -07:00
Andrii Nakryiko
7078c5eae4 libbpf: Tighten BTF type ID rewriting with error checking
It should never fail, but if it does, it's better to know about this rather
than end up with nonsensical type IDs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-11-andrii@kernel.org
2021-04-26 16:30:18 -07:00
Andrii Nakryiko
692ae888bc libbpf: Extend sanity checking ELF symbols with externs validation
Add logic to validate extern symbols, plus some other minor extra checks, like
ELF symbol #0 validation, general symbol visibility and binding validations.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-10-andrii@kernel.org
2021-04-26 16:30:18 -07:00
Andrii Nakryiko
24b5d82967 libbpf: Make few internal helpers available outside of libbpf.c
Make skip_mods_and_typedefs(), btf_kind_str(), and btf_func_linkage() helpers
available outside of libbpf.c, to be used by static linker code.

Also do few cleanups (error code fixes, comment clean up, etc) that don't
deserve their own commit.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-9-andrii@kernel.org
2021-04-26 16:30:18 -07:00
Andrii Nakryiko
4dcf439178 libbpf: Factor out symtab and relos sanity checks
Factor out logic for sanity checking SHT_SYMTAB and SHT_REL sections into
separate sections. They are already quite extensive and are suffering from too
deep indentation. Subsequent changes will extend SYMTAB sanity checking
further, so it's better to factor each into a separate function.

No functional changes are intended.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-8-andrii@kernel.org
2021-04-26 16:30:18 -07:00
Andrii Nakryiko
d9b9d4a43a libbpf: Refactor BTF map definition parsing
Refactor BTF-defined maps parsing logic to allow it to be nicely reused by BPF
static linker. Further, at least for BPF static linker, it's important to know
which attributes of a BPF map were defined explicitly, so provide a bit set
for each known portion of BTF map definition. This allows BPF static linker to
do a simple check when dealing with extern map declarations.

The same capabilities allow to distinguish attributes explicitly set to zero
(e.g., __uint(max_entries, 0)) vs the case of not specifying it at all (no
max_entries attribute at all). Libbpf is currently not utilizing that, but it
could be useful for backwards compatibility reasons later.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-7-andrii@kernel.org
2021-04-26 16:30:18 -07:00
Andrii Nakryiko
7b106ea4b1 libbpf: Allow gaps in BPF program sections to support overriden weak functions
Currently libbpf is very strict about parsing BPF program instruction
sections. No gaps are allowed between sequential BPF programs within a given
ELF section. Libbpf enforced that by keeping track of the next section offset
that should start a new BPF (sub)program and cross-checks that by searching
for a corresponding STT_FUNC ELF symbol.

But this is too restrictive once we allow to have weak BPF programs and link
together two or more BPF object files. In such case, some weak BPF programs
might be "overridden" by either non-weak BPF program with the same name and
signature, or even by another weak BPF program that just happened to be linked
first. That, in turn, leaves BPF instructions of the "lost" BPF (sub)program
intact, but there is no corresponding ELF symbol, because no one is going to
be referencing it.

Libbpf already correctly handles such cases in the sense that it won't append
such dead code to actual BPF programs loaded into kernel. So the only change
that needs to be done is to relax the logic of parsing BPF instruction
sections. Instead of assuming next BPF (sub)program section offset, iterate
available STT_FUNC ELF symbols to discover all available BPF subprograms and
programs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-6-andrii@kernel.org
2021-04-26 16:30:18 -07:00
Andrii Nakryiko
3319982d34 libbpf: Mark BPF subprogs with hidden visibility as static for BPF verifier
Define __hidden helper macro in bpf_helpers.h, which is a short-hand for
__attribute__((visibility("hidden"))). Add libbpf support to mark BPF
subprograms marked with __hidden as static in BTF information to enforce BPF
verifier's static function validation algorithm, which takes more information
(caller's context) into account during a subprogram validation.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-5-andrii@kernel.org
2021-04-26 16:30:18 -07:00
Andrii Nakryiko
2e430712f5 libbpf: Suppress compiler warning when using SEC() macro with externs
When used on externs SEC() macro will trigger compilation warning about
inapplicable `__attribute__((used))`. That's expected for extern declarations,
so suppress it with the corresponding _Pragma.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-4-andrii@kernel.org
2021-04-26 16:30:18 -07:00
Florent Revest
120a21852b libbpf: Introduce a BPF_SNPRINTF helper macro
Similarly to BPF_SEQ_PRINTF, this macro turns variadic arguments into an
array of u64, making it more natural to call the bpf_snprintf helper.

Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-6-revest@chromium.org
2021-04-26 16:30:18 -07:00
Florent Revest
f4da689d90 libbpf: Initialize the bpf_seq_printf parameters array field by field
When initializing the __param array with a one liner, if all args are
const, the initial array value will be placed in the rodata section but
because libbpf does not support relocation in the rodata section, any
pointer in this array will stay NULL.

Fixes: c09add2fbc5a ("tools/libbpf: Add bpf_iter support")
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-5-revest@chromium.org
2021-04-26 16:30:18 -07:00
Florent Revest
30755d3a1c bpf: Add a bpf_snprintf helper
The implementation takes inspiration from the existing bpf_trace_printk
helper but there are a few differences:

To allow for a large number of format-specifiers, parameters are
provided in an array, like in bpf_seq_printf.

Because the output string takes two arguments and the array of
parameters also takes two arguments, the format string needs to fit in
one argument. Thankfully, ARG_PTR_TO_CONST_STR is guaranteed to point to
a zero-terminated read-only map so we don't need a format string length
arg.

Because the format-string is known at verification time, we also do
a first pass of format string validation in the verifier logic. This
makes debugging easier.

Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-4-revest@chromium.org
2021-04-26 16:30:18 -07:00
Alexei Starovoitov
dda0dd6a87 libbpf: Remove unused field.
relo->processed is set, but not used. Remove it.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210415141817.53136-1-alexei.starovoitov@gmail.com
2021-04-26 16:30:18 -07:00
Toke Høiland-Jørgensen
c21f91bd35 bpf: Return target info when a tracing bpf_link is queried
There is currently no way to discover the target of a tracing program
attachment after the fact. Add this information to bpf_link_info and return
it when querying the bpf_link fd.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413091607.58945-1-toke@redhat.com
2021-04-26 16:30:18 -07:00
Pedro Tammela
552dec12dc libbpf: Clarify flags in ringbuf helpers
In 'bpf_ringbuf_reserve()' we require the flag to '0' at the moment.

For 'bpf_ringbuf_{discard,submit,output}' a flag of '0' might send a
notification to the process if needed.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210412192434.944343-1-pctammela@mojatatu.com
2021-04-26 16:30:18 -07:00
Daniel Borkmann
678e8c8e49 bpf: Sync bpf headers in tooling infrastucture
Synchronize tools/include/uapi/linux/bpf.h which was missing changes
from various commits:

  - f3c45326ee71 ("bpf: Document PROG_TEST_RUN limitations")
  - e5e35e754c28 ("bpf: BPF-helper for MTU checking add length input")

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2021-04-26 16:30:18 -07:00
Andrii Nakryiko
f6de59dc3e libbpf: Add bpf_map__inner_map API
The API gives access to inner map for map in map types (array or
hash of map). It will be used to dynamically set max_entries in it.

Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-7-yauheni.kaliuta@redhat.com
2021-04-26 16:30:18 -07:00
Ciara Loftus
823648416c libbpf: Fix potential NULL pointer dereference
Wait until after the UMEM is checked for null to dereference it.

Fixes: 43f1bc1efff1 ("libbpf: Restore umem state after socket create failure")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210408052009.7844-1-ciara.loftus@intel.com
2021-04-26 16:30:18 -07:00
Andrii Nakryiko
02dbcbea28 travis-ci: use default docker from Focal
Stop trying to update Docker version. Focal seems to have recent enough
Docker. This fixes Debian builds.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-04-26 08:30:13 -07:00
Ilya Leoshkevich
a7502f2707 vmtest: fix error reporting
S50-run-tests uses -e, which means that it immediately exits on test
failures without writing /exitcode. Fix by temporarily turning -e off.

Another issue is that $? in S50-run-tests is not quoted, which causes
the random value from the host to be taken (in practice always 0), so
fix that as well.

Finally, this fix has a positive side effect - QEMU no longer hangs
when tests fail. This is because rcS (generated by mkrootfs.sh) also
uses -e and immediately exits, if one of the scripts that it calls
fails, without calling S99-poweroff.

Example output after the fix:

Summary: 53/184 PASSED, 5 SKIPPED, 1 FAILED
+ exitstatus=1
+ set -e
+ echo 1
+ chmod 644 /exitstatus
+ for path in /etc/rcS.d/S*
+ '[' -x /etc/rcS.d/S99-poweroff ']'
+ /etc/rcS.d/S99-poweroff
travis_fold:start:shutdown
Shutdown

starting pid 232, tty '': '/sbin/swapoff -a'
starting pid 233, tty '': '/bin/umount -a -r'
[   45.909033] EXT4-fs (vda): re-mounted. Opts: (null)
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system poweroff
[   48.932007] ACPI: Preparing to enter system sleep state S5
[   48.932785] reboot: Power down

Tests exit status: 1
travis_fold🔚shutdown

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2021-04-06 21:57:01 -07:00
Andrii Nakryiko
bab780e6f9 travis-ci: update to Ubuntu Focal
Update to Ubuntu Focal 20.04.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-04-06 21:18:31 -07:00
Ilya Leoshkevich
915f3abe94 travis-ci: prohibit uninitialized variables in managers/
The scripts in this directory rely on certain environment variables, so
fail if they are not set in order to improve the debugging experience.
The vmtest/ scripts already do it.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2021-04-06 21:18:31 -07:00
Ilya Leoshkevich
0c248143d4 travis-ci: disable GCC's -Wstringop-truncation noisy error on Ubuntu
This is the same as commit 4d86cae4f0 ("ci: disable GCC's
-Wstringop-truncation noisy error"), but for Ubuntu. Without this,
there are false positives in bpf_object__new() on Ubuntu 20.04:
this function calls strncpy() with the correct bounds, but still
triggers the warning.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2021-04-06 21:18:31 -07:00
Ilya Leoshkevich
9f0e42b512 vmtest: blacklist stacktrace_build_id selftest
It requires v5.9+ kernel when the test code is built with a newer
toolchain. The support was added by commit b33164f2bd1c ("bpf:
Iterate through all PT_NOTE sections when looking for build id").

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2021-04-06 21:18:31 -07:00
Andrii Nakryiko
174d0b7b49 vmtest: blacklist kfunc_call selftests
kfunc_call requires 5.13+ kernel.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-04-05 11:30:05 -07:00
Andrii Nakryiko
95f83b8b0c sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   155f556d64b1a48710f01305e14bb860734ed1e3
Checkpoint bpf-next commit: 1e1032b0c4afaed7739a6681ff6b4cb120b82994
Baseline bpf commit:        002322402dafd846c424ffa9240a937f49b48c42
Checkpoint bpf commit:      a14d273ba15968495896a38b7b3399dba66d0270

Andrii Nakryiko (2):
  libbpf: Preserve empty DATASEC BTFs during static linking
  libbpf: Fix memory leak when emitting final btf_ext

Ciara Loftus (3):
  libbpf: Ensure umem pointer is non-NULL before dereferencing
  libbpf: Restore umem state after socket create failure
  libbpf: Only create rx and tx XDP rings when necessary

Cong Wang (1):
  sock_map: Introduce BPF_SK_SKB_VERDICT

Hengqi Chen (1):
  libbpf: Fix KERNEL_VERSION macro

Maciej Fijalkowski (1):
  libbpf: xsk: Use bpf_link

Martin KaFai Lau (6):
  bpf: Support bpf program calling kernel function
  libbpf: Refactor bpf_object__resolve_ksyms_btf_id
  libbpf: Refactor codes for finding btf id of a kernel symbol
  libbpf: Rename RELO_EXTERN to RELO_EXTERN_VAR
  libbpf: Record extern sym relocation first
  libbpf: Support extern kernel function

Pedro Tammela (1):
  libbpf: Fix bail out from 'ringbuf_process_ring()' on error

Yang Yingliang (1):
  libbpf: Remove redundant semi-colon

 include/uapi/linux/bpf.h |   5 +
 src/bpf_helpers.h        |   2 +-
 src/libbpf.c             | 389 +++++++++++++++++++++++++++++----------
 src/linker.c             |  39 +++-
 src/ringbuf.c            |   2 +-
 src/xsk.c                | 315 ++++++++++++++++++++++++-------
 6 files changed, 574 insertions(+), 178 deletions(-)

--
2.30.2
2021-04-05 11:30:05 -07:00
Ciara Loftus
416343d95c libbpf: Only create rx and tx XDP rings when necessary
Prior to this commit xsk_socket__create(_shared) always attempted to create
the rx and tx rings for the socket. However this causes an issue when the
socket being setup is that which shares the fd with the UMEM. If a
previous call to this function failed with this socket after the rings were
set up, a subsequent call would always fail because the rings are not torn
down after the first call and when we try to set them up again we encounter
an error because they already exist. Solve this by remembering whether the
rings were set up by introducing new bools to struct xsk_umem which
represent the ring setup status and using them to determine whether or
not to set up the rings.

Fixes: 1cad07884239 ("libbpf: add support for using AF_XDP sockets")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210331061218.1647-4-ciara.loftus@intel.com
2021-04-05 11:30:05 -07:00
Ciara Loftus
4f932c1ee9 libbpf: Restore umem state after socket create failure
If the call to xsk_socket__create fails, the user may want to retry the
socket creation using the same umem. Ensure that the umem is in the
same state on exit if the call fails by:
1. ensuring the umem _save pointers are unmodified.
2. not unmapping the set of umem rings that were set up with the umem
during xsk_umem__create, since those maps existed before the call to
xsk_socket__create and should remain in tact even in the event of
failure.

Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210331061218.1647-3-ciara.loftus@intel.com
2021-04-05 11:30:05 -07:00
Ciara Loftus
37e838f959 libbpf: Ensure umem pointer is non-NULL before dereferencing
Calls to xsk_socket__create dereference the umem to access the
fill_save and comp_save pointers. Make sure the umem is non-NULL
before doing this.

Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20210331061218.1647-2-ciara.loftus@intel.com
2021-04-05 11:30:05 -07:00
Pedro Tammela
d98e968707 libbpf: Fix bail out from 'ringbuf_process_ring()' on error
The current code bails out with negative and positive returns.
If the callback returns a positive return code, 'ring_buffer__consume()'
and 'ring_buffer__poll()' will return a spurious number of records
consumed, but mostly important will continue the processing loop.

This patch makes positive returns from the callback a no-op.

Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210325150115.138750-1-pctammela@mojatatu.com
2021-04-05 11:30:05 -07:00
Hengqi Chen
d4beac571a libbpf: Fix KERNEL_VERSION macro
Add missing ')' for KERNEL_VERSION macro.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210405040119.802188-1-hengqi.chen@gmail.com
2021-04-05 11:30:05 -07:00
Yang Yingliang
bea42d49f8 libbpf: Remove redundant semi-colon
Remove redundant semi-colon in finalize_btf_ext().

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210402012634.1965453-1-yangyingliang@huawei.com
2021-04-05 11:30:05 -07:00
Cong Wang
4e8d8d5cd2 sock_map: Introduce BPF_SK_SKB_VERDICT
Reusing BPF_SK_SKB_STREAM_VERDICT is possible but its name is
confusing and more importantly we still want to distinguish them
from user-space. So we can just reuse the stream verdict code but
introduce a new type of eBPF program, skb_verdict. Users are not
allowed to attach stream_verdict and skb_verdict programs to the
same map.

Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210331023237.41094-10-xiyou.wangcong@gmail.com
2021-04-05 11:30:05 -07:00
Maciej Fijalkowski
8628610c32 libbpf: xsk: Use bpf_link
Currently, if there are multiple xdpsock instances running on a single
interface and in case one of the instances is terminated, the rest of
them are left in an inoperable state due to the fact of unloaded XDP
prog from interface.

Consider the scenario below:

// load xdp prog and xskmap and add entry to xskmap at idx 10
$ sudo ./xdpsock -i ens801f0 -t -q 10

// add entry to xskmap at idx 11
$ sudo ./xdpsock -i ens801f0 -t -q 11

terminate one of the processes and another one is unable to work due to
the fact that the XDP prog was unloaded from interface.

To address that, step away from setting bpf prog in favour of bpf_link.
This means that refcounting of BPF resources will be done automatically
by bpf_link itself.

Provide backward compatibility by checking if underlying system is
bpf_link capable. Do this by looking up/creating bpf_link on loopback
device. If it failed in any way, stick with netlink-based XDP prog.
therwise, use bpf_link-based logic.

When setting up BPF resources during xsk socket creation, check whether
bpf_link for a given ifindex already exists via set of calls to
bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
and comparing the ifindexes from bpf_link and xsk socket.

For case where resources exist but they are not AF_XDP related, bail out
and ask user to remove existing prog and then retry.

Lastly, do a bit of refactoring within __xsk_setup_xdp_prog and pull out
existing code branches based on prog_id value onto separate functions
that are responsible for resource initialization if prog_id was 0 and
for lookup existing resources for non-zero prog_id as that implies that
XDP program is present on the underlying net device. This in turn makes
it easier to follow, especially the teardown part of both branches.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-7-maciej.fijalkowski@intel.com
2021-04-05 11:30:05 -07:00
Andrii Nakryiko
2e51adc9bc libbpf: Fix memory leak when emitting final btf_ext
Free temporary allocated memory used to construct finalized .BTF.ext data.
Found by Coverity static analysis on libbpf's Github repo.

Fixes: 8fd27bf69b86 ("libbpf: Add BPF static linker BTF and BTF.ext support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210327042502.969745-1-andrii@kernel.org
2021-04-05 11:30:05 -07:00
Martin KaFai Lau
1ccb9d99d6 libbpf: Support extern kernel function
This patch is to make libbpf able to handle the following extern
kernel function declaration and do the needed relocations before
loading the bpf program to the kernel.

extern int foo(struct sock *) __attribute__((section(".ksyms")))

In the collect extern phase, needed changes is made to
bpf_object__collect_externs() and find_extern_btf_id() to collect
extern function in ".ksyms" section.  The func in the BTF datasec also
needs to be replaced by an int var.  The idea is similar to the existing
handling in extern var.  In case the BTF may not have a var, a dummy ksym
var is added at the beginning of bpf_object__collect_externs()
if there is func under ksyms datasec.  It will also change the
func linkage from extern to global which the kernel can support.
It also assigns a param name if it does not have one.

In the collect relo phase, it will record the kernel function
call as RELO_EXTERN_FUNC.

bpf_object__resolve_ksym_func_btf_id() is added to find the func
btf_id of the running kernel.

During actual relocation, it will patch the BPF_CALL instruction with
src_reg = BPF_PSEUDO_FUNC_CALL and insn->imm set to the running
kernel func's btf_id.

The required LLVM patch: https://reviews.llvm.org/D93563

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210325015234.1548923-1-kafai@fb.com
2021-04-05 11:30:05 -07:00
Martin KaFai Lau
90e052e6dd libbpf: Record extern sym relocation first
This patch records the extern sym relocs first before recording
subprog relocs.  The later patch will have relocs for extern
kernel function call which is also using BPF_JMP | BPF_CALL.
It will be easier to handle the extern symbols first in
the later patch.

is_call_insn() helper is added.  The existing is_ldimm64() helper
is renamed to is_ldimm64_insn() for consistency.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210325015227.1548623-1-kafai@fb.com
2021-04-05 11:30:05 -07:00
Martin KaFai Lau
7ef7ed2a5d libbpf: Rename RELO_EXTERN to RELO_EXTERN_VAR
This patch renames RELO_EXTERN to RELO_EXTERN_VAR.
It is to avoid the confusion with a later patch adding
RELO_EXTERN_FUNC.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210325015221.1547722-1-kafai@fb.com
2021-04-05 11:30:05 -07:00
Martin KaFai Lau
7036f3356e libbpf: Refactor codes for finding btf id of a kernel symbol
This patch refactors code, that finds kernel btf_id by kind
and symbol name, to a new function find_ksym_btf_id().

It also adds a new helper __btf_kind_str() to return
a string by the numeric kind value.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210325015214.1547069-1-kafai@fb.com
2021-04-05 11:30:05 -07:00
Martin KaFai Lau
e5d7cbe15a libbpf: Refactor bpf_object__resolve_ksyms_btf_id
This patch refactors most of the logic from
bpf_object__resolve_ksyms_btf_id() into a new function
bpf_object__resolve_ksym_var_btf_id().
It is to get ready for a later patch adding
bpf_object__resolve_ksym_func_btf_id() which resolves
a kernel function to the running kernel btf_id.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210325015207.1546749-1-kafai@fb.com
2021-04-05 11:30:05 -07:00
Martin KaFai Lau
e35afcb289 bpf: Support bpf program calling kernel function
This patch adds support to BPF verifier to allow bpf program calling
kernel function directly.

The use case included in this set is to allow bpf-tcp-cc to directly
call some tcp-cc helper functions (e.g. "tcp_cong_avoid_ai()").  Those
functions have already been used by some kernel tcp-cc implementations.

This set will also allow the bpf-tcp-cc program to directly call the
kernel tcp-cc implementation,  For example, a bpf_dctcp may only want to
implement its own dctcp_cwnd_event() and reuse other dctcp_*() directly
from the kernel tcp_dctcp.c instead of reimplementing (or
copy-and-pasting) them.

The tcp-cc kernel functions mentioned above will be white listed
for the struct_ops bpf-tcp-cc programs to use in a later patch.
The white listed functions are not bounded to a fixed ABI contract.
Those functions have already been used by the existing kernel tcp-cc.
If any of them has changed, both in-tree and out-of-tree kernel tcp-cc
implementations have to be changed.  The same goes for the struct_ops
bpf-tcp-cc programs which have to be adjusted accordingly.

This patch is to make the required changes in the bpf verifier.

First change is in btf.c, it adds a case in "btf_check_func_arg_match()".
When the passed in "btf->kernel_btf == true", it means matching the
verifier regs' states with a kernel function.  This will handle the
PTR_TO_BTF_ID reg.  It also maps PTR_TO_SOCK_COMMON, PTR_TO_SOCKET,
and PTR_TO_TCP_SOCK to its kernel's btf_id.

In the later libbpf patch, the insn calling a kernel function will
look like:

insn->code == (BPF_JMP | BPF_CALL)
insn->src_reg == BPF_PSEUDO_KFUNC_CALL /* <- new in this patch */
insn->imm == func_btf_id /* btf_id of the running kernel */

[ For the future calling function-in-kernel-module support, an array
  of module btf_fds can be passed at the load time and insn->off
  can be used to index into this array. ]

At the early stage of verifier, the verifier will collect all kernel
function calls into "struct bpf_kfunc_desc".  Those
descriptors are stored in "prog->aux->kfunc_tab" and will
be available to the JIT.  Since this "add" operation is similar
to the current "add_subprog()" and looking for the same insn->code,
they are done together in the new "add_subprog_and_kfunc()".

In the "do_check()" stage, the new "check_kfunc_call()" is added
to verify the kernel function call instruction:
1. Ensure the kernel function can be used by a particular BPF_PROG_TYPE.
   A new bpf_verifier_ops "check_kfunc_call" is added to do that.
   The bpf-tcp-cc struct_ops program will implement this function in
   a later patch.
2. Call "btf_check_kfunc_args_match()" to ensure the regs can be
   used as the args of a kernel function.
3. Mark the regs' type, subreg_def, and zext_dst.

At the later do_misc_fixups() stage, the new fixup_kfunc_call()
will replace the insn->imm with the function address (relative
to __bpf_call_base).  If needed, the jit can find the btf_func_model
by calling the new bpf_jit_find_kfunc_model(prog, insn).
With the imm set to the function address, "bpftool prog dump xlated"
will be able to display the kernel function calls the same way as
it displays other bpf helper calls.

gpl_compatible program is required to call kernel function.

This feature currently requires JIT.

The verifier selftests are adjusted because of the changes in
the verbose log in add_subprog_and_kfunc().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210325015142.1544736-1-kafai@fb.com
2021-04-05 11:30:05 -07:00
Andrii Nakryiko
a18b72b920 libbpf: Preserve empty DATASEC BTFs during static linking
Ensure that BPF static linker preserves all DATASEC BTF types, even if some of
them might not have any variable information at all. This may happen if the
compiler promotes local initialized variable contents into .rodata section and
there are no global or static functions in the program.

For example,

  $ cat t.c
  struct t { char a; char b; char c; };
  void bar(struct t*);
  void find() {
     struct t tmp = {1, 2, 3};
     bar(&tmp);
  }

  $ clang -target bpf -O2 -g -S t.c
         .long   104                             # BTF_KIND_DATASEC(id = 8)
         .long   251658240                       # 0xf000000
         .long   0

         .ascii  ".rodata"                       # string offset=104

  $ clang -target bpf -O2 -g -c t.c
  $ readelf -S t.o | grep data
     [ 4] .rodata           PROGBITS         0000000000000000  00000090

Fixes: 8fd27bf69b86 ("libbpf: Add BPF static linker BTF and BTF.ext support")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210326043036.3081011-1-andrii@kernel.org
2021-04-05 11:30:05 -07:00
Adam Jensen
2bd682d23e README: Mention Alpine in list of packaging distros 2021-04-04 17:50:11 -07:00
Andrii Nakryiko
99bc176337 README: update links to more up-to-date CO-RE articles
Update links to point to blog posts that have some new updates and are
generally kept more up-to-date.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-03-28 13:36:49 -07:00
Andrii Nakryiko
ea5752c641 Makefile: add strset.o and linker.o to build
Fix libbpf build by linking strset.o and linker.o into libbpf.a.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-03-25 23:40:47 -07:00
Andrii Nakryiko
1d2d2d0034 vmtests: blacklist fexit_sleep on 5.5
Blacklist fexit_sleep selftest that relies on bpf_trampoline fix in 5.12.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-03-25 23:31:23 -07:00
Andrii Nakryiko
582b8fe21b sync: remove libbpf_util.h from the list of headers
libbpf_util.h was removed in 7e8bbe24cb8b ("libbpf: xsk: Move barriers from
libbpf_util.h to xsk.h") upstream, so remove it from the list of installable
headers.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-03-25 23:31:23 -07:00
Andrii Nakryiko
3ea10e46cb sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   22541a9eeb0d968c133aaebd95fa59da3208e705
Checkpoint bpf-next commit: 155f556d64b1a48710f01305e14bb860734ed1e3
Baseline bpf commit:        6185266c5a853bb0f2a459e3ff594546f277609b
Checkpoint bpf commit:      002322402dafd846c424ffa9240a937f49b48c42

Andrii Nakryiko (11):
  libbpf: Add explicit padding to bpf_xdp_set_link_opts
  libbpf: provide NULL and KERNEL_VERSION macros in bpf_helpers.h
  libbpf: Expose btf_type_by_id() internally
  libbpf: Generalize BTF and BTF.ext type ID and strings iteration
  libbpf: Rename internal memory-management helpers
  libbpf: Extract internal set-of-strings datastructure APIs
  libbpf: Add generic BTF type shallow copy API
  libbpf: Add BPF static linker APIs
  libbpf: Add BPF static linker BTF and BTF.ext support
  libbpf: Skip BTF fixup if object file has no BTF
  libbpf: Constify few bpf_program getters

Björn Töpel (3):
  libbpf, xsk: Add libbpf_smp_store_release libbpf_smp_load_acquire
  libbpf: xsk: Remove linux/compiler.h header
  libbpf: xsk: Move barriers from libbpf_util.h to xsk.h

Jean-Philippe Brucker (2):
  libbpf: Fix arm64 build
  libbpf: Fix BTF dump of pointer-to-array-of-struct

Joe Stringer (2):
  scripts/bpf: Abstract eBPF API target parameter
  tools: Sync uapi bpf.h header with latest changes

KP Singh (1):
  libbpf: Add explicit padding to btf_dump_emit_type_decl_opts

Kumar Kartikeya Dwivedi (1):
  libbpf: Use SOCK_CLOEXEC when opening the netlink socket

Lorenz Bauer (1):
  bpf: Add PROG_TEST_RUN support for sk_lookup programs

Maciej Fijalkowski (1):
  libbpf: Clear map_info before each bpf_obj_get_info_by_fd

Namhyung Kim (1):
  libbpf: Fix error path in bpf_object__elf_init()

Pedro Tammela (1):
  libbpf: Avoid inline hint definition from 'linux/stddef.h'

Rafael David Tinoco (1):
  libbpf: Add bpf object kern_version attribute setter

Xuesen Huang (1):
  bpf: Add bpf_skb_adjust_room flag BPF_F_ADJ_ROOM_ENCAP_L2_ETH

 include/uapi/linux/bpf.h |  724 +++++++++++++-
 src/bpf_helpers.h        |   21 +-
 src/btf.c                |  714 +++++++-------
 src/btf.h                |    3 +
 src/btf_dump.c           |   10 +-
 src/libbpf.c             |   32 +-
 src/libbpf.h             |   19 +-
 src/libbpf.map           |    6 +
 src/libbpf_internal.h    |   38 +-
 src/libbpf_util.h        |   47 -
 src/linker.c             | 1944 ++++++++++++++++++++++++++++++++++++++
 src/netlink.c            |    2 +-
 src/strset.c             |  176 ++++
 src/strset.h             |   21 +
 src/xsk.c                |    5 +-
 src/xsk.h                |   87 +-
 16 files changed, 3379 insertions(+), 470 deletions(-)
 delete mode 100644 src/libbpf_util.h
 create mode 100644 src/linker.c
 create mode 100644 src/strset.c
 create mode 100644 src/strset.h

--
2.30.2
2021-03-25 23:31:23 -07:00
Andrii Nakryiko
3118d38a2e sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2021-03-25 23:31:23 -07:00
Rafael David Tinoco
b09a4999d9 libbpf: Add bpf object kern_version attribute setter
Unfortunately some distros don't have their kernel version defined
accurately in <linux/version.h> due to different long term support
reasons.

It is important to have a way to override the bpf kern_version
attribute during runtime: some old kernels might still check for
kern_version attribute during bpf_prog_load().

Signed-off-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210323040952.2118241-1-rafaeldtinoco@ubuntu.com
2021-03-25 23:31:23 -07:00
Andrii Nakryiko
53f0e7d8ec libbpf: Constify few bpf_program getters
bpf_program__get_type() and bpf_program__get_expected_attach_type() shouldn't
modify given bpf_program, so mark input parameter as const struct bpf_program.
This eliminates unnecessary compilation warnings or explicit casts in user
programs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210324172941.2609884-1-andrii@kernel.org
2021-03-25 23:31:23 -07:00
Andrii Nakryiko
d4d3a88b5a libbpf: Skip BTF fixup if object file has no BTF
Skip BTF fixup step when input object file is missing BTF altogether.

Fixes: 8fd27bf69b86 ("libbpf: Add BPF static linker BTF and BTF.ext support")
Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/bpf/20210319205909.1748642-3-andrii@kernel.org
2021-03-25 23:31:23 -07:00
KP Singh
7b1f3e310b libbpf: Add explicit padding to btf_dump_emit_type_decl_opts
Similar to
https://lore.kernel.org/bpf/20210313210920.1959628-2-andrii@kernel.org/

When DECLARE_LIBBPF_OPTS is used with inline field initialization, e.g:

  DECLARE_LIBBPF_OPTS(btf_dump_emit_type_decl_opts, opts,
    .field_name = var_ident,
    .indent_level = 2,
    .strip_mods = strip_mods,
  );

and compiled in debug mode, the compiler generates code which
leaves the padding uninitialized and triggers errors within libbpf APIs
which require strict zero initialization of OPTS structs.

Adding anonymous padding field fixes the issue.

Fixes: 9f81654eebe8 ("libbpf: Expose BTF-to-C type declaration emitting API")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210319192117.2310658-1-kpsingh@kernel.org
2021-03-25 23:31:23 -07:00
Andrii Nakryiko
b156979d19 libbpf: Add BPF static linker BTF and BTF.ext support
Add .BTF and .BTF.ext static linking logic.

When multiple BPF object files are linked together, their respective .BTF and
.BTF.ext sections are merged together. BTF types are not just concatenated,
but also deduplicated. .BTF.ext data is grouped by type (func info, line info,
core_relos) and target section names, and then all the records are
concatenated together, preserving their relative order. All the BTF type ID
references and string offsets are updated as necessary, to take into account
possibly deduplicated strings and types.

BTF DATASEC types are handled specially. Their respective var_secinfos are
accumulated separately in special per-section data and then final DATASEC
types are emitted at the very end during bpf_linker__finalize() operation,
just before emitting final ELF output file.

BTF data can also provide "section annotations" for some extern variables.
Such concept is missing in ELF, but BTF will have DATASEC types for such
special extern datasections (e.g., .kconfig, .ksyms). Such sections are called
"ephemeral" internally. Internally linker will keep metadata for each such
section, collecting variables information, but those sections won't be emitted
into the final ELF file.

Also, given LLVM/Clang during compilation emits BTF DATASECS that are
incomplete, missing section size and variable offsets for static variables,
BPF static linker will initially fix up such DATASECs, using ELF symbols data.
The final DATASECs will preserve section sizes and all variable offsets. This
is handled correctly by libbpf already, so won't cause any new issues. On the
other hand, it's actually a nice property to have a complete BTF data without
runtime adjustments done during bpf_object__open() by libbpf. In that sense,
BPF static linker is also a BTF normalizer.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-8-andrii@kernel.org
2021-03-25 23:31:23 -07:00
Andrii Nakryiko
bd81770e10 libbpf: Add BPF static linker APIs
Introduce BPF static linker APIs to libbpf. BPF static linker allows to
perform static linking of multiple BPF object files into a single combined
resulting object file, preserving all the BPF programs, maps, global
variables, etc.

Data sections (.bss, .data, .rodata, .maps, maps, etc) with the same name are
concatenated together. Similarly, code sections are also concatenated. All the
symbols and ELF relocations are also concatenated in their respective ELF
sections and are adjusted accordingly to the new object file layout.

Static variables and functions are handled correctly as well, adjusting BPF
instructions offsets to reflect new variable/function offset within the
combined ELF section. Such relocations are referencing STT_SECTION symbols and
that stays intact.

Data sections in different files can have different alignment requirements, so
that is taken care of as well, adjusting sizes and offsets as necessary to
satisfy both old and new alignment requirements.

DWARF data sections are stripped out, currently. As well as LLLVM_ADDRSIG
section, which is ignored by libbpf in bpf_object__open() anyways. So, in
a way, BPF static linker is an analogue to `llvm-strip -g`, which is a pretty
nice property, especially if resulting .o file is then used to generate BPF
skeleton.

Original string sections are ignored and instead we construct our own set of
unique strings using libbpf-internal `struct strset` API.

To reduce the size of the patch, all the .BTF and .BTF.ext processing was
moved into a separate patch.

The high-level API consists of just 4 functions:
  - bpf_linker__new() creates an instance of BPF static linker. It accepts
    output filename and (currently empty) options struct;
  - bpf_linker__add_file() takes input filename and appends it to the already
    processed ELF data; it can be called multiple times, one for each BPF
    ELF object file that needs to be linked in;
  - bpf_linker__finalize() needs to be called to dump final ELF contents into
    the output file, specified when bpf_linker was created; after
    bpf_linker__finalize() is called, no more bpf_linker__add_file() and
    bpf_linker__finalize() calls are allowed, they will return error;
  - regardless of whether bpf_linker__finalize() was called or not,
    bpf_linker__free() will free up all the used resources.

Currently, BPF static linker doesn't resolve cross-object file references
(extern variables and/or functions). This will be added in the follow up patch
set.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-7-andrii@kernel.org
2021-03-25 23:31:23 -07:00
Andrii Nakryiko
4fdc36418d libbpf: Add generic BTF type shallow copy API
Add btf__add_type() API that performs shallow copy of a given BTF type from
the source BTF into the destination BTF. All the information and type IDs are
preserved, but all the strings encountered are added into the destination BTF
and corresponding offsets are rewritten. BTF type IDs are assumed to be
correct or such that will be (somehow) modified afterwards.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-6-andrii@kernel.org
2021-03-25 23:31:23 -07:00
Andrii Nakryiko
861ad35ceb libbpf: Extract internal set-of-strings datastructure APIs
Extract BTF logic for maintaining a set of strings data structure, used for
BTF strings section construction in writable mode, into separate re-usable
API. This data structure is going to be used by bpf_linker to maintains ELF
STRTAB section, which has the same layout as BTF strings section.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-5-andrii@kernel.org
2021-03-25 23:31:23 -07:00
Andrii Nakryiko
7fc514acf1 libbpf: Rename internal memory-management helpers
Rename btf_add_mem() and btf_ensure_mem() helpers that abstract away details
of dynamically resizable memory to use libbpf_ prefix, as they are not
BTF-specific. No functional changes.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-4-andrii@kernel.org
2021-03-25 23:31:23 -07:00
Andrii Nakryiko
74e94c40fe libbpf: Generalize BTF and BTF.ext type ID and strings iteration
Extract and generalize the logic to iterate BTF type ID and string offset
fields within BTF types and .BTF.ext data. Expose this internally in libbpf
for re-use by bpf_linker.

Additionally, complete strings deduplication handling for BTF.ext (e.g., CO-RE
access strings), which was previously missing. There previously was no
case of deduplicating .BTF.ext data, but bpf_linker is going to use it.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-3-andrii@kernel.org
2021-03-25 23:31:23 -07:00
Andrii Nakryiko
082a5c6020 libbpf: Expose btf_type_by_id() internally
btf_type_by_id() is internal-only convenience API returning non-const pointer
to struct btf_type. Expose it outside of btf.c for re-use.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-2-andrii@kernel.org
2021-03-25 23:31:23 -07:00
Andrii Nakryiko
75a2e3bda8 libbpf: provide NULL and KERNEL_VERSION macros in bpf_helpers.h
Given that vmlinux.h is not compatible with headers like stddef.h, NULL poses
an annoying problem: it is defined as #define, so is not captured in BTF, so
is not emitted into vmlinux.h. This leads to users either sticking to explicit
0, or defining their own NULL (as progs/skb_pkt_end.c does).

But it's easy for bpf_helpers.h to provide (conditionally) NULL definition.
Similarly, KERNEL_VERSION is another commonly missed macro that came up
multiple times. So this patch adds both of them, along with offsetof(), that
also is typically defined in stddef.h, just like NULL.

This might cause compilation warning for existing BPF applications defining
their own NULL and/or KERNEL_VERSION already:

  progs/skb_pkt_end.c:7:9: warning: 'NULL' macro redefined [-Wmacro-redefined]
  #define NULL 0
          ^
  /tmp/linux/tools/testing/selftests/bpf/tools/include/vmlinux.h:4:9: note: previous definition is here
  #define NULL ((void *)0)
	  ^

It is trivial to fix, though, so long-term benefits outweight temporary
inconveniences.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20210317200510.1354627-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-03-25 23:31:23 -07:00
Andrii Nakryiko
c8bfeae778 libbpf: Add explicit padding to bpf_xdp_set_link_opts
Adding such anonymous padding fixes the issue with uninitialized portions of
bpf_xdp_set_link_opts when using LIBBPF_DECLARE_OPTS macro with inline field
initialization:

DECLARE_LIBBPF_OPTS(bpf_xdp_set_link_opts, opts, .old_fd = -1);

When such code is compiled in debug mode, compiler is generating code that
leaves padding bytes uninitialized, which triggers error inside libbpf APIs
that do strict zero initialization checks for OPTS structs.

Adding anonymous padding field fixes the issue.

Fixes: bd5ca3ef93cd ("libbpf: Add function to set link XDP fd while specifying old program")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210313210920.1959628-2-andrii@kernel.org
2021-03-25 23:31:23 -07:00
Pedro Tammela
a0ad81d9c4 libbpf: Avoid inline hint definition from 'linux/stddef.h'
Linux headers might pull 'linux/stddef.h' which defines
'__always_inline' as the following:

   #ifndef __always_inline
   #define __always_inline inline
   #endif

This becomes an issue if the program picks up the 'linux/stddef.h'
definition as the macro now just hints inline to clang.

This change now enforces the proper definition for BPF programs
regardless of the include order.

Signed-off-by: Pedro Tammela <pctammela@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210314173839.457768-1-pctammela@gmail.com
2021-03-25 23:31:23 -07:00
Björn Töpel
b727e2deca libbpf: xsk: Move barriers from libbpf_util.h to xsk.h
The only user of libbpf_util.h is xsk.h. Move the barriers to xsk.h,
and remove libbpf_util.h. The barriers are used as an implementation
detail, and should not be considered part of the stable API.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210310080929.641212-3-bjorn.topel@gmail.com
2021-03-25 23:31:23 -07:00
Björn Töpel
27db7104d5 libbpf: xsk: Remove linux/compiler.h header
In commit 291471dd1559 ("libbpf, xsk: Add libbpf_smp_store_release
libbpf_smp_load_acquire") linux/compiler.h was added as a dependency
to xsk.h, which is the user-facing API. This makes it harder for
userspace application to consume the library. Here the header
inclusion is removed, and instead {READ,WRITE}_ONCE() is added
explicitly.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210310080929.641212-2-bjorn.topel@gmail.com
2021-03-25 23:31:23 -07:00
Jean-Philippe Brucker
c14f7e5dcf libbpf: Fix BTF dump of pointer-to-array-of-struct
The vmlinux.h generated from BTF is invalid when building
drivers/phy/ti/phy-gmii-sel.c with clang:

vmlinux.h:61702:27: error: array type has incomplete element type ‘struct reg_field’
61702 |  const struct reg_field (*regfields)[3];
      |                           ^~~~~~~~~

bpftool generates a forward declaration for this struct regfield, which
compilers aren't happy about. Here's a simplified reproducer:

	struct inner {
		int val;
	};
	struct outer {
		struct inner (*ptr_to_array)[2];
	} A;

After build with clang -> bpftool btf dump c -> clang/gcc:
./def-clang.h:11:23: error: array has incomplete element type 'struct inner'
        struct inner (*ptr_to_array)[2];

Member ptr_to_array of struct outer is a pointer to an array of struct
inner. In the DWARF generated by clang, struct outer appears before
struct inner, so when converting BTF of struct outer into C, bpftool
issues a forward declaration to struct inner. With GCC the DWARF info is
reversed so struct inner gets fully defined.

That forward declaration is not sufficient when compilers handle an
array of the struct, even when it's only used through a pointer. Note
that we can trigger the same issue with an intermediate typedef:

	struct inner {
	        int val;
	};
	typedef struct inner inner2_t[2];
	struct outer {
	        inner2_t *ptr_to_array;
	} A;

Becomes:

	struct inner;
	typedef struct inner inner2_t[2];

And causes:

./def-clang.h:10:30: error: array has incomplete element type 'struct inner'
	typedef struct inner inner2_t[2];

To fix this, clear through_ptr whenever we encounter an intermediate
array, to make the inner struct part of a strong link and force full
declaration.

Fixes: 351131b51c7a ("libbpf: add btf_dump API for BTF-to-C conversion")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210319112554.794552-2-jean-philippe@linaro.org
2021-03-25 23:31:23 -07:00
Kumar Kartikeya Dwivedi
bbc65156d7 libbpf: Use SOCK_CLOEXEC when opening the netlink socket
Otherwise, there exists a small window between the opening and closing
of the socket fd where it may leak into processes launched by some other
thread.

Fixes: 949abbe88436 ("libbpf: add function to setup XDP")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210317115857.6536-1-memxor@gmail.com
2021-03-25 23:31:23 -07:00
Namhyung Kim
c903b3ab70 libbpf: Fix error path in bpf_object__elf_init()
When it failed to get section names, it should call into
bpf_object__elf_finish() like others.

Fixes: 88a82120282b ("libbpf: Factor out common ELF operations and improve logging")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210317145414.884817-1-namhyung@kernel.org
2021-03-25 23:31:23 -07:00
Jean-Philippe Brucker
186ffbe0b5 libbpf: Fix arm64 build
The macro for libbpf_smp_store_release() doesn't build on arm64, fix it.

Fixes: 291471dd1559 ("libbpf, xsk: Add libbpf_smp_store_release libbpf_smp_load_acquire")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210308182521.155536-1-jean-philippe@linaro.org
2021-03-25 23:31:23 -07:00
Björn Töpel
1d483b45fc libbpf, xsk: Add libbpf_smp_store_release libbpf_smp_load_acquire
Now that the AF_XDP rings have load-acquire/store-release semantics,
move libbpf to that as well.

The library-internal libbpf_smp_{load_acquire,store_release} are only
valid for 32-bit words on ARM64.

Also, remove the barriers that are no longer in use.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210305094113.413544-3-bjorn.topel@gmail.com
2021-03-25 23:31:23 -07:00
Xuesen Huang
d64f8d3207 bpf: Add bpf_skb_adjust_room flag BPF_F_ADJ_ROOM_ENCAP_L2_ETH
bpf_skb_adjust_room sets the inner_protocol as skb->protocol for packets
encapsulation. But that is not appropriate when pushing Ethernet header.

Add an option to further specify encap L2 type and set the inner_protocol
as ETH_P_TEB.

Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Xuesen Huang <huangxuesen@kuaishou.com>
Signed-off-by: Zhiyong Cheng <chengzhiyong@kuaishou.com>
Signed-off-by: Li Wang <wangli09@kuaishou.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/bpf/20210304064046.6232-1-hxseverything@gmail.com
2021-03-25 23:31:23 -07:00
Lorenz Bauer
4f2e1ecbd9 bpf: Add PROG_TEST_RUN support for sk_lookup programs
Allow to pass sk_lookup programs to PROG_TEST_RUN. User space
provides the full bpf_sk_lookup struct as context. Since the
context includes a socket pointer that can't be exposed
to user space we define that PROG_TEST_RUN returns the cookie
of the selected socket or zero in place of the socket pointer.

We don't support testing programs that select a reuseport socket,
since this would mean running another (unrelated) BPF program
from the sk_lookup test handler.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210303101816.36774-3-lmb@cloudflare.com
2021-03-25 23:31:23 -07:00
Joe Stringer
21f523f235 tools: Sync uapi bpf.h header with latest changes
Synchronize the header after all of the recent changes.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-16-joe@cilium.io
2021-03-25 23:31:23 -07:00
Joe Stringer
18c0f03e2d scripts/bpf: Abstract eBPF API target parameter
Abstract out the target parameter so that upcoming commits, more than
just the existing "helpers" target can be called to generate specific
portions of docs from the eBPF UAPI headers.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-10-joe@cilium.io
2021-03-25 23:31:23 -07:00
Maciej Fijalkowski
fade1c32e6 libbpf: Clear map_info before each bpf_obj_get_info_by_fd
xsk_lookup_bpf_maps, based on prog_fd, looks whether current prog has a
reference to XSKMAP. BPF prog can include insns that work on various BPF
maps and this is covered by iterating through map_ids.

The bpf_map_info that is passed to bpf_obj_get_info_by_fd for filling
needs to be cleared at each iteration, so that it doesn't contain any
outdated fields and that is currently missing in the function of
interest.

To fix that, zero-init map_info via memset before each
bpf_obj_get_info_by_fd call.

Also, since the area of this code is touched, in general strcmp is
considered harmful, so let's convert it to strncmp and provide the
size of the array name for current map_info.

While at it, do s/continue/break/ once we have found the xsks_map to
terminate the search.

Fixes: 5750902a6e9b ("libbpf: proper XSKMAP cleanup")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20210303185636.18070-4-maciej.fijalkowski@intel.com
2021-03-25 23:31:23 -07:00
Ilya Leoshkevich
8c2c7e5bcf sync: use bpf_doc.py
In the latest bpf-next bpf_helpers_doc.py has been renamed to
bpf_doc.py.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2021-03-25 23:31:23 -07:00
Andrii Nakryiko
092a606856 Makefile: fix install flags order
Having -m flag between source and destination breaks install on MacOS, as
reported in [0]. Fix it by moving the flag to the front.

  [0] https://github.com/openwrt/openwrt/pull/3959

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-03-05 17:48:04 -08:00
Ilya Leoshkevich
986962fade sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   303dcc25b5c782547eb13b9f29426de843dd6f34
Checkpoint bpf-next commit: 22541a9eeb0d968c133aaebd95fa59da3208e705
Baseline bpf commit:        6185266c5a853bb0f2a459e3ff594546f277609b
Checkpoint bpf commit:      22541a9eeb0d968c133aaebd95fa59da3208e705

Ilya Leoshkevich (3):
  bpf: Add BTF_KIND_FLOAT to uapi
  libbpf: Fix whitespace in btf_add_composite() comment
  libbpf: Add BTF_KIND_FLOAT support

 include/uapi/linux/btf.h |  5 ++--
 src/btf.c                | 51 +++++++++++++++++++++++++++++++++++++++-
 src/btf.h                |  6 +++++
 src/btf_dump.c           |  4 ++++
 src/libbpf.c             | 29 ++++++++++++++++++++++-
 src/libbpf.map           |  5 ++++
 src/libbpf_internal.h    |  2 ++
 7 files changed, 98 insertions(+), 4 deletions(-)

--
2.25.1
2021-03-05 14:15:26 -08:00
Ilya Leoshkevich
471e7c241d libbpf: Add BTF_KIND_FLOAT support
The logic follows that of BTF_KIND_INT most of the time. Sanitization
replaces BTF_KIND_FLOATs with equally-sized empty BTF_KIND_STRUCTs on
older kernels, for example, the following:

    [4] FLOAT 'float' size=4

becomes the following:

    [4] STRUCT '(anon)' size=4 vlen=0

With dwarves patch [1] and this patch, the older kernels, which were
failing with the floating-point-related errors, will now start working
correctly.

[1] https://github.com/iii-i/dwarves/commit/btf-kind-float-v2

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226202256.116518-4-iii@linux.ibm.com
2021-03-05 14:15:26 -08:00
Ilya Leoshkevich
617f781804 libbpf: Fix whitespace in btf_add_composite() comment
Remove trailing space.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210226202256.116518-3-iii@linux.ibm.com
2021-03-05 14:15:26 -08:00
Ilya Leoshkevich
473899d4f7 bpf: Add BTF_KIND_FLOAT to uapi
Add a new kind value and expand the kind bitfield.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210226202256.116518-2-iii@linux.ibm.com
2021-03-05 14:15:26 -08:00
Andrii Nakryiko
7e03685b8d vmtests: blacklist new tests on 5.5 kernel
Extend 5.5 blacklist.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-03-03 08:35:13 -08:00
Andrii Nakryiko
7065a809fc sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   86ce322d21eb032ed8fdd294d0fb095d2debb430
Checkpoint bpf-next commit: 303dcc25b5c782547eb13b9f29426de843dd6f34
Baseline bpf commit:        78031381ae9c88f4f914d66154f4745122149c58
Checkpoint bpf commit:      6185266c5a853bb0f2a459e3ff594546f277609b

Alexei Starovoitov (1):
  bpf: Count the number of times recursion was prevented

Florent Revest (2):
  bpf: Be less specific about socket cookies guarantees
  bpf: Expose bpf_get_socket_cookie to tracing programs

Hangbin Liu (1):
  bpf: Remove blank line in bpf helper description comment

Jesper Dangaard Brouer (2):
  bpf: bpf_fib_lookup return MTU value as output when looked up
  bpf: Add BPF-helper for MTU checking

Jonas Bonn (1):
  Revert "GTP: add support for flow based tunneling API"

Martin KaFai Lau (1):
  libbpf: Ignore non function pointer member in struct_ops

Stanislav Fomichev (1):
  libbpf: Use AF_LOCAL instead of AF_INET in xsk.c

Yonghong Song (3):
  bpf: Add bpf_for_each_map_elem() helper
  libbpf: Move function is_ldimm64() earlier in libbpf.c
  libbpf: Support subprog address relocation

 include/uapi/linux/bpf.h     | 140 +++++++++++++++++++++++++++++++++--
 include/uapi/linux/if_link.h |   1 -
 src/libbpf.c                 |  98 +++++++++++++++++++-----
 src/xsk.c                    |   2 +-
 4 files changed, 213 insertions(+), 28 deletions(-)

--
2.24.1
2021-03-03 08:35:13 -08:00
Andrii Nakryiko
18b55bc136 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2021-03-03 08:35:13 -08:00
Yonghong Song
712f6587c9 libbpf: Support subprog address relocation
A new relocation RELO_SUBPROG_ADDR is added to capture
subprog addresses loaded with ld_imm64 insns. Such ld_imm64
insns are marked with BPF_PSEUDO_FUNC and will be passed to
kernel. For bpf_for_each_map_elem() case, kernel will
check that the to-be-used subprog address must be a static
function and replace it with proper actual jited func address.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204930.3885367-1-yhs@fb.com
2021-03-03 08:35:13 -08:00
Yonghong Song
60aa32b17a libbpf: Move function is_ldimm64() earlier in libbpf.c
Move function is_ldimm64() close to the beginning of libbpf.c,
so it can be reused by later code and the next patch as well.
There is no functionality change.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204929.3885295-1-yhs@fb.com
2021-03-03 08:35:13 -08:00
Yonghong Song
f3612e4117 bpf: Add bpf_for_each_map_elem() helper
The bpf_for_each_map_elem() helper is introduced which
iterates all map elements with a callback function. The
helper signature looks like
  long bpf_for_each_map_elem(map, callback_fn, callback_ctx, flags)
and for each map element, the callback_fn will be called. For example,
like hashmap, the callback signature may look like
  long callback_fn(map, key, val, callback_ctx)

There are two known use cases for this. One is from upstream ([1]) where
a for_each_map_elem helper may help implement a timeout mechanism
in a more generic way. Another is from our internal discussion
for a firewall use case where a map contains all the rules. The packet
data can be compared to all these rules to decide allow or deny
the packet.

For array maps, users can already use a bounded loop to traverse
elements. Using this helper can avoid using bounded loop. For other
type of maps (e.g., hash maps) where bounded loop is hard or
impossible to use, this helper provides a convenient way to
operate on all elements.

For callback_fn, besides map and map element, a callback_ctx,
allocated on caller stack, is also passed to the callback
function. This callback_ctx argument can provide additional
input and allow to write to caller stack for output.

If the callback_fn returns 0, the helper will iterate through next
element if available. If the callback_fn returns 1, the helper
will stop iterating and returns to the bpf program. Other return
values are not used for now.

Currently, this helper is only available with jit. It is possible
to make it work with interpreter with so effort but I leave it
as the future work.

[1]: https://lore.kernel.org/bpf/20210122205415.113822-1-xiyou.wangcong@gmail.com/

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204925.3884923-1-yhs@fb.com
2021-03-03 08:35:13 -08:00
Hangbin Liu
587d2ab628 bpf: Remove blank line in bpf helper description comment
Commit 34b2021cc616 ("bpf: Add BPF-helper for MTU checking") added an extra
blank line in bpf helper description. This will make bpf_helpers_doc.py stop
building bpf_helper_defs.h immediately after bpf_check_mtu(), which will
affect future added functions.

Fixes: 34b2021cc616 ("bpf: Add BPF-helper for MTU checking")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20210223131457.1378978-1-liuhangbin@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-03-03 08:35:13 -08:00
Jesper Dangaard Brouer
6cc16d6401 bpf: Add BPF-helper for MTU checking
This BPF-helper bpf_check_mtu() works for both XDP and TC-BPF programs.

The SKB object is complex and the skb->len value (accessible from
BPF-prog) also include the length of any extra GRO/GSO segments, but
without taking into account that these GRO/GSO segments get added
transport (L4) and network (L3) headers before being transmitted. Thus,
this BPF-helper is created such that the BPF-programmer don't need to
handle these details in the BPF-prog.

The API is designed to help the BPF-programmer, that want to do packet
context size changes, which involves other helpers. These other helpers
usually does a delta size adjustment. This helper also support a delta
size (len_diff), which allow BPF-programmer to reuse arguments needed by
these other helpers, and perform the MTU check prior to doing any actual
size adjustment of the packet context.

It is on purpose, that we allow the len adjustment to become a negative
result, that will pass the MTU check. This might seem weird, but it's not
this helpers responsibility to "catch" wrong len_diff adjustments. Other
helpers will take care of these checks, if BPF-programmer chooses to do
actual size adjustment.

V14:
 - Improve man-page desc of len_diff.

V13:
 - Enforce flag BPF_MTU_CHK_SEGS cannot use len_diff.

V12:
 - Simplify segment check that calls skb_gso_validate_network_len.
 - Helpers should return long

V9:
- Use dev->hard_header_len (instead of ETH_HLEN)
- Annotate with unlikely req from Daniel
- Fix logic error using skb_gso_validate_network_len from Daniel

V6:
- Took John's advice and dropped BPF_MTU_CHK_RELAX
- Returned MTU is kept at L3-level (like fib_lookup)

V4: Lot of changes
 - ifindex 0 now use current netdev for MTU lookup
 - rename helper from bpf_mtu_check to bpf_check_mtu
 - fix bug for GSO pkt length (as skb->len is total len)
 - remove __bpf_len_adj_positive, simply allow negative len adj

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/161287790461.790810.3429728639563297353.stgit@firesoul
2021-03-03 08:35:13 -08:00
Jesper Dangaard Brouer
f0753b5259 bpf: bpf_fib_lookup return MTU value as output when looked up
The BPF-helpers for FIB lookup (bpf_xdp_fib_lookup and bpf_skb_fib_lookup)
can perform MTU check and return BPF_FIB_LKUP_RET_FRAG_NEEDED. The BPF-prog
don't know the MTU value that caused this rejection.

If the BPF-prog wants to implement PMTU (Path MTU Discovery) (rfc1191) it
need to know this MTU value for the ICMP packet.

Patch change lookup and result struct bpf_fib_lookup, to contain this MTU
value as output via a union with 'tot_len' as this is the value used for
the MTU lookup.

V5:
 - Fixed uninit value spotted by Dan Carpenter.
 - Name struct output member mtu_result

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/161287789952.790810.13134700381067698781.stgit@firesoul
2021-03-03 08:35:13 -08:00
Martin KaFai Lau
642655629b libbpf: Ignore non function pointer member in struct_ops
When libbpf initializes the kernel's struct_ops in
"bpf_map__init_kern_struct_ops()", it enforces all
pointer types must be a function pointer and rejects
others.  It turns out to be too strict.  For example,
when directly using "struct tcp_congestion_ops" from vmlinux.h,
it has a "struct module *owner" member and it is set to NULL
in a bpf_tcp_cc.o.

Instead, it only needs to ensure the member is a function
pointer if it has been set (relocated) to a bpf-prog.
This patch moves the "btf_is_func_proto(kern_mtype)" check
after the existing "if (!prog) { continue; }".  The original debug
message in "if (!prog) { continue; }" is also removed since it is
no longer valid.  Beside, there is a later debug message to tell
which function pointer is set.

The "btf_is_func_proto(mtype)" has already been guaranteed
in "bpf_object__collect_st_ops_relos()" which has been run
before "bpf_map__init_kern_struct_ops()".  Thus, this check
is removed.

v2:
- Remove outdated debug message (Andrii)
  Remove because there is a later debug message to tell
  which function pointer is set.
- Following mtype->type is no longer needed. Remove:
  "skip_mods_and_typedefs(btf, mtype->type, &mtype_id)"
- Do "if (!prog)" test before skip_mods_and_typedefs.

Fixes: 590a00888250 ("bpf: libbpf: Add STRUCT_OPS support")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212021030.266932-1-kafai@fb.com
2021-03-03 08:35:13 -08:00
Stanislav Fomichev
14a61e86f0 libbpf: Use AF_LOCAL instead of AF_INET in xsk.c
We have the environments where usage of AF_INET is prohibited
(cgroup/sock_create returns EPERM for AF_INET). Let's use
AF_LOCAL instead of AF_INET, it should perfectly work with SIOCETHTOOL.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20210209221826.922940-1-sdf@google.com
2021-03-03 08:35:13 -08:00
Florent Revest
d142d4a382 bpf: Expose bpf_get_socket_cookie to tracing programs
This needs a new helper that:
- can work in a sleepable context (using sock_gen_cookie)
- takes a struct sock pointer and checks that it's not NULL

Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210111406.785541-2-revest@chromium.org
2021-03-03 08:35:13 -08:00
Florent Revest
99e6a464b8 bpf: Be less specific about socket cookies guarantees
Since "92acdc58ab11 bpf, net: Rework cookie generator as per-cpu one"
socket cookies are not guaranteed to be non-decreasing. The
bpf_get_socket_cookie helper descriptions are currently specifying that
cookies are non-decreasing but we don't want users to rely on that.

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210210111406.785541-1-revest@chromium.org
2021-03-03 08:35:13 -08:00
Alexei Starovoitov
1015d47c2b bpf: Count the number of times recursion was prevented
Add per-program counter for number of times recursion prevention mechanism
was triggered and expose it via show_fdinfo and bpf_prog_info.
Teach bpftool to print it.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-7-alexei.starovoitov@gmail.com
2021-03-03 08:35:13 -08:00
Jonas Bonn
06ee116fb1 Revert "GTP: add support for flow based tunneling API"
This reverts commit 9ab7e76aefc97a9aa664accb59d6e8dc5e52514a.

This patch was committed without maintainer approval and despite a number
of unaddressed concerns from review.  There are several issues that
impede the acceptance of this patch and that make a reversion of this
particular instance of these changes the best way forward:

i)  the patch contains several logically separate changes that would be
better served as smaller patches (for review purposes)
ii) functionality like the handling of end markers has been introduced
without further explanation
iii) symmetry between the handling of GTPv0 and GTPv1 has been
unnecessarily broken
iv) the patchset produces 'broken' packets when extension headers are
included
v) there are no available userspace tools to allow for testing this
functionality
vi) there is an unaddressed Coverity report against the patch concering
memory leakage
vii) most importantly, the patch contains a large amount of superfluous
churn that impedes other ongoing work with this driver

This patch will be reworked into a series that aligns with other
ongoing work and facilitates review.

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
Acked-by: Harald Welte <laforge@gnumonks.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-03-03 08:35:13 -08:00
Andrii Nakryiko
e1a90f3768 travis-ci: switch from GCC8 to GCC10
GCC 8 builds started failing due to missing gcc-8 package. Let's switch to GCC
10 instead.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-02-22 14:00:53 -08:00
Andrii Nakryiko
f2a926ba46 Revert "vmtests: revert to Clang/LLVM 12 until Clang 13 regression is fixed"
Clang 13 now has the fix for the original regression, time to get back to
using nightly versions.

This reverts commit adaf538bca.
2021-02-22 11:36:12 -08:00
Matteo Croce
b0b5ec0006 fix typo in license name
LGPL was incorrectly named LPGL

Signed-off-by: Matteo Croce <mcroce@microsoft.com>
2021-02-22 11:35:49 -08:00
Andrii Nakryiko
adaf538bca vmtests: revert to Clang/LLVM 12 until Clang 13 regression is fixed
Clang 13 regressed BPF code generation causing some of BPF selftests to fail.
Until that is mitigated, stick to version 12.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-02-08 20:37:11 -08:00
Andrii Nakryiko
f35e87ddc4 vmtest: switch to Clang/LLVM 13
Clang 13 became the new nightly version, so switch to it. Also do vmlinux
compilation with a bit more parallelism. And account python-docutils
installation as part of selftests build.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-02-03 14:53:19 -08:00
Matteo Croce
767d82caab install: don't preserve file owner
'cp -p' preserve file ownership, this may leave files owned by the
current in user in /lib .

Signed-off-by: Matteo Croce <mcroce@microsoft.com>
2021-01-26 19:33:48 -08:00
Andrii Nakryiko
a199b85415 vmtest: blacklist atomics selftest for 5.5
5.5 doesn't support atomics.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-01-26 19:32:13 -08:00
Andrii Nakryiko
649f9dc746 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   3db1a3fa98808aa90f95ec3e0fa2fc7abf28f5c9
Checkpoint bpf-next commit: 86ce322d21eb032ed8fdd294d0fb095d2debb430
Baseline bpf commit:        1a3449c19407a28f7019a887cdf0d6ba2444751a
Checkpoint bpf commit:      78031381ae9c88f4f914d66154f4745122149c58

Andrii Nakryiko (5):
  libbpf: Add user-space variants of BPF_CORE_READ() family of macros
  libbpf: Add non-CO-RE variants of BPF_CORE_READ() macro family
  libbpf: Clarify kernel type use with USER variants of CORE reading
    macros
  libbpf: Support kernel module ksym externs
  libbpf: Allow loading empty BTFs

Björn Töpel (1):
  libbpf, xsk: Select AF_XDP BPF program based on kernel version

Brendan Jackman (4):
  bpf: Clarify return value of probe str helpers
  bpf: Rename BPF_XADD and prepare to encode other atomics in .imm
  bpf: Add BPF_FETCH field / create atomic_fetch_add instruction
  bpf: Add instructions for atomic_[cmp]xchg

Ian Rogers (1):
  bpf, libbpf: Avoid unused function warning on bpf_tail_call_static

Jiri Olsa (1):
  libbpf: Use string table index from index table if needed

Pravin B Shelar (1):
  GTP: add support for flow based tunneling API

 include/uapi/linux/bpf.h     |  20 +++--
 include/uapi/linux/if_link.h |   1 +
 src/bpf_core_read.h          | 169 +++++++++++++++++++++++++++--------
 src/bpf_helpers.h            |   2 +-
 src/btf.c                    |  17 ++--
 src/libbpf.c                 |  50 +++++++----
 src/xsk.c                    |  81 ++++++++++++++++-
 7 files changed, 265 insertions(+), 75 deletions(-)

--
2.24.1
2021-01-26 19:32:13 -08:00
Andrii Nakryiko
eb56f8fb12 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2021-01-26 19:32:13 -08:00
Björn Töpel
6e01a23cf6 libbpf, xsk: Select AF_XDP BPF program based on kernel version
Add detection for kernel version, and adapt the BPF program based on
kernel support. This way, users will get the best possible performance
from the BPF program.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Marek Majtyka  <alardam@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210122105351.11751-4-bjorn.topel@gmail.com
2021-01-26 19:32:13 -08:00
Jiri Olsa
16d7f413e2 libbpf: Use string table index from index table if needed
For very large ELF objects (with many sections), we could
get special value SHN_XINDEX (65535) for elf object's string
table index - e_shstrndx.

Call elf_getshdrstrndx to get the proper string table index,
instead of reading it directly from ELF header.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210121202203.9346-4-jolsa@kernel.org
2021-01-26 19:32:13 -08:00
Andrii Nakryiko
f037b92465 libbpf: Allow loading empty BTFs
Empty BTFs do come up (e.g., simple kernel modules with no new types and
strings, compared to the vmlinux BTF) and there is nothing technically wrong
with them. So remove unnecessary check preventing loading empty BTFs.

Fixes: d8123624506c ("libbpf: Fix BTF data layout checks and allow empty BTF")
Reported-by: Christopher William Snowhill <chris@kode54.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210110070341.1380086-2-andrii@kernel.org
2021-01-26 19:32:13 -08:00
Pravin B Shelar
4adbb7b2c7 GTP: add support for flow based tunneling API
Following patch add support for flow based tunneling API
to send and recv GTP tunnel packet over tunnel metadata API.
This would allow this device integration with OVS or eBPF using
flow based tunneling APIs.

Signed-off-by: Pravin B Shelar <pbshelar@fb.com>
Link: https://lore.kernel.org/r/20210110070021.26822-1-pbshelar@fb.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26 19:32:13 -08:00
Brendan Jackman
d2b784d370 bpf: Add instructions for atomic_[cmp]xchg
This adds two atomic opcodes, both of which include the BPF_FETCH
flag. XCHG without the BPF_FETCH flag would naturally encode
atomic_set. This is not supported because it would be of limited
value to userspace (it doesn't imply any barriers). CMPXCHG without
BPF_FETCH woulud be an atomic compare-and-write. We don't have such
an operation in the kernel so it isn't provided to BPF either.

There are two significant design decisions made for the CMPXCHG
instruction:

 - To solve the issue that this operation fundamentally has 3
   operands, but we only have two register fields. Therefore the
   operand we compare against (the kernel's API calls it 'old') is
   hard-coded to be R0. x86 has similar design (and A64 doesn't
   have this problem).

   A potential alternative might be to encode the other operand's
   register number in the immediate field.

 - The kernel's atomic_cmpxchg returns the old value, while the C11
   userspace APIs return a boolean indicating the comparison
   result. Which should BPF do? A64 returns the old value. x86 returns
   the old value in the hard-coded register (and also sets a
   flag). That means return-old-value is easier to JIT, so that's
   what we use.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-8-jackmanb@google.com
2021-01-26 19:32:13 -08:00
Brendan Jackman
ac86f42e4a bpf: Add BPF_FETCH field / create atomic_fetch_add instruction
The BPF_FETCH field can be set in bpf_insn.imm, for BPF_ATOMIC
instructions, in order to have the previous value of the
atomically-modified memory location loaded into the src register
after an atomic op is carried out.

Suggested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-7-jackmanb@google.com
2021-01-26 19:32:13 -08:00
Brendan Jackman
03fbe22a59 bpf: Rename BPF_XADD and prepare to encode other atomics in .imm
A subsequent patch will add additional atomic operations. These new
operations will use the same opcode field as the existing XADD, with
the immediate discriminating different operations.

In preparation, rename the instruction mode BPF_ATOMIC and start
calling the zero immediate BPF_ADD.

This is possible (doesn't break existing valid BPF progs) because the
immediate field is currently reserved MBZ and BPF_ADD is zero.

All uses are removed from the tree but the BPF_XADD definition is
kept around to avoid breaking builds for people including kernel
headers.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Björn Töpel <bjorn.topel@gmail.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-5-jackmanb@google.com
2021-01-26 19:32:13 -08:00
Ian Rogers
f15814c93a bpf, libbpf: Avoid unused function warning on bpf_tail_call_static
Add inline to __always_inline making it match the linux/compiler.h.
Adding this avoids an unused function warning on bpf_tail_call_static
when compining with -Wall.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210113223609.3358812-1-irogers@google.com
2021-01-26 19:32:13 -08:00
Andrii Nakryiko
0db7da9a4a libbpf: Support kernel module ksym externs
Add support for searching for ksym externs not just in vmlinux BTF, but across
all module BTFs, similarly to how it's done for CO-RE relocations. Kernels
that expose module BTFs through sysfs are assumed to support new ldimm64
instruction extension with BTF FD provided in insn[1].imm field, so no extra
feature detection is performed.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20210112075520.4103414-7-andrii@kernel.org
2021-01-26 19:32:13 -08:00
Brendan Jackman
0de8b9a906 bpf: Clarify return value of probe str helpers
When the buffer is too small to contain the input string, these helpers
return the length of the buffer, not the length of the original string.
This tries to make the docs totally clear about that, since "the length
of the [copied ]string" could also refer to the length of the input.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210112123422.2011234-1-jackmanb@google.com
2021-01-26 19:32:13 -08:00
Andrii Nakryiko
d52e5f5f88 libbpf: Clarify kernel type use with USER variants of CORE reading macros
Add comments clarifying that USER variants of CO-RE reading macro are still
only going to work with kernel types, defined in kernel or kernel module BTF.
This should help preventing invalid use of those macro to read user-defined
types (which doesn't work with CO-RE).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210108194408.3468860-1-andrii@kernel.org
2021-01-26 19:32:13 -08:00
Andrii Nakryiko
a26ae1b254 libbpf: Add non-CO-RE variants of BPF_CORE_READ() macro family
BPF_CORE_READ(), in addition to handling CO-RE relocations, also allows much
nicer way to read data structures with nested pointers. Instead of writing
a sequence of bpf_probe_read() calls to follow links, one can just write
BPF_CORE_READ(a, b, c, d) to effectively do a->b->c->d read. This is a welcome
ability when porting BCC code, which (in most cases) allows exactly the
intuitive a->b->c->d variant.

This patch adds non-CO-RE variants of BPF_CORE_READ() family of macros for
cases where CO-RE is not supported (e.g., old kernels). In such cases, the
property of shortening a sequence of bpf_probe_read()s to a simple
BPF_PROBE_READ(a, b, c, d) invocation is still desirable, especially when
porting BCC code to libbpf. Yet, no CO-RE relocation is going to be emitted.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20201218235614.2284956-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-26 19:32:13 -08:00
Andrii Nakryiko
c1d4bbb8c7 libbpf: Add user-space variants of BPF_CORE_READ() family of macros
Add BPF_CORE_READ_USER(), BPF_CORE_READ_USER_STR() and their _INTO()
variations to allow reading CO-RE-relocatable kernel data structures from the
user-space. One of such cases is reading input arguments of syscalls, while
reaping the benefits of CO-RE relocations w.r.t. handling 32/64 bit
conversions and handling missing/new fields in UAPI data structs.

Suggested-by: Gilad Reti <gilad.reti@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20201218235614.2284956-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-26 19:32:13 -08:00
Luca Boccassi
7c2a94f4f8 README: mention that Debian 11 ships with BTF support 2021-01-22 12:27:35 -08:00
Luca Boccassi
051a4009f9 pkgconfig: use literal ${prefix} to allow override
Various workflows (--define-prefix, --define-variable=prefix) require variables in
the pc file to use a literal  so that it is overridden. Change the Makefile
so that, by default and unless  is specified, it is set as expected.

Signed-off-by: Luca Boccassi <bluca@debian.org>
2021-01-03 10:41:31 -08:00
Luca Boccassi
a3a5e9688a README: point to Debian source package rather than binary
For consistency with other links

Signed-off-by: Luca Boccassi <bluca@debian.org>
2021-01-03 10:41:31 -08:00
Luca Boccassi
5569404346 README: note that Debian 11 (will) ship LLVM 11
Signed-off-by: Luca Boccassi <bluca@debian.org>
2021-01-03 10:41:31 -08:00
Andrii Nakryiko
e05f9be4f4 vmtests: temporarily disable test_maps
Disable test_maps test until it's debugged why they started failing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2020-12-20 17:00:58 -08:00
Andrii Nakryiko
4d3535ff7b vmtests: test_maps needs more memory, so bump to 4G
Memory is cheap.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2020-12-20 17:00:58 -08:00
Andrii Nakryiko
c66a9770e3 vmtests: fix up bpf_testmod.ko generation for 5.5 and 4.9
Selftests makefile deletes local bpf_testmod.ko, so that invalidates current
approach of faking bpf_testmod.ko "generation". Instead, generate a fake
Makefile that will create an empty bpf_testmod/bpf_testmod.ko.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2020-12-20 17:00:58 -08:00
Andrii Nakryiko
8262be6034 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   5c667dca71095abec90420eb09503f35f66c9585
Checkpoint bpf-next commit: 3db1a3fa98808aa90f95ec3e0fa2fc7abf28f5c9
Baseline bpf commit:        12c8a8ca117f3d734babc3fba131fdaa329d2163
Checkpoint bpf commit:      1a3449c19407a28f7019a887cdf0d6ba2444751a

Andrii Nakryiko (2):
  bpf: Fix enum names for bpf_this_cpu_ptr() and bpf_per_cpu_ptr()
    helpers
  libbpf: Support modules in bpf_program__set_attach_target() API

Brendan Jackman (1):
  libbpf: Expose libbpf ring_buffer epoll_fd

Florent Revest (1):
  bpf: Add a bpf_sock_from_file helper

 include/uapi/linux/bpf.h | 13 ++++++--
 src/libbpf.c             | 64 +++++++++++++++++++++++++---------------
 src/libbpf.h             |  1 +
 src/libbpf.map           |  1 +
 src/ringbuf.c            |  6 ++++
 5 files changed, 59 insertions(+), 26 deletions(-)

--
2.24.1
2020-12-20 17:00:58 -08:00
Andrii Nakryiko
182e9dde0d sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2020-12-20 17:00:58 -08:00
Brendan Jackman
30e2c16571 libbpf: Expose libbpf ring_buffer epoll_fd
This provides a convenient perf ringbuf -> libbpf ringbuf migration
path for users of external polling systems. It is analogous to
perf_buffer__epoll_fd.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201214113812.305274-1-jackmanb@google.com
2020-12-20 17:00:58 -08:00
Andrii Nakryiko
ebcae62e7e libbpf: Support modules in bpf_program__set_attach_target() API
Support finding kernel targets in kernel modules when using
bpf_program__set_attach_target() API. This brings it up to par with what
libbpf supports when doing declarative SEC()-based target determination.

Some minor internal refactoring was needed to make sure vmlinux BTF can be
loaded before bpf_object's load phase.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20201211215825.3646154-2-andrii@kernel.org
2020-12-20 17:00:58 -08:00
Florent Revest
252ad1f3eb bpf: Add a bpf_sock_from_file helper
While eBPF programs can check whether a file is a socket by file->f_op
== &socket_file_ops, they cannot convert the void private_data pointer
to a struct socket BTF pointer. In order to do this a new helper
wrapping sock_from_file is added.

This is useful to tracing programs but also other program types
inheriting this set of helpers such as iterators or LSM programs.

Signed-off-by: Florent Revest <revest@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: KP Singh <kpsingh@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201204113609.1850150-2-revest@google.com
2020-12-20 17:00:58 -08:00
Andrii Nakryiko
3e68c60659 bpf: Fix enum names for bpf_this_cpu_ptr() and bpf_per_cpu_ptr() helpers
Remove bpf_ prefix, which causes these helpers to be reported in verifier
dump as bpf_bpf_this_cpu_ptr() and bpf_bpf_per_cpu_ptr(), respectively. Lets
fix it as long as it is still possible before UAPI freezes on these helpers.

Fixes: eaa6bcb71ef6 ("bpf: Introduce bpf_per_cpu_ptr()")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-20 17:00:58 -08:00
Andrii Nakryiko
42baefba71 vmtests: update blacklist for 5.5
Blacklist selftests relying on newer kernel's features.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2020-12-04 20:04:52 -08:00
Andrii Nakryiko
46ecf7aef3 vmtest: omit building bpf_testmod.ko on non-latest kernels
Non-latest kernel versions don't build kernel from sources, so module buliding
fails, despite using `make prepare`. For now, just make sure no module is
built by overwriting bpf_testmod/Makefile.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2020-12-04 20:04:52 -08:00
Andrii Nakryiko
2981bb8d26 vmtests: update vmlinux.h to latest version
Update vmlinux.h to get some of BPF UAPI constants needed for the compilation
of new selftests.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2020-12-04 20:04:52 -08:00
Andrii Nakryiko
2042df2fed sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   c6bde958a62b8ca5ee8d2c1fe429aec4ad54efad
Checkpoint bpf-next commit: 5c667dca71095abec90420eb09503f35f66c9585
Baseline bpf commit:        d3bec0138bfbe58606fc1d6f57a4cdc1a20218db
Checkpoint bpf commit:      12c8a8ca117f3d734babc3fba131fdaa329d2163

Alan Maguire (1):
  libbpf: bpf__find_by_name[_kind] should use btf__get_nr_types()

Andrei Matei (1):
  libbpf: Fail early when loading programs with unspecified type

Andrii Nakryiko (11):
  bpf: Assign ID to vmlinux BTF and return extra info for BTF in
    GET_OBJ_INFO
  libbpf: Don't attempt to load unused subprog as an entry-point BPF
    program
  libbpf: Add base BTF accessor
  libbpf: Add internal helper to load BTF data by FD
  libbpf: Refactor CO-RE relocs to not assume a single BTF object
  libbpf: Add kernel module BTF support for CO-RE relocations
  bpf: Allow to specify kernel module BTFs when attaching BPF programs
  libbpf: Factor out low-level BPF program loading helper
  libbpf: Support attachment of BPF tracing programs to kernel modules
  libbpf: Use memcpy instead of strncpy to please GCC
  libbpf: Fix ring_buffer__poll() to return number of consumed samples

Dmitrii Banshchikov (1):
  bpf: Add bpf_ktime_get_coarse_ns helper

KP Singh (5):
  bpf: Implement task local storage
  libbpf: Add support for task local storage
  bpf: Implement get_current_task_btf and RET_PTR_TO_BTF_ID
  bpf: Add bpf_bprm_opts_set helper
  bpf: Add a BPF helper for getting the IMA hash of an inode

Li RongQing (1):
  libbpf: Add support for canceling cached_cons advance

Magnus Karlsson (1):
  libbpf: Replace size_t with __u32 in xsk interfaces

Mariusz Dudek (1):
  libbpf: Separate XDP program load with xsk socket creation

Stanislav Fomichev (1):
  libbpf: Cap retries in sys_bpf_prog_load

Thomas Karlsson (1):
  macvlan: Support for high multicast packet rate

Toke Høiland-Jørgensen (1):
  libbpf: Sanitise map names before pinning

 include/uapi/linux/bpf.h     |  96 +++++-
 include/uapi/linux/if_link.h |   2 +
 src/bpf.c                    | 104 +++++--
 src/btf.c                    |  74 +++--
 src/btf.h                    |   1 +
 src/libbpf.c                 | 550 +++++++++++++++++++++++++++--------
 src/libbpf.map               |   3 +
 src/libbpf_internal.h        |  31 ++
 src/libbpf_probes.c          |   1 +
 src/ringbuf.c                |   2 +-
 src/xsk.c                    |  92 +++++-
 src/xsk.h                    |  22 +-
 12 files changed, 771 insertions(+), 207 deletions(-)

--
2.24.1
2020-12-04 20:04:52 -08:00
Andrii Nakryiko
8c2c4c3451 sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2020-12-04 20:04:52 -08:00
Andrii Nakryiko
21ae7bb113 libbpf: Fix ring_buffer__poll() to return number of consumed samples
Fix ring_buffer__poll() to return the number of non-discarded records
consumed, just like its documentation states. It's also consistent with
ring_buffer__consume() return. Fix up selftests with wrong expected results.

Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
Fixes: cb1c9ddd5525 ("selftests/bpf: Add BPF ringbuf selftests")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201130223336.904192-1-andrii@kernel.org
2020-12-04 20:04:52 -08:00
Andrii Nakryiko
b2a34784b2 libbpf: Use memcpy instead of strncpy to please GCC
Some versions of GCC are really nit-picky about strncpy() use. Use memcpy(),
as they are pretty much equivalent for the case of fixed length strings.

Fixes: e459f49b4394 ("libbpf: Separate XDP program load with xsk socket creation")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201203235440.2302137-1-andrii@kernel.org
2020-12-04 20:04:52 -08:00
Andrii Nakryiko
d95b12da56 libbpf: Support attachment of BPF tracing programs to kernel modules
Teach libbpf to search for BTF types in kernel modules for tracing BPF
programs. This allows attachment of raw_tp/fentry/fexit/fmod_ret/etc BPF
program types to tracepoints and functions in kernel modules.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201203204634.1325171-13-andrii@kernel.org
2020-12-04 20:04:52 -08:00
Andrii Nakryiko
a1fd6dab54 libbpf: Factor out low-level BPF program loading helper
Refactor low-level API for BPF program loading to not rely on public API
types. This allows painless extension without constant efforts to cleverly not
break backwards compatibility.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201203204634.1325171-12-andrii@kernel.org
2020-12-04 20:04:52 -08:00
Andrii Nakryiko
fde1be5a9c bpf: Allow to specify kernel module BTFs when attaching BPF programs
Add ability for user-space programs to specify non-vmlinux BTF when attaching
BTF-powered BPF programs: raw_tp, fentry/fexit/fmod_ret, LSM, etc. For this,
attach_prog_fd (now with the alias name attach_btf_obj_fd) should specify FD
of a module or vmlinux BTF object. For backwards compatibility reasons,
0 denotes vmlinux BTF. Only kernel BTF (vmlinux or module) can be specified.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201203204634.1325171-11-andrii@kernel.org
2020-12-04 20:04:52 -08:00
Andrii Nakryiko
6b08519a69 libbpf: Add kernel module BTF support for CO-RE relocations
Teach libbpf to search for candidate types for CO-RE relocations across kernel
modules BTFs, in addition to vmlinux BTF. If at least one candidate type is
found in vmlinux BTF, kernel module BTFs are not iterated. If vmlinux BTF has
no matching candidates, then find all kernel module BTFs and search for all
matching candidates across all of them.

Kernel's support for module BTFs are inferred from the support for BTF name
pointer in BPF UAPI.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201203204634.1325171-6-andrii@kernel.org
2020-12-04 20:04:52 -08:00
Andrii Nakryiko
aff8028b6e libbpf: Refactor CO-RE relocs to not assume a single BTF object
Refactor CO-RE relocation candidate search to not expect a single BTF, rather
return all candidate types with their corresponding BTF objects. This will
allow to extend CO-RE relocations to accommodate kernel module BTFs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201203204634.1325171-5-andrii@kernel.org
2020-12-04 20:04:52 -08:00
Andrii Nakryiko
10e321f100 libbpf: Add internal helper to load BTF data by FD
Add a btf_get_from_fd() helper, which constructs struct btf from in-kernel BTF
data by FD. This is used for loading module BTFs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201203204634.1325171-4-andrii@kernel.org
2020-12-04 20:04:52 -08:00
Stanislav Fomichev
8051a539d8 libbpf: Cap retries in sys_bpf_prog_load
I've seen a situation, where a process that's under pprof constantly
generates SIGPROF which prevents program loading indefinitely.
The right thing to do probably is to disable signals in the upper
layers while loading, but it still would be nice to get some error from
libbpf instead of an endless loop.

Let's add some small retry limit to the program loading:
try loading the program 5 (arbitrary) times and give up.

v2:
* 10 -> 5 retires (Andrii Nakryiko)

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201202231332.3923644-1-sdf@google.com
2020-12-04 20:04:52 -08:00
Toke Høiland-Jørgensen
691c22dc0c libbpf: Sanitise map names before pinning
When we added sanitising of map names before loading programs to libbpf, we
still allowed periods in the name. While the kernel will accept these for
the map names themselves, they are not allowed in file names when pinning
maps. This means that bpf_object__pin_maps() will fail if called on an
object that contains internal maps (such as sections .rodata).

Fix this by replacing periods with underscores when constructing map pin
paths. This only affects the paths generated by libbpf when
bpf_object__pin_maps() is called with a path argument. Any pin paths set
by bpf_map__set_pin_path() are unaffected, and it will still be up to the
caller to avoid invalid characters in those.

Fixes: 113e6b7e15e2 ("libbpf: Sanitise internal map names so they are not rejected by the kernel")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201203093306.107676-1-toke@redhat.com
2020-12-04 20:04:52 -08:00
Andrei Matei
5fe9c1217a libbpf: Fail early when loading programs with unspecified type
Before this patch, a program with unspecified type
(BPF_PROG_TYPE_UNSPEC) would be passed to the BPF syscall, only to have
the kernel reject it with an opaque invalid argument error. This patch
makes libbpf reject such programs with a nicer error message - in
particular libbpf now tries to diagnose bad ELF section names at both
open time and load time.

Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201203043410.59699-1-andreimatei1@gmail.com
2020-12-04 20:04:52 -08:00
Mariusz Dudek
78c76a1015 libbpf: Separate XDP program load with xsk socket creation
Add support for separation of eBPF program load and xsk socket
creation.

This is needed for use-case when you want to privide as little
privileges as possible to the data plane application that will
handle xsk socket creation and incoming traffic.

With this patch the data entity container can be run with only
CAP_NET_RAW capability to fulfill its purpose of creating xsk
socket and handling packages. In case your umem is larger or
equal process limit for MEMLOCK you need either increase the
limit or CAP_IPC_LOCK capability.

To resolve privileges issue two APIs are introduced:

- xsk_setup_xdp_prog - loads the built in XDP program. It can
also return xsks_map_fd which is needed by unprivileged process
to update xsks_map with AF_XDP socket "fd"

- xsk_socket__update_xskmap - inserts an AF_XDP socket into an xskmap
for a particular xsk_socket

Signed-off-by: Mariusz Dudek <mariuszx.dudek@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20201203090546.11976-2-mariuszx.dudek@intel.com
2020-12-04 20:04:52 -08:00
Andrii Nakryiko
a741bc6479 libbpf: Add base BTF accessor
Add ability to get base BTF. It can be also used to check if BTF is split BTF.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201202065244.530571-3-andrii@kernel.org
2020-12-04 20:04:52 -08:00
Magnus Karlsson
65e4be6f5d libbpf: Replace size_t with __u32 in xsk interfaces
Replace size_t with __u32 in the xsk interfaces that contain this.
There is no reason to have size_t since the internal variable that
is manipulated is a __u32. The following APIs are affected:

__u32 xsk_ring_prod__reserve(struct xsk_ring_prod *prod, __u32 nb, __u32 *idx)
void xsk_ring_prod__submit(struct xsk_ring_prod *prod, __u32 nb)
__u32 xsk_ring_cons__peek(struct xsk_ring_cons *cons, __u32 nb, __u32 *idx)
void xsk_ring_cons__cancel(struct xsk_ring_cons *cons, __u32 nb)
void xsk_ring_cons__release(struct xsk_ring_cons *cons, __u32 nb)

The "nb" variable and the return values have been changed from size_t
to __u32.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1606383455-8243-1-git-send-email-magnus.karlsson@gmail.com
2020-12-04 20:04:52 -08:00
KP Singh
3a2739aa8a bpf: Add a BPF helper for getting the IMA hash of an inode
Provide a wrapper function to get the IMA hash of an inode. This helper
is useful in fingerprinting files (e.g executables on execution) and
using these fingerprints in detections like an executable unlinking
itself.

Since the ima_inode_hash can sleep, it's only allowed for sleepable
LSM hooks.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201124151210.1081188-3-kpsingh@chromium.org
2020-12-04 20:04:52 -08:00
Li RongQing
dd2369d2a8 libbpf: Add support for canceling cached_cons advance
Add a new function for returning descriptors the user received
after an xsk_ring_cons__peek call. After the application has
gotten a number of descriptors from a ring, it might not be able
to or want to process them all for various reasons. Therefore,
it would be useful to have an interface for returning or
cancelling a number of them so that they are returned to the ring.

This patch adds a new function called xsk_ring_cons__cancel that
performs this operation on nb descriptors counted from the end of
the batch of descriptors that was received through the peek call.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ Magnus Karlsson: rewrote changelog ]
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/1606202474-8119-1-git-send-email-lirongqing@baidu.com
2020-12-04 20:04:52 -08:00
Dmitrii Banshchikov
39f5b2e75e bpf: Add bpf_ktime_get_coarse_ns helper
The helper uses CLOCK_MONOTONIC_COARSE source of time that is less
accurate but more performant.

We have a BPF CGROUP_SKB firewall that supports event logging through
bpf_perf_event_output(). Each event has a timestamp and currently we use
bpf_ktime_get_ns() for it. Use of bpf_ktime_get_coarse_ns() saves ~15-20
ns in time required for event logging.

bpf_ktime_get_ns():
EgressLogByRemoteEndpoint                              113.82ns    8.79M

bpf_ktime_get_coarse_ns():
EgressLogByRemoteEndpoint                               95.40ns   10.48M

Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201117184549.257280-1-me@ubique.spb.ru
2020-12-04 20:04:52 -08:00
KP Singh
6969a44914 bpf: Add bpf_bprm_opts_set helper
The helper allows modification of certain bits on the linux_binprm
struct starting with the secureexec bit which can be updated using the
BPF_F_BPRM_SECUREEXEC flag.

secureexec can be set by the LSM for privilege gaining executions to set
the AT_SECURE auxv for glibc.  When set, the dynamic linker disables the
use of certain environment variables (like LD_PRELOAD).

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201117232929.2156341-1-kpsingh@chromium.org
2020-12-04 20:04:52 -08:00
Alan Maguire
de2edae80d libbpf: bpf__find_by_name[_kind] should use btf__get_nr_types()
When operating on split BTF, btf__find_by_name[_kind] will not
iterate over all types since they use btf->nr_types to show
the number of types to iterate over. For split BTF this is
the number of types _on top of base BTF_, so it will
underestimate the number of types to iterate over, especially
for vmlinux + module BTF, where the latter is much smaller.

Use btf__get_nr_types() instead.

Fixes: ba451366bf44 ("libbpf: Implement basic split BTF support")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1605437195-2175-1-git-send-email-alan.maguire@oracle.com
2020-12-04 20:04:52 -08:00
Thomas Karlsson
2ea4ba9c96 macvlan: Support for high multicast packet rate
Background:
Broadcast and multicast packages are enqueued for later processing.
This queue was previously hardcoded to 1000.

This proved insufficient for handling very high packet rates.
This resulted in packet drops for multicast.
While at the same time unicast worked fine.

The change:
This patch make the queue length adjustable to accommodate
for environments with very high multicast packet rate.
But still keeps the default value of 1000 unless specified.

The queue length is specified as a request per macvlan
using the IFLA_MACVLAN_BC_QUEUE_LEN parameter.

The actual used queue length will then be the maximum of
any macvlan connected to the same port. The actual used
queue length for the port can be retrieved (read only)
by the IFLA_MACVLAN_BC_QUEUE_LEN_USED parameter for verification.

This will be followed up by a patch to iproute2
in order to adjust the parameter from userspace.

Signed-off-by: Thomas Karlsson <thomas.karlsson@paneda.se>
Link: https://lore.kernel.org/r/dd4673b2-7eab-edda-6815-85c67ce87f63@paneda.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04 20:04:52 -08:00
Andrii Nakryiko
2dd5965052 libbpf: Don't attempt to load unused subprog as an entry-point BPF program
If BPF code contains unused BPF subprogram and there are no other subprogram
calls (which can realistically happen in real-world applications given
sufficiently smart Clang code optimizations), libbpf will erroneously assume
that subprograms are entry-point programs and will attempt to load them with
UNSPEC program type.

Fix by not relying on subcall instructions and rather detect it based on the
structure of BPF object's sections.

Fixes: 9a94f277c4fb ("tools: libbpf: restore the ability to load programs from .text section")
Reported-by: Dmitrii Banshchikov <dbanschikov@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201107000251.256821-1-andrii@kernel.org
2020-12-04 20:04:52 -08:00
Andrii Nakryiko
ef8820fea8 bpf: Assign ID to vmlinux BTF and return extra info for BTF in GET_OBJ_INFO
Allocate ID for vmlinux BTF. This makes it visible when iterating over all BTF
objects in the system. To allow distinguishing vmlinux BTF (and later kernel
module BTF) from user-provided BTFs, expose extra kernel_btf flag, as well as
BTF name ("vmlinux" for vmlinux BTF, will equal to module's name for module
BTF).  We might want to later allow specifying BTF name for user-provided BTFs
as well, if that makes sense. But currently this is reserved only for
in-kernel BTFs.

Having in-kernel BTFs exposed IDs will allow to extend BPF APIs that require
in-kernel BTF type with ability to specify BTF types from kernel modules, not
just vmlinux BTF. This will be implemented in a follow up patch set for
fentry/fexit/fmod_ret/lsm/etc.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201110011932.3201430-3-andrii@kernel.org
2020-12-04 20:04:52 -08:00
KP Singh
eae38a781c bpf: Implement get_current_task_btf and RET_PTR_TO_BTF_ID
The currently available bpf_get_current_task returns an unsigned integer
which can be used along with BPF_CORE_READ to read data from
the task_struct but still cannot be used as an input argument to a
helper that accepts an ARG_PTR_TO_BTF_ID of type task_struct.

In order to implement this helper a new return type, RET_PTR_TO_BTF_ID,
is added. This is similar to RET_PTR_TO_BTF_ID_OR_NULL but does not
require checking the nullness of returned pointer.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201106103747.2780972-6-kpsingh@chromium.org
2020-12-04 20:04:52 -08:00
KP Singh
83c2c20acb libbpf: Add support for task local storage
Updates the bpf_probe_map_type API to also support
BPF_MAP_TYPE_TASK_STORAGE similar to other local storage maps.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201106103747.2780972-4-kpsingh@chromium.org
2020-12-04 20:04:52 -08:00
KP Singh
00ae5bac8f bpf: Implement task local storage
Similar to bpf_local_storage for sockets and inodes add local storage
for task_struct.

The life-cycle of storage is managed with the life-cycle of the
task_struct.  i.e. the storage is destroyed along with the owning task
with a callback to the bpf_task_storage_free from the task_free LSM
hook.

The BPF LSM allocates an __rcu pointer to the bpf_local_storage in
the security blob which are now stackable and can co-exist with other
LSMs.

The userspace map operations can be done by using a pid fd as a key
passed to the lookup, update and delete operations.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201106103747.2780972-3-kpsingh@chromium.org
2020-12-04 20:04:52 -08:00
Andrii Nakryiko
f99c252cbc vmtest: update Kconfig to accommodate IMA test config
test_progs's IMA selftests requires extra Kconfig values, so update
latest.config to accommodate those.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2020-12-03 12:27:49 -08:00
Andrii Nakryiko
5ae2a2621c readme: move gory sync details down and add libbpf-bootstrap references
Move gory details about libbpf mirror and sync into a
separate section at the bottom of README.

Also add references to libbpf-bootstrap and blog about it,
as well as libbpf-tools reference.
2020-11-29 13:34:03 -08:00
Andrii Nakryiko
5af3d86b5a vmtests: blacklist two more tests on 5.5
tcpbpf_user uses cgroup bpf_link, not available in 5.5. hash_large_key is
testing a more permissive verifier check, implemented in 5.11. So blacklist
both.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2020-11-05 21:20:45 -08:00
Andrii Nakryiko
c55abf0752 sync: latest libbpf changes from kernel
Syncing latest libbpf commits from kernel repository.
Baseline bpf-next commit:   3cb12d27ff655e57e8efe3486dca2a22f4e30578
Checkpoint bpf-next commit: c6bde958a62b8ca5ee8d2c1fe429aec4ad54efad
Baseline bpf commit:        c66dca98a24cb5f3493dd08d40bcfa94a220fa92
Checkpoint bpf commit:      d3bec0138bfbe58606fc1d6f57a4cdc1a20218db

Andrii Nakryiko (6):
  libbpf: Factor out common operations in BTF writing APIs
  libbpf: Unify and speed up BTF string deduplication
  libbpf: Implement basic split BTF support
  libbpf: Fix BTF data layout checks and allow empty BTF
  libbpf: Support BTF dedup of split BTFs
  libbpf: Accomodate DWARF/compiler bug with duplicated identical arrays

Ian Rogers (1):
  libbpf, hashmap: Fix undefined behavior in hash_bits

Magnus Karlsson (2):
  libbpf: Fix null dereference in xsk_socket__delete
  libbpf: Fix possible use after free in xsk_socket__delete

 src/btf.c      | 807 +++++++++++++++++++++++++++++--------------------
 src/btf.h      |   8 +
 src/hashmap.h  |  15 +-
 src/libbpf.map |   9 +
 src/xsk.c      |   9 +-
 5 files changed, 504 insertions(+), 344 deletions(-)

--
2.24.1
2020-11-05 21:20:45 -08:00
Andrii Nakryiko
e30f758aab sync: auto-generate latest BPF helpers
Latest changes to BPF helper definitions.
2020-11-05 21:20:45 -08:00
Magnus Karlsson
8caff995c7 libbpf: Fix possible use after free in xsk_socket__delete
Fix a possible use after free in xsk_socket__delete that will happen
if xsk_put_ctx() frees the ctx. To fix, save the umem reference taken
from the context and just use that instead.

Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1604396490-12129-3-git-send-email-magnus.karlsson@gmail.com
2020-11-05 21:20:45 -08:00
Magnus Karlsson
539aa6bea5 libbpf: Fix null dereference in xsk_socket__delete
Fix a possible null pointer dereference in xsk_socket__delete that
will occur if a null pointer is fed into the function.

Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices")
Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1604396490-12129-2-git-send-email-magnus.karlsson@gmail.com
2020-11-05 21:20:45 -08:00
Ian Rogers
224db2db07 libbpf, hashmap: Fix undefined behavior in hash_bits
If bits is 0, the case when the map is empty, then the >> is the size of
the register which is undefined behavior - on x86 it is the same as a
shift by 0.

Fix by handling the 0 case explicitly and guarding calls to hash_bits for
empty maps in hashmap__for_each_key_entry and hashmap__for_each_entry_safe.

Fixes: e3b924224028 ("libbpf: add resizable non-thread safe internal hashmap")
Suggested-by: Andrii Nakryiko <andriin@fb.com>,
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201029223707.494059-1-irogers@google.com
2020-11-05 21:20:45 -08:00
Andrii Nakryiko
e6725d2467 libbpf: Accomodate DWARF/compiler bug with duplicated identical arrays
In some cases compiler seems to generate distinct DWARF types for identical
arrays within the same CU. That seems like a bug, but it's already out there
and breaks type graph equivalence checks, so accommodate it anyway by checking
for identical arrays, regardless of their type ID.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-10-andrii@kernel.org
2020-11-05 21:20:45 -08:00
Andrii Nakryiko
658ac1ec19 libbpf: Support BTF dedup of split BTFs
Add support for deduplication split BTFs. When deduplicating split BTF, base
BTF is considered to be immutable and can't be modified or adjusted. 99% of
BTF deduplication logic is left intact (module some type numbering adjustments).
There are only two differences.

First, each type in base BTF gets hashed (expect VAR and DATASEC, of course,
those are always considered to be self-canonical instances) and added into
a table of canonical table candidates. Hashing is a shallow, fast operation,
so mostly eliminates the overhead of having entire base BTF to be a part of
BTF dedup.

Second difference is very critical and subtle. While deduplicating split BTF
types, it is possible to discover that one of immutable base BTF BTF_KIND_FWD
types can and should be resolved to a full STRUCT/UNION type from the split
BTF part.  This is, obviously, can't happen because we can't modify the base
BTF types anymore. So because of that, any type in split BTF that directly or
indirectly references that newly-to-be-resolved FWD type can't be considered
to be equivalent to the corresponding canonical types in base BTF, because
that would result in a loss of type resolution information. So in such case,
split BTF types will be deduplicated separately and will cause some
duplication of type information, which is unavoidable.

With those two changes, the rest of the algorithm manages to deduplicate split
BTF correctly, pointing all the duplicates to their canonical counter-parts in
base BTF, but also is deduplicating whatever unique types are present in split
BTF on their own.

Also, theoretically, split BTF after deduplication could end up with either
empty type section or empty string section. This is handled by libbpf
correctly in one of previous patches in the series.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-9-andrii@kernel.org
2020-11-05 21:20:45 -08:00
Andrii Nakryiko
dd36215834 libbpf: Fix BTF data layout checks and allow empty BTF
Make data section layout checks stricter, disallowing overlap of types and
strings data.

Additionally, allow BTFs with no type data. There is nothing inherently wrong
with having BTF with no types (put potentially with some strings). This could
be a situation with kernel module BTFs, if module doesn't introduce any new
type information.

Also fix invalid offset alignment check for btf->hdr->type_off.

Fixes: 8a138aed4a80 ("bpf: btf: Add BTF support to libbpf")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-8-andrii@kernel.org
2020-11-05 21:20:45 -08:00
Andrii Nakryiko
2811d54f8b libbpf: Implement basic split BTF support
Support split BTF operation, in which one BTF (base BTF) provides basic set of
types and strings, while another one (split BTF) builds on top of base's types
and strings and adds its own new types and strings. From API standpoint, the
fact that the split BTF is built on top of the base BTF is transparent.

Type numeration is transparent. If the base BTF had last type ID #N, then all
types in the split BTF start at type ID N+1. Any type in split BTF can
reference base BTF types, but not vice versa. Programmatically construction of
a split BTF on top of a base BTF is supported: one can create an empty split
BTF with btf__new_empty_split() and pass base BTF as an input, or pass raw
binary data to btf__new_split(), or use btf__parse_xxx_split() variants to get
initial set of split types/strings from the ELF file with .BTF section.

String offsets are similarly transparent and are a logical continuation of
base BTF's strings. When building BTF programmatically and adding a new string
(explicitly with btf__add_str() or implicitly through appending new
types/members), string-to-be-added would first be looked up from the base
BTF's string section and re-used if it's there. If not, it will be looked up
and/or added to the split BTF string section. Similarly to type IDs, types in
split BTF can refer to strings from base BTF absolutely transparently (but not
vice versa, of course, because base BTF doesn't "know" about existence of
split BTF).

Internal type index is slightly adjusted to be zero-indexed, ignoring a fake
[0] VOID type. This allows to handle split/base BTF type lookups transparently
by using btf->start_id type ID offset, which is always 1 for base/non-split
BTF and equals btf__get_nr_types(base_btf) + 1 for the split BTF.

BTF deduplication is not yet supported for split BTF and support for it will
be added in separate patch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-5-andrii@kernel.org
2020-11-05 21:20:45 -08:00
Andrii Nakryiko
be2dc73ee2 libbpf: Unify and speed up BTF string deduplication
Revamp BTF dedup's string deduplication to match the approach of writable BTF
string management. This allows to transfer deduplicated strings index back to
BTF object after deduplication without expensive extra memory copying and hash
map re-construction. It also simplifies the code and speeds it up, because
hashmap-based string deduplication is faster than sort + unique approach.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-4-andrii@kernel.org
2020-11-05 21:20:45 -08:00
Andrii Nakryiko
4953827790 libbpf: Factor out common operations in BTF writing APIs
Factor out commiting of appended type data. Also extract fetching the very
last type in the BTF (to append members to). These two operations are common
across many APIs and will be easier to refactor with split BTF, if they are
extracted into a single place.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-2-andrii@kernel.org
2020-11-05 21:20:45 -08:00
44 changed files with 87168 additions and 24628 deletions

View File

@@ -1,6 +1,6 @@
sudo: required
language: bash
dist: bionic
dist: focal
services:
- docker
@@ -36,7 +36,12 @@ stages:
jobs:
include:
- stage: Builds & Tests
name: Kernel LATEST + selftests
name: Kernel 5.5.0 + selftests
language: bash
env: KERNEL=5.5.0
script: $CI_ROOT/vmtest/run_vmtest.sh || travis_terminate 1
- name: Kernel LATEST + selftests
language: bash
env: KERNEL=LATEST
script: $CI_ROOT/vmtest/run_vmtest.sh || travis_terminate 1
@@ -46,11 +51,6 @@ jobs:
env: KERNEL=4.9.0
script: $CI_ROOT/vmtest/run_vmtest.sh || travis_terminate 1
- name: Kernel 5.5.0 + selftests
language: bash
env: KERNEL=5.5.0
script: $CI_ROOT/vmtest/run_vmtest.sh || travis_terminate 1
- name: Debian Build
language: bash
install: $CI_ROOT/managers/debian.sh SETUP
@@ -75,33 +75,33 @@ jobs:
script: $CI_ROOT/managers/debian.sh RUN_CLANG_ASAN || travis_terminate 1
after_script: $CI_ROOT/managers/debian.sh CLEANUP
- name: Debian Build (gcc-8)
- name: Debian Build (gcc-10)
language: bash
install: $CI_ROOT/managers/debian.sh SETUP
script: $CI_ROOT/managers/debian.sh RUN_GCC8 || travis_terminate 1
script: $CI_ROOT/managers/debian.sh RUN_GCC10 || travis_terminate 1
after_script: $CI_ROOT/managers/debian.sh CLEANUP
- name: Debian Build (gcc-8 ASan+UBSan)
- name: Debian Build (gcc-10 ASan+UBSan)
language: bash
install: $CI_ROOT/managers/debian.sh SETUP
script: $CI_ROOT/managers/debian.sh RUN_GCC8_ASAN || travis_terminate 1
script: $CI_ROOT/managers/debian.sh RUN_GCC10_ASAN || travis_terminate 1
after_script: $CI_ROOT/managers/debian.sh CLEANUP
- name: Ubuntu Bionic Build
- name: Ubuntu Focal Build
language: bash
script: sudo $CI_ROOT/managers/ubuntu.sh || travis_terminate 1
- name: Ubuntu Bionic Build (arm)
- name: Ubuntu Focal Build (arm)
arch: arm64
language: bash
script: sudo $CI_ROOT/managers/ubuntu.sh || travis_terminate 1
- name: Ubuntu Bionic Build (s390x)
- name: Ubuntu Focal Build (s390x)
arch: s390x
language: bash
script: sudo $CI_ROOT/managers/ubuntu.sh || travis_terminate 1
- name: Ubuntu Bionic Build (ppc64le)
- name: Ubuntu Focal Build (ppc64le)
arch: ppc64le
language: bash
script: sudo $CI_ROOT/managers/ubuntu.sh || travis_terminate 1
@@ -120,7 +120,7 @@ jobs:
- COVERITY_SCAN_BUILD_COMMAND_PREPEND="cd src/"
- COVERITY_SCAN_BUILD_COMMAND="make"
install:
- sudo echo 'deb-src http://archive.ubuntu.com/ubuntu/ bionic main restricted universe multiverse' >>/etc/apt/sources.list
- sudo echo 'deb-src http://archive.ubuntu.com/ubuntu/ focal main restricted universe multiverse' >>/etc/apt/sources.list
- sudo apt-get update
- sudo apt-get -y build-dep libelf-dev
- sudo apt-get install -y libelf-dev pkg-config

View File

@@ -1 +1 @@
c66dca98a24cb5f3493dd08d40bcfa94a220fa92
d0c0fe10ce6d87734b65c18dc8f4bcae3f4dbea4

View File

@@ -1 +1 @@
3cb12d27ff655e57e8efe3486dca2a22f4e30578
f18ba26da88a89db9b50cb4ff47fadb159f2810b

View File

@@ -1,17 +1,11 @@
This is a mirror of [bpf-next Linux source
tree](https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next)'s
`tools/lib/bpf` directory plus its supporting header files.
BPF/libbpf usage and questions
==============================
All the gory details of syncing can be found in `scripts/sync-kernel.sh`
script.
Some header files in this repo (`include/linux/*.h`) are reduced versions of
their counterpart files at
[bpf-next](https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/)'s
`tools/include/linux/*.h` to make compilation successful.
BPF questions
=============
Please check out [libbpf-bootstrap](https://github.com/libbpf/libbpf-bootstrap)
and [the companion blog post](https://nakryiko.com/posts/libbpf-bootstrap/) for
the examples of building BPF applications with libbpf.
[libbpf-tools](https://github.com/iovisor/bcc/tree/master/libbpf-tools) are also
a good source of the real-world libbpf-based tracing tools.
All general BPF questions, including kernel functionality, libbpf APIs and
their application, should be sent to bpf@vger.kernel.org mailing list. You can
@@ -67,9 +61,10 @@ Distributions
Distributions packaging libbpf from this mirror:
- [Fedora](https://src.fedoraproject.org/rpms/libbpf)
- [Gentoo](https://packages.gentoo.org/packages/dev-libs/libbpf)
- [Debian](https://packages.debian.org/sid/libbpf-dev)
- [Debian](https://packages.debian.org/source/sid/libbpf)
- [Arch](https://www.archlinux.org/packages/extra/x86_64/libbpf/)
- [Ubuntu](https://packages.ubuntu.com/source/groovy/libbpf)
- [Alpine](https://pkgs.alpinelinux.org/packages?name=libbpf)
Benefits of packaging from the mirror over packaging from kernel sources:
- Consistent versioning across distributions.
@@ -104,6 +99,7 @@ Some major Linux distributions come with kernel BTF already built in:
- OpenSUSE Tumbleweed (in the next release, as of 2020-06-04)
- Arch Linux (from kernel 5.7.1.arch1-1)
- Ubuntu 20.10
- Debian 11 (amd64/arm64)
If your kernel doesn't come with BTF built-in, you'll need to build custom
kernel. You'll need:
@@ -124,17 +120,33 @@ distributions have Clang/LLVM 10+ packaged by default:
- Ubuntu 20.04+
- Arch Linux
- Ubuntu 20.10 (LLVM 11)
- Debian 11 (LLVM 11)
- Alpine 3.13+
Otherwise, please make sure to update it on your system.
The following resources are useful to understand what BPF CO-RE is and how to
use it:
- [BPF Portability and CO-RE](https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html)
- [HOWTO: BCC to libbpf conversion](https://facebookmicrosites.github.io/bpf/blog/2020/02/20/bcc-to-libbpf-howto-guide.html)
- [BPF Portability and CO-RE](https://nakryiko.com/posts/bpf-portability-and-co-re/)
- [HOWTO: BCC to libbpf conversion](https://nakryiko.com/posts/bcc-to-libbpf-howto-guide/)
- [libbpf-tools in BCC repo](https://github.com/iovisor/bcc/tree/master/libbpf-tools)
contain lots of real-world tools converted from BCC to BPF CO-RE. Consider
converting some more to both contribute to the BPF community and gain some
more experience with it.
Details
=======
This is a mirror of [bpf-next Linux source
tree](https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next)'s
`tools/lib/bpf` directory plus its supporting header files.
All the gory details of syncing can be found in `scripts/sync-kernel.sh`
script.
Some header files in this repo (`include/linux/*.h`) are reduced versions of
their counterpart files at
[bpf-next](https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/)'s
`tools/include/linux/*.h` to make compilation successful.
License
=======

File diff suppressed because it is too large Load Diff

View File

@@ -52,7 +52,7 @@ struct btf_type {
};
};
#define BTF_INFO_KIND(info) (((info) >> 24) & 0x0f)
#define BTF_INFO_KIND(info) (((info) >> 24) & 0x1f)
#define BTF_INFO_VLEN(info) ((info) & 0xffff)
#define BTF_INFO_KFLAG(info) ((info) >> 31)
@@ -72,7 +72,8 @@ struct btf_type {
#define BTF_KIND_FUNC_PROTO 13 /* Function Proto */
#define BTF_KIND_VAR 14 /* Variable */
#define BTF_KIND_DATASEC 15 /* Section */
#define BTF_KIND_MAX BTF_KIND_DATASEC
#define BTF_KIND_FLOAT 16 /* Floating point */
#define BTF_KIND_MAX BTF_KIND_FLOAT
#define NR_BTF_KINDS (BTF_KIND_MAX + 1)
/* For some specific BTF_KIND, "struct btf_type" is immediately

View File

@@ -409,6 +409,8 @@ enum {
IFLA_MACVLAN_MACADDR,
IFLA_MACVLAN_MACADDR_DATA,
IFLA_MACVLAN_MACADDR_COUNT,
IFLA_MACVLAN_BC_QUEUE_LEN,
IFLA_MACVLAN_BC_QUEUE_LEN_USED,
__IFLA_MACVLAN_MAX,
};

View File

@@ -266,12 +266,12 @@ for patch in $(ls -1 ${TMP_DIR}/patches | tail -n +2); do
done
# Generate bpf_helper_defs.h and commit, if anything changed
# restore Linux tip to use bpf_helpers_doc.py
# restore Linux tip to use bpf_doc.py
cd_to ${LINUX_REPO}
git checkout ${TIP_TAG}
# re-generate bpf_helper_defs.h
cd_to ${LIBBPF_REPO}
"${LINUX_ABS_DIR}/scripts/bpf_helpers_doc.py" --header \
"${LINUX_ABS_DIR}/scripts/bpf_doc.py" --header \
--file include/uapi/linux/bpf.h > src/bpf_helper_defs.h
# if anything changed, commit it
helpers_changes=$(git status --porcelain src/bpf_helper_defs.h | wc -l)

View File

@@ -36,7 +36,7 @@ SHARED_OBJDIR := $(OBJDIR)/sharedobjs
STATIC_OBJDIR := $(OBJDIR)/staticobjs
OBJS := bpf.o btf.o libbpf.o libbpf_errno.o netlink.o \
nlattr.o str_error.o libbpf_probes.o bpf_prog_linfo.o xsk.o \
btf_dump.o hashmap.o ringbuf.o
btf_dump.o hashmap.o ringbuf.o strset.o linker.o
SHARED_OBJS := $(addprefix $(SHARED_OBJDIR)/,$(OBJS))
STATIC_OBJS := $(addprefix $(STATIC_OBJDIR)/,$(OBJS))
@@ -48,7 +48,7 @@ ifndef BUILD_STATIC_ONLY
VERSION_SCRIPT := libbpf.map
endif
HEADERS := bpf.h libbpf.h btf.h xsk.h libbpf_util.h \
HEADERS := bpf.h libbpf.h btf.h xsk.h \
bpf_helpers.h bpf_helper_defs.h bpf_tracing.h \
bpf_endian.h bpf_core_read.h libbpf_common.h
UAPI_HEADERS := $(addprefix $(TOPDIR)/include/uapi/linux/,\
@@ -66,6 +66,13 @@ else
LIBSUBDIR := lib
endif
# By default let the pc file itself use ${prefix} in includedir/libdir so that
# the prefix can be overridden at runtime (eg: --define-prefix)
ifndef LIBDIR
LIBDIR_PC := $$\{prefix\}/$(LIBSUBDIR)
else
LIBDIR_PC := $(LIBDIR)
endif
PREFIX ?= /usr
LIBDIR ?= $(PREFIX)/$(LIBSUBDIR)
INCLUDEDIR ?= $(PREFIX)/include
@@ -93,7 +100,7 @@ $(OBJDIR)/libbpf.so.$(LIBBPF_VERSION): $(SHARED_OBJS)
$(OBJDIR)/libbpf.pc:
$(Q)sed -e "s|@PREFIX@|$(PREFIX)|" \
-e "s|@LIBDIR@|$(LIBDIR)|" \
-e "s|@LIBDIR@|$(LIBDIR_PC)|" \
-e "s|@VERSION@|$(LIBBPF_VERSION)|" \
< libbpf.pc.template > $@
@@ -114,7 +121,7 @@ define do_install
$(Q)if [ ! -d '$(DESTDIR)$2' ]; then \
$(INSTALL) -d -m 755 '$(DESTDIR)$2'; \
fi;
$(Q)$(INSTALL) $1 $(if $3,-m $3,) '$(DESTDIR)$2'
$(Q)$(INSTALL) $(if $3,-m $3,) $1 '$(DESTDIR)$2'
endef
# Preserve symlinks at installation.
@@ -123,7 +130,7 @@ define do_s_install
$(Q)if [ ! -d '$(DESTDIR)$2' ]; then \
$(INSTALL) -d -m 755 '$(DESTDIR)$2'; \
fi;
$(Q)cp -fpR $1 '$(DESTDIR)$2'
$(Q)cp -fR $1 '$(DESTDIR)$2'
endef
install: all install_headers install_pkgconfig

104
src/bpf.c
View File

@@ -67,11 +67,12 @@ static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
static inline int sys_bpf_prog_load(union bpf_attr *attr, unsigned int size)
{
int retries = 5;
int fd;
do {
fd = sys_bpf(BPF_PROG_LOAD, attr, size);
} while (fd < 0 && errno == EAGAIN);
} while (fd < 0 && errno == EAGAIN && retries-- > 0);
return fd;
}
@@ -214,59 +215,55 @@ alloc_zero_tailing_info(const void *orecord, __u32 cnt,
return info;
}
int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
char *log_buf, size_t log_buf_sz)
int libbpf__bpf_prog_load(const struct bpf_prog_load_params *load_attr)
{
void *finfo = NULL, *linfo = NULL;
union bpf_attr attr;
__u32 log_level;
int fd;
if (!load_attr || !log_buf != !log_buf_sz)
if (!load_attr->log_buf != !load_attr->log_buf_sz)
return -EINVAL;
log_level = load_attr->log_level;
if (log_level > (4 | 2 | 1) || (log_level && !log_buf))
if (load_attr->log_level > (4 | 2 | 1) || (load_attr->log_level && !load_attr->log_buf))
return -EINVAL;
memset(&attr, 0, sizeof(attr));
attr.prog_type = load_attr->prog_type;
attr.expected_attach_type = load_attr->expected_attach_type;
if (attr.prog_type == BPF_PROG_TYPE_STRUCT_OPS ||
attr.prog_type == BPF_PROG_TYPE_LSM) {
attr.attach_btf_id = load_attr->attach_btf_id;
} else if (attr.prog_type == BPF_PROG_TYPE_TRACING ||
attr.prog_type == BPF_PROG_TYPE_EXT) {
attr.attach_btf_id = load_attr->attach_btf_id;
if (load_attr->attach_prog_fd)
attr.attach_prog_fd = load_attr->attach_prog_fd;
} else {
attr.prog_ifindex = load_attr->prog_ifindex;
attr.kern_version = load_attr->kern_version;
}
attr.insn_cnt = (__u32)load_attr->insns_cnt;
else
attr.attach_btf_obj_fd = load_attr->attach_btf_obj_fd;
attr.attach_btf_id = load_attr->attach_btf_id;
attr.prog_ifindex = load_attr->prog_ifindex;
attr.kern_version = load_attr->kern_version;
attr.insn_cnt = (__u32)load_attr->insn_cnt;
attr.insns = ptr_to_u64(load_attr->insns);
attr.license = ptr_to_u64(load_attr->license);
attr.log_level = log_level;
if (log_level) {
attr.log_buf = ptr_to_u64(log_buf);
attr.log_size = log_buf_sz;
} else {
attr.log_buf = ptr_to_u64(NULL);
attr.log_size = 0;
attr.log_level = load_attr->log_level;
if (attr.log_level) {
attr.log_buf = ptr_to_u64(load_attr->log_buf);
attr.log_size = load_attr->log_buf_sz;
}
attr.prog_btf_fd = load_attr->prog_btf_fd;
attr.prog_flags = load_attr->prog_flags;
attr.func_info_rec_size = load_attr->func_info_rec_size;
attr.func_info_cnt = load_attr->func_info_cnt;
attr.func_info = ptr_to_u64(load_attr->func_info);
attr.line_info_rec_size = load_attr->line_info_rec_size;
attr.line_info_cnt = load_attr->line_info_cnt;
attr.line_info = ptr_to_u64(load_attr->line_info);
if (load_attr->name)
memcpy(attr.prog_name, load_attr->name,
min(strlen(load_attr->name), BPF_OBJ_NAME_LEN - 1));
attr.prog_flags = load_attr->prog_flags;
min(strlen(load_attr->name), (size_t)BPF_OBJ_NAME_LEN - 1));
fd = sys_bpf_prog_load(&attr, sizeof(attr));
if (fd >= 0)
@@ -306,19 +303,19 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
}
fd = sys_bpf_prog_load(&attr, sizeof(attr));
if (fd >= 0)
goto done;
}
if (log_level || !log_buf)
if (load_attr->log_level || !load_attr->log_buf)
goto done;
/* Try again with log */
attr.log_buf = ptr_to_u64(log_buf);
attr.log_size = log_buf_sz;
attr.log_buf = ptr_to_u64(load_attr->log_buf);
attr.log_size = load_attr->log_buf_sz;
attr.log_level = 1;
log_buf[0] = 0;
load_attr->log_buf[0] = 0;
fd = sys_bpf_prog_load(&attr, sizeof(attr));
done:
free(finfo);
@@ -326,6 +323,49 @@ done:
return fd;
}
int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
char *log_buf, size_t log_buf_sz)
{
struct bpf_prog_load_params p = {};
if (!load_attr || !log_buf != !log_buf_sz)
return -EINVAL;
p.prog_type = load_attr->prog_type;
p.expected_attach_type = load_attr->expected_attach_type;
switch (p.prog_type) {
case BPF_PROG_TYPE_STRUCT_OPS:
case BPF_PROG_TYPE_LSM:
p.attach_btf_id = load_attr->attach_btf_id;
break;
case BPF_PROG_TYPE_TRACING:
case BPF_PROG_TYPE_EXT:
p.attach_btf_id = load_attr->attach_btf_id;
p.attach_prog_fd = load_attr->attach_prog_fd;
break;
default:
p.prog_ifindex = load_attr->prog_ifindex;
p.kern_version = load_attr->kern_version;
}
p.insn_cnt = load_attr->insns_cnt;
p.insns = load_attr->insns;
p.license = load_attr->license;
p.log_level = load_attr->log_level;
p.log_buf = log_buf;
p.log_buf_sz = log_buf_sz;
p.prog_btf_fd = load_attr->prog_btf_fd;
p.func_info_rec_size = load_attr->func_info_rec_size;
p.func_info_cnt = load_attr->func_info_cnt;
p.func_info = load_attr->func_info;
p.line_info_rec_size = load_attr->line_info_rec_size;
p.line_info_cnt = load_attr->line_info_cnt;
p.line_info = load_attr->line_info;
p.name = load_attr->name;
p.prog_flags = load_attr->prog_flags;
return libbpf__bpf_prog_load(&p);
}
int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
size_t insns_cnt, const char *license,
__u32 kern_version, char *log_buf,

View File

@@ -88,11 +88,19 @@ enum bpf_enum_value_kind {
const void *p = (const void *)s + __CORE_RELO(s, field, BYTE_OFFSET); \
unsigned long long val; \
\
/* This is a so-called barrier_var() operation that makes specified \
* variable "a black box" for optimizing compiler. \
* It forces compiler to perform BYTE_OFFSET relocation on p and use \
* its calculated value in the switch below, instead of applying \
* the same relocation 4 times for each individual memory load. \
*/ \
asm volatile("" : "=r"(p) : "0"(p)); \
\
switch (__CORE_RELO(s, field, BYTE_SIZE)) { \
case 1: val = *(const unsigned char *)p; \
case 2: val = *(const unsigned short *)p; \
case 4: val = *(const unsigned int *)p; \
case 8: val = *(const unsigned long long *)p; \
case 1: val = *(const unsigned char *)p; break; \
case 2: val = *(const unsigned short *)p; break; \
case 4: val = *(const unsigned int *)p; break; \
case 8: val = *(const unsigned long long *)p; break; \
} \
val <<= __CORE_RELO(s, field, LSHIFT_U64); \
if (__CORE_RELO(s, field, SIGNED)) \
@@ -195,17 +203,22 @@ enum bpf_enum_value_kind {
* (local) BTF, used to record relocation.
*/
#define bpf_core_read(dst, sz, src) \
bpf_probe_read_kernel(dst, sz, \
(const void *)__builtin_preserve_access_index(src))
bpf_probe_read_kernel(dst, sz, (const void *)__builtin_preserve_access_index(src))
/* NOTE: see comments for BPF_CORE_READ_USER() about the proper types use. */
#define bpf_core_read_user(dst, sz, src) \
bpf_probe_read_user(dst, sz, (const void *)__builtin_preserve_access_index(src))
/*
* bpf_core_read_str() is a thin wrapper around bpf_probe_read_str()
* additionally emitting BPF CO-RE field relocation for specified source
* argument.
*/
#define bpf_core_read_str(dst, sz, src) \
bpf_probe_read_kernel_str(dst, sz, \
(const void *)__builtin_preserve_access_index(src))
bpf_probe_read_kernel_str(dst, sz, (const void *)__builtin_preserve_access_index(src))
/* NOTE: see comments for BPF_CORE_READ_USER() about the proper types use. */
#define bpf_core_read_user_str(dst, sz, src) \
bpf_probe_read_user_str(dst, sz, (const void *)__builtin_preserve_access_index(src))
#define ___concat(a, b) a ## b
#define ___apply(fn, n) ___concat(fn, n)
@@ -264,30 +277,29 @@ enum bpf_enum_value_kind {
read_fn((void *)(dst), sizeof(*(dst)), &((src_type)(src))->accessor)
/* "recursively" read a sequence of inner pointers using local __t var */
#define ___rd_first(src, a) ___read(bpf_core_read, &__t, ___type(src), src, a);
#define ___rd_last(...) \
___read(bpf_core_read, &__t, \
___type(___nolast(__VA_ARGS__)), __t, ___last(__VA_ARGS__));
#define ___rd_p1(...) const void *__t; ___rd_first(__VA_ARGS__)
#define ___rd_p2(...) ___rd_p1(___nolast(__VA_ARGS__)) ___rd_last(__VA_ARGS__)
#define ___rd_p3(...) ___rd_p2(___nolast(__VA_ARGS__)) ___rd_last(__VA_ARGS__)
#define ___rd_p4(...) ___rd_p3(___nolast(__VA_ARGS__)) ___rd_last(__VA_ARGS__)
#define ___rd_p5(...) ___rd_p4(___nolast(__VA_ARGS__)) ___rd_last(__VA_ARGS__)
#define ___rd_p6(...) ___rd_p5(___nolast(__VA_ARGS__)) ___rd_last(__VA_ARGS__)
#define ___rd_p7(...) ___rd_p6(___nolast(__VA_ARGS__)) ___rd_last(__VA_ARGS__)
#define ___rd_p8(...) ___rd_p7(___nolast(__VA_ARGS__)) ___rd_last(__VA_ARGS__)
#define ___rd_p9(...) ___rd_p8(___nolast(__VA_ARGS__)) ___rd_last(__VA_ARGS__)
#define ___read_ptrs(src, ...) \
___apply(___rd_p, ___narg(__VA_ARGS__))(src, __VA_ARGS__)
#define ___rd_first(fn, src, a) ___read(fn, &__t, ___type(src), src, a);
#define ___rd_last(fn, ...) \
___read(fn, &__t, ___type(___nolast(__VA_ARGS__)), __t, ___last(__VA_ARGS__));
#define ___rd_p1(fn, ...) const void *__t; ___rd_first(fn, __VA_ARGS__)
#define ___rd_p2(fn, ...) ___rd_p1(fn, ___nolast(__VA_ARGS__)) ___rd_last(fn, __VA_ARGS__)
#define ___rd_p3(fn, ...) ___rd_p2(fn, ___nolast(__VA_ARGS__)) ___rd_last(fn, __VA_ARGS__)
#define ___rd_p4(fn, ...) ___rd_p3(fn, ___nolast(__VA_ARGS__)) ___rd_last(fn, __VA_ARGS__)
#define ___rd_p5(fn, ...) ___rd_p4(fn, ___nolast(__VA_ARGS__)) ___rd_last(fn, __VA_ARGS__)
#define ___rd_p6(fn, ...) ___rd_p5(fn, ___nolast(__VA_ARGS__)) ___rd_last(fn, __VA_ARGS__)
#define ___rd_p7(fn, ...) ___rd_p6(fn, ___nolast(__VA_ARGS__)) ___rd_last(fn, __VA_ARGS__)
#define ___rd_p8(fn, ...) ___rd_p7(fn, ___nolast(__VA_ARGS__)) ___rd_last(fn, __VA_ARGS__)
#define ___rd_p9(fn, ...) ___rd_p8(fn, ___nolast(__VA_ARGS__)) ___rd_last(fn, __VA_ARGS__)
#define ___read_ptrs(fn, src, ...) \
___apply(___rd_p, ___narg(__VA_ARGS__))(fn, src, __VA_ARGS__)
#define ___core_read0(fn, dst, src, a) \
#define ___core_read0(fn, fn_ptr, dst, src, a) \
___read(fn, dst, ___type(src), src, a);
#define ___core_readN(fn, dst, src, ...) \
___read_ptrs(src, ___nolast(__VA_ARGS__)) \
#define ___core_readN(fn, fn_ptr, dst, src, ...) \
___read_ptrs(fn_ptr, src, ___nolast(__VA_ARGS__)) \
___read(fn, dst, ___type(src, ___nolast(__VA_ARGS__)), __t, \
___last(__VA_ARGS__));
#define ___core_read(fn, dst, src, a, ...) \
___apply(___core_read, ___empty(__VA_ARGS__))(fn, dst, \
#define ___core_read(fn, fn_ptr, dst, src, a, ...) \
___apply(___core_read, ___empty(__VA_ARGS__))(fn, fn_ptr, dst, \
src, a, ##__VA_ARGS__)
/*
@@ -295,20 +307,73 @@ enum bpf_enum_value_kind {
* BPF_CORE_READ(), in which final field is read into user-provided storage.
* See BPF_CORE_READ() below for more details on general usage.
*/
#define BPF_CORE_READ_INTO(dst, src, a, ...) \
({ \
___core_read(bpf_core_read, dst, (src), a, ##__VA_ARGS__) \
})
#define BPF_CORE_READ_INTO(dst, src, a, ...) ({ \
___core_read(bpf_core_read, bpf_core_read, \
dst, (src), a, ##__VA_ARGS__) \
})
/*
* Variant of BPF_CORE_READ_INTO() for reading from user-space memory.
*
* NOTE: see comments for BPF_CORE_READ_USER() about the proper types use.
*/
#define BPF_CORE_READ_USER_INTO(dst, src, a, ...) ({ \
___core_read(bpf_core_read_user, bpf_core_read_user, \
dst, (src), a, ##__VA_ARGS__) \
})
/* Non-CO-RE variant of BPF_CORE_READ_INTO() */
#define BPF_PROBE_READ_INTO(dst, src, a, ...) ({ \
___core_read(bpf_probe_read, bpf_probe_read, \
dst, (src), a, ##__VA_ARGS__) \
})
/* Non-CO-RE variant of BPF_CORE_READ_USER_INTO().
*
* As no CO-RE relocations are emitted, source types can be arbitrary and are
* not restricted to kernel types only.
*/
#define BPF_PROBE_READ_USER_INTO(dst, src, a, ...) ({ \
___core_read(bpf_probe_read_user, bpf_probe_read_user, \
dst, (src), a, ##__VA_ARGS__) \
})
/*
* BPF_CORE_READ_STR_INTO() does same "pointer chasing" as
* BPF_CORE_READ() for intermediate pointers, but then executes (and returns
* corresponding error code) bpf_core_read_str() for final string read.
*/
#define BPF_CORE_READ_STR_INTO(dst, src, a, ...) \
({ \
___core_read(bpf_core_read_str, dst, (src), a, ##__VA_ARGS__)\
})
#define BPF_CORE_READ_STR_INTO(dst, src, a, ...) ({ \
___core_read(bpf_core_read_str, bpf_core_read, \
dst, (src), a, ##__VA_ARGS__) \
})
/*
* Variant of BPF_CORE_READ_STR_INTO() for reading from user-space memory.
*
* NOTE: see comments for BPF_CORE_READ_USER() about the proper types use.
*/
#define BPF_CORE_READ_USER_STR_INTO(dst, src, a, ...) ({ \
___core_read(bpf_core_read_user_str, bpf_core_read_user, \
dst, (src), a, ##__VA_ARGS__) \
})
/* Non-CO-RE variant of BPF_CORE_READ_STR_INTO() */
#define BPF_PROBE_READ_STR_INTO(dst, src, a, ...) ({ \
___core_read(bpf_probe_read_str, bpf_probe_read, \
dst, (src), a, ##__VA_ARGS__) \
})
/*
* Non-CO-RE variant of BPF_CORE_READ_USER_STR_INTO().
*
* As no CO-RE relocations are emitted, source types can be arbitrary and are
* not restricted to kernel types only.
*/
#define BPF_PROBE_READ_USER_STR_INTO(dst, src, a, ...) ({ \
___core_read(bpf_probe_read_user_str, bpf_probe_read_user, \
dst, (src), a, ##__VA_ARGS__) \
})
/*
* BPF_CORE_READ() is used to simplify BPF CO-RE relocatable read, especially
@@ -334,12 +399,46 @@ enum bpf_enum_value_kind {
* N.B. Only up to 9 "field accessors" are supported, which should be more
* than enough for any practical purpose.
*/
#define BPF_CORE_READ(src, a, ...) \
({ \
___type((src), a, ##__VA_ARGS__) __r; \
BPF_CORE_READ_INTO(&__r, (src), a, ##__VA_ARGS__); \
__r; \
})
#define BPF_CORE_READ(src, a, ...) ({ \
___type((src), a, ##__VA_ARGS__) __r; \
BPF_CORE_READ_INTO(&__r, (src), a, ##__VA_ARGS__); \
__r; \
})
/*
* Variant of BPF_CORE_READ() for reading from user-space memory.
*
* NOTE: all the source types involved are still *kernel types* and need to
* exist in kernel (or kernel module) BTF, otherwise CO-RE relocation will
* fail. Custom user types are not relocatable with CO-RE.
* The typical situation in which BPF_CORE_READ_USER() might be used is to
* read kernel UAPI types from the user-space memory passed in as a syscall
* input argument.
*/
#define BPF_CORE_READ_USER(src, a, ...) ({ \
___type((src), a, ##__VA_ARGS__) __r; \
BPF_CORE_READ_USER_INTO(&__r, (src), a, ##__VA_ARGS__); \
__r; \
})
/* Non-CO-RE variant of BPF_CORE_READ() */
#define BPF_PROBE_READ(src, a, ...) ({ \
___type((src), a, ##__VA_ARGS__) __r; \
BPF_PROBE_READ_INTO(&__r, (src), a, ##__VA_ARGS__); \
__r; \
})
/*
* Non-CO-RE variant of BPF_CORE_READ_USER().
*
* As no CO-RE relocations are emitted, source types can be arbitrary and are
* not restricted to kernel types only.
*/
#define BPF_PROBE_READ_USER(src, a, ...) ({ \
___type((src), a, ##__VA_ARGS__) __r; \
BPF_PROBE_READ_USER_INTO(&__r, (src), a, ##__VA_ARGS__); \
__r; \
})
#endif

View File

@@ -1,4 +1,4 @@
/* This is auto-generated file. See bpf_helpers_doc.py for details. */
/* This is auto-generated file. See bpf_doc.py for details. */
/* Forward declarations of BPF structs */
struct bpf_fib_lookup;
@@ -16,6 +16,7 @@ struct bpf_sysctl;
struct bpf_tcp_sock;
struct bpf_tunnel_key;
struct bpf_xfrm_state;
struct linux_binprm;
struct pt_regs;
struct sk_reuseport_md;
struct sockaddr;
@@ -32,6 +33,9 @@ struct sk_msg_md;
struct xdp_md;
struct path;
struct btf_ptr;
struct inode;
struct socket;
struct file;
/*
* bpf_map_lookup_elem
@@ -1142,8 +1146,8 @@ static long (*bpf_probe_read_str)(void *dst, __u32 size, const void *unsafe_ptr)
* identifier that can be assumed unique.
*
* Returns
* A 8-byte long non-decreasing number on success, or 0 if the
* socket field is missing inside *skb*.
* A 8-byte long unique number on success, or 0 if the socket
* field is missing inside *skb*.
*/
static __u64 (*bpf_get_socket_cookie)(void *ctx) = (void *) 46;
@@ -1245,6 +1249,10 @@ static long (*bpf_setsockopt)(void *bpf_socket, int level, int optname, void *op
* Use with ENCAP_L3/L4 flags to further specify the tunnel
* type; *len* is the length of the inner MAC header.
*
* * **BPF_F_ADJ_ROOM_ENCAP_L2_ETH**:
* Use with BPF_F_ADJ_ROOM_ENCAP_L2 flag to further specify the
* L2 type as Ethernet.
*
* A call to this helper is susceptible to change the underlying
* packet buffer. Therefore, at load time, all checks on pointers
* previously done by the verifier are invalidated and must be
@@ -1795,6 +1803,9 @@ static long (*bpf_skb_load_bytes_relative)(const void *skb, __u32 offset, void *
* * 0 on success (packet is forwarded, nexthop neighbor exists)
* * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the
* packet is not forwarded or needs assist from full stack
*
* If lookup fails with BPF_FIB_LKUP_RET_FRAG_NEEDED, then the MTU
* was exceeded and output params->mtu_result contains the MTU.
*/
static long (*bpf_fib_lookup)(void *ctx, struct bpf_fib_lookup *params, int plen, __u32 flags) = (void *) 69;
@@ -2067,7 +2078,7 @@ static __u64 (*bpf_get_current_cgroup_id)(void) = (void *) 80;
* running simultaneously.
*
* A user should care about the synchronization by himself.
* For example, by using the **BPF_STX_XADD** instruction to alter
* For example, by using the **BPF_ATOMIC** instructions to alter
* the shared data.
*
* Returns
@@ -2744,10 +2755,10 @@ static long (*bpf_probe_read_kernel)(void *dst, __u32 size, const void *unsafe_p
* string length is larger than *size*, just *size*-1 bytes are
* copied and the last byte is set to NUL.
*
* On success, the length of the copied string is returned. This
* makes this helper useful in tracing programs for reading
* strings, and more importantly to get its length at runtime. See
* the following snippet:
* On success, returns the number of bytes that were written,
* including the terminal NUL. This makes this helper useful in
* tracing programs for reading strings, and more importantly to
* get its length at runtime. See the following snippet:
*
* ::
*
@@ -2776,7 +2787,7 @@ static long (*bpf_probe_read_kernel)(void *dst, __u32 size, const void *unsafe_p
* one can quickly iterate at the right offset of the memory area.
*
* Returns
* On success, the strictly positive length of the string,
* On success, the strictly positive length of the output string,
* including the trailing NUL character. On error, a negative
* value.
*/
@@ -3081,6 +3092,13 @@ static __u64 (*bpf_sk_ancestor_cgroup_id)(void *sk, int ancestor_level) = (void
* of new data availability is sent.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally.
* If **0** is specified in *flags*, an adaptive notification
* of new data availability is sent.
*
* An adaptive notification is a notification sent whenever the user-space
* process has caught up and consumed all available payloads. In case the user-space
* process is still processing a previous payload, then no notification is needed
* as it will process the newly added payload automatically.
*
* Returns
* 0 on success, or a negative error in case of failure.
@@ -3091,6 +3109,7 @@ static long (*bpf_ringbuf_output)(void *ringbuf, void *data, __u64 size, __u64 f
* bpf_ringbuf_reserve
*
* Reserve *size* bytes of payload in a ring buffer *ringbuf*.
* *flags* must be 0.
*
* Returns
* Valid pointer with *size* bytes of memory available; NULL,
@@ -3106,6 +3125,10 @@ static void *(*bpf_ringbuf_reserve)(void *ringbuf, __u64 size, __u64 flags) = (v
* of new data availability is sent.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally.
* If **0** is specified in *flags*, an adaptive notification
* of new data availability is sent.
*
* See 'bpf_ringbuf_output()' for the definition of adaptive notification.
*
* Returns
* Nothing. Always succeeds.
@@ -3120,6 +3143,10 @@ static void (*bpf_ringbuf_submit)(void *data, __u64 flags) = (void *) 132;
* of new data availability is sent.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally.
* If **0** is specified in *flags*, an adaptive notification
* of new data availability is sent.
*
* See 'bpf_ringbuf_output()' for the definition of adaptive notification.
*
* Returns
* Nothing. Always succeeds.
@@ -3617,4 +3644,249 @@ static void *(*bpf_this_cpu_ptr)(const void *percpu_ptr) = (void *) 154;
*/
static long (*bpf_redirect_peer)(__u32 ifindex, __u64 flags) = (void *) 155;
/*
* bpf_task_storage_get
*
* Get a bpf_local_storage from the *task*.
*
* Logically, it could be thought of as getting the value from
* a *map* with *task* as the **key**. From this
* perspective, the usage is not much different from
* **bpf_map_lookup_elem**\ (*map*, **&**\ *task*) except this
* helper enforces the key must be an task_struct and the map must also
* be a **BPF_MAP_TYPE_TASK_STORAGE**.
*
* Underneath, the value is stored locally at *task* instead of
* the *map*. The *map* is used as the bpf-local-storage
* "type". The bpf-local-storage "type" (i.e. the *map*) is
* searched against all bpf_local_storage residing at *task*.
*
* An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be
* used such that a new bpf_local_storage will be
* created if one does not exist. *value* can be used
* together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify
* the initial value of a bpf_local_storage. If *value* is
* **NULL**, the new bpf_local_storage will be zero initialized.
*
* Returns
* A bpf_local_storage pointer is returned on success.
*
* **NULL** if not found or there was an error in adding
* a new bpf_local_storage.
*/
static void *(*bpf_task_storage_get)(void *map, struct task_struct *task, void *value, __u64 flags) = (void *) 156;
/*
* bpf_task_storage_delete
*
* Delete a bpf_local_storage from a *task*.
*
* Returns
* 0 on success.
*
* **-ENOENT** if the bpf_local_storage cannot be found.
*/
static long (*bpf_task_storage_delete)(void *map, struct task_struct *task) = (void *) 157;
/*
* bpf_get_current_task_btf
*
* Return a BTF pointer to the "current" task.
* This pointer can also be used in helpers that accept an
* *ARG_PTR_TO_BTF_ID* of type *task_struct*.
*
* Returns
* Pointer to the current task.
*/
static struct task_struct *(*bpf_get_current_task_btf)(void) = (void *) 158;
/*
* bpf_bprm_opts_set
*
* Set or clear certain options on *bprm*:
*
* **BPF_F_BPRM_SECUREEXEC** Set the secureexec bit
* which sets the **AT_SECURE** auxv for glibc. The bit
* is cleared if the flag is not specified.
*
* Returns
* **-EINVAL** if invalid *flags* are passed, zero otherwise.
*/
static long (*bpf_bprm_opts_set)(struct linux_binprm *bprm, __u64 flags) = (void *) 159;
/*
* bpf_ktime_get_coarse_ns
*
* Return a coarse-grained version of the time elapsed since
* system boot, in nanoseconds. Does not include time the system
* was suspended.
*
* See: **clock_gettime**\ (**CLOCK_MONOTONIC_COARSE**)
*
* Returns
* Current *ktime*.
*/
static __u64 (*bpf_ktime_get_coarse_ns)(void) = (void *) 160;
/*
* bpf_ima_inode_hash
*
* Returns the stored IMA hash of the *inode* (if it's avaialable).
* If the hash is larger than *size*, then only *size*
* bytes will be copied to *dst*
*
* Returns
* The **hash_algo** is returned on success,
* **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if
* invalid arguments are passed.
*/
static long (*bpf_ima_inode_hash)(struct inode *inode, void *dst, __u32 size) = (void *) 161;
/*
* bpf_sock_from_file
*
* If the given file represents a socket, returns the associated
* socket.
*
* Returns
* A pointer to a struct socket on success or NULL if the file is
* not a socket.
*/
static struct socket *(*bpf_sock_from_file)(struct file *file) = (void *) 162;
/*
* bpf_check_mtu
*
* Check packet size against exceeding MTU of net device (based
* on *ifindex*). This helper will likely be used in combination
* with helpers that adjust/change the packet size.
*
* The argument *len_diff* can be used for querying with a planned
* size change. This allows to check MTU prior to changing packet
* ctx. Providing an *len_diff* adjustment that is larger than the
* actual packet size (resulting in negative packet size) will in
* principle not exceed the MTU, why it is not considered a
* failure. Other BPF-helpers are needed for performing the
* planned size change, why the responsability for catch a negative
* packet size belong in those helpers.
*
* Specifying *ifindex* zero means the MTU check is performed
* against the current net device. This is practical if this isn't
* used prior to redirect.
*
* On input *mtu_len* must be a valid pointer, else verifier will
* reject BPF program. If the value *mtu_len* is initialized to
* zero then the ctx packet size is use. When value *mtu_len* is
* provided as input this specify the L3 length that the MTU check
* is done against. Remember XDP and TC length operate at L2, but
* this value is L3 as this correlate to MTU and IP-header tot_len
* values which are L3 (similar behavior as bpf_fib_lookup).
*
* The Linux kernel route table can configure MTUs on a more
* specific per route level, which is not provided by this helper.
* For route level MTU checks use the **bpf_fib_lookup**\ ()
* helper.
*
* *ctx* is either **struct xdp_md** for XDP programs or
* **struct sk_buff** for tc cls_act programs.
*
* The *flags* argument can be a combination of one or more of the
* following values:
*
* **BPF_MTU_CHK_SEGS**
* This flag will only works for *ctx* **struct sk_buff**.
* If packet context contains extra packet segment buffers
* (often knows as GSO skb), then MTU check is harder to
* check at this point, because in transmit path it is
* possible for the skb packet to get re-segmented
* (depending on net device features). This could still be
* a MTU violation, so this flag enables performing MTU
* check against segments, with a different violation
* return code to tell it apart. Check cannot use len_diff.
*
* On return *mtu_len* pointer contains the MTU value of the net
* device. Remember the net device configured MTU is the L3 size,
* which is returned here and XDP and TC length operate at L2.
* Helper take this into account for you, but remember when using
* MTU value in your BPF-code.
*
*
* Returns
* * 0 on success, and populate MTU value in *mtu_len* pointer.
*
* * < 0 if any input argument is invalid (*mtu_len* not updated)
*
* MTU violations return positive values, but also populate MTU
* value in *mtu_len* pointer, as this can be needed for
* implementing PMTU handing:
*
* * **BPF_MTU_CHK_RET_FRAG_NEEDED**
* * **BPF_MTU_CHK_RET_SEGS_TOOBIG**
*/
static long (*bpf_check_mtu)(void *ctx, __u32 ifindex, __u32 *mtu_len, __s32 len_diff, __u64 flags) = (void *) 163;
/*
* bpf_for_each_map_elem
*
* For each element in **map**, call **callback_fn** function with
* **map**, **callback_ctx** and other map-specific parameters.
* The **callback_fn** should be a static function and
* the **callback_ctx** should be a pointer to the stack.
* The **flags** is used to control certain aspects of the helper.
* Currently, the **flags** must be 0.
*
* The following are a list of supported map types and their
* respective expected callback signatures:
*
* BPF_MAP_TYPE_HASH, BPF_MAP_TYPE_PERCPU_HASH,
* BPF_MAP_TYPE_LRU_HASH, BPF_MAP_TYPE_LRU_PERCPU_HASH,
* BPF_MAP_TYPE_ARRAY, BPF_MAP_TYPE_PERCPU_ARRAY
*
* long (\*callback_fn)(struct bpf_map \*map, const void \*key, void \*value, void \*ctx);
*
* For per_cpu maps, the map_value is the value on the cpu where the
* bpf_prog is running.
*
* If **callback_fn** return 0, the helper will continue to the next
* element. If return value is 1, the helper will skip the rest of
* elements and return. Other return values are not used now.
*
*
* Returns
* The number of traversed map elements for success, **-EINVAL** for
* invalid **flags**.
*/
static long (*bpf_for_each_map_elem)(void *map, void *callback_fn, void *callback_ctx, __u64 flags) = (void *) 164;
/*
* bpf_snprintf
*
* Outputs a string into the **str** buffer of size **str_size**
* based on a format string stored in a read-only map pointed by
* **fmt**.
*
* Each format specifier in **fmt** corresponds to one u64 element
* in the **data** array. For strings and pointers where pointees
* are accessed, only the pointer values are stored in the *data*
* array. The *data_len* is the size of *data* in bytes.
*
* Formats **%s** and **%p{i,I}{4,6}** require to read kernel
* memory. Reading kernel memory may fail due to either invalid
* address or valid address but requiring a major memory fault. If
* reading kernel memory fails, the string for **%s** will be an
* empty string, and the ip address for **%p{i,I}{4,6}** will be 0.
* Not returning error to bpf program is consistent with what
* **bpf_trace_printk**\ () does for now.
*
*
* Returns
* The strictly positive length of the formatted string, including
* the trailing zero character. If the return value is greater than
* **str_size**, **str** contains a truncated string, guaranteed to
* be zero-terminated except when **str_size** is 0.
*
* Or **-EBUSY** if the per-CPU memory copy buffer is busy.
*/
static long (*bpf_snprintf)(char *str, __u32 str_size, const char *fmt, __u64 *data, __u32 data_len) = (void *) 165;

View File

@@ -25,13 +25,21 @@
/*
* Helper macro to place programs, maps, license in
* different sections in elf_bpf file. Section names
* are interpreted by elf_bpf loader
* are interpreted by libbpf depending on the context (BPF programs, BPF maps,
* extern variables, etc).
* To allow use of SEC() with externs (e.g., for extern .maps declarations),
* make sure __attribute__((unused)) doesn't trigger compilation warning.
*/
#define SEC(NAME) __attribute__((section(NAME), used))
#define SEC(name) \
_Pragma("GCC diagnostic push") \
_Pragma("GCC diagnostic ignored \"-Wignored-attributes\"") \
__attribute__((section(name), used)) \
_Pragma("GCC diagnostic pop") \
/* Avoid 'linux/stddef.h' definition of '__always_inline'. */
#undef __always_inline
#define __always_inline inline __attribute__((always_inline))
#ifndef __always_inline
#define __always_inline __attribute__((always_inline))
#endif
#ifndef __noinline
#define __noinline __attribute__((noinline))
#endif
@@ -40,7 +48,29 @@
#endif
/*
* Helper macro to manipulate data structures
* Use __hidden attribute to mark a non-static BPF subprogram effectively
* static for BPF verifier's verification algorithm purposes, allowing more
* extensive and permissive BPF verification process, taking into account
* subprogram's caller context.
*/
#define __hidden __attribute__((visibility("hidden")))
/* When utilizing vmlinux.h with BPF CO-RE, user BPF programs can't include
* any system-level headers (such as stddef.h, linux/version.h, etc), and
* commonly-used macros like NULL and KERNEL_VERSION aren't available through
* vmlinux.h. This just adds unnecessary hurdles and forces users to re-define
* them on their own. So as a convenience, provide such definitions here.
*/
#ifndef NULL
#define NULL ((void *)0)
#endif
#ifndef KERNEL_VERSION
#define KERNEL_VERSION(a, b, c) (((a) << 16) + ((b) << 8) + ((c) > 255 ? 255 : (c)))
#endif
/*
* Helper macros to manipulate data structures
*/
#ifndef offsetof
#define offsetof(TYPE, MEMBER) ((unsigned long)&((TYPE *)0)->MEMBER)

View File

@@ -413,20 +413,56 @@ typeof(name(0)) name(struct pt_regs *ctx) \
} \
static __always_inline typeof(name(0)) ____##name(struct pt_regs *ctx, ##args)
#define ___bpf_fill0(arr, p, x) do {} while (0)
#define ___bpf_fill1(arr, p, x) arr[p] = x
#define ___bpf_fill2(arr, p, x, args...) arr[p] = x; ___bpf_fill1(arr, p + 1, args)
#define ___bpf_fill3(arr, p, x, args...) arr[p] = x; ___bpf_fill2(arr, p + 1, args)
#define ___bpf_fill4(arr, p, x, args...) arr[p] = x; ___bpf_fill3(arr, p + 1, args)
#define ___bpf_fill5(arr, p, x, args...) arr[p] = x; ___bpf_fill4(arr, p + 1, args)
#define ___bpf_fill6(arr, p, x, args...) arr[p] = x; ___bpf_fill5(arr, p + 1, args)
#define ___bpf_fill7(arr, p, x, args...) arr[p] = x; ___bpf_fill6(arr, p + 1, args)
#define ___bpf_fill8(arr, p, x, args...) arr[p] = x; ___bpf_fill7(arr, p + 1, args)
#define ___bpf_fill9(arr, p, x, args...) arr[p] = x; ___bpf_fill8(arr, p + 1, args)
#define ___bpf_fill10(arr, p, x, args...) arr[p] = x; ___bpf_fill9(arr, p + 1, args)
#define ___bpf_fill11(arr, p, x, args...) arr[p] = x; ___bpf_fill10(arr, p + 1, args)
#define ___bpf_fill12(arr, p, x, args...) arr[p] = x; ___bpf_fill11(arr, p + 1, args)
#define ___bpf_fill(arr, args...) \
___bpf_apply(___bpf_fill, ___bpf_narg(args))(arr, 0, args)
/*
* BPF_SEQ_PRINTF to wrap bpf_seq_printf to-be-printed values
* in a structure.
*/
#define BPF_SEQ_PRINTF(seq, fmt, args...) \
({ \
_Pragma("GCC diagnostic push") \
_Pragma("GCC diagnostic ignored \"-Wint-conversion\"") \
static const char ___fmt[] = fmt; \
unsigned long long ___param[] = { args }; \
_Pragma("GCC diagnostic pop") \
int ___ret = bpf_seq_printf(seq, ___fmt, sizeof(___fmt), \
___param, sizeof(___param)); \
___ret; \
})
#define BPF_SEQ_PRINTF(seq, fmt, args...) \
({ \
static const char ___fmt[] = fmt; \
unsigned long long ___param[___bpf_narg(args)]; \
\
_Pragma("GCC diagnostic push") \
_Pragma("GCC diagnostic ignored \"-Wint-conversion\"") \
___bpf_fill(___param, args); \
_Pragma("GCC diagnostic pop") \
\
bpf_seq_printf(seq, ___fmt, sizeof(___fmt), \
___param, sizeof(___param)); \
})
/*
* BPF_SNPRINTF wraps the bpf_snprintf helper with variadic arguments instead of
* an array of u64.
*/
#define BPF_SNPRINTF(out, out_size, fmt, args...) \
({ \
static const char ___fmt[] = fmt; \
unsigned long long ___param[___bpf_narg(args)]; \
\
_Pragma("GCC diagnostic push") \
_Pragma("GCC diagnostic ignored \"-Wint-conversion\"") \
___bpf_fill(___param, args); \
_Pragma("GCC diagnostic pop") \
\
bpf_snprintf(out, out_size, ___fmt, \
___param, sizeof(___param)); \
})
#endif

1476
src/btf.c

File diff suppressed because it is too large Load Diff

View File

@@ -31,11 +31,19 @@ enum btf_endianness {
};
LIBBPF_API void btf__free(struct btf *btf);
LIBBPF_API struct btf *btf__new(const void *data, __u32 size);
LIBBPF_API struct btf *btf__new_split(const void *data, __u32 size, struct btf *base_btf);
LIBBPF_API struct btf *btf__new_empty(void);
LIBBPF_API struct btf *btf__new_empty_split(struct btf *base_btf);
LIBBPF_API struct btf *btf__parse(const char *path, struct btf_ext **btf_ext);
LIBBPF_API struct btf *btf__parse_split(const char *path, struct btf *base_btf);
LIBBPF_API struct btf *btf__parse_elf(const char *path, struct btf_ext **btf_ext);
LIBBPF_API struct btf *btf__parse_elf_split(const char *path, struct btf *base_btf);
LIBBPF_API struct btf *btf__parse_raw(const char *path);
LIBBPF_API struct btf *btf__parse_raw_split(const char *path, struct btf *base_btf);
LIBBPF_API int btf__finalize_data(struct bpf_object *obj, struct btf *btf);
LIBBPF_API int btf__load(struct btf *btf);
LIBBPF_API __s32 btf__find_by_name(const struct btf *btf,
@@ -43,6 +51,7 @@ LIBBPF_API __s32 btf__find_by_name(const struct btf *btf,
LIBBPF_API __s32 btf__find_by_name_kind(const struct btf *btf,
const char *type_name, __u32 kind);
LIBBPF_API __u32 btf__get_nr_types(const struct btf *btf);
LIBBPF_API const struct btf *btf__base_btf(const struct btf *btf);
LIBBPF_API const struct btf_type *btf__type_by_id(const struct btf *btf,
__u32 id);
LIBBPF_API size_t btf__pointer_size(const struct btf *btf);
@@ -84,8 +93,11 @@ LIBBPF_API struct btf *libbpf_find_kernel_btf(void);
LIBBPF_API int btf__find_str(struct btf *btf, const char *s);
LIBBPF_API int btf__add_str(struct btf *btf, const char *s);
LIBBPF_API int btf__add_type(struct btf *btf, const struct btf *src_btf,
const struct btf_type *src_type);
LIBBPF_API int btf__add_int(struct btf *btf, const char *name, size_t byte_sz, int encoding);
LIBBPF_API int btf__add_float(struct btf *btf, const char *name, size_t byte_sz);
LIBBPF_API int btf__add_ptr(struct btf *btf, int ref_type_id);
LIBBPF_API int btf__add_array(struct btf *btf,
int index_type_id, int elem_type_id, __u32 nr_elems);
@@ -164,6 +176,7 @@ struct btf_dump_emit_type_decl_opts {
int indent_level;
/* strip all the const/volatile/restrict mods */
bool strip_mods;
size_t :0;
};
#define btf_dump_emit_type_decl_opts__last_field strip_mods
@@ -285,6 +298,11 @@ static inline bool btf_is_datasec(const struct btf_type *t)
return btf_kind(t) == BTF_KIND_DATASEC;
}
static inline bool btf_is_float(const struct btf_type *t)
{
return btf_kind(t) == BTF_KIND_FLOAT;
}
static inline __u8 btf_int_encoding(const struct btf_type *t)
{
return BTF_INT_ENCODING(*(__u32 *)(t + 1));

View File

@@ -166,11 +166,11 @@ static int btf_dump_resize(struct btf_dump *d)
if (last_id <= d->last_id)
return 0;
if (btf_ensure_mem((void **)&d->type_states, &d->type_states_cap,
sizeof(*d->type_states), last_id + 1))
if (libbpf_ensure_mem((void **)&d->type_states, &d->type_states_cap,
sizeof(*d->type_states), last_id + 1))
return -ENOMEM;
if (btf_ensure_mem((void **)&d->cached_names, &d->cached_names_cap,
sizeof(*d->cached_names), last_id + 1))
if (libbpf_ensure_mem((void **)&d->cached_names, &d->cached_names_cap,
sizeof(*d->cached_names), last_id + 1))
return -ENOMEM;
if (d->last_id == 0) {
@@ -279,6 +279,7 @@ static int btf_dump_mark_referenced(struct btf_dump *d)
case BTF_KIND_INT:
case BTF_KIND_ENUM:
case BTF_KIND_FWD:
case BTF_KIND_FLOAT:
break;
case BTF_KIND_VOLATILE:
@@ -453,6 +454,7 @@ static int btf_dump_order_type(struct btf_dump *d, __u32 id, bool through_ptr)
switch (btf_kind(t)) {
case BTF_KIND_INT:
case BTF_KIND_FLOAT:
tstate->order_state = ORDERED;
return 0;
@@ -462,7 +464,7 @@ static int btf_dump_order_type(struct btf_dump *d, __u32 id, bool through_ptr)
return err;
case BTF_KIND_ARRAY:
return btf_dump_order_type(d, btf_array(t)->type, through_ptr);
return btf_dump_order_type(d, btf_array(t)->type, false);
case BTF_KIND_STRUCT:
case BTF_KIND_UNION: {
@@ -1133,6 +1135,7 @@ skip_mod:
case BTF_KIND_STRUCT:
case BTF_KIND_UNION:
case BTF_KIND_TYPEDEF:
case BTF_KIND_FLOAT:
goto done;
default:
pr_warn("unexpected type in decl chain, kind:%u, id:[%u]\n",
@@ -1247,6 +1250,7 @@ static void btf_dump_emit_type_chain(struct btf_dump *d,
switch (kind) {
case BTF_KIND_INT:
case BTF_KIND_FLOAT:
btf_dump_emit_mods(d, decls);
name = btf_name_of(d, t->name_off);
btf_dump_printf(d, "%s", name);

View File

@@ -15,6 +15,9 @@
static inline size_t hash_bits(size_t h, int bits)
{
/* shuffle bits and return requested number of upper bits */
if (bits == 0)
return 0;
#if (__SIZEOF_SIZE_T__ == __SIZEOF_LONG_LONG__)
/* LP64 case */
return (h * 11400714819323198485llu) >> (__SIZEOF_LONG_LONG__ * 8 - bits);
@@ -174,17 +177,17 @@ bool hashmap__find(const struct hashmap *map, const void *key, void **value);
* @key: key to iterate entries for
*/
#define hashmap__for_each_key_entry(map, cur, _key) \
for (cur = ({ size_t bkt = hash_bits(map->hash_fn((_key), map->ctx),\
map->cap_bits); \
map->buckets ? map->buckets[bkt] : NULL; }); \
for (cur = map->buckets \
? map->buckets[hash_bits(map->hash_fn((_key), map->ctx), map->cap_bits)] \
: NULL; \
cur; \
cur = cur->next) \
if (map->equal_fn(cur->key, (_key), map->ctx))
#define hashmap__for_each_key_entry_safe(map, cur, tmp, _key) \
for (cur = ({ size_t bkt = hash_bits(map->hash_fn((_key), map->ctx),\
map->cap_bits); \
cur = map->buckets ? map->buckets[bkt] : NULL; }); \
for (cur = map->buckets \
? map->buckets[hash_bits(map->hash_fn((_key), map->ctx), map->cap_bits)] \
: NULL; \
cur && ({ tmp = cur->next; true; }); \
cur = tmp) \
if (map->equal_fn(cur->key, (_key), map->ctx))

File diff suppressed because it is too large Load Diff

View File

@@ -143,6 +143,7 @@ LIBBPF_API int bpf_object__unload(struct bpf_object *obj);
LIBBPF_API const char *bpf_object__name(const struct bpf_object *obj);
LIBBPF_API unsigned int bpf_object__kversion(const struct bpf_object *obj);
LIBBPF_API int bpf_object__set_kversion(struct bpf_object *obj, __u32 kern_version);
struct btf;
LIBBPF_API struct btf *bpf_object__btf(const struct bpf_object *obj);
@@ -361,12 +362,12 @@ LIBBPF_API int bpf_program__set_struct_ops(struct bpf_program *prog);
LIBBPF_API int bpf_program__set_extension(struct bpf_program *prog);
LIBBPF_API int bpf_program__set_sk_lookup(struct bpf_program *prog);
LIBBPF_API enum bpf_prog_type bpf_program__get_type(struct bpf_program *prog);
LIBBPF_API enum bpf_prog_type bpf_program__get_type(const struct bpf_program *prog);
LIBBPF_API void bpf_program__set_type(struct bpf_program *prog,
enum bpf_prog_type type);
LIBBPF_API enum bpf_attach_type
bpf_program__get_expected_attach_type(struct bpf_program *prog);
bpf_program__get_expected_attach_type(const struct bpf_program *prog);
LIBBPF_API void
bpf_program__set_expected_attach_type(struct bpf_program *prog,
enum bpf_attach_type type);
@@ -479,6 +480,7 @@ LIBBPF_API int bpf_map__pin(struct bpf_map *map, const char *path);
LIBBPF_API int bpf_map__unpin(struct bpf_map *map, const char *path);
LIBBPF_API int bpf_map__set_inner_map_fd(struct bpf_map *map, int fd);
LIBBPF_API struct bpf_map *bpf_map__inner_map(struct bpf_map *map);
LIBBPF_API long libbpf_get_error(const void *ptr);
@@ -496,6 +498,7 @@ LIBBPF_API int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
LIBBPF_API int bpf_prog_load(const char *file, enum bpf_prog_type type,
struct bpf_object **pobj, int *prog_fd);
/* XDP related API */
struct xdp_link_info {
__u32 prog_id;
__u32 drv_prog_id;
@@ -507,6 +510,7 @@ struct xdp_link_info {
struct bpf_xdp_set_link_opts {
size_t sz;
int old_fd;
size_t :0;
};
#define bpf_xdp_set_link_opts__last_field old_fd
@@ -517,6 +521,49 @@ LIBBPF_API int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags);
LIBBPF_API int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
size_t info_size, __u32 flags);
/* TC related API */
enum bpf_tc_attach_point {
BPF_TC_INGRESS = 1 << 0,
BPF_TC_EGRESS = 1 << 1,
BPF_TC_CUSTOM = 1 << 2,
};
#define BPF_TC_PARENT(a, b) \
((((a) << 16) & 0xFFFF0000U) | ((b) & 0x0000FFFFU))
enum bpf_tc_flags {
BPF_TC_F_REPLACE = 1 << 0,
};
struct bpf_tc_hook {
size_t sz;
int ifindex;
enum bpf_tc_attach_point attach_point;
__u32 parent;
size_t :0;
};
#define bpf_tc_hook__last_field parent
struct bpf_tc_opts {
size_t sz;
int prog_fd;
__u32 flags;
__u32 prog_id;
__u32 handle;
__u32 priority;
size_t :0;
};
#define bpf_tc_opts__last_field priority
LIBBPF_API int bpf_tc_hook_create(struct bpf_tc_hook *hook);
LIBBPF_API int bpf_tc_hook_destroy(struct bpf_tc_hook *hook);
LIBBPF_API int bpf_tc_attach(const struct bpf_tc_hook *hook,
struct bpf_tc_opts *opts);
LIBBPF_API int bpf_tc_detach(const struct bpf_tc_hook *hook,
const struct bpf_tc_opts *opts);
LIBBPF_API int bpf_tc_query(const struct bpf_tc_hook *hook,
struct bpf_tc_opts *opts);
/* Ring buffer APIs */
struct ring_buffer;
@@ -536,6 +583,7 @@ LIBBPF_API int ring_buffer__add(struct ring_buffer *rb, int map_fd,
ring_buffer_sample_fn sample_cb, void *ctx);
LIBBPF_API int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms);
LIBBPF_API int ring_buffer__consume(struct ring_buffer *rb);
LIBBPF_API int ring_buffer__epoll_fd(const struct ring_buffer *rb);
/* Perf buffer APIs */
struct perf_buffer;
@@ -758,6 +806,27 @@ enum libbpf_tristate {
TRI_MODULE = 2,
};
struct bpf_linker_opts {
/* size of this struct, for forward/backward compatiblity */
size_t sz;
};
#define bpf_linker_opts__last_field sz
struct bpf_linker_file_opts {
/* size of this struct, for forward/backward compatiblity */
size_t sz;
};
#define bpf_linker_file_opts__last_field sz
struct bpf_linker;
LIBBPF_API struct bpf_linker *bpf_linker__new(const char *filename, struct bpf_linker_opts *opts);
LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker,
const char *filename,
const struct bpf_linker_file_opts *opts);
LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
#ifdef __cplusplus
} /* extern "C" */
#endif

View File

@@ -337,3 +337,33 @@ LIBBPF_0.2.0 {
perf_buffer__consume_buffer;
xsk_socket__create_shared;
} LIBBPF_0.1.0;
LIBBPF_0.3.0 {
global:
btf__base_btf;
btf__parse_elf_split;
btf__parse_raw_split;
btf__parse_split;
btf__new_empty_split;
btf__new_split;
ring_buffer__epoll_fd;
xsk_setup_xdp_prog;
xsk_socket__update_xskmap;
} LIBBPF_0.2.0;
LIBBPF_0.4.0 {
global:
btf__add_float;
btf__add_type;
bpf_linker__add_file;
bpf_linker__finalize;
bpf_linker__free;
bpf_linker__new;
bpf_map__inner_map;
bpf_object__set_kversion;
bpf_tc_attach;
bpf_tc_detach;
bpf_tc_hook_create;
bpf_tc_hook_destroy;
bpf_tc_query;
} LIBBPF_0.3.0;

View File

@@ -19,6 +19,38 @@
#pragma GCC poison reallocarray
#include "libbpf.h"
#include "btf.h"
#ifndef EM_BPF
#define EM_BPF 247
#endif
#ifndef R_BPF_64_64
#define R_BPF_64_64 1
#endif
#ifndef R_BPF_64_ABS64
#define R_BPF_64_ABS64 2
#endif
#ifndef R_BPF_64_ABS32
#define R_BPF_64_ABS32 3
#endif
#ifndef R_BPF_64_32
#define R_BPF_64_32 10
#endif
#ifndef SHT_LLVM_ADDRSIG
#define SHT_LLVM_ADDRSIG 0x6FFF4C03
#endif
/* if libelf is old and doesn't support mmap(), fall back to read() */
#ifndef ELF_C_READ_MMAP
#define ELF_C_READ_MMAP ELF_C_READ
#endif
/* Older libelf all end up in this expression, for both 32 and 64 bit */
#ifndef GELF_ST_VISIBILITY
#define GELF_ST_VISIBILITY(o) ((o) & 0x03)
#endif
#define BTF_INFO_ENC(kind, kind_flag, vlen) \
((!!(kind_flag) << 31) | ((kind) << 24) | ((vlen) & BTF_MAX_VLEN))
@@ -31,6 +63,8 @@
#define BTF_MEMBER_ENC(name, type, bits_offset) (name), (type), (bits_offset)
#define BTF_PARAM_ENC(name, type) (name), (type)
#define BTF_VAR_SECINFO_ENC(type, offset, size) (type), (offset), (size)
#define BTF_TYPE_FLOAT_ENC(name, sz) \
BTF_TYPE_ENC(name, BTF_INFO_ENC(BTF_KIND_FLOAT, 0, 0), sz)
#ifndef likely
#define likely(x) __builtin_expect(!!(x), 1)
@@ -105,9 +139,58 @@ static inline void *libbpf_reallocarray(void *ptr, size_t nmemb, size_t size)
return realloc(ptr, total);
}
void *btf_add_mem(void **data, size_t *cap_cnt, size_t elem_sz,
size_t cur_cnt, size_t max_cnt, size_t add_cnt);
int btf_ensure_mem(void **data, size_t *cap_cnt, size_t elem_sz, size_t need_cnt);
struct btf;
struct btf_type;
struct btf_type *btf_type_by_id(struct btf *btf, __u32 type_id);
const char *btf_kind_str(const struct btf_type *t);
const struct btf_type *skip_mods_and_typedefs(const struct btf *btf, __u32 id, __u32 *res_id);
static inline enum btf_func_linkage btf_func_linkage(const struct btf_type *t)
{
return (enum btf_func_linkage)(int)btf_vlen(t);
}
static inline __u32 btf_type_info(int kind, int vlen, int kflag)
{
return (kflag << 31) | (kind << 24) | vlen;
}
enum map_def_parts {
MAP_DEF_MAP_TYPE = 0x001,
MAP_DEF_KEY_TYPE = 0x002,
MAP_DEF_KEY_SIZE = 0x004,
MAP_DEF_VALUE_TYPE = 0x008,
MAP_DEF_VALUE_SIZE = 0x010,
MAP_DEF_MAX_ENTRIES = 0x020,
MAP_DEF_MAP_FLAGS = 0x040,
MAP_DEF_NUMA_NODE = 0x080,
MAP_DEF_PINNING = 0x100,
MAP_DEF_INNER_MAP = 0x200,
MAP_DEF_ALL = 0x3ff, /* combination of all above */
};
struct btf_map_def {
enum map_def_parts parts;
__u32 map_type;
__u32 key_type_id;
__u32 key_size;
__u32 value_type_id;
__u32 value_size;
__u32 max_entries;
__u32 map_flags;
__u32 numa_node;
__u32 pinning;
};
int parse_btf_map_def(const char *map_name, struct btf *btf,
const struct btf_type *def_t, bool strict,
struct btf_map_def *map_def, struct btf_map_def *inner_def);
void *libbpf_add_mem(void **data, size_t *cap_cnt, size_t elem_sz,
size_t cur_cnt, size_t max_cnt, size_t add_cnt);
int libbpf_ensure_mem(void **data, size_t *cap_cnt, size_t elem_sz, size_t need_cnt);
static inline bool libbpf_validate_opts(const char *opts,
size_t opts_sz, size_t user_sz,
@@ -151,10 +234,41 @@ int parse_cpu_mask_file(const char *fcpu, bool **mask, int *mask_sz);
int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
const char *str_sec, size_t str_len);
struct bpf_prog_load_params {
enum bpf_prog_type prog_type;
enum bpf_attach_type expected_attach_type;
const char *name;
const struct bpf_insn *insns;
size_t insn_cnt;
const char *license;
__u32 kern_version;
__u32 attach_prog_fd;
__u32 attach_btf_obj_fd;
__u32 attach_btf_id;
__u32 prog_ifindex;
__u32 prog_btf_fd;
__u32 prog_flags;
__u32 func_info_rec_size;
const void *func_info;
__u32 func_info_cnt;
__u32 line_info_rec_size;
const void *line_info;
__u32 line_info_cnt;
__u32 log_level;
char *log_buf;
size_t log_buf_sz;
};
int libbpf__bpf_prog_load(const struct bpf_prog_load_params *load_attr);
int bpf_object__section_size(const struct bpf_object *obj, const char *name,
__u32 *size);
int bpf_object__variable_offset(const struct bpf_object *obj, const char *name,
__u32 *off);
struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf);
struct btf_ext_info {
/*
@@ -318,4 +432,11 @@ struct bpf_core_relo {
enum bpf_core_relo_kind kind;
};
typedef int (*type_id_visit_fn)(__u32 *type_id, void *ctx);
typedef int (*str_off_visit_fn)(__u32 *str_off, void *ctx);
int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx);
int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ctx);
int btf_ext_visit_type_ids(struct btf_ext *btf_ext, type_id_visit_fn visit, void *ctx);
int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void *ctx);
#endif /* __LIBBPF_LIBBPF_INTERNAL_H */

View File

@@ -230,6 +230,7 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
break;
case BPF_MAP_TYPE_SK_STORAGE:
case BPF_MAP_TYPE_INODE_STORAGE:
case BPF_MAP_TYPE_TASK_STORAGE:
btf_key_type_id = 1;
btf_value_type_id = 3;
value_size = 8;

View File

@@ -1,47 +0,0 @@
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
/* Copyright (c) 2019 Facebook */
#ifndef __LIBBPF_LIBBPF_UTIL_H
#define __LIBBPF_LIBBPF_UTIL_H
#include <stdbool.h>
#ifdef __cplusplus
extern "C" {
#endif
/* Use these barrier functions instead of smp_[rw]mb() when they are
* used in a libbpf header file. That way they can be built into the
* application that uses libbpf.
*/
#if defined(__i386__) || defined(__x86_64__)
# define libbpf_smp_rmb() asm volatile("" : : : "memory")
# define libbpf_smp_wmb() asm volatile("" : : : "memory")
# define libbpf_smp_mb() \
asm volatile("lock; addl $0,-4(%%rsp)" : : : "memory", "cc")
/* Hinders stores to be observed before older loads. */
# define libbpf_smp_rwmb() asm volatile("" : : : "memory")
#elif defined(__aarch64__)
# define libbpf_smp_rmb() asm volatile("dmb ishld" : : : "memory")
# define libbpf_smp_wmb() asm volatile("dmb ishst" : : : "memory")
# define libbpf_smp_mb() asm volatile("dmb ish" : : : "memory")
# define libbpf_smp_rwmb() libbpf_smp_mb()
#elif defined(__arm__)
/* These are only valid for armv7 and above */
# define libbpf_smp_rmb() asm volatile("dmb ish" : : : "memory")
# define libbpf_smp_wmb() asm volatile("dmb ishst" : : : "memory")
# define libbpf_smp_mb() asm volatile("dmb ish" : : : "memory")
# define libbpf_smp_rwmb() libbpf_smp_mb()
#else
/* Architecture missing native barrier functions. */
# define libbpf_smp_rmb() __sync_synchronize()
# define libbpf_smp_wmb() __sync_synchronize()
# define libbpf_smp_mb() __sync_synchronize()
# define libbpf_smp_rwmb() __sync_synchronize()
#endif
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif

2892
src/linker.c Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -4,7 +4,10 @@
#include <stdlib.h>
#include <memory.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/pkt_cls.h>
#include <linux/rtnetlink.h>
#include <sys/socket.h>
#include <errno.h>
@@ -40,7 +43,7 @@ static int libbpf_netlink_open(__u32 *nl_pid)
memset(&sa, 0, sizeof(sa));
sa.nl_family = AF_NETLINK;
sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
sock = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_ROUTE);
if (sock < 0)
return -errno;
@@ -73,9 +76,20 @@ cleanup:
return ret;
}
static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
__dump_nlmsg_t _fn, libbpf_dump_nlmsg_t fn,
void *cookie)
static void libbpf_netlink_close(int sock)
{
close(sock);
}
enum {
NL_CONT,
NL_NEXT,
NL_DONE,
};
static int libbpf_netlink_recv(int sock, __u32 nl_pid, int seq,
__dump_nlmsg_t _fn, libbpf_dump_nlmsg_t fn,
void *cookie)
{
bool multipart = true;
struct nlmsgerr *err;
@@ -84,6 +98,7 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
int len, ret;
while (multipart) {
start:
multipart = false;
len = recv(sock, buf, sizeof(buf), 0);
if (len < 0) {
@@ -121,8 +136,16 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
}
if (_fn) {
ret = _fn(nh, fn, cookie);
if (ret)
switch (ret) {
case NL_CONT:
break;
case NL_NEXT:
goto start;
case NL_DONE:
return 0;
default:
return ret;
}
}
}
}
@@ -131,74 +154,74 @@ done:
return ret;
}
static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
__u32 flags)
static int libbpf_netlink_send_recv(struct nlmsghdr *nh,
__dump_nlmsg_t parse_msg,
libbpf_dump_nlmsg_t parse_attr,
void *cookie)
{
int sock, seq = 0, ret;
struct nlattr *nla, *nla_xdp;
struct {
struct nlmsghdr nh;
struct ifinfomsg ifinfo;
char attrbuf[64];
} req;
__u32 nl_pid = 0;
int sock, ret;
sock = libbpf_netlink_open(&nl_pid);
if (sock < 0)
return sock;
memset(&req, 0, sizeof(req));
req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
req.nh.nlmsg_type = RTM_SETLINK;
req.nh.nlmsg_pid = 0;
req.nh.nlmsg_seq = ++seq;
req.ifinfo.ifi_family = AF_UNSPEC;
req.ifinfo.ifi_index = ifindex;
nh->nlmsg_pid = 0;
nh->nlmsg_seq = time(NULL);
/* started nested attribute for XDP */
nla = (struct nlattr *)(((char *)&req)
+ NLMSG_ALIGN(req.nh.nlmsg_len));
nla->nla_type = NLA_F_NESTED | IFLA_XDP;
nla->nla_len = NLA_HDRLEN;
/* add XDP fd */
nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
nla_xdp->nla_type = IFLA_XDP_FD;
nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
nla->nla_len += nla_xdp->nla_len;
/* if user passed in any flags, add those too */
if (flags) {
nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
nla_xdp->nla_type = IFLA_XDP_FLAGS;
nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
nla->nla_len += nla_xdp->nla_len;
}
if (flags & XDP_FLAGS_REPLACE) {
nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
nla_xdp->nla_type = IFLA_XDP_EXPECTED_FD;
nla_xdp->nla_len = NLA_HDRLEN + sizeof(old_fd);
memcpy((char *)nla_xdp + NLA_HDRLEN, &old_fd, sizeof(old_fd));
nla->nla_len += nla_xdp->nla_len;
}
req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
if (send(sock, nh, nh->nlmsg_len, 0) < 0) {
ret = -errno;
goto cleanup;
goto out;
}
ret = bpf_netlink_recv(sock, nl_pid, seq, NULL, NULL, NULL);
cleanup:
close(sock);
ret = libbpf_netlink_recv(sock, nl_pid, nh->nlmsg_seq,
parse_msg, parse_attr, cookie);
out:
libbpf_netlink_close(sock);
return ret;
}
static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
__u32 flags)
{
struct nlattr *nla;
int ret;
struct {
struct nlmsghdr nh;
struct ifinfomsg ifinfo;
char attrbuf[64];
} req;
memset(&req, 0, sizeof(req));
req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
req.nh.nlmsg_type = RTM_SETLINK;
req.ifinfo.ifi_family = AF_UNSPEC;
req.ifinfo.ifi_index = ifindex;
nla = nlattr_begin_nested(&req.nh, sizeof(req), IFLA_XDP);
if (!nla)
return -EMSGSIZE;
ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FD, &fd, sizeof(fd));
if (ret < 0)
return ret;
if (flags) {
ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FLAGS, &flags,
sizeof(flags));
if (ret < 0)
return ret;
}
if (flags & XDP_FLAGS_REPLACE) {
ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_EXPECTED_FD,
&old_fd, sizeof(old_fd));
if (ret < 0)
return ret;
}
nlattr_end_nested(&req.nh, nla);
return libbpf_netlink_send_recv(&req.nh, NULL, NULL, NULL);
}
int bpf_set_link_xdp_fd_opts(int ifindex, int fd, __u32 flags,
const struct bpf_xdp_set_link_opts *opts)
{
@@ -212,9 +235,7 @@ int bpf_set_link_xdp_fd_opts(int ifindex, int fd, __u32 flags,
flags |= XDP_FLAGS_REPLACE;
}
return __bpf_set_link_xdp_fd_replace(ifindex, fd,
old_fd,
flags);
return __bpf_set_link_xdp_fd_replace(ifindex, fd, old_fd, flags);
}
int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
@@ -231,6 +252,7 @@ static int __dump_link_nlmsg(struct nlmsghdr *nlh,
len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*ifi));
attr = (struct nlattr *) ((void *) ifi + NLMSG_ALIGN(sizeof(*ifi)));
if (libbpf_nla_parse(tb, IFLA_MAX, attr, len, NULL) != 0)
return -LIBBPF_ERRNO__NLPARSE;
@@ -282,16 +304,21 @@ static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb)
return 0;
}
static int libbpf_nl_get_link(int sock, unsigned int nl_pid,
libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie);
int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
size_t info_size, __u32 flags)
{
struct xdp_id_md xdp_id = {};
int sock, ret;
__u32 nl_pid = 0;
__u32 mask;
int ret;
struct {
struct nlmsghdr nh;
struct ifinfomsg ifm;
} req = {
.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
.nh.nlmsg_type = RTM_GETLINK,
.nh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
.ifm.ifi_family = AF_PACKET,
};
if (flags & ~XDP_FLAGS_MASK || !info_size)
return -EINVAL;
@@ -302,14 +329,11 @@ int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
if (flags && flags & mask)
return -EINVAL;
sock = libbpf_netlink_open(&nl_pid);
if (sock < 0)
return sock;
xdp_id.ifindex = ifindex;
xdp_id.flags = flags;
ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_info, &xdp_id);
ret = libbpf_netlink_send_recv(&req.nh, __dump_link_nlmsg,
get_xdp_info, &xdp_id);
if (!ret) {
size_t sz = min(info_size, sizeof(xdp_id.info));
@@ -317,7 +341,6 @@ int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
memset((void *) info + sz, 0, info_size - sz);
}
close(sock);
return ret;
}
@@ -349,24 +372,403 @@ int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
return ret;
}
int libbpf_nl_get_link(int sock, unsigned int nl_pid,
libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
typedef int (*qdisc_config_t)(struct nlmsghdr *nh, struct tcmsg *t,
size_t maxsz);
static int clsact_config(struct nlmsghdr *nh, struct tcmsg *t, size_t maxsz)
{
struct {
struct nlmsghdr nlh;
struct ifinfomsg ifm;
} req = {
.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
.nlh.nlmsg_type = RTM_GETLINK,
.nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
.ifm.ifi_family = AF_PACKET,
};
int seq = time(NULL);
t->tcm_parent = TC_H_CLSACT;
t->tcm_handle = TC_H_MAKE(TC_H_CLSACT, 0);
req.nlh.nlmsg_seq = seq;
if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0)
return -errno;
return bpf_netlink_recv(sock, nl_pid, seq, __dump_link_nlmsg,
dump_link_nlmsg, cookie);
return nlattr_add(nh, maxsz, TCA_KIND, "clsact", sizeof("clsact"));
}
static int attach_point_to_config(struct bpf_tc_hook *hook,
qdisc_config_t *config)
{
switch (OPTS_GET(hook, attach_point, 0)) {
case BPF_TC_INGRESS:
case BPF_TC_EGRESS:
case BPF_TC_INGRESS | BPF_TC_EGRESS:
if (OPTS_GET(hook, parent, 0))
return -EINVAL;
*config = &clsact_config;
return 0;
case BPF_TC_CUSTOM:
return -EOPNOTSUPP;
default:
return -EINVAL;
}
}
static int tc_get_tcm_parent(enum bpf_tc_attach_point attach_point,
__u32 *parent)
{
switch (attach_point) {
case BPF_TC_INGRESS:
case BPF_TC_EGRESS:
if (*parent)
return -EINVAL;
*parent = TC_H_MAKE(TC_H_CLSACT,
attach_point == BPF_TC_INGRESS ?
TC_H_MIN_INGRESS : TC_H_MIN_EGRESS);
break;
case BPF_TC_CUSTOM:
if (!*parent)
return -EINVAL;
break;
default:
return -EINVAL;
}
return 0;
}
static int tc_qdisc_modify(struct bpf_tc_hook *hook, int cmd, int flags)
{
qdisc_config_t config;
int ret;
struct {
struct nlmsghdr nh;
struct tcmsg tc;
char buf[256];
} req;
ret = attach_point_to_config(hook, &config);
if (ret < 0)
return ret;
memset(&req, 0, sizeof(req));
req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | flags;
req.nh.nlmsg_type = cmd;
req.tc.tcm_family = AF_UNSPEC;
req.tc.tcm_ifindex = OPTS_GET(hook, ifindex, 0);
ret = config(&req.nh, &req.tc, sizeof(req));
if (ret < 0)
return ret;
return libbpf_netlink_send_recv(&req.nh, NULL, NULL, NULL);
}
static int tc_qdisc_create_excl(struct bpf_tc_hook *hook)
{
return tc_qdisc_modify(hook, RTM_NEWQDISC, NLM_F_CREATE);
}
static int tc_qdisc_delete(struct bpf_tc_hook *hook)
{
return tc_qdisc_modify(hook, RTM_DELQDISC, 0);
}
int bpf_tc_hook_create(struct bpf_tc_hook *hook)
{
if (!hook || !OPTS_VALID(hook, bpf_tc_hook) ||
OPTS_GET(hook, ifindex, 0) <= 0)
return -EINVAL;
return tc_qdisc_create_excl(hook);
}
static int __bpf_tc_detach(const struct bpf_tc_hook *hook,
const struct bpf_tc_opts *opts,
const bool flush);
int bpf_tc_hook_destroy(struct bpf_tc_hook *hook)
{
if (!hook || !OPTS_VALID(hook, bpf_tc_hook) ||
OPTS_GET(hook, ifindex, 0) <= 0)
return -EINVAL;
switch (OPTS_GET(hook, attach_point, 0)) {
case BPF_TC_INGRESS:
case BPF_TC_EGRESS:
return __bpf_tc_detach(hook, NULL, true);
case BPF_TC_INGRESS | BPF_TC_EGRESS:
return tc_qdisc_delete(hook);
case BPF_TC_CUSTOM:
return -EOPNOTSUPP;
default:
return -EINVAL;
}
}
struct bpf_cb_ctx {
struct bpf_tc_opts *opts;
bool processed;
};
static int __get_tc_info(void *cookie, struct tcmsg *tc, struct nlattr **tb,
bool unicast)
{
struct nlattr *tbb[TCA_BPF_MAX + 1];
struct bpf_cb_ctx *info = cookie;
if (!info || !info->opts)
return -EINVAL;
if (unicast && info->processed)
return -EINVAL;
if (!tb[TCA_OPTIONS])
return NL_CONT;
libbpf_nla_parse_nested(tbb, TCA_BPF_MAX, tb[TCA_OPTIONS], NULL);
if (!tbb[TCA_BPF_ID])
return -EINVAL;
OPTS_SET(info->opts, prog_id, libbpf_nla_getattr_u32(tbb[TCA_BPF_ID]));
OPTS_SET(info->opts, handle, tc->tcm_handle);
OPTS_SET(info->opts, priority, TC_H_MAJ(tc->tcm_info) >> 16);
info->processed = true;
return unicast ? NL_NEXT : NL_DONE;
}
static int get_tc_info(struct nlmsghdr *nh, libbpf_dump_nlmsg_t fn,
void *cookie)
{
struct tcmsg *tc = NLMSG_DATA(nh);
struct nlattr *tb[TCA_MAX + 1];
libbpf_nla_parse(tb, TCA_MAX,
(struct nlattr *)((char *)tc + NLMSG_ALIGN(sizeof(*tc))),
NLMSG_PAYLOAD(nh, sizeof(*tc)), NULL);
if (!tb[TCA_KIND])
return NL_CONT;
return __get_tc_info(cookie, tc, tb, nh->nlmsg_flags & NLM_F_ECHO);
}
static int tc_add_fd_and_name(struct nlmsghdr *nh, size_t maxsz, int fd)
{
struct bpf_prog_info info = {};
__u32 info_len = sizeof(info);
char name[256];
int len, ret;
ret = bpf_obj_get_info_by_fd(fd, &info, &info_len);
if (ret < 0)
return ret;
ret = nlattr_add(nh, maxsz, TCA_BPF_FD, &fd, sizeof(fd));
if (ret < 0)
return ret;
len = snprintf(name, sizeof(name), "%s:[%u]", info.name, info.id);
if (len < 0)
return -errno;
if (len >= sizeof(name))
return -ENAMETOOLONG;
return nlattr_add(nh, maxsz, TCA_BPF_NAME, name, len + 1);
}
int bpf_tc_attach(const struct bpf_tc_hook *hook, struct bpf_tc_opts *opts)
{
__u32 protocol, bpf_flags, handle, priority, parent, prog_id, flags;
int ret, ifindex, attach_point, prog_fd;
struct bpf_cb_ctx info = {};
struct nlattr *nla;
struct {
struct nlmsghdr nh;
struct tcmsg tc;
char buf[256];
} req;
if (!hook || !opts ||
!OPTS_VALID(hook, bpf_tc_hook) ||
!OPTS_VALID(opts, bpf_tc_opts))
return -EINVAL;
ifindex = OPTS_GET(hook, ifindex, 0);
parent = OPTS_GET(hook, parent, 0);
attach_point = OPTS_GET(hook, attach_point, 0);
handle = OPTS_GET(opts, handle, 0);
priority = OPTS_GET(opts, priority, 0);
prog_fd = OPTS_GET(opts, prog_fd, 0);
prog_id = OPTS_GET(opts, prog_id, 0);
flags = OPTS_GET(opts, flags, 0);
if (ifindex <= 0 || !prog_fd || prog_id)
return -EINVAL;
if (priority > UINT16_MAX)
return -EINVAL;
if (flags & ~BPF_TC_F_REPLACE)
return -EINVAL;
flags = (flags & BPF_TC_F_REPLACE) ? NLM_F_REPLACE : NLM_F_EXCL;
protocol = ETH_P_ALL;
memset(&req, 0, sizeof(req));
req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE |
NLM_F_ECHO | flags;
req.nh.nlmsg_type = RTM_NEWTFILTER;
req.tc.tcm_family = AF_UNSPEC;
req.tc.tcm_ifindex = ifindex;
req.tc.tcm_handle = handle;
req.tc.tcm_info = TC_H_MAKE(priority << 16, htons(protocol));
ret = tc_get_tcm_parent(attach_point, &parent);
if (ret < 0)
return ret;
req.tc.tcm_parent = parent;
ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
if (ret < 0)
return ret;
nla = nlattr_begin_nested(&req.nh, sizeof(req), TCA_OPTIONS);
if (!nla)
return -EMSGSIZE;
ret = tc_add_fd_and_name(&req.nh, sizeof(req), prog_fd);
if (ret < 0)
return ret;
bpf_flags = TCA_BPF_FLAG_ACT_DIRECT;
ret = nlattr_add(&req.nh, sizeof(req), TCA_BPF_FLAGS, &bpf_flags,
sizeof(bpf_flags));
if (ret < 0)
return ret;
nlattr_end_nested(&req.nh, nla);
info.opts = opts;
ret = libbpf_netlink_send_recv(&req.nh, get_tc_info, NULL, &info);
if (ret < 0)
return ret;
if (!info.processed)
return -ENOENT;
return ret;
}
static int __bpf_tc_detach(const struct bpf_tc_hook *hook,
const struct bpf_tc_opts *opts,
const bool flush)
{
__u32 protocol = 0, handle, priority, parent, prog_id, flags;
int ret, ifindex, attach_point, prog_fd;
struct {
struct nlmsghdr nh;
struct tcmsg tc;
char buf[256];
} req;
if (!hook ||
!OPTS_VALID(hook, bpf_tc_hook) ||
!OPTS_VALID(opts, bpf_tc_opts))
return -EINVAL;
ifindex = OPTS_GET(hook, ifindex, 0);
parent = OPTS_GET(hook, parent, 0);
attach_point = OPTS_GET(hook, attach_point, 0);
handle = OPTS_GET(opts, handle, 0);
priority = OPTS_GET(opts, priority, 0);
prog_fd = OPTS_GET(opts, prog_fd, 0);
prog_id = OPTS_GET(opts, prog_id, 0);
flags = OPTS_GET(opts, flags, 0);
if (ifindex <= 0 || flags || prog_fd || prog_id)
return -EINVAL;
if (priority > UINT16_MAX)
return -EINVAL;
if (flags & ~BPF_TC_F_REPLACE)
return -EINVAL;
if (!flush) {
if (!handle || !priority)
return -EINVAL;
protocol = ETH_P_ALL;
} else {
if (handle || priority)
return -EINVAL;
}
memset(&req, 0, sizeof(req));
req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
req.nh.nlmsg_type = RTM_DELTFILTER;
req.tc.tcm_family = AF_UNSPEC;
req.tc.tcm_ifindex = ifindex;
if (!flush) {
req.tc.tcm_handle = handle;
req.tc.tcm_info = TC_H_MAKE(priority << 16, htons(protocol));
}
ret = tc_get_tcm_parent(attach_point, &parent);
if (ret < 0)
return ret;
req.tc.tcm_parent = parent;
if (!flush) {
ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND,
"bpf", sizeof("bpf"));
if (ret < 0)
return ret;
}
return libbpf_netlink_send_recv(&req.nh, NULL, NULL, NULL);
}
int bpf_tc_detach(const struct bpf_tc_hook *hook,
const struct bpf_tc_opts *opts)
{
return !opts ? -EINVAL : __bpf_tc_detach(hook, opts, false);
}
int bpf_tc_query(const struct bpf_tc_hook *hook, struct bpf_tc_opts *opts)
{
__u32 protocol, handle, priority, parent, prog_id, flags;
int ret, ifindex, attach_point, prog_fd;
struct bpf_cb_ctx info = {};
struct {
struct nlmsghdr nh;
struct tcmsg tc;
char buf[256];
} req;
if (!hook || !opts ||
!OPTS_VALID(hook, bpf_tc_hook) ||
!OPTS_VALID(opts, bpf_tc_opts))
return -EINVAL;
ifindex = OPTS_GET(hook, ifindex, 0);
parent = OPTS_GET(hook, parent, 0);
attach_point = OPTS_GET(hook, attach_point, 0);
handle = OPTS_GET(opts, handle, 0);
priority = OPTS_GET(opts, priority, 0);
prog_fd = OPTS_GET(opts, prog_fd, 0);
prog_id = OPTS_GET(opts, prog_id, 0);
flags = OPTS_GET(opts, flags, 0);
if (ifindex <= 0 || flags || prog_fd || prog_id ||
!handle || !priority)
return -EINVAL;
if (priority > UINT16_MAX)
return -EINVAL;
protocol = ETH_P_ALL;
memset(&req, 0, sizeof(req));
req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
req.nh.nlmsg_flags = NLM_F_REQUEST;
req.nh.nlmsg_type = RTM_GETTFILTER;
req.tc.tcm_family = AF_UNSPEC;
req.tc.tcm_ifindex = ifindex;
req.tc.tcm_handle = handle;
req.tc.tcm_info = TC_H_MAKE(priority << 16, htons(protocol));
ret = tc_get_tcm_parent(attach_point, &parent);
if (ret < 0)
return ret;
req.tc.tcm_parent = parent;
ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
if (ret < 0)
return ret;
info.opts = opts;
ret = libbpf_netlink_send_recv(&req.nh, get_tc_info, NULL, &info);
if (ret < 0)
return ret;
if (!info.processed)
return -ENOENT;
return ret;
}

View File

@@ -10,7 +10,10 @@
#define __LIBBPF_NLATTR_H
#include <stdint.h>
#include <string.h>
#include <errno.h>
#include <linux/netlink.h>
/* avoid multiple definition of netlink features */
#define __LINUX_NETLINK_H
@@ -103,4 +106,49 @@ int libbpf_nla_parse_nested(struct nlattr *tb[], int maxtype,
int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh);
static inline struct nlattr *nla_data(struct nlattr *nla)
{
return (struct nlattr *)((char *)nla + NLA_HDRLEN);
}
static inline struct nlattr *nh_tail(struct nlmsghdr *nh)
{
return (struct nlattr *)((char *)nh + NLMSG_ALIGN(nh->nlmsg_len));
}
static inline int nlattr_add(struct nlmsghdr *nh, size_t maxsz, int type,
const void *data, int len)
{
struct nlattr *nla;
if (NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(NLA_HDRLEN + len) > maxsz)
return -EMSGSIZE;
if (!!data != !!len)
return -EINVAL;
nla = nh_tail(nh);
nla->nla_type = type;
nla->nla_len = NLA_HDRLEN + len;
if (data)
memcpy(nla_data(nla), data, len);
nh->nlmsg_len = NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(nla->nla_len);
return 0;
}
static inline struct nlattr *nlattr_begin_nested(struct nlmsghdr *nh,
size_t maxsz, int type)
{
struct nlattr *tail;
tail = nh_tail(nh);
if (nlattr_add(nh, maxsz, type | NLA_F_NESTED, NULL, 0))
return NULL;
return tail;
}
static inline void nlattr_end_nested(struct nlmsghdr *nh, struct nlattr *tail)
{
tail->nla_len = (char *)nh_tail(nh) - (char *)tail;
}
#endif /* __LIBBPF_NLATTR_H */

View File

@@ -202,9 +202,11 @@ static inline int roundup_len(__u32 len)
return (len + 7) / 8 * 8;
}
static int ringbuf_process_ring(struct ring* r)
static int64_t ringbuf_process_ring(struct ring* r)
{
int *len_ptr, len, err, cnt = 0;
int *len_ptr, len, err;
/* 64-bit to avoid overflow in case of extreme application behavior */
int64_t cnt = 0;
unsigned long cons_pos, prod_pos;
bool got_new_data;
void *sample;
@@ -227,7 +229,7 @@ static int ringbuf_process_ring(struct ring* r)
if ((len & BPF_RINGBUF_DISCARD_BIT) == 0) {
sample = (void *)len_ptr + BPF_RINGBUF_HDR_SZ;
err = r->sample_cb(r->ctx, sample, len);
if (err) {
if (err < 0) {
/* update consumer pos and bail out */
smp_store_release(r->consumer_pos,
cons_pos);
@@ -244,12 +246,14 @@ done:
}
/* Consume available ring buffer(s) data without event polling.
* Returns number of records consumed across all registered ring buffers, or
* negative number if any of the callbacks return error.
* Returns number of records consumed across all registered ring buffers (or
* INT_MAX, whichever is less), or negative number if any of the callbacks
* return error.
*/
int ring_buffer__consume(struct ring_buffer *rb)
{
int i, err, res = 0;
int64_t err, res = 0;
int i;
for (i = 0; i < rb->ring_cnt; i++) {
struct ring *ring = &rb->rings[i];
@@ -259,18 +263,24 @@ int ring_buffer__consume(struct ring_buffer *rb)
return err;
res += err;
}
if (res > INT_MAX)
return INT_MAX;
return res;
}
/* Poll for available data and consume records, if any are available.
* Returns number of records consumed, or negative number, if any of the
* registered callbacks returned error.
* Returns number of records consumed (or INT_MAX, whichever is less), or
* negative number, if any of the registered callbacks returned error.
*/
int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms)
{
int i, cnt, err, res = 0;
int i, cnt;
int64_t err, res = 0;
cnt = epoll_wait(rb->epoll_fd, rb->events, rb->ring_cnt, timeout_ms);
if (cnt < 0)
return -errno;
for (i = 0; i < cnt; i++) {
__u32 ring_id = rb->events[i].data.fd;
struct ring *ring = &rb->rings[ring_id];
@@ -278,7 +288,15 @@ int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms)
err = ringbuf_process_ring(ring);
if (err < 0)
return err;
res += cnt;
res += err;
}
return cnt < 0 ? -errno : res;
if (res > INT_MAX)
return INT_MAX;
return res;
}
/* Get an fd that can be used to sleep until data is available in the ring(s) */
int ring_buffer__epoll_fd(const struct ring_buffer *rb)
{
return rb->epoll_fd;
}

176
src/strset.c Normal file
View File

@@ -0,0 +1,176 @@
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
/* Copyright (c) 2021 Facebook */
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <linux/err.h>
#include "hashmap.h"
#include "libbpf_internal.h"
#include "strset.h"
struct strset {
void *strs_data;
size_t strs_data_len;
size_t strs_data_cap;
size_t strs_data_max_len;
/* lookup index for each unique string in strings set */
struct hashmap *strs_hash;
};
static size_t strset_hash_fn(const void *key, void *ctx)
{
const struct strset *s = ctx;
const char *str = s->strs_data + (long)key;
return str_hash(str);
}
static bool strset_equal_fn(const void *key1, const void *key2, void *ctx)
{
const struct strset *s = ctx;
const char *str1 = s->strs_data + (long)key1;
const char *str2 = s->strs_data + (long)key2;
return strcmp(str1, str2) == 0;
}
struct strset *strset__new(size_t max_data_sz, const char *init_data, size_t init_data_sz)
{
struct strset *set = calloc(1, sizeof(*set));
struct hashmap *hash;
int err = -ENOMEM;
if (!set)
return ERR_PTR(-ENOMEM);
hash = hashmap__new(strset_hash_fn, strset_equal_fn, set);
if (IS_ERR(hash))
goto err_out;
set->strs_data_max_len = max_data_sz;
set->strs_hash = hash;
if (init_data) {
long off;
set->strs_data = malloc(init_data_sz);
if (!set->strs_data)
goto err_out;
memcpy(set->strs_data, init_data, init_data_sz);
set->strs_data_len = init_data_sz;
set->strs_data_cap = init_data_sz;
for (off = 0; off < set->strs_data_len; off += strlen(set->strs_data + off) + 1) {
/* hashmap__add() returns EEXIST if string with the same
* content already is in the hash map
*/
err = hashmap__add(hash, (void *)off, (void *)off);
if (err == -EEXIST)
continue; /* duplicate */
if (err)
goto err_out;
}
}
return set;
err_out:
strset__free(set);
return ERR_PTR(err);
}
void strset__free(struct strset *set)
{
if (IS_ERR_OR_NULL(set))
return;
hashmap__free(set->strs_hash);
free(set->strs_data);
}
size_t strset__data_size(const struct strset *set)
{
return set->strs_data_len;
}
const char *strset__data(const struct strset *set)
{
return set->strs_data;
}
static void *strset_add_str_mem(struct strset *set, size_t add_sz)
{
return libbpf_add_mem(&set->strs_data, &set->strs_data_cap, 1,
set->strs_data_len, set->strs_data_max_len, add_sz);
}
/* Find string offset that corresponds to a given string *s*.
* Returns:
* - >0 offset into string data, if string is found;
* - -ENOENT, if string is not in the string data;
* - <0, on any other error.
*/
int strset__find_str(struct strset *set, const char *s)
{
long old_off, new_off, len;
void *p;
/* see strset__add_str() for why we do this */
len = strlen(s) + 1;
p = strset_add_str_mem(set, len);
if (!p)
return -ENOMEM;
new_off = set->strs_data_len;
memcpy(p, s, len);
if (hashmap__find(set->strs_hash, (void *)new_off, (void **)&old_off))
return old_off;
return -ENOENT;
}
/* Add a string s to the string data. If the string already exists, return its
* offset within string data.
* Returns:
* - > 0 offset into string data, on success;
* - < 0, on error.
*/
int strset__add_str(struct strset *set, const char *s)
{
long old_off, new_off, len;
void *p;
int err;
/* Hashmap keys are always offsets within set->strs_data, so to even
* look up some string from the "outside", we need to first append it
* at the end, so that it can be addressed with an offset. Luckily,
* until set->strs_data_len is incremented, that string is just a piece
* of garbage for the rest of the code, so no harm, no foul. On the
* other hand, if the string is unique, it's already appended and
* ready to be used, only a simple set->strs_data_len increment away.
*/
len = strlen(s) + 1;
p = strset_add_str_mem(set, len);
if (!p)
return -ENOMEM;
new_off = set->strs_data_len;
memcpy(p, s, len);
/* Now attempt to add the string, but only if the string with the same
* contents doesn't exist already (HASHMAP_ADD strategy). If such
* string exists, we'll get its offset in old_off (that's old_key).
*/
err = hashmap__insert(set->strs_hash, (void *)new_off, (void *)new_off,
HASHMAP_ADD, (const void **)&old_off, NULL);
if (err == -EEXIST)
return old_off; /* duplicated string, return existing offset */
if (err)
return err;
set->strs_data_len += len; /* new unique string, adjust data length */
return new_off;
}

21
src/strset.h Normal file
View File

@@ -0,0 +1,21 @@
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
/* Copyright (c) 2021 Facebook */
#ifndef __LIBBPF_STRSET_H
#define __LIBBPF_STRSET_H
#include <stdbool.h>
#include <stddef.h>
struct strset;
struct strset *strset__new(size_t max_data_sz, const char *init_data, size_t init_data_sz);
void strset__free(struct strset *set);
const char *strset__data(const struct strset *set);
size_t strset__data_size(const struct strset *set);
int strset__find_str(struct strset *set, const char *s);
int strset__add_str(struct strset *set, const char *s);
#endif /* __LIBBPF_STRSET_H */

473
src/xsk.c
View File

@@ -28,6 +28,7 @@
#include <sys/mman.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <linux/if_link.h>
#include "bpf.h"
#include "libbpf.h"
@@ -46,6 +47,11 @@
#define PF_XDP AF_XDP
#endif
enum xsk_prog {
XSK_PROG_FALLBACK,
XSK_PROG_REDIRECT_FLAGS,
};
struct xsk_umem {
struct xsk_ring_prod *fill_save;
struct xsk_ring_cons *comp_save;
@@ -54,6 +60,8 @@ struct xsk_umem {
int fd;
int refcount;
struct list_head ctx_list;
bool rx_ring_setup_done;
bool tx_ring_setup_done;
};
struct xsk_ctx {
@@ -65,8 +73,10 @@ struct xsk_ctx {
int ifindex;
struct list_head list;
int prog_fd;
int link_fd;
int xsks_map_fd;
char ifname[IFNAMSIZ];
bool has_bpf_link;
};
struct xsk_socket {
@@ -351,14 +361,62 @@ int xsk_umem__create_v0_0_2(struct xsk_umem **umem_ptr, void *umem_area,
COMPAT_VERSION(xsk_umem__create_v0_0_2, xsk_umem__create, LIBBPF_0.0.2)
DEFAULT_VERSION(xsk_umem__create_v0_0_4, xsk_umem__create, LIBBPF_0.0.4)
static enum xsk_prog get_xsk_prog(void)
{
enum xsk_prog detected = XSK_PROG_FALLBACK;
struct bpf_load_program_attr prog_attr;
struct bpf_create_map_attr map_attr;
__u32 size_out, retval, duration;
char data_in = 0, data_out;
struct bpf_insn insns[] = {
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_MOV64_IMM(BPF_REG_3, XDP_PASS),
BPF_EMIT_CALL(BPF_FUNC_redirect_map),
BPF_EXIT_INSN(),
};
int prog_fd, map_fd, ret;
memset(&map_attr, 0, sizeof(map_attr));
map_attr.map_type = BPF_MAP_TYPE_XSKMAP;
map_attr.key_size = sizeof(int);
map_attr.value_size = sizeof(int);
map_attr.max_entries = 1;
map_fd = bpf_create_map_xattr(&map_attr);
if (map_fd < 0)
return detected;
insns[0].imm = map_fd;
memset(&prog_attr, 0, sizeof(prog_attr));
prog_attr.prog_type = BPF_PROG_TYPE_XDP;
prog_attr.insns = insns;
prog_attr.insns_cnt = ARRAY_SIZE(insns);
prog_attr.license = "GPL";
prog_fd = bpf_load_program_xattr(&prog_attr, NULL, 0);
if (prog_fd < 0) {
close(map_fd);
return detected;
}
ret = bpf_prog_test_run(prog_fd, 0, &data_in, 1, &data_out, &size_out, &retval, &duration);
if (!ret && retval == XDP_PASS)
detected = XSK_PROG_REDIRECT_FLAGS;
close(prog_fd);
close(map_fd);
return detected;
}
static int xsk_load_xdp_prog(struct xsk_socket *xsk)
{
static const int log_buf_size = 16 * 1024;
struct xsk_ctx *ctx = xsk->ctx;
char log_buf[log_buf_size];
int err, prog_fd;
int prog_fd;
/* This is the C-program:
/* This is the fallback C-program:
* SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
* {
* int ret, index = ctx->rx_queue_index;
@@ -414,9 +472,31 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
/* The jumps are to this instruction */
BPF_EXIT_INSN(),
};
size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
prog_fd = bpf_load_program(BPF_PROG_TYPE_XDP, prog, insns_cnt,
/* This is the post-5.3 kernel C-program:
* SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
* {
* return bpf_redirect_map(&xsks_map, ctx->rx_queue_index, XDP_PASS);
* }
*/
struct bpf_insn prog_redirect_flags[] = {
/* r2 = *(u32 *)(r1 + 16) */
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 16),
/* r1 = xskmap[] */
BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd),
/* r3 = XDP_PASS */
BPF_MOV64_IMM(BPF_REG_3, 2),
/* call bpf_redirect_map */
BPF_EMIT_CALL(BPF_FUNC_redirect_map),
BPF_EXIT_INSN(),
};
size_t insns_cnt[] = {sizeof(prog) / sizeof(struct bpf_insn),
sizeof(prog_redirect_flags) / sizeof(struct bpf_insn),
};
struct bpf_insn *progs[] = {prog, prog_redirect_flags};
enum xsk_prog option = get_xsk_prog();
prog_fd = bpf_load_program(BPF_PROG_TYPE_XDP, progs[option], insns_cnt[option],
"LGPL-2.1 or BSD-2-Clause", 0, log_buf,
log_buf_size);
if (prog_fd < 0) {
@@ -424,14 +504,41 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
return prog_fd;
}
err = bpf_set_link_xdp_fd(xsk->ctx->ifindex, prog_fd,
xsk->config.xdp_flags);
ctx->prog_fd = prog_fd;
return 0;
}
static int xsk_create_bpf_link(struct xsk_socket *xsk)
{
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
struct xsk_ctx *ctx = xsk->ctx;
__u32 prog_id = 0;
int link_fd;
int err;
err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags);
if (err) {
close(prog_fd);
pr_warn("getting XDP prog id failed\n");
return err;
}
ctx->prog_fd = prog_fd;
/* if there's a netlink-based XDP prog loaded on interface, bail out
* and ask user to do the removal by himself
*/
if (prog_id) {
pr_warn("Netlink-based XDP prog detected, please unload it in order to launch AF_XDP prog\n");
return -EINVAL;
}
opts.flags = xsk->config.xdp_flags & ~(XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_REPLACE);
link_fd = bpf_link_create(ctx->prog_fd, ctx->ifindex, BPF_XDP, &opts);
if (link_fd < 0) {
pr_warn("bpf_link_create failed: %s\n", strerror(errno));
return link_fd;
}
ctx->link_fd = link_fd;
return 0;
}
@@ -442,7 +549,7 @@ static int xsk_get_max_queues(struct xsk_socket *xsk)
struct ifreq ifr = {};
int fd, err, ret;
fd = socket(AF_INET, SOCK_DGRAM, 0);
fd = socket(AF_LOCAL, SOCK_DGRAM, 0);
if (fd < 0)
return -errno;
@@ -535,21 +642,21 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
if (fd < 0)
continue;
memset(&map_info, 0, map_len);
err = bpf_obj_get_info_by_fd(fd, &map_info, &map_len);
if (err) {
close(fd);
continue;
}
if (!strcmp(map_info.name, "xsks_map")) {
if (!strncmp(map_info.name, "xsks_map", sizeof(map_info.name))) {
ctx->xsks_map_fd = fd;
continue;
break;
}
close(fd);
}
err = 0;
if (ctx->xsks_map_fd == -1)
err = -ENOENT;
@@ -566,47 +673,223 @@ static int xsk_set_bpf_maps(struct xsk_socket *xsk)
&xsk->fd, 0);
}
static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
static int xsk_link_lookup(int ifindex, __u32 *prog_id, int *link_fd)
{
struct bpf_link_info link_info;
__u32 link_len;
__u32 id = 0;
int err;
int fd;
while (true) {
err = bpf_link_get_next_id(id, &id);
if (err) {
if (errno == ENOENT) {
err = 0;
break;
}
pr_warn("can't get next link: %s\n", strerror(errno));
break;
}
fd = bpf_link_get_fd_by_id(id);
if (fd < 0) {
if (errno == ENOENT)
continue;
pr_warn("can't get link by id (%u): %s\n", id, strerror(errno));
err = -errno;
break;
}
link_len = sizeof(struct bpf_link_info);
memset(&link_info, 0, link_len);
err = bpf_obj_get_info_by_fd(fd, &link_info, &link_len);
if (err) {
pr_warn("can't get link info: %s\n", strerror(errno));
close(fd);
break;
}
if (link_info.type == BPF_LINK_TYPE_XDP) {
if (link_info.xdp.ifindex == ifindex) {
*link_fd = fd;
if (prog_id)
*prog_id = link_info.prog_id;
break;
}
}
close(fd);
}
return err;
}
static bool xsk_probe_bpf_link(void)
{
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts,
.flags = XDP_FLAGS_SKB_MODE);
struct bpf_load_program_attr prog_attr;
struct bpf_insn insns[2] = {
BPF_MOV64_IMM(BPF_REG_0, XDP_PASS),
BPF_EXIT_INSN()
};
int prog_fd, link_fd = -1;
int ifindex_lo = 1;
bool ret = false;
int err;
err = xsk_link_lookup(ifindex_lo, NULL, &link_fd);
if (err)
return ret;
if (link_fd >= 0)
return true;
memset(&prog_attr, 0, sizeof(prog_attr));
prog_attr.prog_type = BPF_PROG_TYPE_XDP;
prog_attr.insns = insns;
prog_attr.insns_cnt = ARRAY_SIZE(insns);
prog_attr.license = "GPL";
prog_fd = bpf_load_program_xattr(&prog_attr, NULL, 0);
if (prog_fd < 0)
return ret;
link_fd = bpf_link_create(prog_fd, ifindex_lo, BPF_XDP, &opts);
close(prog_fd);
if (link_fd >= 0) {
ret = true;
close(link_fd);
}
return ret;
}
static int xsk_create_xsk_struct(int ifindex, struct xsk_socket *xsk)
{
char ifname[IFNAMSIZ];
struct xsk_ctx *ctx;
char *interface;
ctx = calloc(1, sizeof(*ctx));
if (!ctx)
return -ENOMEM;
interface = if_indextoname(ifindex, &ifname[0]);
if (!interface) {
free(ctx);
return -errno;
}
ctx->ifindex = ifindex;
memcpy(ctx->ifname, ifname, IFNAMSIZ -1);
ctx->ifname[IFNAMSIZ - 1] = 0;
xsk->ctx = ctx;
xsk->ctx->has_bpf_link = xsk_probe_bpf_link();
return 0;
}
static int xsk_init_xdp_res(struct xsk_socket *xsk,
int *xsks_map_fd)
{
struct xsk_ctx *ctx = xsk->ctx;
int err;
err = xsk_create_bpf_maps(xsk);
if (err)
return err;
err = xsk_load_xdp_prog(xsk);
if (err)
goto err_load_xdp_prog;
if (ctx->has_bpf_link)
err = xsk_create_bpf_link(xsk);
else
err = bpf_set_link_xdp_fd(xsk->ctx->ifindex, ctx->prog_fd,
xsk->config.xdp_flags);
if (err)
goto err_attach_xdp_prog;
if (!xsk->rx)
return err;
err = xsk_set_bpf_maps(xsk);
if (err)
goto err_set_bpf_maps;
return err;
err_set_bpf_maps:
if (ctx->has_bpf_link)
close(ctx->link_fd);
else
bpf_set_link_xdp_fd(ctx->ifindex, -1, 0);
err_attach_xdp_prog:
close(ctx->prog_fd);
err_load_xdp_prog:
xsk_delete_bpf_maps(xsk);
return err;
}
static int xsk_lookup_xdp_res(struct xsk_socket *xsk, int *xsks_map_fd, int prog_id)
{
struct xsk_ctx *ctx = xsk->ctx;
int err;
ctx->prog_fd = bpf_prog_get_fd_by_id(prog_id);
if (ctx->prog_fd < 0) {
err = -errno;
goto err_prog_fd;
}
err = xsk_lookup_bpf_maps(xsk);
if (err)
goto err_lookup_maps;
if (!xsk->rx)
return err;
err = xsk_set_bpf_maps(xsk);
if (err)
goto err_set_maps;
return err;
err_set_maps:
close(ctx->xsks_map_fd);
err_lookup_maps:
close(ctx->prog_fd);
err_prog_fd:
if (ctx->has_bpf_link)
close(ctx->link_fd);
return err;
}
static int __xsk_setup_xdp_prog(struct xsk_socket *_xdp, int *xsks_map_fd)
{
struct xsk_socket *xsk = _xdp;
struct xsk_ctx *ctx = xsk->ctx;
__u32 prog_id = 0;
int err;
err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id,
xsk->config.xdp_flags);
if (ctx->has_bpf_link)
err = xsk_link_lookup(ctx->ifindex, &prog_id, &ctx->link_fd);
else
err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags);
if (err)
return err;
if (!prog_id) {
err = xsk_create_bpf_maps(xsk);
if (err)
return err;
err = !prog_id ? xsk_init_xdp_res(xsk, xsks_map_fd) :
xsk_lookup_xdp_res(xsk, xsks_map_fd, prog_id);
err = xsk_load_xdp_prog(xsk);
if (err) {
xsk_delete_bpf_maps(xsk);
return err;
}
} else {
ctx->prog_fd = bpf_prog_get_fd_by_id(prog_id);
if (ctx->prog_fd < 0)
return -errno;
err = xsk_lookup_bpf_maps(xsk);
if (err) {
close(ctx->prog_fd);
return err;
}
}
if (!err && xsks_map_fd)
*xsks_map_fd = ctx->xsks_map_fd;
if (xsk->rx)
err = xsk_set_bpf_maps(xsk);
if (err) {
xsk_delete_bpf_maps(xsk);
close(ctx->prog_fd);
return err;
}
return 0;
return err;
}
static struct xsk_ctx *xsk_get_ctx(struct xsk_umem *umem, int ifindex,
@@ -627,26 +910,30 @@ static struct xsk_ctx *xsk_get_ctx(struct xsk_umem *umem, int ifindex,
return NULL;
}
static void xsk_put_ctx(struct xsk_ctx *ctx)
static void xsk_put_ctx(struct xsk_ctx *ctx, bool unmap)
{
struct xsk_umem *umem = ctx->umem;
struct xdp_mmap_offsets off;
int err;
if (--ctx->refcount == 0) {
err = xsk_get_mmap_offsets(umem->fd, &off);
if (!err) {
munmap(ctx->fill->ring - off.fr.desc,
off.fr.desc + umem->config.fill_size *
sizeof(__u64));
munmap(ctx->comp->ring - off.cr.desc,
off.cr.desc + umem->config.comp_size *
sizeof(__u64));
}
if (--ctx->refcount)
return;
list_del(&ctx->list);
free(ctx);
}
if (!unmap)
goto out_free;
err = xsk_get_mmap_offsets(umem->fd, &off);
if (err)
goto out_free;
munmap(ctx->fill->ring - off.fr.desc, off.fr.desc + umem->config.fill_size *
sizeof(__u64));
munmap(ctx->comp->ring - off.cr.desc, off.cr.desc + umem->config.comp_size *
sizeof(__u64));
out_free:
list_del(&ctx->list);
free(ctx);
}
static struct xsk_ctx *xsk_create_ctx(struct xsk_socket *xsk,
@@ -681,14 +968,46 @@ static struct xsk_ctx *xsk_create_ctx(struct xsk_socket *xsk,
memcpy(ctx->ifname, ifname, IFNAMSIZ - 1);
ctx->ifname[IFNAMSIZ - 1] = '\0';
umem->fill_save = NULL;
umem->comp_save = NULL;
ctx->fill = fill;
ctx->comp = comp;
list_add(&ctx->list, &umem->ctx_list);
return ctx;
}
static void xsk_destroy_xsk_struct(struct xsk_socket *xsk)
{
free(xsk->ctx);
free(xsk);
}
int xsk_socket__update_xskmap(struct xsk_socket *xsk, int fd)
{
xsk->ctx->xsks_map_fd = fd;
return xsk_set_bpf_maps(xsk);
}
int xsk_setup_xdp_prog(int ifindex, int *xsks_map_fd)
{
struct xsk_socket *xsk;
int res;
xsk = calloc(1, sizeof(*xsk));
if (!xsk)
return -ENOMEM;
res = xsk_create_xsk_struct(ifindex, xsk);
if (res) {
free(xsk);
return -EINVAL;
}
res = __xsk_setup_xdp_prog(xsk, xsks_map_fd);
xsk_destroy_xsk_struct(xsk);
return res;
}
int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
const char *ifname,
__u32 queue_id, struct xsk_umem *umem,
@@ -698,6 +1017,7 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
struct xsk_ring_cons *comp,
const struct xsk_socket_config *usr_config)
{
bool unmap, rx_setup_done = false, tx_setup_done = false;
void *rx_map = NULL, *tx_map = NULL;
struct sockaddr_xdp sxdp = {};
struct xdp_mmap_offsets off;
@@ -708,6 +1028,8 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
if (!umem || !xsk_ptr || !(rx || tx))
return -EFAULT;
unmap = umem->fill_save != fill;
xsk = calloc(1, sizeof(*xsk));
if (!xsk)
return -ENOMEM;
@@ -731,6 +1053,8 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
}
} else {
xsk->fd = umem->fd;
rx_setup_done = umem->rx_ring_setup_done;
tx_setup_done = umem->tx_ring_setup_done;
}
ctx = xsk_get_ctx(umem, ifindex, queue_id);
@@ -748,8 +1072,9 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
}
}
xsk->ctx = ctx;
xsk->ctx->has_bpf_link = xsk_probe_bpf_link();
if (rx) {
if (rx && !rx_setup_done) {
err = setsockopt(xsk->fd, SOL_XDP, XDP_RX_RING,
&xsk->config.rx_size,
sizeof(xsk->config.rx_size));
@@ -757,8 +1082,10 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
err = -errno;
goto out_put_ctx;
}
if (xsk->fd == umem->fd)
umem->rx_ring_setup_done = true;
}
if (tx) {
if (tx && !tx_setup_done) {
err = setsockopt(xsk->fd, SOL_XDP, XDP_TX_RING,
&xsk->config.tx_size,
sizeof(xsk->config.tx_size));
@@ -766,6 +1093,8 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
err = -errno;
goto out_put_ctx;
}
if (xsk->fd == umem->fd)
umem->rx_ring_setup_done = true;
}
err = xsk_get_mmap_offsets(xsk->fd, &off);
@@ -838,12 +1167,14 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
ctx->prog_fd = -1;
if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) {
err = xsk_setup_xdp_prog(xsk);
err = __xsk_setup_xdp_prog(xsk, NULL);
if (err)
goto out_mmap_tx;
}
*xsk_ptr = xsk;
umem->fill_save = NULL;
umem->comp_save = NULL;
return 0;
out_mmap_tx:
@@ -855,7 +1186,7 @@ out_mmap_rx:
munmap(rx_map, off.rx.desc +
xsk->config.rx_size * sizeof(struct xdp_desc));
out_put_ctx:
xsk_put_ctx(ctx);
xsk_put_ctx(ctx, unmap);
out_socket:
if (--umem->refcount)
close(xsk->fd);
@@ -869,6 +1200,9 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
struct xsk_ring_cons *rx, struct xsk_ring_prod *tx,
const struct xsk_socket_config *usr_config)
{
if (!umem)
return -EFAULT;
return xsk_socket__create_shared(xsk_ptr, ifname, queue_id, umem,
rx, tx, umem->fill_save,
umem->comp_save, usr_config);
@@ -891,16 +1225,21 @@ int xsk_umem__delete(struct xsk_umem *umem)
void xsk_socket__delete(struct xsk_socket *xsk)
{
size_t desc_sz = sizeof(struct xdp_desc);
struct xsk_ctx *ctx = xsk->ctx;
struct xdp_mmap_offsets off;
struct xsk_umem *umem;
struct xsk_ctx *ctx;
int err;
if (!xsk)
return;
ctx = xsk->ctx;
umem = ctx->umem;
if (ctx->prog_fd != -1) {
xsk_delete_bpf_maps(xsk);
close(ctx->prog_fd);
if (ctx->has_bpf_link)
close(ctx->link_fd);
}
err = xsk_get_mmap_offsets(xsk->fd, &off);
@@ -915,13 +1254,13 @@ void xsk_socket__delete(struct xsk_socket *xsk)
}
}
xsk_put_ctx(ctx);
xsk_put_ctx(ctx, true);
ctx->umem->refcount--;
umem->refcount--;
/* Do not close an fd that also has an associated umem connected
* to it.
*/
if (xsk->fd != ctx->umem->fd)
if (xsk->fd != umem->fd)
close(xsk->fd);
free(xsk);
}

109
src/xsk.h
View File

@@ -3,7 +3,8 @@
/*
* AF_XDP user-space access library.
*
* Copyright(c) 2018 - 2019 Intel Corporation.
* Copyright (c) 2018 - 2019 Intel Corporation.
* Copyright (c) 2019 Facebook
*
* Author(s): Magnus Karlsson <magnus.karlsson@intel.com>
*/
@@ -13,15 +14,80 @@
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
#include <linux/if_xdp.h>
#include "libbpf.h"
#include "libbpf_util.h"
#ifdef __cplusplus
extern "C" {
#endif
/* Load-Acquire Store-Release barriers used by the XDP socket
* library. The following macros should *NOT* be considered part of
* the xsk.h API, and is subject to change anytime.
*
* LIBRARY INTERNAL
*/
#define __XSK_READ_ONCE(x) (*(volatile typeof(x) *)&x)
#define __XSK_WRITE_ONCE(x, v) (*(volatile typeof(x) *)&x) = (v)
#if defined(__i386__) || defined(__x86_64__)
# define libbpf_smp_store_release(p, v) \
do { \
asm volatile("" : : : "memory"); \
__XSK_WRITE_ONCE(*p, v); \
} while (0)
# define libbpf_smp_load_acquire(p) \
({ \
typeof(*p) ___p1 = __XSK_READ_ONCE(*p); \
asm volatile("" : : : "memory"); \
___p1; \
})
#elif defined(__aarch64__)
# define libbpf_smp_store_release(p, v) \
asm volatile ("stlr %w1, %0" : "=Q" (*p) : "r" (v) : "memory")
# define libbpf_smp_load_acquire(p) \
({ \
typeof(*p) ___p1; \
asm volatile ("ldar %w0, %1" \
: "=r" (___p1) : "Q" (*p) : "memory"); \
___p1; \
})
#elif defined(__riscv)
# define libbpf_smp_store_release(p, v) \
do { \
asm volatile ("fence rw,w" : : : "memory"); \
__XSK_WRITE_ONCE(*p, v); \
} while (0)
# define libbpf_smp_load_acquire(p) \
({ \
typeof(*p) ___p1 = __XSK_READ_ONCE(*p); \
asm volatile ("fence r,rw" : : : "memory"); \
___p1; \
})
#endif
#ifndef libbpf_smp_store_release
#define libbpf_smp_store_release(p, v) \
do { \
__sync_synchronize(); \
__XSK_WRITE_ONCE(*p, v); \
} while (0)
#endif
#ifndef libbpf_smp_load_acquire
#define libbpf_smp_load_acquire(p) \
({ \
typeof(*p) ___p1 = __XSK_READ_ONCE(*p); \
__sync_synchronize(); \
___p1; \
})
#endif
/* LIBRARY INTERNAL -- END */
/* Do not access these members directly. Use the functions below. */
#define DEFINE_XSK_RING(name) \
struct name { \
@@ -96,7 +162,8 @@ static inline __u32 xsk_prod_nb_free(struct xsk_ring_prod *r, __u32 nb)
* this function. Without this optimization it whould have been
* free_entries = r->cached_prod - r->cached_cons + r->size.
*/
r->cached_cons = *r->consumer + r->size;
r->cached_cons = libbpf_smp_load_acquire(r->consumer);
r->cached_cons += r->size;
return r->cached_cons - r->cached_prod;
}
@@ -106,15 +173,14 @@ static inline __u32 xsk_cons_nb_avail(struct xsk_ring_cons *r, __u32 nb)
__u32 entries = r->cached_prod - r->cached_cons;
if (entries == 0) {
r->cached_prod = *r->producer;
r->cached_prod = libbpf_smp_load_acquire(r->producer);
entries = r->cached_prod - r->cached_cons;
}
return (entries > nb) ? nb : entries;
}
static inline size_t xsk_ring_prod__reserve(struct xsk_ring_prod *prod,
size_t nb, __u32 *idx)
static inline __u32 xsk_ring_prod__reserve(struct xsk_ring_prod *prod, __u32 nb, __u32 *idx)
{
if (xsk_prod_nb_free(prod, nb) < nb)
return 0;
@@ -125,27 +191,19 @@ static inline size_t xsk_ring_prod__reserve(struct xsk_ring_prod *prod,
return nb;
}
static inline void xsk_ring_prod__submit(struct xsk_ring_prod *prod, size_t nb)
static inline void xsk_ring_prod__submit(struct xsk_ring_prod *prod, __u32 nb)
{
/* Make sure everything has been written to the ring before indicating
* this to the kernel by writing the producer pointer.
*/
libbpf_smp_wmb();
*prod->producer += nb;
libbpf_smp_store_release(prod->producer, *prod->producer + nb);
}
static inline size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons,
size_t nb, __u32 *idx)
static inline __u32 xsk_ring_cons__peek(struct xsk_ring_cons *cons, __u32 nb, __u32 *idx)
{
size_t entries = xsk_cons_nb_avail(cons, nb);
__u32 entries = xsk_cons_nb_avail(cons, nb);
if (entries > 0) {
/* Make sure we do not speculatively read the data before
* we have received the packet buffers from the ring.
*/
libbpf_smp_rmb();
*idx = cons->cached_cons;
cons->cached_cons += entries;
}
@@ -153,14 +211,18 @@ static inline size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons,
return entries;
}
static inline void xsk_ring_cons__release(struct xsk_ring_cons *cons, size_t nb)
static inline void xsk_ring_cons__cancel(struct xsk_ring_cons *cons, __u32 nb)
{
cons->cached_cons -= nb;
}
static inline void xsk_ring_cons__release(struct xsk_ring_cons *cons, __u32 nb)
{
/* Make sure data has been read before indicating we are done
* with the entries by updating the consumer pointer.
*/
libbpf_smp_rwmb();
libbpf_smp_store_release(cons->consumer, *cons->consumer + nb);
*cons->consumer += nb;
}
static inline void *xsk_umem__get_data(void *umem_area, __u64 addr)
@@ -201,6 +263,11 @@ struct xsk_umem_config {
__u32 flags;
};
LIBBPF_API int xsk_setup_xdp_prog(int ifindex,
int *xsks_map_fd);
LIBBPF_API int xsk_socket__update_xskmap(struct xsk_socket *xsk,
int xsks_map_fd);
/* Flags for the libbpf_flags field. */
#define XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD (1 << 0)

View File

@@ -6,7 +6,7 @@ CONT_NAME="${CONT_NAME:-libbpf-debian-$DEBIAN_RELEASE}"
ENV_VARS="${ENV_VARS:-}"
DOCKER_RUN="${DOCKER_RUN:-docker run}"
REPO_ROOT="${REPO_ROOT:-$PWD}"
ADDITIONAL_DEPS=(clang pkg-config gcc-8)
ADDITIONAL_DEPS=(clang pkg-config gcc-10)
CFLAGS="-g -O2 -Werror -Wall"
function info() {
@@ -21,7 +21,7 @@ function docker_exec() {
docker exec $ENV_VARS -it $CONT_NAME "$@"
}
set -e
set -eu
source "$(dirname $0)/travis_wait.bash"
@@ -31,7 +31,6 @@ for phase in "${PHASES[@]}"; do
info "Setup phase"
info "Using Debian $DEBIAN_RELEASE"
sudo apt-get -y -o Dpkg::Options::="--force-confnew" install docker-ce
docker --version
docker pull debian:$DEBIAN_RELEASE
@@ -41,17 +40,18 @@ for phase in "${PHASES[@]}"; do
-dit --net=host debian:$DEBIAN_RELEASE /bin/bash
docker_exec bash -c "echo deb-src http://deb.debian.org/debian $DEBIAN_RELEASE main >>/etc/apt/sources.list"
docker_exec apt-get -y update
docker_exec apt-get -y build-dep libelf-dev
docker_exec apt-get -y install libelf-dev
docker_exec apt-get -y install "${ADDITIONAL_DEPS[@]}"
docker_exec apt-get -y install aptitude
docker_exec aptitude -y build-dep libelf-dev
docker_exec aptitude -y install libelf-dev
docker_exec aptitude -y install "${ADDITIONAL_DEPS[@]}"
;;
RUN|RUN_CLANG|RUN_GCC8|RUN_ASAN|RUN_CLANG_ASAN|RUN_GCC8_ASAN)
RUN|RUN_CLANG|RUN_GCC10|RUN_ASAN|RUN_CLANG_ASAN|RUN_GCC10_ASAN)
if [[ "$phase" = *"CLANG"* ]]; then
ENV_VARS="-e CC=clang -e CXX=clang++"
CC="clang"
elif [[ "$phase" = *"GCC8"* ]]; then
ENV_VARS="-e CC=gcc-8 -e CXX=g++-8"
CC="gcc-8"
elif [[ "$phase" = *"GCC10"* ]]; then
ENV_VARS="-e CC=gcc-10 -e CXX=g++-10"
CC="gcc-10"
else
CFLAGS="${CFLAGS} -Wno-stringop-truncation"
fi

View File

@@ -1,20 +1,16 @@
#!/bin/bash
set -e
set -x
set -eux
RELEASE="bionic"
echo "deb-src http://archive.ubuntu.com/ubuntu/ $RELEASE main restricted universe multiverse" >>/etc/apt/sources.list
RELEASE="focal"
apt-get update
apt-get -y build-dep libelf-dev
apt-get install -y libelf-dev pkg-config
apt-get install -y pkg-config
source "$(dirname $0)/travis_wait.bash"
cd $REPO_ROOT
CFLAGS="-g -O2 -Werror -Wall -fsanitize=address,undefined"
CFLAGS="-g -O2 -Werror -Wall -fsanitize=address,undefined -Wno-stringop-truncation"
mkdir build install
cc --version
make -j$((4*$(nproc))) CFLAGS="${CFLAGS}" -C ./src -B OBJDIR=../build

View File

@@ -6,7 +6,9 @@ source $(cd $(dirname $0) && pwd)/helpers.sh
travis_fold start prepare_selftests "Building selftests"
LLVM_VER=12
sudo apt-get -y install python-docutils # for rst2man
LLVM_VER=13
LIBBPF_PATH="${REPO_ROOT}"
REPO_PATH="travis-ci/vmtest/bpf-next"
@@ -28,7 +30,7 @@ make \
VMLINUX_BTF="${VMLINUX_BTF}" \
VMLINUX_H=${VMLINUX_H} \
-C "${REPO_ROOT}/${REPO_PATH}/tools/testing/selftests/bpf" \
-j $((2*$(nproc)))
-j $((4*$(nproc)))
mkdir ${LIBBPF_PATH}/selftests
cp -R "${REPO_ROOT}/${REPO_PATH}/tools/testing/selftests/bpf" \
${LIBBPF_PATH}/selftests

View File

@@ -1,5 +1,8 @@
# PERMANENTLY DISABLED
align # verifier output format changed
atomics # new atomic operations (v5.12+)
atomic_bounds # new atomic operations (v5.12+)
bind_perm # changed semantics of return values (v5.12+)
bpf_iter # bpf_iter support is missing
bpf_obj_id # bpf_link support missing for GET_OBJ_INFO, GET_FD_BY_ID, etc
bpf_tcp_ca # STRUCT_OPS is missing
@@ -9,6 +12,7 @@ cg_storage_multi # v5.9+ functionality
cgroup_attach_multi # BPF_F_REPLACE_PROG missing
cgroup_link # LINK_CREATE is missing
cgroup_skb_sk_lookup # bpf_sk_lookup_tcp() helper is missing
check_mtu # missing BPF helper (v5.12+)
cls_redirect # bpf_csum_level() helper is missing
connect_force_port # cgroup/get{peer,sock}name{4,6} support is missing
d_path # v5.10+ feature
@@ -16,24 +20,34 @@ enable_stats # BPF_ENABLE_STATS support is missing
fentry_fexit # bpf_prog_test_tracing missing
fentry_test # bpf_prog_test_tracing missing
fexit_bpf2bpf # freplace is missing
fexit_sleep # relies on bpf_trampoline fix in 5.12+
fexit_test # bpf_prog_test_tracing missing
flow_dissector # bpf_link-based flow dissector is in 5.8+
flow_dissector_reattach
for_each # v5.12+
get_stack_raw_tp # exercising BPF verifier bug causing infinite loop
hash_large_key # v5.11+
ima # v5.11+
kfree_skb # 32-bit pointer arith in test_pkt_access
ksyms # __start_BTF has different name
kfunc_call # v5.13+
link_pinning # bpf_link is missing
linked_vars # v5.13+
load_bytes_relative # new functionality in 5.8
map_init # per-CPU LRU missing
map_ptr # test uses BPF_MAP_TYPE_RINGBUF, added in 5.8
metadata # v5.10+
mmap # 5.5 kernel is too permissive with re-mmaping
modify_return # fmod_ret support is missing
module_attach # module BTF support missing (v5.11+)
ns_current_pid_tgid # bpf_get_ns_current_pid_tgid() helper is missing
pe_preserve_elems # v5.10+
perf_branches # bpf_read_branch_records() helper is missing
pkt_access # 32-bit pointer arith in test_pkt_access
probe_read_user_str # kernel bug with garbage bytes at the end
prog_run_xattr # 32-bit pointer arith in test_pkt_access
raw_tp_test_run # v5.10+
recursion # v5.12+
ringbuf # BPF_MAP_TYPE_RINGBUF is supported in 5.8+
# bug in verifier w/ tracking references
@@ -43,22 +57,32 @@ reference_tracking
select_reuseport # UDP support is missing
send_signal # bpf_send_signal_thread() helper is missing
sk_assign # bpf_sk_assign helper missing
sk_lookup # v5.9+
sk_storage_tracing # missing bpf_sk_storage_get() helper
skb_ctx # ctx_{size, }_{in, out} in BPF_PROG_TEST_RUN is missing
skb_helpers # helpers added in 5.8+
snprintf # v5.13+
snprintf_btf # v5.10+
sock_fields # v5.10+
socket_cookie # v5.12+
sockmap_basic # uses new socket fields, 5.8+
sockmap_listen # no listen socket supportin SOCKMAP
sockopt_sk
sk_lookup # v5.9+
skb_ctx # ctx_{size, }_{in, out} in BPF_PROG_TEST_RUN is missing
stacktrace_build_id # v5.9+
stack_var_off # v5.12+
task_local_storage # v5.12+
tcp_hdr_options # v5.10+, new TCP header options feature in BPF
tcpbpf_user # LINK_CREATE is missing
test_bpffs # v5.10+, new CONFIG_BPF_PRELOAD=y and CONFIG_BPF_PRELOAD_UMG=y|m
test_bprm_opts # v5.11+
test_global_funcs # kernel doesn't support BTF linkage=global on FUNCs
test_local_storage # v5.10+ feature
test_lsm # no BPF_LSM support
test_overhead # no fmod_ret support
test_profiler # needs verifier logic improvements from v5.10+
test_skb_pkt_end # v5.11+
trace_ext # v5.10+
trampoline_count # v5.12+ have lower allowed limits
udp_limit # no cgroup/sock_release BPF program type (5.9+)
varlen # verifier bug fixed in later kernels
vmlinux # hrtimer_nanosleep() signature changed incompatibly

View File

@@ -1388,7 +1388,7 @@ CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_FD is not set
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_LOOP is not set
CONFIG_BLK_DEV_LOOP=y
# CONFIG_BLK_DEV_DRBD is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SKD is not set
@@ -2394,8 +2394,8 @@ CONFIG_IMA_DEFAULT_TEMPLATE="ima-ng"
CONFIG_IMA_DEFAULT_HASH_SHA1=y
# CONFIG_IMA_DEFAULT_HASH_SHA256 is not set
CONFIG_IMA_DEFAULT_HASH="sha1"
# CONFIG_IMA_WRITE_POLICY is not set
# CONFIG_IMA_READ_POLICY is not set
CONFIG_IMA_WRITE_POLICY=y
CONFIG_IMA_READ_POLICY=y
# CONFIG_IMA_APPRAISE is not set
CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS=y
CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS=y
@@ -2403,7 +2403,7 @@ CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS=y
# CONFIG_EVM is not set
# CONFIG_DEFAULT_SECURITY_SELINUX is not set
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_LSM="selinux,bpf"
CONFIG_LSM="selinux,bpf,integrity"
#
# Kernel hardening options

View File

@@ -0,0 +1,3 @@
#!/bin/bash
printf "all:\n\ttouch bpf_testmod.ko\n\nclean:\n" > bpf_testmod/Makefile

View File

@@ -0,0 +1,3 @@
#!/bin/bash
printf "all:\n\ttouch bpf_testmod.ko\n\nclean:\n" > bpf_testmod/Makefile

View File

@@ -415,8 +415,8 @@ set -eux
echo 'Running setup commands'
%s
%s
echo $? > /exitstatus
set +e; %s; exitstatus=\$?; set -e
echo \$exitstatus > /exitstatus
chmod 644 /exitstatus" "${setup_envvars}" "${setup_cmd}")
fi
@@ -434,8 +434,10 @@ sudo chmod 755 "$mnt/etc/rcS.d/S99-poweroff"
sudo umount "$mnt"
echo "Starting VM with $(nproc) CPUs..."
qemu-system-x86_64 -nodefaults -display none -serial mon:stdio \
-cpu kvm64 -enable-kvm -smp "$(nproc)" -m 2G \
-cpu kvm64 -enable-kvm -smp "$(nproc)" -m 4G \
-drive file="$IMG",format=raw,index=1,media=disk,if=virtio,cache=none \
-kernel "$vmlinuz" -append "root=/dev/vda rw console=ttyS0,115200$APPEND"

View File

@@ -46,6 +46,6 @@ cd libbpf/selftests/bpf
test_progs
if [[ "${KERNEL}" == 'latest' ]]; then
test_maps
#test_maps
test_verifier
fi

View File

@@ -14,12 +14,12 @@ ${VMTEST_ROOT}/build_pahole.sh travis-ci/vmtest/pahole
travis_fold start install_clang "Installing Clang/LLVM"
# Install required packages
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
echo "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic main" | sudo tee -a /etc/apt/sources.list
sudo add-apt-repository "deb http://apt.llvm.org/focal/ llvm-toolchain-focal main"
sudo apt-get update
sudo apt-get -y install clang-12 lld-12 llvm-12
sudo apt-get -y install python-docutils # for rst2man
sudo apt-get install --allow-downgrades -y libc6=2.31-0ubuntu9.2
sudo aptitude install -y g++ libelf-dev
sudo aptitude install -y clang-13 lld-13 llvm-13
travis_fold end install_clang

File diff suppressed because it is too large Load Diff