mirror of https://github.com/netdata/libbpf.git synced 2026-05-08 08:29:11 +08:00

Go to file

Daniel Borkmann b064c40d94 bpf: Add fd-based tcx multi-prog infra with link support

This work refactors and adds a lightweight extension ("tcx") to the tc BPF
ingress and egress data path side for allowing BPF program management based
on fds via bpf() syscall through the newly added generic multi-prog API.
The main goal behind this work which we also presented at LPC [0] last year
and a recent update at LSF/MM/BPF this year [3] is to support long-awaited
BPF link functionality for tc BPF programs, which allows for a model of safe
ownership and program detachment.

Given the rise in tc BPF users in cloud native environments, this becomes
necessary to avoid hard to debug incidents either through stale leftover
programs or 3rd party applications accidentally stepping on each others toes.
As a recap, a BPF link represents the attachment of a BPF program to a BPF
hook point. The BPF link holds a single reference to keep BPF program alive.
Moreover, hook points do not reference a BPF link, only the application's
fd or pinning does. A BPF link holds meta-data specific to attachment and
implements operations for link creation, (atomic) BPF program update,
detachment and introspection. The motivation for BPF links for tc BPF programs
is multi-fold, for example:

- From Meta: "It's especially important for applications that are deployed
fleet-wide and that don't "control" hosts they are deployed to. If such
application crashes and no one notices and does anything about that, BPF
program will keep running draining resources or even just, say, dropping
packets. We at FB had outages due to such permanent BPF attachment
semantics. With fd-based BPF link we are getting a framework, which allows
safe, auto-detachable behavior by default, unless application explicitly
opts in by pinning the BPF link." [1]

- From Cilium-side the tc BPF programs we attach to host-facing veth devices
and phys devices build the core datapath for Kubernetes Pods, and they
implement forwarding, load-balancing, policy, EDT-management, etc, within
BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently
experienced hard-to-debug issues in a user's staging environment where
another Kubernetes application using tc BPF attached to the same prio/handle
of cls_bpf, accidentally wiping all Cilium-based BPF programs from underneath
it. The goal is to establish a clear/safe ownership model via links which
cannot accidentally be overridden. [0,2]

BPF links for tc can co-exist with non-link attachments, and the semantics are
in line also with XDP links: BPF links cannot replace other BPF links, BPF
links cannot replace non-BPF links, non-BPF links cannot replace BPF links and
lastly only non-BPF links can replace non-BPF links. In case of Cilium, this
would solve mentioned issue of safe ownership model as 3rd party applications
would not be able to accidentally wipe Cilium programs, even if they are not
BPF link aware.

Earlier attempts [4] have tried to integrate BPF links into core tc machinery
to solve cls_bpf, which has been intrusive to the generic tc kernel API with
extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could
be wiped from the qdisc also. Locking a tc BPF program in place this way, is
getting into layering hacks given the two object models are vastly different.

We instead implemented the tcx (tc 'express') layer which is an fd-based tc BPF
attach API, so that the BPF link implementation blends in naturally similar to
other link types which are fd-based and without the need for changing core tc
internal APIs. BPF programs for tc can then be successively migrated from classic
cls_bpf to the new tc BPF link without needing to change the program's source
code, just the BPF loader mechanics for attaching is sufficient.

For the current tc framework, there is no change in behavior with this change
and neither does this change touch on tc core kernel APIs. The gist of this
patch is that the ingress and egress hook have a lightweight, qdisc-less
extension for BPF to attach its tc BPF programs, in other words, a minimal
entry point for tc BPF. The name tcx has been suggested from discussion of
earlier revisions of this work as a good fit, and to more easily differ between
the classic cls_bpf attachment and the fd-based one.

For the ingress and egress tcx points, the device holds a cache-friendly array
with program pointers which is separated from control plane (slow-path) data.
Earlier versions of this work used priority to determine ordering and expression
of dependencies similar as with classic tc, but it was challenged that for
something more future-proof a better user experience is required. Hence this
resulted in the design and development of the generic attach/detach/query API
for multi-progs. See prior patch with its discussion on the API design. tcx is
the first user and later we plan to integrate also others, for example, one
candidate is multi-prog support for XDP which would benefit and have the same
'look and feel' from API perspective.

The goal with tcx is to have maximum compatibility to existing tc BPF programs,
so they don't need to be rewritten specifically. Compatibility to call into
classic tcf_classify() is also provided in order to allow successive migration
or both to cleanly co-exist where needed given its all one logical tc layer and
the tcx plus classic tc cls/act build one logical overall processing pipeline.

tcx supports the simplified return codes TCX_NEXT which is non-terminating (go
to next program) and terminating ones with TCX_PASS, TCX_DROP, TCX_REDIRECT.
The fd-based API is behind a static key, so that when unused the code is also
not entered. The struct tcx_entry's program array is currently static, but
could be made dynamic if necessary at a point in future. The a/b pair swap
design has been chosen so that for detachment there are no allocations which
otherwise could fail.

The work has been tested with tc-testing selftest suite which all passes, as
well as the tc BPF tests from the BPF CI, and also with Cilium's L4LB.

Thanks also to Nikolay Aleksandrov and Martin Lau for in-depth early reviews
of this work.

[0] https://lpc.events/event/16/contributions/1353/
[1] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com
[2] https://colocatedeventseu2023.sched.com/event/1Jo6O/tales-from-an-ebpf-programs-murder-mystery-hemanth-malla-guillaume-fournier-datadog
[3] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
[4] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230719140858.13224-3-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

2023-08-21 13:27:45 -07:00

.github

ci: install headers before building selftests

2023-07-07 18:55:44 -07:00

assets

assets: add libbpf logo images

2022-08-24 21:51:42 -07:00

ci: add kprobe_multi_bench_attach to DENYLIST

2023-07-07 18:55:44 -07:00

docs

bpf, docs: Libbpf overview documentation

2023-03-23 13:10:17 -07:00

fuzz

tests: move the fuzzer upstream

2022-01-24 15:37:36 -08:00

include

bpf: Add fd-based tcx multi-prog infra with link support

2023-08-21 13:27:45 -07:00

scripts

fuzz: bump elfutils to 0.189

2023-05-12 14:29:41 -07:00

src

xsk: add new netlink attribute dedicated for ZC max frags

2023-08-21 13:27:45 -07:00

.gitattributes

git: make .gitattributes compatible with git-archive-all action

2023-05-25 13:14:58 -07:00

.readthedocs.yaml

Fix downloads formats

2022-04-22 14:30:27 -07:00

BPF-CHECKPOINT-COMMIT

sync: latest libbpf changes from kernel

2023-07-07 18:55:44 -07:00

CHECKPOINT-COMMIT

sync: latest libbpf changes from kernel

2023-07-11 10:03:25 -07:00

LICENSE

license: add LICENSE with dual-license SPDX expression

2019-11-26 11:06:43 -08:00

LICENSE.BSD-2-Clause

LICENSE: fix BSD-2-Clause by adding year and authors

2022-02-23 17:55:21 -08:00

LICENSE.LGPL-2.1

fix typo in license name

2021-02-22 11:35:49 -08:00

README.md

sync: add sync process documentation at SYNC.md

2023-02-28 09:22:25 -08:00

SYNC.md

sync: Add section about need for Makefile adjustments

2023-03-06 13:06:39 -08:00

README.md

libbpf

This is the official home of the libbpf library.

Please use this Github repository for building and packaging libbpf and when using it in your projects through Git submodule.

Libbpf authoritative source code is developed as part of bpf-next Linux source tree under tools/lib/bpf subdirectory and is periodically synced to Github. As such, all the libbpf changes should be sent to BPF mailing list, please don't open PRs here unless you are changing Github-specific parts of libbpf (e.g., Github-specific Makefile).

Libbpf and general BPF usage questions

Libbpf documentation can be found here. It's an ongoing effort and has ways to go, but please take a look and consider contributing as well.

Please check out libbpf-bootstrap and the companion blog post for the examples of building BPF applications with libbpf. libbpf-tools are also a good source of the real-world libbpf-based tracing tools.

See also "BPF CO-RE reference guide" for the coverage of practical aspects of building BPF CO-RE applications and "BPF CO-RE" for general introduction into BPF portability issues and BPF CO-RE origins.

All general BPF questions, including kernel functionality, libbpf APIs and their application, should be sent to bpf@vger.kernel.org mailing list. You can subscribe to it here and search its archive here. Please search the archive before asking new questions. It very well might be that this was already addressed or answered before.

bpf@vger.kernel.org is monitored by many more people and they will happily try to help you with whatever issue you have. This repository's PRs and issues should be opened only for dealing with issues pertaining to specific way this libbpf mirror repo is set up and organized.

Building libbpf

libelf is an internal dependency of libbpf and thus it is required to link against and must be installed on the system for applications to work. pkg-config is used by default to find libelf, and the program called can be overridden with PKG_CONFIG.

If using pkg-config at build time is not desired, it can be disabled by setting NO_PKG_CONFIG=1 when calling make.

To build both static libbpf.a and shared libbpf.so:

$ cd src
$ make

To build only static libbpf.a library in directory build/ and install them together with libbpf headers in a staging directory root/:

$ cd src
$ mkdir build root
$ BUILD_STATIC_ONLY=y OBJDIR=build DESTDIR=root make install

To build both static libbpf.a and shared libbpf.so against a custom libelf dependency installed in /build/root/ and install them together with libbpf headers in a build directory /build/root/:

$ cd src
$ PKG_CONFIG_PATH=/build/root/lib64/pkgconfig DESTDIR=/build/root make install

BPF CO-RE (Compile Once – Run Everywhere)

Libbpf supports building BPF CO-RE-enabled applications, which, in contrast to BCC, do not require Clang/LLVM runtime being deployed to target servers and doesn't rely on kernel-devel headers being available.

It does rely on kernel to be built with BTF type information, though. Some major Linux distributions come with kernel BTF already built in:

Fedora 31+
RHEL 8.2+
OpenSUSE Tumbleweed (in the next release, as of 2020-06-04)
Arch Linux (from kernel 5.7.1.arch1-1)
Manjaro (from kernel 5.4 if compiled after 2021-06-18)
Ubuntu 20.10
Debian 11 (amd64/arm64)

If your kernel doesn't come with BTF built-in, you'll need to build custom kernel. You'll need:

pahole 1.16+ tool (part of dwarves package), which performs DWARF to BTF conversion;
kernel built with CONFIG_DEBUG_INFO_BTF=y option;
you can check if your kernel has BTF built-in by looking for /sys/kernel/btf/vmlinux file:

$ ls -la /sys/kernel/btf/vmlinux
-r--r--r--. 1 root root 3541561 Jun  2 18:16 /sys/kernel/btf/vmlinux

To develop and build BPF programs, you'll need Clang/LLVM 10+. The following distributions have Clang/LLVM 10+ packaged by default:

Fedora 32+
Ubuntu 20.04+
Arch Linux
Ubuntu 20.10 (LLVM 11)
Debian 11 (LLVM 11)
Alpine 3.13+

Otherwise, please make sure to update it on your system.

The following resources are useful to understand what BPF CO-RE is and how to use it:

BPF CO-RE reference guide
BPF Portability and CO-RE
HOWTO: BCC to libbpf conversion
libbpf-tools in BCC repo contain lots of real-world tools converted from BCC to BPF CO-RE. Consider converting some more to both contribute to the BPF community and gain some more experience with it.

Distributions

Distributions packaging libbpf from this mirror:

Benefits of packaging from the mirror over packaging from kernel sources:

Consistent versioning across distributions.
No ties to any specific kernel, transparent handling of older kernels. Libbpf is designed to be kernel-agnostic and work across multitude of kernel versions. It has built-in mechanisms to gracefully handle older kernels, that are missing some of the features, by working around or gracefully degrading functionality. Thus libbpf is not tied to a specific kernel version and can/should be packaged and versioned independently.
Continuous integration testing via GitHub Actions.
Static code analysis via LGTM and Coverity.

Package dependencies of libbpf, package names may vary across distros:

zlib
libelf

bpf-next to Github sync

All the gory details of syncing can be found in scripts/sync-kernel.sh script. See SYNC.md for instruction.

Some header files in this repo (include/linux/*.h) are reduced versions of their counterpart files at bpf-next's tools/include/linux/*.h to make compilation successful.

License

This work is dual-licensed under BSD 2-clause license and GNU LGPL v2.1 license. You can choose between one of them if you use this work.

SPDX-License-Identifier: BSD-2-Clause OR LGPL-2.1

README.md Unescape Escape

libbpf

Libbpf and general BPF usage questions

Building libbpf

BPF CO-RE (Compile Once – Run Everywhere)

Distributions

bpf-next to Github sync

License

README.md