This work adds a new, minimal BPF-programmable device called "netkit"
(former PoC code-name "meta") we recently presented at LSF/MM/BPF. The
core idea is that BPF programs are executed within the drivers xmit routine
and therefore e.g. in case of containers/Pods moving BPF processing closer
to the source.
One of the goals was that in case of Pod egress traffic, this allows to
move BPF programs from hostns tcx ingress into the device itself, providing
earlier drop or forward mechanisms, for example, if the BPF program
determines that the skb must be sent out of the node, then a redirect to
the physical device can take place directly without going through per-CPU
backlog queue. This helps to shift processing for such traffic from softirq
to process context, leading to better scheduling decisions/performance (see
measurements in the slides).
In this initial version, the netkit device ships as a pair, but we plan to
extend this further so it can also operate in single device mode. The pair
comes with a primary and a peer device. Only the primary device, typically
residing in hostns, can manage BPF programs for itself and its peer. The
peer device is designated for containers/Pods and cannot attach/detach
BPF programs. Upon the device creation, the user can set the default policy
to 'pass' or 'drop' for the case when no BPF program is attached.
Additionally, the device can be operated in L3 (default) or L2 mode. The
management of BPF programs is done via bpf_mprog, so that multi-attach is
supported right from the beginning with similar API and dependency controls
as tcx. For details on the latter see commit 053c8e1f235d ("bpf: Add generic
attach/detach/query API for multi-progs"). tc BPF compatibility is provided,
so that existing programs can be easily migrated.
Going forward, we plan to use netkit devices in Cilium as the main device
type for connecting Pods. They will be operated in L3 mode in order to
simplify a Pod's neighbor management and the peer will operate in default
drop mode, so that no traffic is leaving between the time when a Pod is
brought up by the CNI plugin and programs attached by the agent.
Additionally, the programs we attach via tcx on the physical devices are
using bpf_redirect_peer() for inbound traffic into netkit device, hence the
latter is also supporting the ndo_get_peer_dev callback. Similarly, we use
bpf_redirect_neigh() for the way out, pushing from netkit peer to phys device
directly. Also, BIG TCP is supported on netkit device. For the follow-up
work in single device mode, we plan to convert Cilium's cilium_host/_net
devices into a single one.
An extensive test suite for checking device operations and the BPF program
and link management API comes as BPF selftests in this series.
Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://github.com/borkmann/iproute2/tree/pr/netkit
Link: http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf (24ff.)
Link: https://lore.kernel.org/r/20231024214904.29825-2-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
libbpf

This is the official home of the libbpf library.
Please use this Github repository for building and packaging libbpf and when using it in your projects through Git submodule.
Libbpf authoritative source code is developed as part of bpf-next Linux source
tree under
tools/lib/bpf subdirectory and is periodically synced to Github. As such, all the
libbpf changes should be sent to BPF mailing list,
please don't open PRs here unless you are changing Github-specific parts of libbpf
(e.g., Github-specific Makefile).
Libbpf and general BPF usage questions
Libbpf documentation can be found here. It's an ongoing effort and has ways to go, but please take a look and consider contributing as well.
Please check out libbpf-bootstrap and the companion blog post for the examples of building BPF applications with libbpf. libbpf-tools are also a good source of the real-world libbpf-based tracing tools.
See also "BPF CO-RE reference guide" for the coverage of practical aspects of building BPF CO-RE applications and "BPF CO-RE" for general introduction into BPF portability issues and BPF CO-RE origins.
All general BPF questions, including kernel functionality, libbpf APIs and their application, should be sent to bpf@vger.kernel.org mailing list. You can subscribe to it here and search its archive here. Please search the archive before asking new questions. It very well might be that this was already addressed or answered before.
bpf@vger.kernel.org is monitored by many more people and they will happily try to help you with whatever issue you have. This repository's PRs and issues should be opened only for dealing with issues pertaining to specific way this libbpf mirror repo is set up and organized.
Building libbpf
libelf is an internal dependency of libbpf and thus it is required to link
against and must be installed on the system for applications to work.
pkg-config is used by default to find libelf, and the program called can be
overridden with PKG_CONFIG.
If using pkg-config at build time is not desired, it can be disabled by
setting NO_PKG_CONFIG=1 when calling make.
To build both static libbpf.a and shared libbpf.so:
$ cd src
$ make
To build only static libbpf.a library in directory build/ and install them together with libbpf headers in a staging directory root/:
$ cd src
$ mkdir build root
$ BUILD_STATIC_ONLY=y OBJDIR=build DESTDIR=root make install
To build both static libbpf.a and shared libbpf.so against a custom libelf dependency installed in /build/root/ and install them together with libbpf headers in a build directory /build/root/:
$ cd src
$ PKG_CONFIG_PATH=/build/root/lib64/pkgconfig DESTDIR=/build/root make install
BPF CO-RE (Compile Once – Run Everywhere)
Libbpf supports building BPF CO-RE-enabled applications, which, in contrast to BCC, do not require Clang/LLVM runtime being deployed to target servers and doesn't rely on kernel-devel headers being available.
It does rely on kernel to be built with BTF type information, though. Some major Linux distributions come with kernel BTF already built in:
- Fedora 31+
- RHEL 8.2+
- OpenSUSE Tumbleweed (in the next release, as of 2020-06-04)
- Arch Linux (from kernel 5.7.1.arch1-1)
- Manjaro (from kernel 5.4 if compiled after 2021-06-18)
- Ubuntu 20.10
- Debian 11 (amd64/arm64)
If your kernel doesn't come with BTF built-in, you'll need to build custom kernel. You'll need:
pahole1.16+ tool (part ofdwarvespackage), which performs DWARF to BTF conversion;- kernel built with
CONFIG_DEBUG_INFO_BTF=yoption; - you can check if your kernel has BTF built-in by looking for
/sys/kernel/btf/vmlinuxfile:
$ ls -la /sys/kernel/btf/vmlinux
-r--r--r--. 1 root root 3541561 Jun 2 18:16 /sys/kernel/btf/vmlinux
To develop and build BPF programs, you'll need Clang/LLVM 10+. The following distributions have Clang/LLVM 10+ packaged by default:
- Fedora 32+
- Ubuntu 20.04+
- Arch Linux
- Ubuntu 20.10 (LLVM 11)
- Debian 11 (LLVM 11)
- Alpine 3.13+
Otherwise, please make sure to update it on your system.
The following resources are useful to understand what BPF CO-RE is and how to use it:
- BPF CO-RE reference guide
- BPF Portability and CO-RE
- HOWTO: BCC to libbpf conversion
- libbpf-tools in BCC repo contain lots of real-world tools converted from BCC to BPF CO-RE. Consider converting some more to both contribute to the BPF community and gain some more experience with it.
Distributions
Distributions packaging libbpf from this mirror:
Benefits of packaging from the mirror over packaging from kernel sources:
- Consistent versioning across distributions.
- No ties to any specific kernel, transparent handling of older kernels. Libbpf is designed to be kernel-agnostic and work across multitude of kernel versions. It has built-in mechanisms to gracefully handle older kernels, that are missing some of the features, by working around or gracefully degrading functionality. Thus libbpf is not tied to a specific kernel version and can/should be packaged and versioned independently.
- Continuous integration testing via GitHub Actions.
- Static code analysis via LGTM and Coverity.
Package dependencies of libbpf, package names may vary across distros:
- zlib
- libelf
bpf-next to Github sync
All the gory details of syncing can be found in scripts/sync-kernel.sh
script. See SYNC.md for instruction.
Some header files in this repo (include/linux/*.h) are reduced versions of
their counterpart files at
bpf-next's
tools/include/linux/*.h to make compilation successful.
License
This work is dual-licensed under BSD 2-clause license and GNU LGPL v2.1 license. You can choose between one of them if you use this work.
SPDX-License-Identifier: BSD-2-Clause OR LGPL-2.1