Update .mailmap based on libbpf's list of contributors and on the latest
.mailmap version in the upstream repository.
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Detect exact function names (no wildcards) in
bpf_program__attach_kprobe_multi_opts() and bypass kallsyms parsing,
passing the symbol directly to the kernel via syms[] array. This
benefits all callers, not just kprobe.session.
When the pattern contains no '*' or '?' characters, set syms to point
directly at the pattern string and cnt to 1, skipping the expensive
/proc/kallsyms or available_filter_functions parsing (~150ms per
function).
Error code normalization: the fast path returns ESRCH from kernel's
ftrace_lookup_symbols(), while the slow path returns ENOENT from
userspace kallsyms parsing. Convert ESRCH to ENOENT in the
bpf_link_create error path to maintain API consistency - both paths
now return identical error codes for "symbol not found".
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260302200837.317907-2-andrey.grodzovsky@crowdstrike.com
btf__add_btf() currently rejects split BTF sources with -ENOTSUP.
This prevents merging types from multiple kernel module BTFs that
are all split against the same vmlinux base.
Extend btf__add_btf() to handle split BTF sources by:
- Replacing the blanket -ENOTSUP with a validation that src and dst
share the same base BTF pointer when both are split, returning
-EOPNOTSUPP on mismatch.
- Computing src_start_id from the source's base to distinguish base
type ID references (which must remain unchanged) from split type
IDs (which must be remapped to new positions in the destination).
- Using src_btf->nr_types instead of btf__type_cnt()-1 for the type
count, which is correct for both split and non-split sources.
- Skipping base string offsets (< start_str_off) during the string
rewrite loop, mirroring the type ID skip pattern. Since src and
dst share the same base BTF, base string offsets are already valid
and need no remapping.
For non-split sources the behavior is identical: src_start_id is 1,
the type_id < 1 guard is never true (VOID is already skipped), and
the remapping formula reduces to the original. start_str_off is 0
so no string offsets are skipped.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/c00216ed48cf7897078d9645679059d5ebf42738.1772657690.git.josef@toxicpanda.com
Adding support to detect nop,nop5 instructions combo for usdt probe
by checking on probe's following nop5 instruction.
When the nop,nop5 combo is detected together with uprobe syscall,
we can place the probe on top of nop5 and get it optimized.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260224103915.1369690-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Netlink requires that the recv buffer used during dumps is at least
min(PAGE_SIZE, 8k) (see the man page). Otherwise the messages will
get truncated. Make sure bpftool follows this requirement, avoid
missing information on systems with large pages.
Acked-by: Quentin Monnet <qmo@kernel.org>
Fixes: 7084566a236f ("tools/bpftool: Remove libbpf_internal.h usage in bpftool")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20260217194150.734701-1-kuba@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
github.head_ref is empty for push and schedule events, so all those
runs shared the same concurrency group and cancelled each other. Fall
back to github.run_id to give each non-PR run a unique group.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
The libbpf CI always builds the kernel from source at CHECKPOINT-COMMIT;
there is no prebuilt-kernel matrix entry. Remove the `kernel` input and
the conditional build-vs-download logic, simplifying the workflow.
The run-vmtest action defaults KERNEL to "LATEST" internally when the
env var is unset, so DENYLIST-LATEST is still picked up. The vmlinuz
path is auto-discovered via `make -s image_name` when not passed.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Merge DENYLIST-LATEST into DENYLIST and remove the per-kernel denylist
files. With LATEST being the only kernel mode, there's no need for
separate files. Also remove the s390x denylist (libbpf CI only tests
x86_64) and drop stale entries fixed upstream.
Add map_kptr, test_profiler (kprobes not available in VM kernel), and
sockmap udp multi channels (flaky) based on CI run results.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
All 4 patches either fail to apply (context mismatch) or are already
applied upstream. They produce noise in CI logs.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Commit 728ff167910e ("libbpf: Add gating for arena globals relocation feature")
adds a feature gate check that loads a map and BPF program to
test the running kernel supports large direct offsets for LDIMM64
instructions. This check is currently used to calculate arena symbol
offsets during bpf_object__collect_relos, itself called by
bpf_object_open.
However, the program calling bpf_object_open may not have the permissions to
load maps and programs. This is the case with the BPF selftests, where
bpftool is invoked at compilation time during skeleton generation. This
causes errors as the feature gate unexpectedly fails with -EPERM.
Avoid this by moving all the use of the FEAT_LDIMM64_FULL_RANGE_OFF feature gate
to BPF object preparation time instead.
Fixes: 728ff167910e ("libbpf: Add gating for arena globals relocation feature")
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260217204345.548648-3-emil@etsalapatis.com
Commit 728ff167910e uses a PROG_TYPE_TRACEPOINT BPF test program to
check whether the running kernel supports large LDIMM64 offsets. The
feature gate incorrectly assumes that the program will fail at
verification time with one of two messages, depending on whether the
feature is supported by the running kernel. However,
PROG_TYPE_TRACEPOINT programs may fail to load before verification even
starts, e.g., if the shell does not have the appropriate capabilities.
Use a BPF_PROG_TYPE_SOCKET_FILTER program for the feature gate instead.
Also fix two minor issues. First, ensure the log buffer for the test is
initialized: Failing program load before verification led to libbpf dumping
uninitialized data to stdout. Also, ensure that close() is only called
for program_fd in the probe if the program load actually succeeded. The
call was currently failing silently with -EBADF most of the time.
Fixes: 728ff167910e ("libbpf: Add gating for arena globals relocation feature")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260217204345.548648-2-emil@etsalapatis.com
This reverts commit 77b9c4a438fc66e2ab004c411056b3fb71a54f2c, reversing
changes made to 4515ec4ad58a37e70a9e1256c0b993958c9b7497:
931420a2fc36 ("selftests/net: Add netkit container tests")
ab771c938d9a ("selftests/net: Make NetDrvContEnv support queue leasing")
6be87fbb2776 ("selftests/net: Add env for container based tests")
61d99ce3dfc2 ("selftests/net: Add bpf skb forwarding program")
920da3634194 ("netkit: Add xsk support for af_xdp applications")
eef51113f8af ("netkit: Add netkit notifier to check for unregistering devices")
b5ef109d22d4 ("netkit: Implement rtnl_link_ops->alloc and ndo_queue_create")
b5c3fa4a0b16 ("netkit: Add single device mode for netkit")
0073d2fd679d ("xsk: Proxy pool management for leased queues")
1ecea95dd3b5 ("xsk: Extend xsk_rcv_check validation")
804bf334d08a ("net: Proxy netdev_queue_get_dma_dev for leased queues")
0caa9a8ddec3 ("net: Proxy net_mp_{open,close}_rxq for leased queues")
ff8889ff9107 ("net, ethtool: Disallow leased real rxqs to be resized")
9e2103f36110 ("net: Add lease info to queue-get response")
31127deddef4 ("net: Implement netdev_nl_queue_create_doit")
a5546e18f77c ("net: Add queue-create operation")
The series will conflict with io_uring work, and the code needs more
polish.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a single device mode for netkit instead of netkit pairs. The primary
target for the paired devices is to connect network namespaces, of course,
and support has been implemented in projects like Cilium [0]. For the rxq
leasing the plan is to support two main scenarios related to single device
mode:
* For the use-case of io_uring zero-copy, the control plane can either
set up a netkit pair where the peer device can perform rxq leasing which
is then tied to the lifetime of the peer device, or the control plane
can use a regular netkit pair to connect the hostns to a Pod/container
and dynamically add/remove rxq leasing through a single device without
having to interrupt the device pair. In the case of io_uring, the memory
pool is used as skb non-linear pages, and thus the skb will go its way
through the regular stack into netkit. Things like the netkit policy when
no BPF is attached or skb scrubbing etc apply as-is in case the paired
devices are used, or if the backend memory is tied to the single device
and traffic goes through a paired device.
* For the use-case of AF_XDP, the control plane needs to use netkit in the
single device mode. The single device mode currently enforces only a
pass policy when no BPF is attached, and does not yet support BPF link
attachments for AF_XDP. skbs sent to that device get dropped at the
moment. Given AF_XDP operates at a lower layer of the stack tying this
to the netkit pair did not make sense. In future, the plan is to allow
BPF at the XDP layer which can: i) process traffic coming from the AF_XDP
application (e.g. QEMU with AF_XDP backend) to filter egress traffic or
to push selected egress traffic up to the single netkit device to the
local stack (e.g. DHCP requests), and ii) vice-versa skbs sent to the
single netkit into the AF_XDP application (e.g. DHCP replies). Also,
the control-plane can dynamically manage rxq leasing for the single
netkit device without having to interrupt (e.g. down/up cycle) the main
netkit pair for the Pod which has traffic going in and out.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Jordan Rife <jordan@jrife.io>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://docs.cilium.io/en/stable/operations/performance/tuning/#netkit-device-mode [0]
Link: https://patch.msgid.link/20260115082603.219152-10-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a ynl netdev family operation called queue-create that creates a
new queue on a netdevice:
name: queue-create
attribute-set: queue
flags: [admin-perm]
do:
request:
attributes:
- ifindex
- type
- lease
reply: &queue-create-op
attributes:
- id
This is a generic operation such that it can be extended for various
use cases in future. Right now it is mandatory to specify ifindex,
the queue type which is enforced to rx and a lease. The newly created
queue id is returned to the caller.
A queue from a virtual device can have a lease which refers to another
queue from a physical device. This is useful for memory providers
and AF_XDP operations which take an ifindex and queue id to allow
applications to bind against virtual devices in containers. The lease
couples both queues together and allows to proxy the operations from
a virtual device in a container to the physical device.
In future, the nested lease attribute can be lifted and made optional
for other use-cases such as dynamic queue creation for physical
netdevs. The lack of lease and the specification of the physical
device as an ifindex will imply that we need a real queue to be
allocated. Similarly, the queue type enforcement to rx can then be
lifted as well to support tx.
An early implementation had only driver-specific integration [0], but
in order for other virtual devices to reuse, it makes sense to have
this as a generic API in core net.
For leasing queues, the virtual netdev must have real_num_rx_queue
less than num_rx_queues at the time of calling queue-create. The
queue-type must be rx as only rx queues are supported for leasing
for now. We also enforce that the queue-create ifindex must point
to a virtual device, and that the nested lease attribute's ifindex
must point to a physical device. The nested lease attribute set
contains a netns-id attribute which is currently only intended for
dumping as part of the queue-get operation. Also, it is modeled as
an s32 type similarly as done elsewhere in the stack.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://bpfconf.ebpf.io/bpfconf2025/bpfconf2025_material/lsfmmbpf_2025_netkit_borkmann.pdf [0]
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260115082603.219152-2-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This commit adds shared shaper state across the cake instances beneath a
cake_mq qdisc. It works by periodically tracking the number of active
instances, and scaling the configured rate by the number of active
queues.
The scan is lockless and simply reads the qlen and the last_active state
variable of each of the instances configured beneath the parent cake_mq
instance. Locking is not required since the values are only updated by
the owning instance, and eventual consistency is sufficient for the
purpose of estimating the number of active queues.
The interval for scanning the number of active queues is set to 200 us.
We found this to be a good tradeoff between overhead and response time.
For a detailed analysis of this aspect see the Netdevconf talk:
https://netdevconf.info/0x19/docs/netdev-0x19-paper16-talk-paper.pdf
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-5-8d613fece5d8@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Update .mailmap based on libbpf's list of contributors and on the latest
.mailmap version in the upstream repository.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Add feature gating for the arena globals relocation introduced in
commit c1f61171d44b. The commit depends on a previous commit in the
same patchset that is absent from older kernels
(12a1fe6e12db "bpf/verifier: Do not limit maximum direct offset into arena map").
Without this commit, arena globals relocation with arenas >= 512MiB
fails to load and breaks libbpf's backwards compatibility.
Introduce a libbpf feature to check whether the running kernel allows for
full range ldimm64 offset, and only relocate arena globals if it does.
Fixes: c1f61171d44b ("libbpf: Move arena globals to the end of the arena")
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260210184532.255475-1-emil@etsalapatis.com
With the introduction of the OMR feature, the PEBS memory auxiliary info
field for load and store latency events has been restructured for DMR.
The memory auxiliary info field's bit[8] indicates whether a L2 cache
miss occurred for a memory load or store instruction. If bit[8] is 0,
it signifies no L2 cache miss, and bits[7:0] specify the exact cache data
source (up to the L2 cache level). If bit[8] is 1, bits[7:0] represent
the OMR encoding, indicating the specific L3 cache or memory region
involved in the memory access. A significant enhancement is OMR encoding
provides up to 8 fine-grained memory regions besides the cache region.
A significant enhancement for OMR encoding is the ability to provide
up to 8 fine-grained memory regions in addition to the cache region,
offering more detailed insights into memory access regions.
For detailed information on the memory auxiliary info encoding, please
refer to section 16.2 "PEBS LOAD LATENCY AND STORE LATENCY FACILITY" in
the ISE documentation.
This patch ensures that the PEBS memory auxiliary info field is correctly
interpreted and utilized in DMR.
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260114011750.350569-3-dapeng1.mi@linux.intel.com
Introduce __counted_by_ptr(), which works like __counted_by(), but for
pointer struct members.
struct foo {
int a, b, c;
char *buffer __counted_by_ptr(bytes);
short nr_bars;
struct bar *bars __counted_by_ptr(nr_bars);
size_t bytes;
};
Because "counted_by" can only be applied to pointer members in very
recent compiler versions, its application ends up needing to be distinct
from flexibe array "counted_by" annotations, hence a separate macro.
This is a reworking of Kees' previous patch [1].
Link: https://lore.kernel.org/all/20251020220118.1226740-1-kees@kernel.org/ [1]
Co-developed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Bill Wendling <morbo@google.com>
Link: https://patch.msgid.link/20260116005838.2419118-1-morbo@google.com
Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-11 12:04:56 -08:00
26 changed files with 322 additions and 395 deletions
test_ima # All of CI is broken on it following 6.3-rc1 merge
lwt_reroute # crashes kernel after netnext merge from 2ab1efad60ad "net/sched: cls_api: complement tcf_tfilter_dump_policy"
tc_links_ingress # started failing after net-next merge from 2ab1efad60ad "net/sched: cls_api: complement tcf_tfilter_dump_policy"
xdp_bonding/xdp_bonding_features # started failing after net merge from 359e54a93ab4 "l2tp: pass correct message length to ip6_append_data"
tc_redirect/tc_redirect_dtime # uapi breakage after net-next commit 885c36e59f46 ("net: Re-use and set mono_delivery_time bit for userspace tstamp packets")
migrate_reuseport/IPv4 TCP_NEW_SYN_RECV reqsk_timer_handler # flaky, under investigation
migrate_reuseport/IPv6 TCP_NEW_SYN_RECV reqsk_timer_handler # flaky, under investigation
verify_pkcs7_sig # keeps failing
verif_scale_pyperf600 # fails on newer Clangs
decap_sanity # weird failure with decap_sanity_ns netns already existing, TBD
empty_skb # waiting the fix in bpf tree to make it to bpf-next
bpf_nf/tc-bpf-ct # test consistently failing on x86: https://github.com/libbpf/libbpf/pull/698#issuecomment-1590341200
bpf_nf/xdp-ct # test consistently failing on x86: https://github.com/libbpf/libbpf/pull/698#issuecomment-1590341200
find_vma # test consistently fails on latest kernel, see https://github.com/libbpf/libbpf/issues/754 for details
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.