all writing

Blocking Copy Fail (CVE-2026-31431) in Kubernetes with Tetragon

· 10 min read · ·
Blocking Copy Fail (CVE-2026-31431) in Kubernetes with Tetragon

On April 29, Xint Code dropped a Linux kernel privilege escalation that makes every other LPE look like it’s trying too hard. It doesn’t need a race condition or kernel-specific offsets and it doesn’t care which distro you’re running. A 732-byte Python script that roots Ubuntu, RHEL, Amazon Linux, and SUSE with the same binary, unchanged. They named it Copy Fail and it’s been sitting in the kernel since 2017.

The bug is a logic flaw in authencesn (the kernel’s authenticated encryption with sequence numbers implementation). It chains through AF_ALG, the userspace interface to the kernel crypto API, and splice() to write 4 bytes into the page cache of any setuid binary. Four bytes, and /usr/bin/su now does whatever you want.

If you’re running multi-tenant Kubernetes, CI runners, or anything that runs user-supplied code on a shared kernel, this is relevant.

The kernel patch exists (mainline commit a664bf3d603d), but most managed Kubernetes services haven’t rolled it into their node images yet. So what do you do in the meantime?

How 732 bytes get you root

The full chain:

socket(AF_ALG)
  → bind("authencesn(...)")
  → splice(file → pipe)
  → splice(pipe → alg_fd)
  → recv()
  → 4-byte page cache write
  → setuid binary hijacked

It starts by creating an AF_ALG socket (address family 38), which is the userspace interface to the kernel crypto API. It binds to a specific vulnerable algorithm, uses splice() to move page-cache pages into the crypto socket, and triggers a 4-byte write back into the page cache of a setuid binary. The PoC is Python but it’s just raw syscalls. C, Go, Rust, anything that can call socket() works.

The whole chain starts with socket(38), and AF_ALG is a niche kernel crypto interface that almost nothing in userspace actually uses. OpenSSL, GnuTLS, libsodium all handle the math themselves without touching it. We ran lsof | grep AF_ALG across our nodes and found exactly zero open sockets.

So the fix is straightforward: block socket(38) from pods. One kprobe at the syscall entry, return EPERM for address family 38, and the exploit can’t even start.

We checked what actually uses the kernel crypto API versus what goes through AF_ALG to make sure we weren’t going to break something. Things like dm-crypt, LUKS, kTLS, IPsec, in-kernel TLS, OpenSSL, GnuTLS, NSS, SSH, container runtimes, service meshes, ingress controllers: none of them touch AF_ALG. They use the in-kernel crypto API directly. The only things that would break are OpenSSL with the afalg engine explicitly enabled (not the default), hardware crypto offload paths that expose accelerators through AF_ALG, or custom applications that bind aead/skcipher/hash sockets directly. You’d know if any of those applied to you.

”We already shipped a modprobe fix”

Some distros have already shipped a modprobe blacklist for algif_aead. When we spun up a fresh AKS cluster to test, Microsoft had already dropped this in the May node image:

/etc/modprobe.d/disable-algif_aead.conf:
  install algif_aead /bin/false
  blacklist algif_aead

If the module can’t load, the exploit can’t bind to the aead algorithm and the chain breaks. For distros where algif_aead is a loadable module, this works. But there are a few gaps. Also worth double checking that your existing nodes actually have the fix, not just new ones. If the blacklist shipped as part of a node image update rather than a package patch, nodes that haven’t been reimaged won’t have it.

The blacklist only prevents loading the algif_aead submodule. The socket family itself (family 38) is still reachable, and Kubernetes RuntimeDefault seccomp does not block AF_ALG. If that config file gets removed or misconfigured, the full chain is immediately exploitable. On RHEL-family distros (RHEL, AlmaLinux, Rocky, some Amazon Linux 2023 configs), CONFIG_CRYPTO_USER_API_AEAD=y means it’s compiled directly into the kernel, so modprobe.d rules don’t apply at all.

Good as a first response. But if you’re running across different providers and distros, you want something at the syscall level that works regardless of how the kernel was compiled.

Tetragon to the rescue

Tetragon is Cilium’s eBPF-based security agent. Runs as a DaemonSet, hooks kernel functions via kprobes, and can override syscall return values before they reach the kernel’s socket creation path.

There’s a cleaner approach using BPF-LSM, hooking the security_socket_create callback directly, and projects like cozystack/copy-fail-blocker do exactly that. The problem is that BPF-LSM needs bpf in the active lsm= kernel boot parameter. Most managed Kubernetes node images (AKS Ubuntu, EKS AL2023, standard Bottlerocket) compile it in but don’t include it in the boot parameter. You can’t change kernel cmdline on managed nodes. Tetragon hooks the syscall entry via kprobes instead, which doesn’t need BPF-LSM at all. Works on any Kubernetes distribution running kernel 4.19+.

Setting up

We used an AKS cluster with Kubernetes 1.34 for this, but it works the same on EKS, GKE, or self-managed clusters. Tetragon just needs to run as a privileged DaemonSet.

Install Tetragon

helm repo add cilium https://helm.cilium.io
helm repo update cilium

helm install tetragon cilium/tetragon \
  --version 1.7.0 \
  --namespace kube-system \
  --set tetragon.exportRateLimit=-1

Tetragon runs as a DaemonSet with two containers per pod: the agent (which loads eBPF programs into the kernel) and an export sidecar. Takes about 20 seconds to roll out.

kubectl rollout status daemonset/tetragon -n kube-system --timeout=120s

Apply the policy

The full TracingPolicy. 22 lines of YAML:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: copyfail-mitigate
spec:
  kprobes:
  - call: "__x64_sys_socket"
    syscall: true
    args:
    - index: 0
      type: "int"
      label: "family"
    tags:
    - "CVE-2026-31431"
    - "copyfail"
    message: "AF_ALG socket creation blocked"
    selectors:
    - matchArgs:
      - index: 0
        operator: "Equal"
        values:
        - "38"
      matchActions:
      - action: Override
        argError: -1
      - action: Post
        kernelStackTrace: true

__x64_sys_socket hooks the syscall entry for socket() on x86_64. The matchArgs selector filters for address family 38 (AF_ALG). When it matches, Override returns EPERM before the socket is created and Post emits a Tetragon event with the kernel stack trace, pod context, and CVE tags.

kubectl apply -f block-af-alg.yaml

One kprobe. The socket call never reaches the kernel’s socket creation path.

Moment of truth

AF_ALG blocked

kubectl run test --rm -it --restart=Never --image=python:3.12-slim -- \
  python3 -c 'import socket; socket.socket(38, 5, 0)'
[OK] AF_ALG blocked: [Errno 1] Operation not permitted

Normal sockets unaffected

[OK] TCP socket works
[OK] UDP socket works

Only address family 38 is blocked. Everything else passes through.

The alert

Every block generates a Tetragon event that flows through the export pipeline:

{
  "process_kprobe": {
    "process": {
      "binary": "/usr/local/bin/python3",
      "pod": {
        "namespace": "default",
        "name": "test"
      }
    },
    "function_name": "__x64_sys_socket",
    "args": [{ "int_arg": 38, "label": "family" }],
    "action": "KPROBE_ACTION_POST",
    "policy_name": "copyfail-mitigate",
    "tags": ["CVE-2026-31431", "copyfail"],
    "message": "AF_ALG socket creation blocked"
  },
  "node_name": "aks-nodepool1-11749143-vmss000000",
  "time": "2026-05-01T18:28:42.527Z"
}

You get the binary, the pod, the node, the syscall args, the kernel stack trace, and the CVE tag. Forward these to Splunk, Sentinel, Datadog, Elastic, whatever you use. Both the block and the alert in one event. If you’re running Prometheus, Tetragon exposes tetragon_policy_events_total with labels for policy name and action.

Belt, suspenders, and a kprobe

The modprobe blacklist and the kprobe protect against different failure modes. The blacklist prevents module loading, the kprobe prevents socket creation. If either one breaks, the other catches it. And if you’re running a mix of managed Kubernetes providers or node OS images, your kernel configs are going to differ. Some have algif_aead as a loadable module, some have it built-in, some might not have the modprobe fix at all yet. The Tetragon policy doesn’t care about any of that. It hooks the syscall, not the module.

Most providers haven’t shipped a node image with the actual kernel fix yet either. Both mitigations are workarounds, and stacked workarounds beat a single one.

LayerWhat it doesLimitation
Modprobe blacklistPrevents algif_aead module from loadingOnly works when algif_aead is a loadable module, not built-in
Tetragon kprobeBlocks socket(AF_ALG) at syscall level + alertsRequires DaemonSet running; protection active while pod is alive
Kernel patchFixes the root cause in authencesnRequires node image update and reboot

Run the Tetragon policy alongside whatever module-level mitigation your distro shipped. Remove both once your nodes have the patched kernel.

”But what about chain detection?”

You could build a multi-kprobe policy that tracks the whole sequence from socket(AF_ALG) through bind(authencesn(...)) to splice and recv, and only blocks when it sees the complete pattern. We actually built one with five kprobes and socket lifecycle tracking. It worked fine.

But chain detection only makes sense when the individual syscalls are ambiguous. If the exploit started with write or mmap you’d need the full pattern to tell an attack from normal behavior. socket(38) from a pod isn’t ambiguous at all. Nothing legitimate makes that call in a Kubernetes cluster, so adding four more kprobes to confirm what you already know just adds complexity.

Also worth noting that Tetragon’s TracingPolicy can’t do true in-kernel chain correlation anyway. TrackSock creates a BPF map linking sockets to processes, but there’s no matchSock selector that lets a later kprobe like splice query “does this process hold a tracked AF_ALG socket?” The correlation would have to happen in the SIEM, not in the kernel.

Resources

About Isala Piyarisi

Builder and platform engineer with a track record of shipping products from scratch and seeing them through to scale. Works across the full stack from kernel to user interface.

AI & Machine Learning

Builds AI infrastructure and local-first AI systems. Experience with PyTorch, ML pipelines, RAG architectures, vector databases, and GPU orchestration. Created Tera, a local-first AI assistant built with Rust. Passionate about privacy-preserving AI that runs on-device.

Technical Range

Work spans: AI Infrastructure (local LLMs, ML pipelines, RAG, PyTorch), Platform Engineering (Kubernetes, observability, service mesh, GPU orchestration), and Systems (eBPF, Rust, Go, Linux internals).

Founder Mindset

Founded and ran a gaming community for 6 years, building infrastructure that served thousands of users. Built observability tools now used by developers daily. Approaches problems end-to-end, from design to production to on-call. Prefers building solutions over talking about them.

Current Work

Senior Software Engineer at WSO2, building Choreo developer platform. Architected eBPF-powered observability processing 500GB/day. Led Cilium CNI migration on 10,000+ pod cluster. Speaker at Conf42, KCD, and cloud-native events.