umber-kernel/include/net
Florian Westphal 09d6074995 netfilter: nf_tables: avoid chain re-validation if possible
[ Upstream commit 8e1a1bc4f5a42747c08130b8242ebebd1210b32f ]

Hamza Mahfooz reports cpu soft lock-ups in
nft_chain_validate():

 watchdog: BUG: soft lockup - CPU#1 stuck for 27s! [iptables-nft-re:37547]
[..]
 RIP: 0010:nft_chain_validate+0xcb/0x110 [nf_tables]
[..]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_table_validate+0x6b/0xb0 [nf_tables]
  nf_tables_validate+0x8b/0xa0 [nf_tables]
  nf_tables_commit+0x1df/0x1eb0 [nf_tables]
[..]

Currently nf_tables will traverse the entire table (chain graph), starting
from the entry points (base chains), exploring all possible paths
(chain jumps).  But there are cases where we could avoid revalidation.

Consider:
1  input -> j2 -> j3
2  input -> j2 -> j3
3  input -> j1 -> j2 -> j3

Then the second rule does not need to revalidate j2, and, by extension j3,
because this was already checked during validation of the first rule.
We need to validate it only for rule 3.

This is needed because chain loop detection also ensures we do not exceed
the jump stack: Just because we know that j2 is cycle free, its last jump
might now exceed the allowed stack size.  We also need to update all
reachable chains with the new largest observed call depth.

Care has to be taken to revalidate even if the chain depth won't be an
issue: chain validation also ensures that expressions are not called from
invalid base chains.  For example, the masquerade expression can only be
called from NAT postrouting base chains.

Therefore we also need to keep record of the base chain context (type,
hooknum) and revalidate if the chain becomes reachable from a different
hook location.

Reported-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Closes: https://lore.kernel.org/netfilter-devel/20251118221735.GA5477@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/
Tested-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-17 16:35:32 +01:00
..
9p
bluetooth Bluetooth: hci_core: lookup hci_conn on RX path on protocol side 2025-11-20 17:01:09 -05:00
caif
iucv
libeth libeth: xdp: Disable generic kCFI pass for libeth_xdp_tx_xmit_bulk() 2025-10-29 20:04:55 -07:00
mana net: mana: Use page pool fragments for RX buffers instead of full pages to improve memory efficiency. 2025-08-19 14:42:44 +02:00
netfilter netfilter: nf_tables: avoid chain re-validation if possible 2026-01-17 16:35:32 +01:00
netns tcp: accecn: AccECN option send control 2025-09-18 08:47:52 +02:00
nfc net: nfc: nci: Increase NCI_DATA_TIMEOUT to 3000 ms 2025-09-03 17:02:12 -07:00
page_pool net: add helper to pre-check if PP for an Rx queue will be unreadable 2025-09-04 10:19:17 +02:00
phonet
psp net: psp: don't assume reply skbs will have a socket 2025-10-03 10:23:50 -07:00
sctp sctp: Convert cookie authentication to use HMAC-SHA256 2025-08-19 19:36:26 -07:00
tc_act net_sched: act_skbmod: use RCU in tcf_skbmod_dump() 2025-08-28 16:46:23 -07:00
6lowpan.h
Space.h
act_api.h net_sched: act: remove tcfa_qstats 2025-09-02 15:52:24 -07:00
addrconf.h
af_ieee802154.h
af_rxrpc.h
af_unix.h
af_vsock.h
ah.h
aligned_data.h
amt.h
arp.h
atmclip.h
ax25.h
ax88796.h
bareudp.h
bond_3ad.h bonding: support aggregator selection based on port priority 2025-09-09 10:56:02 +02:00
bond_alb.h
bond_options.h bonding: add support for per-port LACP actor priority 2025-09-09 10:56:02 +02:00
bonding.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-10-01 10:14:49 +02:00
bpf_sk_storage.h
busy_poll.h
calipso.h
cfg80211-wext.h
cfg80211.h wifi: cfg80211: add an hrtimer based delayed work item 2025-10-28 14:56:30 +01:00
cfg802154.h
checksum.h
cipso_ipv4.h
cls_cgroup.h net/cls_cgroup: Fix task_get_classid() during qdisc run 2025-09-14 11:55:04 -07:00
codel.h
codel_impl.h
codel_qdisc.h
compat.h
datalink.h
dcbevent.h
dcbnl.h
devlink.h devlink: Add a 'num_doorbells' driverinit param 2025-09-17 18:30:51 -07:00
dropreason-core.h tcp: add datapath logic for PSP with inline key exchange 2025-09-18 12:32:06 +02:00
dropreason.h
dsa.h net: dsa: properly keep track of conduit reference 2026-01-08 10:16:45 +01:00
dsa_stubs.h
dscp.h
dsfield.h
dst.h net: dst: introduce dst->dev_rcu 2025-08-29 19:36:31 -07:00
dst_cache.h
dst_metadata.h net: dst_metadata: fix IP_DF bit not extracted from tunnel headers 2025-09-14 14:28:12 -07:00
dst_ops.h
eee.h
erspan.h
esp.h
espintcp.h
ethoc.h
failover.h
fib_notifier.h
fib_rules.h
firewire.h
flow.h ipv4: Convert ->flowi4_tos to dscp_t. 2025-08-26 17:34:31 -07:00
flow_dissector.h
flow_offload.h
fou.h
fq.h
fq_impl.h
garp.h
gen_stats.h
genetlink.h genetlink: fix typo in comment 2025-09-03 15:16:49 -07:00
geneve.h
gre.h
gro.h net: gro: remove unnecessary df checks 2025-09-25 12:42:49 +02:00
gro_cells.h
gso.h
gtp.h
gue.h
handshake.h
hotdata.h net: add NUMA awareness to skb_attempt_defer_free() 2025-09-30 15:45:53 +02:00
hwbm.h
icmp.h ipv4: icmp: Pass IPv4 control block structure as an argument to __icmp_send() 2025-09-11 12:22:38 +02:00
ieee8021q.h
ieee80211_radiotap.h
ieee802154_netdev.h
if_inet6.h
ife.h
inet6_connection_sock.h
inet6_hashtables.h tcp: Remove inet6_hash(). 2025-09-22 11:38:43 -07:00
inet_common.h
inet_connection_sock.h tcp: Update bind bucket state on port release 2025-09-23 10:12:15 +02:00
inet_dscp.h ipv4: Convert ->flowi4_tos to dscp_t. 2025-08-26 17:34:31 -07:00
inet_ecn.h
inet_frag.h inet: frags: flush pending skbs in fqdir_pre_exit() 2026-01-02 12:56:46 +01:00
inet_hashtables.h tcp: Update bind bucket state on port release 2025-09-23 10:12:15 +02:00
inet_sock.h
inet_timewait_sock.h tcp: Update bind bucket state on port release 2025-09-23 10:12:15 +02:00
inetpeer.h
ioam6.h
ip.h net: snmp: remove SNMP_MIB_SENTINEL 2025-09-08 18:06:21 -07:00
ip6_checksum.h
ip6_fib.h
ip6_route.h ipv6: make ipv6_pinfo.daddr_cache a boolean 2025-09-18 10:17:09 +02:00
ip6_tunnel.h
ip_fib.h ipv4: Convert ->flowi4_tos to dscp_t. 2025-08-26 17:34:31 -07:00
ip_tunnels.h net/ip6_tunnel: Prevent perpetual tunnel growth 2025-10-13 17:43:46 -07:00
ip_vs.h ipvs: Fix estimator kthreads preferred affinity 2025-08-13 08:34:33 +02:00
ipcomp.h
ipconfig.h
ipv6.h
ipv6_frag.h inet: frags: flush pending skbs in fqdir_pre_exit() 2026-01-02 12:56:46 +01:00
ipv6_stubs.h
iw_handler.h
kcm.h net: kcm: Fix race condition in kcm_unattach() 2025-08-13 18:18:33 -07:00
l3mdev.h
lag.h
lapb.h
llc.h
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h
llc_if.h
llc_pdu.h
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
lwtunnel.h
mac80211.h wifi: mac80211: Export an API to check if NAN is started 2025-09-19 11:26:23 +02:00
mac802154.h
macsec.h
mctp.h
mctpdevice.h
mip6.h
mld.h
mpls.h
mpls_iptunnel.h
mptcp.h
mrp.h
ncsi.h
ndisc.h
neighbour.h
neighbour_tables.h
net_debug.h
net_failover.h
net_namespace.h namespace-6.18-rc1 2025-09-29 11:20:29 -07:00
net_ratelimit.h
net_shaper.h
net_trackers.h
netdev_lock.h
netdev_netlink.h
netdev_queues.h net: add helper to pre-check if PP for an Rx queue will be unreadable 2025-09-04 10:19:17 +02:00
netdev_rx_queue.h
netevent.h
netkit.h
netlabel.h
netlink.h
netmem.h
netprio_cgroup.h
netrom.h
nexthop.h
nl802154.h
nsh.h
pfcp.h
pie.h
ping.h inet: ping: remove ping_hash() 2025-09-01 13:15:14 -07:00
pkt_cls.h net: sched: fix TCF_LAYER_TRANSPORT handling in tcf_get_base_ptr() 2025-11-24 18:53:14 -08:00
pkt_sched.h
pptp.h
proto_memory.h net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure(). 2025-08-19 19:20:59 -07:00
protocol.h
psample.h
psnap.h
psp.h psp: base PSP device support 2025-09-18 12:32:06 +02:00
raw.h net: use NUMA drop counters for softnet_data.dropped 2025-09-14 11:35:17 -07:00
rawv6.h
red.h
regulatory.h
request_sock.h tcp: reclaim 8 bytes in struct request_sock_queue 2025-09-22 17:55:25 -07:00
rose.h net: rose: convert 'use' field to refcount_t 2025-08-27 07:43:08 -07:00
route.h net: use dst_dev_rcu() in sk_setup_caps() 2025-08-29 19:36:32 -07:00
rpl.h
rps.h net: Add rfs_needed() helper 2025-09-03 15:08:20 -07:00
rsi_91x.h
rstreason.h
rtnetlink.h
rtnh.h
sch_generic.h net/sched: Fix backlog accounting in qdisc_dequeue_internal 2025-08-14 17:52:29 -07:00
scm.h
secure_seq.h
seg6.h
seg6_hmac.h ipv6: sr: Prepare HMAC key ahead of time 2025-08-26 18:11:29 -07:00
seg6_local.h
selftests.h
slhc_vj.h
smc.h dibs: Move event handling to dibs layer 2025-09-23 11:13:22 +02:00
snmp.h net: snmp: remove SNMP_MIB_SENTINEL 2025-09-08 18:06:21 -07:00
sock.h Revert "net: group sk_backlog and sk_receive_queue" 2025-09-29 18:30:32 -07:00
sock_reuseport.h
stp.h
strparser.h
switchdev.h
tc_wrapper.h
tcp.h tcp: add newval parameter to tcp_rcvbuf_grow() 2025-10-29 17:30:19 -07:00
tcp_ao.h tcp: Free TCP-AO/TCP-MD5 info/keys without RCU 2025-09-11 19:05:56 -07:00
tcp_ecn.h tcp: accecn: AccECN option failure handling 2025-09-18 08:47:52 +02:00
tcp_states.h
tcx.h
timewait_sock.h tcp: Remove timewait_sock_ops.twsk_destructor(). 2025-08-25 17:53:35 -07:00
tipc.h
tls.h net: tls: Cancel RX async resync request on rcd_delta overflow 2025-10-29 18:32:18 -07:00
tls_prot.h
tls_toe.h
transp_v6.h
tso.h
tun_proto.h
udp.h udp: remove busylock and add per NUMA queues 2025-09-23 16:38:39 -07:00
udp_tunnel.h
udplite.h
vsock_addr.h
vxlan.h
wext.h
x25.h
x25device.h
xdp.h bpf-next-for-netdev 2025-09-24 10:22:37 -07:00
xdp_priv.h
xdp_sock.h
xdp_sock_drv.h bpf: Allow bpf_xdp_shrink_data to shrink a frag from head and tail 2025-09-23 13:35:12 -07:00
xfrm.h xfrm: Determine inner GSO type from packet inner protocol 2025-10-30 11:52:31 +01:00
xsk_buff_pool.h