umber-kernel/arch
Sean Christopherson eea6f395ca x86/fpu: Clear XSTATE_BV[i] in guest XSAVE state whenever XFD[i]=1
commit b45f721775947a84996deb5c661602254ce25ce6 upstream.

When loading guest XSAVE state via KVM_SET_XSAVE, and when updating XFD in
response to a guest WRMSR, clear XFD-disabled features in the saved (or to
be restored) XSTATE_BV to ensure KVM doesn't attempt to load state for
features that are disabled via the guest's XFD.  Because the kernel
executes XRSTOR with the guest's XFD, saving XSTATE_BV[i]=1 with XFD[i]=1
will cause XRSTOR to #NM and panic the kernel.

E.g. if fpu_update_guest_xfd() sets XFD without clearing XSTATE_BV:

  ------------[ cut here ]------------
  WARNING: arch/x86/kernel/traps.c:1524 at exc_device_not_available+0x101/0x110, CPU#29: amx_test/848
  Modules linked in: kvm_intel kvm irqbypass
  CPU: 29 UID: 1000 PID: 848 Comm: amx_test Not tainted 6.19.0-rc2-ffa07f7fd437-x86_amx_nm_xfd_non_init-vm #171 NONE
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:exc_device_not_available+0x101/0x110
  Call Trace:
   <TASK>
   asm_exc_device_not_available+0x1a/0x20
  RIP: 0010:restore_fpregs_from_fpstate+0x36/0x90
   switch_fpu_return+0x4a/0xb0
   kvm_arch_vcpu_ioctl_run+0x1245/0x1e40 [kvm]
   kvm_vcpu_ioctl+0x2c3/0x8f0 [kvm]
   __x64_sys_ioctl+0x8f/0xd0
   do_syscall_64+0x62/0x940
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
   </TASK>
  ---[ end trace 0000000000000000 ]---

This can happen if the guest executes WRMSR(MSR_IA32_XFD) to set XFD[18] = 1,
and a host IRQ triggers kernel_fpu_begin() prior to the vmexit handler's
call to fpu_update_guest_xfd().

and if userspace stuffs XSTATE_BV[i]=1 via KVM_SET_XSAVE:

  ------------[ cut here ]------------
  WARNING: arch/x86/kernel/traps.c:1524 at exc_device_not_available+0x101/0x110, CPU#14: amx_test/867
  Modules linked in: kvm_intel kvm irqbypass
  CPU: 14 UID: 1000 PID: 867 Comm: amx_test Not tainted 6.19.0-rc2-2dace9faccd6-x86_amx_nm_xfd_non_init-vm #168 NONE
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:exc_device_not_available+0x101/0x110
  Call Trace:
   <TASK>
   asm_exc_device_not_available+0x1a/0x20
  RIP: 0010:restore_fpregs_from_fpstate+0x36/0x90
   fpu_swap_kvm_fpstate+0x6b/0x120
   kvm_load_guest_fpu+0x30/0x80 [kvm]
   kvm_arch_vcpu_ioctl_run+0x85/0x1e40 [kvm]
   kvm_vcpu_ioctl+0x2c3/0x8f0 [kvm]
   __x64_sys_ioctl+0x8f/0xd0
   do_syscall_64+0x62/0x940
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
   </TASK>
  ---[ end trace 0000000000000000 ]---

The new behavior is consistent with the AMX architecture.  Per Intel's SDM,
XSAVE saves XSTATE_BV as '0' for components that are disabled via XFD
(and non-compacted XSAVE saves the initial configuration of the state
component):

  If XSAVE, XSAVEC, XSAVEOPT, or XSAVES is saving the state component i,
  the instruction does not generate #NM when XCR0[i] = IA32_XFD[i] = 1;
  instead, it operates as if XINUSE[i] = 0 (and the state component was
  in its initial state): it saves bit i of XSTATE_BV field of the XSAVE
  header as 0; in addition, XSAVE saves the initial configuration of the
  state component (the other instructions do not save state component i).

Alternatively, KVM could always do XRSTOR with XFD=0, e.g. by using
a constant XFD based on the set of enabled features when XSAVEing for
a struct fpu_guest.  However, having XSTATE_BV[i]=1 for XFD-disabled
features can only happen in the above interrupt case, or in similar
scenarios involving preemption on preemptible kernels, because
fpu_swap_kvm_fpstate()'s call to save_fpregs_to_fpstate() saves the
outgoing FPU state with the current XFD; and that is (on all but the
first WRMSR to XFD) the guest XFD.

Therefore, XFD can only go out of sync with XSTATE_BV in the above
interrupt case, or in similar scenarios involving preemption on
preemptible kernels, and it we can consider it (de facto) part of KVM
ABI that KVM_GET_XSAVE returns XSTATE_BV[i]=0 for XFD-disabled features.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 820a6ee944 ("kvm: x86: Add emulation for IA32_XFD", 2022-01-14)
Signed-off-by: Sean Christopherson <seanjc@google.com>
[Move clearing of XSTATE_BV from fpu_copy_uabi_to_guest_fpstate
 to kvm_vcpu_ioctl_x86_set_xsave. - Paolo]
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23 11:21:12 +01:00
..
alpha alpha: don't reference obsolete termio struct for TC* constants 2026-01-17 16:35:16 +01:00
arc Ext4 bug fixes for 6.18-rc2, including 2025-10-15 07:51:57 -07:00
arm ARM: dts: imx6q-ba16: fix RTC interrupt level 2026-01-17 16:35:20 +01:00
arm64 arm64: dts: mba8mx: Fix Ethernet PHY IRQ support 2026-01-17 16:35:21 +01:00
csky csky: fix csky_cmpxchg_fixup not working 2026-01-17 16:35:16 +01:00
hexagon Remove long-stale ext3 defconfig option 2025-10-15 07:57:28 -07:00
loongarch LoongArch: BPF: Enhance the bpf_arch_text_poke() function 2026-01-08 10:17:21 +01:00
m68k Ext4 bug fixes for 6.18-rc2, including 2025-10-15 07:51:57 -07:00
microblaze Ext4 bug fixes for 6.18-rc2, including 2025-10-15 07:51:57 -07:00
mips MIPS: ftrace: Fix memory corruption when kernel is located beyond 32 bits 2026-01-02 12:57:03 +01:00
nios2 Summary of significant series in this pull request: 2025-10-02 18:18:33 -07:00
openrisc Ext4 bug fixes for 6.18-rc2, including 2025-10-15 07:51:57 -07:00
parisc parisc: entry: set W bit for !compat tasks in syscall_restore_rfi() 2026-01-08 10:17:01 +01:00
powerpc powerpc/pseries/cmm: call balloon_devinfo_init() also without CONFIG_BALLOON_COMPACTION 2026-01-08 10:17:02 +01:00
riscv riscv: pgtable: Cleanup useless VA_USER_XXX definitions 2026-01-17 16:35:25 +01:00
s390 KVM: s390: Fix gmap_helper_zap_one_page() again 2026-01-08 10:16:41 +01:00
sh Remove long-stale ext3 defconfig option 2025-10-15 07:57:28 -07:00
sparc sparc/PCI: Correct 64-bit non-pref -> pref BAR resources 2026-01-17 16:35:29 +01:00
um um: init cpu_tasks[] earlier 2026-01-02 12:56:59 +01:00
x86 x86/fpu: Clear XSTATE_BV[i] in guest XSAVE state whenever XFD[i]=1 2026-01-23 11:21:12 +01:00
xtensa Ext4 bug fixes for 6.18-rc2, including 2025-10-15 07:51:57 -07:00
.gitignore
Kconfig compiler_types: Introduce __nocfi_generic 2025-10-29 20:04:55 -07:00