diff options
author | H. Peter Anvin (Intel) <hpa@zytor.com> | 2021-05-18 12:13:01 -0700 |
---|---|---|
committer | Thomas Gleixner <tglx@linutronix.de> | 2021-05-20 15:19:49 +0200 |
commit | 0595494891723a1dcca5eaa8eeca8ab54ad953b9 (patch) | |
tree | 11640da20dbf9e37b38ec809fecc7604f5862e9c /arch/x86/entry | |
parent | 795e2a023b8080b95442811f26f0762184116caa (diff) | |
download | linux-0595494891723a1dcca5eaa8eeca8ab54ad953b9.tar.gz linux-0595494891723a1dcca5eaa8eeca8ab54ad953b9.tar.bz2 linux-0595494891723a1dcca5eaa8eeca8ab54ad953b9.zip |
x86/entry/64: Sign-extend system calls on entry to int
Right now, *some* code will treat e.g. 0x0000000100000001 as a system
call and some will not. Some of the code, notably in ptrace, will
treat 0x000000018000000 as a system call and some will not. Finally,
right now, e.g. 335 for x86-64 will force the exit code to be set to
-ENOSYS even if poked by ptrace, but 548 will not, because there is an
observable difference between an out of range system call and a system
call number that falls outside the range of the table.
This is visible to the user: for example, the syscall_numbering_64
test fails if run under strace, because as strace uses ptrace, it ends
up clobbering the upper half of the 64-bit system call number.
The architecture independent code all assumes that a system call is "int"
that the value -1 specifically and not just any negative value is used for
a non-system call. This is the case on x86 as well when arch-independent
code is involved. The arch-independent API is defined/documented (but not
*implemented*!) in <asm-generic/syscall.h>.
This is an ABI change, but is in fact a revert to the original x86-64
ABI. The original assembly entry code would zero-extend the system call
number;
Use sign extend to be explicit that this is treated as a signed number
(although in practice it makes no difference, of course) and to avoid
people getting the idea of "optimizing" it, as has happened on at least
two(!) separate occasions.
Do not store the extended value into regs->orig_ax, however: on x86-64, the
ABI is that the callee is responsible for extending parameters, so only
examining the lower 32 bits is fully consistent with any "int" argument to
any system call, e.g. regs->di for write(2). The full value of %rax on
entry to the kernel is thus still available.
[ tglx: Add a comment to the ASM code ]
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210518191303.4135296-5-hpa@zytor.com
Diffstat (limited to 'arch/x86/entry')
-rw-r--r-- | arch/x86/entry/entry_64.S | 3 |
1 files changed, 2 insertions, 1 deletions
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 1d9db15fdc69..a5f02d03c585 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -108,7 +108,8 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL) /* IRQs are off. */ movq %rsp, %rdi - movq %rax, %rsi + /* Sign extend the lower 32bit as syscall numbers are treated as int */ + movslq %eax, %rsi call do_syscall_64 /* returns with IRQs disabled */ /* |