summaryrefslogtreecommitdiffstats
path: root/arch/mips/lib/memset.S
Commit message (Collapse)AuthorAgeFilesLines
* MIPS: memset: Limit excessive `noreorder' assembly mode useMaciej W. Rozycki2018-10-091-24/+24
| | | | | | | | | | | | | | | Rewrite to use the `reorder' assembly mode and remove manually scheduled delay slots except where GAS cannot schedule a delay-slot instruction due to a data dependency or a section switch (as is the case with the EX macro). No change in machine code produced. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> [paul.burton@mips.com: Fix conflict with commit 932afdeec18b ("MIPS: Add Kconfig variable for CPUs with unaligned load/store instructions")] Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/20834/ Cc: Ralf Baechle <ralf@linux-mips.org>
* MIPS: memset: Fix CPU_DADDI_WORKAROUNDS `small_fixup' regressionMaciej W. Rozycki2018-10-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a commit 8a8158c85e1e ("MIPS: memset.S: EVA & fault support for small_memset") regression and remove assembly warnings: arch/mips/lib/memset.S: Assembler messages: arch/mips/lib/memset.S:243: Warning: Macro instruction expanded into multiple instructions in a branch delay slot triggering with the CPU_DADDI_WORKAROUNDS option set and this code: PTR_SUBU a2, t1, a0 jr ra PTR_ADDIU a2, 1 This is because with that option in place the DADDIU instruction, which the PTR_ADDIU CPP macro expands to, becomes a GAS macro, which in turn expands to an LI/DADDU (or actually ADDIU/DADDU) sequence: 13c: 01a4302f dsubu a2,t1,a0 140: 03e00008 jr ra 144: 24010001 li at,1 148: 00c1302d daddu a2,a2,at ... Correct this by switching off the `noreorder' assembly mode and letting GAS schedule this jump's delay slot, as there is nothing special about it that would require manual scheduling. With this change in place correct code is produced: 13c: 01a4302f dsubu a2,t1,a0 140: 24010001 li at,1 144: 03e00008 jr ra 148: 00c1302d daddu a2,a2,at ... Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: 8a8158c85e1e ("MIPS: memset.S: EVA & fault support for small_memset") Patchwork: https://patchwork.linux-mips.org/patch/20833/ Cc: Ralf Baechle <ralf@linux-mips.org> Cc: stable@vger.kernel.org # 4.17+
* MIPS: Add Kconfig variable for CPUs with unaligned load/store instructionsYasha Cherikovsky2018-09-261-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MIPSR6 CPUs do not support unaligned load/store instructions (LWL, LWR, SWL, SWR and LDL, LDR, SDL, SDR for 64bit). Currently the MIPS tree has some special cases to avoid these instructions, and the code is testing for !CONFIG_CPU_MIPSR6. This patch declares a new Kconfig variable: CONFIG_CPU_HAS_LOAD_STORE_LR. This variable indicates that the CPU supports these instructions. Then, the patch does the following: - Carefully selects this option on all CPUs except MIPSR6. - Switches all the special cases to test for the new variable, and inverts the logic: '#ifndef CONFIG_CPU_MIPSR6' turns into '#ifdef CONFIG_CPU_HAS_LOAD_STORE_LR' and vice-versa. Also, when this variable is NOT selected (e.g. MIPSR6), CONFIG_GENERIC_CSUM will default to 'y', to compile generic C checksum code (instead of special assembly code that uses the unsupported instructions). This commit should not affect any existing CPU, and is required for future Lexra CPU support, that misses these instructions too. Signed-off-by: Yasha Cherikovsky <yasha.che3@gmail.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/20808/ Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: James Hogan <jhogan@kernel.org> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
* MIPS: memset.S: Add comments to fault fixup handlersMatt Redfearn2018-07-241-0/+18
| | | | | | | | | | | | | | | It is not immediately obvious what the expected inputs to these fault handlers is and how they calculate the number of unset bytes. Having stared deeply at this in order to fix some corner cases, add some comments to assist those who follow. Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/19339/ Cc: James Hogan <jhogan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: <linux-mips@linux-mips.org> Cc: <linux-kernel@vger.kernel.org>
* MIPS: memset.S: Fix byte_fixup for MIPSr6Matt Redfearn2018-07-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __clear_user function is defined to return the number of bytes that could not be cleared. From the underlying memset / bzero implementation this means setting register a2 to that number on return. Currently if a page fault is triggered within the MIPSr6 version of setting of initial unaligned bytes, the value loaded into a2 on return is meaningless. During the MIPSr6 version of the initial unaligned bytes block, register a2 contains the number of bytes to be set beyond the initial unaligned bytes. The t0 register is initally set to the number of unaligned bytes - STORSIZE, effectively a negative version of the number of unaligned bytes. This is then incremented before each byte is saved. The label .Lbyte_fixup\@ is jumped to on page fault. Currently the value in a2 is incorrectly replaced by 0 - t0 + 1, effectively the number of unaligned bytes remaining. This leads to the failures being reported by the following test code: static int __init test_clear_user(void) { int j, k; pr_info("\n\n\nTesting clear_user\n"); for (j = 0; j < 512; j++) { if ((k = clear_user(NULL+3, j)) != j) { pr_err("clear_user (NULL %d) returned %d\n", j, k); } } return 0; } late_initcall(test_clear_user); Which reports: [ 3.965439] Testing clear_user [ 3.973169] clear_user (NULL 8) returned 6 [ 3.976782] clear_user (NULL 9) returned 6 [ 3.980390] clear_user (NULL 10) returned 6 [ 3.984052] clear_user (NULL 11) returned 6 [ 3.987524] clear_user (NULL 12) returned 6 Fix this by subtracting t0 from a2 (rather than $0), effectivey giving: unset_bytes = (#bytes - (#unaligned bytes)) - (-#unaligned bytes remaining + 1) + 1 a2 = a2 - t0 + 1 This fixes the value returned from __clear user when the number of bytes to set is > LONGSIZE and the address is invalid and unaligned. Unfortunately, this breaks the fixup handling for unaligned bytes after the final long, where register a2 still contains the number of bytes remaining to be set and the t0 register is to 0 - the number of unaligned bytes remaining. Because t0 is now is now subtracted from a2 rather than 0, the number of bytes unset is reported incorrectly: static int __init test_clear_user(void) { char *test; int j, k; pr_info("\n\n\nTesting clear_user\n"); test = vmalloc(PAGE_SIZE); for (j = 256; j < 512; j++) { if ((k = clear_user(test + PAGE_SIZE - 254, j)) != j - 254) { pr_err("clear_user (%px %d) returned %d\n", test + PAGE_SIZE - 254, j, k); } } return 0; } late_initcall(test_clear_user); [ 3.976775] clear_user (c00000000000df02 256) returned 4 [ 3.981957] clear_user (c00000000000df02 257) returned 6 [ 3.986425] clear_user (c00000000000df02 258) returned 8 [ 3.990850] clear_user (c00000000000df02 259) returned 10 [ 3.995332] clear_user (c00000000000df02 260) returned 12 [ 3.999815] clear_user (c00000000000df02 261) returned 14 Fix this by ensuring that a2 is set to 0 during the set of final unaligned bytes. Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: 8c56208aff77 ("MIPS: lib: memset: Add MIPS R6 support") Patchwork: https://patchwork.linux-mips.org/patch/19338/ Cc: James Hogan <jhogan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org # v4.0+
* MIPS: memset.S: Reinstate delay slot indentationMatt Redfearn2018-05-211-14/+14
| | | | | | | | | | | | | | | | Assembly language within the MIPS kernel conventionally indents instructions which are in a branch delay slot to make them easier to see. Commit 8483b14aaa81 ("MIPS: lib: memset: Whitespace fixes") rather inexplicably removed all of these indentations from memset.S. Reinstate the convention for all instructions in a branch delay slot. This effectively reverts the above commit, plus other locations introduced with MIPSR6 support. Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/19111/ Signed-off-by: James Hogan <jhogan@kernel.org>
* MIPS: memset.S: Fix clobber of v1 in last_fixupMatt Redfearn2018-04-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The label .Llast_fixup\@ is jumped to on page fault within the final byte set loop of memset (on < MIPSR6 architectures). For some reason, in this fault handler, the v1 register is randomly set to a2 & STORMASK. This clobbers v1 for the calling function. This can be observed with the following test code: static int __init __attribute__((optimize("O0"))) test_clear_user(void) { register int t asm("v1"); char *test; int j, k; pr_info("\n\n\nTesting clear_user\n"); test = vmalloc(PAGE_SIZE); for (j = 256; j < 512; j++) { t = 0xa5a5a5a5; if ((k = clear_user(test + PAGE_SIZE - 256, j)) != j - 256) { pr_err("clear_user (%px %d) returned %d\n", test + PAGE_SIZE - 256, j, k); } if (t != 0xa5a5a5a5) { pr_err("v1 was clobbered to 0x%x!\n", t); } } return 0; } late_initcall(test_clear_user); Which demonstrates that v1 is indeed clobbered (MIPS64): Testing clear_user v1 was clobbered to 0x1! v1 was clobbered to 0x2! v1 was clobbered to 0x3! v1 was clobbered to 0x4! v1 was clobbered to 0x5! v1 was clobbered to 0x6! v1 was clobbered to 0x7! Since the number of bytes that could not be set is already contained in a2, the andi placing a value in v1 is not necessary and actively harmful in clobbering v1. Reported-by: James Hogan <jhogan@kernel.org> Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/19109/ Signed-off-by: James Hogan <jhogan@kernel.org>
* MIPS: memset.S: Fix return of __clear_user from Lpartial_fixupMatt Redfearn2018-04-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __clear_user function is defined to return the number of bytes that could not be cleared. From the underlying memset / bzero implementation this means setting register a2 to that number on return. Currently if a page fault is triggered within the memset_partial block, the value loaded into a2 on return is meaningless. The label .Lpartial_fixup\@ is jumped to on page fault. In order to work out how many bytes failed to copy, the exception handler should find how many bytes left in the partial block (andi a2, STORMASK), add that to the partial block end address (a2), and subtract the faulting address to get the remainder. Currently it incorrectly subtracts the partial block start address (t1), which has additionally been clobbered to generate a jump target in memset_partial. Fix this by adding the block end address instead. This issue was found with the following test code: int j, k; for (j = 0; j < 512; j++) { if ((k = clear_user(NULL, j)) != j) { pr_err("clear_user (NULL %d) returned %d\n", j, k); } } Which now passes on Creator Ci40 (MIPS32) and Cavium Octeon II (MIPS64). Suggested-by: James Hogan <jhogan@kernel.org> Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/19108/ Signed-off-by: James Hogan <jhogan@kernel.org>
* MIPS: memset.S: EVA & fault support for small_memsetMatt Redfearn2018-04-161-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MIPS kernel memset / bzero implementation includes a small_memset branch which is used when the region to be set is smaller than a long (4 bytes on 32bit, 8 bytes on 64bit). The current small_memset implementation uses a simple store byte loop to write the destination. There are 2 issues with this implementation: 1. When EVA mode is active, user and kernel address spaces may overlap. Currently the use of the sb instruction means kernel mode addressing is always used and an intended write to userspace may actually overwrite some critical kernel data. 2. If the write triggers a page fault, for example by calling __clear_user(NULL, 2), instead of gracefully handling the fault, an OOPS is triggered. Fix these issues by replacing the sb instruction with the EX() macro, which will emit EVA compatible instuctions as required. Additionally implement a fault fixup for small_memset which sets a2 to the number of bytes that could not be cleared (as defined by __clear_user). Reported-by: Chuanhua Lei <chuanhua.lei@intel.com> Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/18975/ Signed-off-by: James Hogan <jhogan@kernel.org>
* MIPS: Export memcpy & memset functions alongside their definitionsPaul Burton2017-01-031-0/+5
| | | | | | | | | | | Now that EXPORT_SYMBOL can be used from assembly source, move the EXPORT_SYMBOL invocations for the memcpy & memset functions & variants thereof to be alongside their definitions. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/14514/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: memset.S: Disable code unused with non-R6 MIPS configsMaciej W. Rozycki2016-05-091-0/+2
| | | | | | | | | | This complements commit 8c56208aff77 ("MIPS: lib: memset: Add MIPS R6 support"). Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12452/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: uaccess: Take EVA into account in [__]clear_userJames Hogan2015-12-221-0/+2
| | | | | | | | | | | | | | | | | | __clear_user() (and clear_user() which uses it), always access the user mode address space, which results in EVA store instructions when EVA is enabled even if the current user address limit is KERNEL_DS. Fix this by adding a new symbol __bzero_kernel for the normal kernel address space bzero in EVA mode, and call that from __clear_user() if eva_kernel_access(). Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/10844/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: lib: memset: Add MIPS R6 supportLeonid Yegoshin2015-02-171-0/+47
| | | | | | | | | MIPS R6 dropped the unaligned load and store instructions so we need to re-write this part of the code for R6 to store one byte at a time. Signed-off-by: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com> Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
* MIPS: lib: memset: Clean up some MIPS{EL,EB} ifdeferyMarkos Chandras2014-11-241-4/+2
| | | | | | | | | | The toolchain defines exactly one of __MIPSEB__ and __MIPSEL__. As a result, simplify the ifdefery a little bit. Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/8522/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: lib: memset: Add EVA support for the __bzero function.Markos Chandras2014-03-261-4/+23
| | | | | | | | | Build the __bzero function using the EVA load/store instructions when operating in the EVA mode. This function is only used when accessing user code so there is no need to build two distinct symbols for user and kernel operations respectively. Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
* MIPS: lib: memset: Use macro to build the __bzero symbolMarkos Chandras2014-03-261-35/+60
| | | | | | | | Build the __bzero symbol using a macor. In EVA mode we will need to use similar code to do the userspace load operations so it is better if we use a macro to avoid code duplications. Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
* MIPS: lib: memset: Whitespace fixesMarkos Chandras2014-03-261-15/+15
| | | | Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
* MIPS: microMIPS: Optimise 'memset' core library function.Steven J. Hill2013-05-091-30/+54
| | | | | | | | Optimise 'memset' to use microMIPS instructions and/or optimisations for binary size reduction. When the microMIPS ISA is not being used, the library function compiles to the original binary code. Signed-off-by: Steven J. Hill <Steven.Hill@imgtec.com>
* MIPS: Whitespace cleanup.Ralf Baechle2013-02-011-2/+2
| | | | | | | | Having received another series of whitespace patches I decided to do this once and for all rather than dealing with this kind of patches trickling in forever. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: Separate two consecutive loads in memset.STony Wu2010-12-161-2/+2
| | | | | | | | | | | | | | | | | | partial_fixup is used in noreorder block. Separating two consecutive loads can save one cycle on processors with GPR intrelock and can fix load-use on processors that need a load delay slot. Also do so for fwd_fixup. [Ralf: Only R2000/R3000 class processors are lacking the the load-user interlock and even some of those got it retrofitted. With R2000/R3000 being fairly uncommon these days the impact of this bug should be minor.] Signed-off-by: Tony Wu <tung7970@gmail.com> To: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/1768/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* [MIPS] Eleminate local symbols from the symbol table.Ralf Baechle2008-01-291-14/+14
| | | | | | | | | These symbols appear in oprofile output, stacktraces and similar but only make the output harder to read. Many identical symbol names such as "both_aligned" were also being used in multiple source files making it impossible to see which file actually was meant. So let's get rid of them. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* [MIPS] IP28: added cache barrier to assembly routinesThomas Bogendoerfer2008-01-291-0/+5
| | | | | | | | | | | IP28 needs special treatment to avoid speculative accesses. gcc takes care for .c code, but for assembly code we need to do it manually. This is taken from Peter Fuersts IP28 patches. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* [MIPS] R4000/R4400 daddiu erratum workaroundMaciej W. Rozycki2008-01-291-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This complements the generic R4000/R4400 errata workaround code and adds bits for the daddiu problem. In most places it just modifies handwritten assembly code so that the assembler is allowed to use a temporary register as daddiu may now be treated as a macro that expands to a sequence of li and daddu. It is the AT register or, where AT is unavailable or used explicitly for another purpose, an explicitly-named register is selected, using the .set at=<reg> feature added recently to gas. This feature is only used if CONFIG_CPU_DADDI_WORKAROUNDS has been set, so if the workaround remains disabled, the required version of binutils stays unchanged. Similarly, daddiu instructions put in branch delay slots in noreorder fragments are now taken out of them and the assembler is allowed to reorder them itself as possible (which it does making the whole idea of scheduling them into delay slots manually questionable). Also in the very few places where such a simple conversion was not possible, a handcoded longer sequence is implemented. Other than that there are changes to code responsible for building the TLB fault and page clear/copy handlers to avoid daddiu as appropriate. These are only effective if the erratum is verified to be present at the run time. Finally there is a trivial update to __delay(), because it uses daddiu in a branch delay slot. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* [MIPS] Unify memset.SAtsushi Nemoto2007-02-061-0/+166
The 32-bit version and 64-bit version are almost equal. Unify them. This makes further improvements (for example, supporting CDEX, etc.) easier. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>