summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* btrfs: convert pr_* to btrfs_* where possibleJeff Mahoney2016-09-2620-177/+231
| | | | | | | | | | | | | | For many printks, we want to know which file system issued the message. This patch converts most pr_* calls to use the btrfs_* versions instead. In some cases, this means adding plumbing to allow call sites access to an fs_info pointer. fs/btrfs/check-integrity.c is left alone for another day. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: convert printk(KERN_* to use pr_* callsJeff Mahoney2016-09-2616-275/+205
| | | | | | | | | | This patch converts printk(KERN_* style messages to use the pr_* versions. One side effect is that anything that was KERN_DEBUG is now automatically a dynamic debug message. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: unsplit printed stringsJeff Mahoney2016-09-2627-391/+324
| | | | | | | | | | | CodingStyle chapter 2: "[...] never break user-visible strings such as printk messages, because that breaks the ability to grep for them." This patch unsplits user-visible strings. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: clean the old superblocks before freeing the deviceJeff Mahoney2016-09-261-27/+11
| | | | | | | | | | | | | | | | btrfs_rm_device frees the block device but then re-opens it using the saved device name. A race exists between the close and the re-open that allows the block size to be changed. The result is getting stuck forever in the reclaim loop in __getblk_slow. This patch moves the superblock cleanup before closing the block device, which is also consistent with other callers. We also don't need a private copy of dev_name as the whole routine operates under the uuid_mutex. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: kill BUG_ON in run_delayed_tree_refLiu Bo2016-09-261-1/+7
| | | | | | | | | | In a corrupted btrfs image, we can come across this BUG_ON and get an unreponsive system, but if we return errors instead, its caller can handle everything gracefully by aborting the current transaction. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: don't leak reloc root nodes on errorJosef Bacik2016-09-261-0/+4
| | | | | | | | | | We don't track the reloc roots in any sort of normal way, so the only way the root/commit_root nodes get free'd is if the relocation finishes successfully and the reloc root is deleted. Fix this by free'ing them in free_reloc_roots. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: squash lines for simple wrapper functionsMasahiro Yamada2016-09-265-28/+8
| | | | | | | | Remove unneeded variables and assignments. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: improve check_node to avoid reading corrupted nodesLiu Bo2016-09-261-4/+28
| | | | | | | | | | We need to check items in a node to make sure that we're reading a valid one, otherwise we could get various crashes while processing delayed_refs. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: add error handling for extent buffer in print treeLiu Bo2016-09-261-0/+7
| | | | | | | | | | | Somehow we missed btrfs_print_tree when last time we updated error handling for read_extent_block(). This keeps us from getting a NULL pointer panic when btrfs_print_tree's read_extent_block() fails. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: remove BUG_ON in start_transactionLiu Bo2016-09-261-4/+1
| | | | | | | | | | | | | | | Since we could get errors from the concurrent aborted transaction, the check of this BUG_ON in start_transaction is not true any more. Say, while flushing free space cache inode's dirty pages, btrfs_finish_ordered_io -> btrfs_join_transaction_nolock (the transaction has been aborted.) -> BUG_ON(type == TRANS_JOIN_NOLOCK); Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: memset to avoid stale content in btree node blockLiu Bo2016-09-261-0/+11
| | | | | | | | | | | | | | | | | | | | | During updating btree, we could push items between sibling nodes/leaves, for leaves data sections starts reversely from the end of the block while for nodes we only have key pairs which are stored one by one from the start of the block. So we could do try to push key pairs from one node to the next node right in the tree, and after that, we update the node's nritems to reflect the correct end while leaving the stale content in the node. One may intentionally corrupt the fs image and access the stale content by bumping the nritems and causes various crashes. This takes the in-memory @nritems as the correct one and gets to memset the unused part of a btree node. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: return gracefully from balance if fs tree is corruptedLiu Bo2016-09-261-6/+17
| | | | | | | | | | | | | | | | When relocating tree blocks, we firstly get block information from back references in the extent tree, we then search fs tree to try to find all parents of a block. However, if fs tree is corrupted, eg. if there're some missing items, we could come across these WARN_ONs and BUG_ONs. This makes us print some error messages and return gracefully from balance. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: kill BUG_ON()'s in btrfs_mark_extent_writtenJosef Bacik2016-09-261-8/+33
| | | | | | | | No reason to bug on in here, fs corruption could easily cause these things to happen. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: kill the start argument to read_extent_buffer_pagesJosef Bacik2016-09-263-28/+15
| | | | | | | Nobody uses this, it makes no sense to do partial reads of extent buffers. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: add a flags field to btrfs_fs_infoJosef Bacik2016-09-2616-109/+99
| | | | | | | | | | | We have a lot of random ints in btrfs_fs_info that can be put into flags. This is mostly equivalent with the exception of how we deal with quota going on or off, now instead we set a flag when we are turning it on or off and deal with that appropriately, rather than just having a pending state that the current quota_enabled gets set to. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: extend btrfs_set_extent_delalloc and its friends to support in-band ↵Qu Wenruo2016-09-267-25/+37
| | | | | | | | | | | | | | | dedupe and subpage size patchset Extend btrfs_set_extent_delalloc() and extent_clear_unlock_delalloc() parameters for both in-band dedupe and subpage sector size patchset. This should reduce conflict of both patchset and the effort to rebase them. Cc: Chandan Rajendra <chandan@linux.vnet.ibm.com> Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: add dynamic debug supportJeff Mahoney2016-09-261-1/+30
| | | | | | | | | | | | | | We can re-use the dynamic debugging descriptor to make use of the dynamic debugging mechanism but still use our own printk interface. Defining the DEBUG macro works as it did before. When it's defined, all of the messages default to print. We can also enable all debug messages at boot or module-load time using the 'dyndbg' and 'btrfs.dyndbg' options. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: Fix warning "variable ‘gen’ set but not used"Luis Henriques2016-09-261-2/+0
| | | | | | | | | | Variable 'gen' in reada_for_search() is not used since commit 58dc4ce43251 ("btrfs: remove unused parameter from readahead_tree_block"). This patch simply removes this variable. Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: Fix warning "variable ‘blocksize’ set but not used"Luis Henriques2016-09-261-2/+0
| | | | | | | | | | Variable 'blocksize' in reada_walk_down() is not used since commit d3e46fea1b1e ("btrfs: sink blocksize parameter to readahead_tree_block"). This patch simply removes this variable. Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: let btrfs_delete_unused_bgs() to clean relocated bgsNaohiro Aota2016-09-262-15/+11
| | | | | | | | | | | | | | | | | Currently, btrfs_relocate_chunk() is removing relocated BG by itself. But the work can be done by btrfs_delete_unused_bgs() (and it's better since it trim the BG). Let's dedupe the code. While btrfs_delete_unused_bgs() is already hitting the relocated BG, it skip the BG since the BG has "ro" flag set (to keep balancing BG intact). On the other hand, btrfs cannot drop "ro" flag here to prevent additional writes. So this patch make use of "removed" flag. btrfs_delete_unused_bgs() now detect the flag to distinguish whether a read-only BG is relocating or not. Signed-off-by: Naohiro Aota <naohiro.aota@hgst.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: bail out if block group has different mixed flagLiu Bo2016-09-261-0/+14
| | | | | | | | | | | | | | | | | | | | Currently we allow inconsistence about mixed flag (BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA). We'd get ENOSPC if block group has mixed flag and btrfs doesn't. If that happens, we have one space_info with mixed flag and another space_info only with BTRFS_BLOCK_GROUP_METADATA, and global_block_rsv.space_info points to the latter one, but all bytes from block_group contributes to the mixed space_info, thus all the allocation will fail with ENOSPC. This adds a check for the above case. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> [ updated message ] Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: fix memory leak in reading btree blocksLiu Bo2016-09-261-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So we can read a btree block via readahead or intentional read, and we can end up with a memory leak when something happens as follows, 1) readahead starts to read block A but does not wait for read completion, 2) btree_readpage_end_io_hook finds that block A is corrupted, and it needs to clear all block A's pages' uptodate bit. 3) meanwhile an intentional read kicks in and checks block A's pages' uptodate to decide which page needs to be read. 4) when some pages have the uptodate bit during 3)'s check so 3) doesn't count them for eb->io_pages, but they are later cleared by 2) so we has to readpage on the page, we get the wrong eb->io_pages which results in a memory leak of this block. This fixes the problem by firstly getting all pages's locking and then checking pages' uptodate bit. t1(readahead) t2(readahead endio) t3(the following read) read_extent_buffer_pages end_bio_extent_readpage for pg in eb: for page 0,1,2 in eb: if pg is uptodate: btree_readpage_end_io_hook(pg) num_reads++ if uptodate: eb->io_pages = num_reads SetPageUptodate(pg) _______________ for pg in eb: for page 3 in eb: read_extent_buffer_pages if pg is NOT uptodate: btree_readpage_end_io_hook(pg) for pg in eb: __extent_read_full_page(pg) sanity check reports something wrong if pg is uptodate: clear_extent_buffer_uptodate(eb) num_reads++ for pg in eb: eb->io_pages = num_reads ClearPageUptodate(page) _______________ for pg in eb: if pg is NOT uptodate: __extent_read_full_page(pg) So t3's eb->io_pages is not consistent with the number of pages it's reading, and during endio(), atomic_dec_and_test(&eb->io_pages) will get a negative number so that we're not able to free the eb. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: remove BUG() in raid56Liu Bo2016-09-261-1/+4
| | | | | | | | | | | | | This BUG() has been triggered by a fuzz testing image, which contains an invalid chunk type, ie. a single stripe chunk has the raid6 type. Btrfs can handle this gracefully by returning -EIO, so besides using btrfs_warn to give us more debugging information rather than a single BUG(), we can return error properly. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: fix check_shared for fiemap ioctlLu Fengqi2016-09-262-10/+369
| | | | | | | | | | | | | | | | | | | Only in the case of different root_id or different object_id, check_shared identified extent as the shared. However, If a extent was referred by different offset of same file, it should also be identified as shared. In addition, check_shared's loop scale is at least n^3, so if a extent has too many references, even causes soft hang up. First, add all delayed_ref to the ref_tree and calculate the unqiue_refs, if the unique_refs is greater than one, return BACKREF_FOUND_SHARED. Then individually add the on-disk reference(inline/keyed) to the ref_tree and calculate the unique_refs of the ref_tree to check if the unique_refs is greater than one.Because once there are two references to return SHARED, so the time complexity is close to the constant. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: create example debugfs file only in debugging buildDavid Sterba2016-09-261-0/+9
| | | | | Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: fix perms on demonstration debugfs interfaceEric Sandeen2016-09-261-1/+1
| | | | | | | | | | | | | btrfs provides a helpful demonstration of how to export a global variable via debugfs; however, it is unique among other debugfs files in that it is world-writable, which causes some concern to people who are not familiar with its purpose. Fix it so that it is only user-writable. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: fix memory leak of block group cacheLiu Bo2016-09-263-0/+75
| | | | | | | | | | | | | | | | | While processing delayed refs, we may update block group's statistics and attach it to cur_trans->dirty_bgs, and later writing dirty block groups will process the list, which happens during btrfs_commit_transaction(). For whatever reason, the transaction is aborted and dirty_bgs is not processed in cleanup_transaction(), we end up with memory leak of these dirty block group cache. Since btrfs_start_dirty_block_groups() doesn't make it go to the commit critical section, this also adds the cleanup work inside it. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Linux 4.8-rc8v4.8-rc8Linus Torvalds2016-09-251-1/+1
|
* Merge tag 'trace-v4.8-rc7' of ↵Linus Torvalds2016-09-251-13/+16
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracefs fixes from Steven Rostedt: "Al Viro has been looking at the tracefs code, and has pointed out some issues. This contains one fix by me and one by Al. I'm sure that he'll come up with more but for now I tested these patches and they don't appear to have any negative impact on tracing" * tag 'trace-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: fix memory leaks in tracing_buffers_splice_read() tracing: Move mutex to protect against resetting of seq data
| * fix memory leaks in tracing_buffers_splice_read()Al Viro2016-09-251-6/+8
| | | | | | | | | | Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * tracing: Move mutex to protect against resetting of seq dataSteven Rostedt (Red Hat)2016-09-251-7/+8
| | | | | | | | | | | | | | | | | | | | The iter->seq can be reset outside the protection of the mutex. So can reading of user data. Move the mutex up to the beginning of the function. Fixes: d7350c3f45694 ("tracing/core: make the read callbacks reentrants") Cc: stable@vger.kernel.org # 2.6.30+ Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | fault_in_multipages_readable() throws set-but-unused errorDave Chinner2016-09-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When building XFS with -Werror, it now fails with: include/linux/pagemap.h: In function 'fault_in_multipages_readable': include/linux/pagemap.h:602:16: error: variable 'c' set but not used [-Werror=unused-but-set-variable] volatile char c; ^ This is a regression caused by commit e23d4159b109 ("fix fault_in_multipages_...() on architectures with no-op access_ok()"). Fix it by re-adding the "(void)c" trick taht was previously used to make the compiler think the variable is used. Signed-off-by: Dave Chinner <dchinner@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | mm: check VMA flags to avoid invalid PROT_NONE NUMA balancingLorenzo Stoakes2016-09-252-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NUMA balancing logic uses an arch-specific PROT_NONE page table flag defined by pte_protnone() or pmd_protnone() to mark PTEs or huge page PMDs respectively as requiring balancing upon a subsequent page fault. User-defined PROT_NONE memory regions which also have this flag set will not normally invoke the NUMA balancing code as do_page_fault() will send a segfault to the process before handle_mm_fault() is even called. However if access_remote_vm() is invoked to access a PROT_NONE region of memory, handle_mm_fault() is called via faultin_page() and __get_user_pages() without any access checks being performed, meaning the NUMA balancing logic is incorrectly invoked on a non-NUMA memory region. A simple means of triggering this problem is to access PROT_NONE mmap'd memory using /proc/self/mem which reliably results in the NUMA handling functions being invoked when CONFIG_NUMA_BALANCING is set. This issue was reported in bugzilla (issue 99101) which includes some simple repro code. There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page() added at commit c0e7cad to avoid accidentally provoking strange behaviour by attempting to apply NUMA balancing to pages that are in fact PROT_NONE. The BUG_ON()'s are consistently triggered by the repro. This patch moves the PROT_NONE check into mm/memory.c rather than invoking BUG_ON() as faulting in these pages via faultin_page() is a valid reason for reaching the NUMA check with the PROT_NONE page table flag set and is therefore not always a bug. Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101 Reported-by: Trevor Saunders <tbsaunde@tbsaunde.org> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds2016-09-2517-65/+37
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull MIPS fixes from Ralf Baechle: "A round of 4.8 fixes: MIPS generic code: - Add a missing ".set pop" in an early commit - Fix memory regions reaching top of physical - MAAR: Fix address alignment - vDSO: Fix Malta EVA mapping to vDSO page structs - uprobes: fix incorrect uprobe brk handling - uprobes: select HAVE_REGS_AND_STACK_ACCESS_API - Avoid a BUG warning during PR_SET_FP_MODE prctl - SMP: Fix possibility of deadlock when bringing CPUs online - R6: Remove compact branch policy Kconfig entries - Fix size calc when avoiding IPIs for small icache flushes - Fix pre-r6 emulation FPU initialisation - Fix delay slot emulation count in debugfs ATH79: - Fix test for error return of clk_register_fixed_factor. Octeon: - Fix kernel header to work for VDSO build. - Fix initialization of platform device probing. paravirt: - Fix undefined reference to smp_bootstrap" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: Fix delay slot emulation count in debugfs MIPS: SMP: Fix possibility of deadlock when bringing CPUs online MIPS: Fix pre-r6 emulation FPU initialisation MIPS: vDSO: Fix Malta EVA mapping to vDSO page structs MIPS: Select HAVE_REGS_AND_STACK_ACCESS_API MIPS: Octeon: Fix platform bus probing MIPS: Octeon: mangle-port: fix build failure with VDSO code MIPS: Avoid a BUG warning during prctl(PR_SET_FP_MODE, ...) MIPS: c-r4k: Fix size calc when avoiding IPIs for small icache flushes MIPS: Add a missing ".set pop" in an early commit MIPS: paravirt: Fix undefined reference to smp_bootstrap MIPS: Remove compact branch policy Kconfig entries MIPS: MAAR: Fix address alignment MIPS: Fix memory regions reaching top of physical MIPS: uprobes: fix incorrect uprobe brk handling MIPS: ath79: Fix test for error return of clk_register_fixed_factor().
| * | MIPS: Fix delay slot emulation count in debugfsPaul Burton2016-09-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 432c6bacbd0c ("MIPS: Use per-mm page to execute branch delay slot instructions") accidentally removed use of the MIPS_FPU_EMU_INC_STATS macro from do_dsemulret, leading to the ds_emul file in debugfs always returning zero even though we perform delay slot emulations. Fix this by re-adding the use of the MIPS_FPU_EMU_INC_STATS macro. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: 432c6bacbd0c ("MIPS: Use per-mm page to execute branch delay slot instructions") Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/14301/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | MIPS: SMP: Fix possibility of deadlock when bringing CPUs onlineMatt Redfearn2016-09-251-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the possibility of a deadlock when bringing up secondary CPUs. The deadlock occurs because the set_cpu_online() is called before synchronise_count_slave(). This can cause a deadlock if the boot CPU, having scheduled another thread, attempts to send an IPI to the secondary CPU, which it sees has been marked online. The secondary is blocked in synchronise_count_slave() waiting for the boot CPU to enter synchronise_count_master(), but the boot cpu is blocked in smp_call_function_many() waiting for the secondary to respond to it's IPI request. Fix this by marking the CPU online in cpu_callin_map and synchronising counters before declaring the CPU online and calculating the maps for IPIs. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Reported-by: Justin Chen <justinpopo6@gmail.com> Tested-by: Justin Chen <justinpopo6@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: stable@vger.kernel.org # v4.1+ Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/14302/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | MIPS: Fix pre-r6 emulation FPU initialisationPaul Burton2016-09-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the mipsr2_decoder() function, used to emulate pre-MIPSr6 instructions that were removed in MIPSr6, the init_fpu() function is called if a removed pre-MIPSr6 floating point instruction is the first floating point instruction used by the task. However, init_fpu() performs varous actions that rely upon not being migrated. For example in the most basic case it sets the coprocessor 0 Status.CU1 bit to enable the FPU & then loads FP register context into the FPU registers. If the task were to migrate during this time, it may end up attempting to load FP register context on a different CPU where it hasn't set the CU1 bit, leading to errors such as: do_cpu invoked from kernel context![#2]: CPU: 2 PID: 7338 Comm: fp-prctl Tainted: G D 4.7.0-00424-g49b0c82 #2 task: 838e4000 ti: 88d38000 task.ti: 88d38000 $ 0 : 00000000 00000001 ffffffff 88d3fef8 $ 4 : 838e4000 88d38004 00000000 00000001 $ 8 : 3400fc01 801f8020 808e9100 24000000 $12 : dbffffff 807b69d8 807b0000 00000000 $16 : 00000000 80786150 00400fc4 809c0398 $20 : 809c0338 0040273c 88d3ff28 808e9d30 $24 : 808e9d30 00400fb4 $28 : 88d38000 88d3fe88 00000000 8011a2ac Hi : 0040273c Lo : 88d3ff28 epc : 80114178 _restore_fp+0x10/0xa0 ra : 8011a2ac mipsr2_decoder+0xd5c/0x1660 Status: 1400fc03 KERNEL EXL IE Cause : 1080002c (ExcCode 0b) PrId : 0001a920 (MIPS I6400) Modules linked in: Process fp-prctl (pid: 7338, threadinfo=88d38000, task=838e4000, tls=766527d0) Stack : 00000000 00000000 00000000 88d3fe98 00000000 00000000 809c0398 809c0338 808e9100 00000000 88d3ff28 00400fc4 00400fc4 0040273c 7fb69e18 004a0000 004a0000 004a0000 7664add0 8010de18 00000000 00000000 88d3fef8 88d3ff28 808e9100 00000000 766527d0 8010e534 000c0000 85755000 8181d580 00000000 00000000 00000000 004a0000 00000000 766527d0 7fb69e18 004a0000 80105c20 ... Call Trace: [<80114178>] _restore_fp+0x10/0xa0 [<8011a2ac>] mipsr2_decoder+0xd5c/0x1660 [<8010de18>] do_ri+0x90/0x6b8 [<80105c20>] ret_from_exception+0x0/0x10 Fix this by disabling preemption around the call to init_fpu(), ensuring that it starts & completes on one CPU. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: b0a668fb2038 ("MIPS: kernel: mips-r2-to-r6-emul: Add R2 emulator for MIPS R6") Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # v4.0+ Patchwork: https://patchwork.linux-mips.org/patch/14305/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | MIPS: vDSO: Fix Malta EVA mapping to vDSO page structsJames Hogan2016-09-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The page structures associated with the vDSO pages in the kernel image are calculated using virt_to_page(), which uses __pa() under the hood to find the pfn associated with the virtual address. The vDSO data pointers however point to kernel symbols, so __pa_symbol() should really be used instead. Since there is no equivalent to virt_to_page() which uses __pa_symbol(), fix init_vdso_image() to work directly with pfns, calculated with __phys_to_pfn(__pa_symbol(...)). This issue broke the Malta Enhanced Virtual Addressing (EVA) configuration which has a non-default implementation of __pa_symbol(). This is because it uses a physical alias so that the kernel executes from KSeg0 (VA 0x80000000 -> PA 0x00000000), while RAM is provided to the kernel in the KUSeg range (VA 0x00000000 -> PA 0x80000000) which uses the same underlying RAM. Since there are no page structures associated with the low physical address region, some arbitrary kernel memory would be interpreted as a page structure for the vDSO pages and badness ensues. Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 4.4.x- Patchwork: https://patchwork.linux-mips.org/patch/14229/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | MIPS: Select HAVE_REGS_AND_STACK_ACCESS_APIMarcin Nowakowski2016-09-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add lost Kconfig symbol. This should have been part of 40e084a506eb ('MIPS: Add uprobes support.'). Fixes: 40e084a506eb ('MIPS: Add uprobes support.') Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | MIPS: Octeon: Fix platform bus probingAaro Koskinen2016-09-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 44a7185c2ae6 ("of/platform: Add common method to populate default bus") added new arch_initcall of_platform_default_populate_init() that will override device_initcall octeon_publish_devices(). This broke many OCTEON boards as important devices are not getting probed anymore (e.g. on EdgeRouter Lite the USB mass storage/rootfs is missing). Fix by changing octeon_publish_devices() to arch_initcall. Fixes: 44a7185c2ae6 ("of/platform: Add common method to populate default bus") Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Acked-by: Rob Herring <robh@kernel.org> Cc: David Daney <ddaney@caviumnetworks.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: linux-mips@linux-mips.org Cc: devicetree@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/14041/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | MIPS: Octeon: mangle-port: fix build failure with VDSO codeAaro Koskinen2016-09-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1685ddbe35cd ("MIPS: Octeon: Changes to support readq()/writeq() usage.") added bitwise shift operations that assume that unsigned long is always 64-bits. This broke the build of VDSO code, as it gets compiled also in "faked" 32-bit mode. Althought the failing inline functions are never executed in 32-bit mode, they still need to pass the compilation. Fix by using 64-bit types explicitly. The patch fixes the following build failure: CC arch/mips/vdso/gettimeofday-o32.o In file included from los/git/devel/linux/arch/mips/include/asm/io.h:32:0, from los/git/devel/linux/arch/mips/include/asm/page.h:194, from los/git/devel/linux/arch/mips/vdso/vdso.h:26, from los/git/devel/linux/arch/mips/vdso/gettimeofday.c:11: los/git/devel/linux/arch/mips/include/asm/mach-cavium-octeon/mangle-port.h: In function '__should_swizzle_bits': los/git/devel/linux/arch/mips/include/asm/mach-cavium-octeon/mangle-port.h:19:40: error: right shift count >= width of type [-Werror=shift-count-overflow] unsigned long did = ((unsigned long)a >> 40) & 0xff; ^~ Fixes: 1685ddbe35cd ("MIPS: Octeon: Changes to support readq()/writeq() usage.") Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Acked-by: David Daney <ddaney@caviumnetworks.com> Cc: David Daney <david.daney@cavium.com> Cc: Steven J. Hill <steven.hill@cavium.com> Cc: Alex Smith <alex.smith@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/14039/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | MIPS: Avoid a BUG warning during prctl(PR_SET_FP_MODE, ...)Marcin Nowakowski2016-09-191-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpu_has_fpu macro uses smp_processor_id() and is currently executed with preemption enabled, that triggers the warning at runtime. It is assumed throughout the kernel that if any CPU has an FPU, then all CPUs would have an FPU as well, so it is safe to perform the check with preemption enabled - change the code to use raw_ variant of the check to avoid the warning. Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # 4.0+ Patchwork: https://patchwork.linux-mips.org/patch/14125/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | MIPS: c-r4k: Fix size calc when avoiding IPIs for small icache flushesPaul Burton2016-09-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f70ddc07b637 ("MIPS: c-r4k: Avoid small flush_icache_range SMP calls") adds checks to force use of hit-type cache ops for small icache flushes where they are globalised & index-type cache ops aren't, in order to avoid the overhead of IPIs in those cases. However it calculated the size of the region being flushed incorrectly, subtracting the end address from the start address rather than the reverse. This would have led to an overflow with size wrapping round to some large value, and likely to the special case for avoiding IPIs not actually being hit. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: James Hogan <james.hogan@imgtec.com> Fixes: f70ddc07b637 ("MIPS: c-r4k: Avoid small flush_icache_range SMP calls") Reviewed-by: James Hogan <james.hogan@imgtec.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Cc: Huacai Chen <chenhc@lemote.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/14211/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | MIPS: Add a missing ".set pop" in an early commitHuacai Chen2016-09-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 842dfc11ea9a21 ("MIPS: Fix build with binutils 2.24.51+") missing a ".set pop" in macro fpu_restore_16even, so add it. Signed-off-by: Huacai Chen <chenhc@lemote.com> Acked-by: Manuel Lauss <manuel.lauss@gmail.com> Cc: Steven J . Hill <Steven.Hill@caviumnetworks.com> Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # 3.18+ Patchwork: https://patchwork.linux-mips.org/patch/14210/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | MIPS: paravirt: Fix undefined reference to smp_bootstrapMatt Redfearn2016-09-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the paravirt machine is compiles without CONFIG_SMP, the following linker error occurs arch/mips/kernel/head.o: In function `kernel_entry': (.ref.text+0x10): undefined reference to `smp_bootstrap' due to the kernel entry macro always including SMP startup code. Wrap this code in CONFIG_SMP to fix the error. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org # 3.16+ Patchwork: https://patchwork.linux-mips.org/patch/14212/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | MIPS: Remove compact branch policy Kconfig entriesPaul Burton2016-09-132-40/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit c1a0e9bc885d ("MIPS: Allow compact branch policy to be changed") added Kconfig entries allowing for the compact branch policy used by the compiler for MIPSr6 kernels to be specified. This can be useful for debugging, particularly in systems where compact branches have recently been introduced. Unfortunately mainline gcc 5.x supports MIPSr6 but not the -mcompact-branches compiler flag, leading to MIPSr6 kernels failing to build with gcc 5.x with errors such as: mipsel-linux-gnu-gcc: error: unrecognized command line option '-mcompact-branches=optimal' make[2]: *** [kernel/bounds.s] Error 1 Fixing this by hiding the Kconfig entry behind another seems to be more hassle than it's worth, as MIPSr6 & compact branches have been around for a while now and if policy does need to be set for debug it can be done easily enough with KCFLAGS. Therefore remove the compact branch policy Kconfig entries & their handling in the Makefile. This reverts commit c1a0e9bc885d ("MIPS: Allow compact branch policy to be changed"). Signed-off-by: Paul Burton <paul.burton@imgtec.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: c1a0e9bc885d ("MIPS: Allow compact branch policy to be changed") Cc: stable <stable@vger.kernel.org> # v4.4+ Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/14241/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | MIPS: MAAR: Fix address alignmentJames Hogan2016-09-131-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The alignment of MIPS MAAR region addresses isn't quite right. - It rounds an already 64 KiB aligned start address up to the next 64 KiB boundary, e.g. 0x80000000 is rounded up to 0x80010000. - It assumes the end address is already on a 64 KiB boundary and doesn't round it down. Should that not be the case it will hit the second BUG_ON() in write_maar_pair(). Both cases are addressed by rounding up and down to 64 KiB boundaries in the more traditional way of adding 0xffff (for rounding up) and masking off the low 16 bits. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13858/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | MIPS: Fix memory regions reaching top of physicalJames Hogan2016-09-131-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Memory regions added with add_memory_region() at the top of the physical address space will have their end address overflow to 0. This causes them to be rejected as invalid, and would cause various other issues later on. This causes issues on Malta and Boston platforms when wanting to use all 2GB of RAM on a 32-bit kernel, either via highmem (using physical addresses 0x90000000..0xFFFFFFFF), or with the Malta Enhanced Virtual Addressing (EVA) layout which exposes the whole 0x80000000..0xFFFFFFFF physical address range to kernel mode at 0x00000000..0x7FFFFFFF. Due to the abundance of these non-overflow assumptions and the fact that memblock already avoids the arithmetic overflow by limiting the size of new memory regions without the arch code knowing it (in particular mem_init_free_highmem() will trigger a page dump due to nonzero mapcount on the last page), it is simpler and safer to just limit the size of the region in a similar way to memblock but at the arch level to allow most of the RAM to be used without arithmetic overflows. Therefore we detect this case specifically and reduce the size of the region slightly to avoid the arithmetic overflows and cause the last page to be ignored. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13857/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | MIPS: uprobes: fix incorrect uprobe brk handlingMarcin Nowakowski2016-09-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a uprobe-replacement breakpoint instruction is handled, a notifier is called with DIE_UPROBE argument, but a corresponding exception notify handler for MIPS attempts to handle DIE_BREAK instead. As a result the breakpoint instruction isn't handled by the uprobe code and the probed application terminates with SIGTRAP. Fix this by changing arch_uprobe_exception_notify code to handle DIE_UPROBE as a pre-singlestep condition instead of DIE_BREAK. Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13884/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | MIPS: ath79: Fix test for error return of clk_register_fixed_factor().Amitoj Kaur Chawla2016-09-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clk_register_fixed_factor returns an ERR_PTR in case of an error and should have an IS_ERR check instead of a null check. The Coccinelle semantic patch used to find this issue is as follows: @@ expression e; statement S; @@ *e = clk_register_fixed_factor(...); if (!e) S Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Cc: julia.lawall@lip6.fr Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/13894/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>