summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* SUNRPC: Handle EPIPE in xprt_connect_statusTrond Myklebust2014-07-032-0/+2
| | | | | | | | | | The callback handler xs_error_report() can end up propagating an EPIPE error by means of the call to xprt_wake_pending_tasks(). Ensure that xprt_connect_status() does not automatically convert this into an EIO error. Reported-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* SUNRPC: Ensure that we handle ENOBUFS errors correctly.Trond Myklebust2014-06-302-0/+9
| | | | | | | Currently, an ENOBUFS error will result in a fatal error for the RPC call. Normally, we will just want to wait and then retry. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* FS/NFS: replace count*size kzalloc by kcallocFabian Frederick2014-06-252-2/+2
| | | | | | | | | kcalloc manages count*sizeof overflow. Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: get rid of duplicate dprintkWeston Andros Adamson2014-06-251-6/+0
| | | | | | | This was introduced by a merge error with my recent pgio patchset. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: Fix unused variable errorAnna Schumaker2014-06-241-3/+2
| | | | | | | inode is unused when CONFIG_SUNRPC_DEBUG=n. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: remove unneeded EXPORTsWeston Andros Adamson2014-06-241-2/+0
| | | | | | | | | EXPORT_GPLs of nfs_pageio_add_request and nfs_pageio_complete aren't needed anymore. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pnfs: clean up *_resend_to_mdsWeston Andros Adamson2014-06-245-68/+47
| | | | | | | | | | Clean up pnfs_read_done_resend_to_mds and pnfs_write_done_resend_to_mds: - instead of passing all arguments from a nfs_pgio_header, just pass the header - share the common code Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: remove pgio_header refcount, related cleanupWeston Andros Adamson2014-06-243-32/+11
| | | | | | | | | | The refcounting on nfs_pgio_header was related to there being (possibly) more than one nfs_pgio_data. Now that nfs_pgio_data has been merged into nfs_pgio_header, there is no reason to do this ref counting. Just call the completion callback on nfs_pgio_release/nfs_pgio_error. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: remove unused writeverf codeWeston Andros Adamson2014-06-247-46/+18
| | | | | | | | | Remove duplicate writeverf structure from merge of nfs_pgio_header and nfs_pgio_data and remove writeverf related flags and logic to handle more than one RPC per nfs_pgio_header. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: merge nfs_pgio_data into _headerWeston Andros Adamson2014-06-2419-482/+460
| | | | | | | | | | | | struct nfs_pgio_data only exists as a member of nfs_pgio_header, but is passed around everywhere, because there used to be multiple _data structs per _header. Many of these functions then use the _data to find a pointer to the _header. This patch cleans this up by merging the nfs_pgio_data structure into nfs_pgio_header and passing nfs_pgio_header around instead. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: rename members of nfs_pgio_dataWeston Andros Adamson2014-06-245-21/+25
| | | | | | | | | Rename "verf" to "writeverf" and "pages" to "page_array" to prepare for merge of nfs_pgio_data and nfs_pgio_header. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: move nfs_pgio_data and remove nfs_rw_headerWeston Andros Adamson2014-06-248-119/+71
| | | | | | | | | | | | nfs_rw_header was used to allocate an nfs_pgio_header along with an nfs_pgio_data, because a _header would need at least one _data. Now there is only ever one nfs_pgio_data for each nfs_pgio_header -- move it to nfs_pgio_header and get rid of nfs_rw_header. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4: test SECINFO RPC_AUTH_GSS pseudoflavors for supportAndy Adamson2014-06-244-45/+58
| | | | | | | | | | | | | | Fix nfs4_negotiate_security to create an rpc_clnt used to test each SECINFO returned pseudoflavor. Check credential creation (and gss_context creation) which is important for RPC_AUTH_GSS pseudoflavors which can fail for multiple reasons including mis-configuration. Don't call nfs4_negotiate in nfs4_submount as it was just called by nfs4_proc_lookup_mountpoint (nfs4_proc_lookup_common) Signed-off-by: Andy Adamson <andros@netapp.com> [Trond: fix corrupt return value from nfs_find_best_sec()] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS Return -EPERM if no supported or matching SECINFO flavorAndy Adamson2014-06-241-7/+4
| | | | | | | | | | | | | | | Do not return RPC_AUTH_UNIX if SEINFO reply tests fail. This prevents an infinite loop of NFS4ERR_WRONGSEC for non RPC_AUTH_UNIX mounts. Without this patch, a mount with no sec= option to a server that does not include RPC_AUTH_UNIX in the SECINFO return can be presented with an attemtp to use RPC_AUTH_UNIX which will result in an NFS4ERR_WRONG_SEC which will prompt the SECINFO call which will again try RPC_AUTH_UNIX.... Signed-off-by: Andy Adamson <andros@netapp.com> Tested-By: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS check the return of nfs4_negotiate_security in nfs4_submountAndy Adamson2014-06-241-2/+5
| | | | | | Signed-off-by: Andy Adamson <andros@netapp.com> Tested-By: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Don't mark the data cache as invalid if it has been flushedTrond Myklebust2014-06-241-35/+40
| | | | | | | | | | | | Now that we have functions such as nfs_write_pageuptodate() that use the cache_validity flags to check if the data cache is valid or not, it is a little more important to keep the flags in sync with the state of the data cache. In particular, we'd like to ensure that if the data cache is empty, we don't start marking it as needing revalidation. Reported-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Clear NFS_INO_REVAL_PAGECACHE when we update the file sizeTrond Myklebust2014-06-241-0/+1
| | | | | | | | | | | In nfs_update_inode(), if the change attribute is seen to change on the server, then we set NFS_INO_REVAL_PAGECACHE in order to make sure that we check the file size. However, if we also update the file size in the same function, we don't need to check it again. So make sure that we clear the NFS_INO_REVAL_PAGECACHE that was set earlier. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: Fix cache_validity check in nfs_write_pageuptodate()Scott Mayhew2014-06-241-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS_INO_INVALID_DATA cannot be ignored, even if we have a delegation. We're still having some problems with data corruption when multiple clients are appending to a file and those clients are being granted write delegations on open. To reproduce: Client A: vi /mnt/`hostname -s` while :; do echo "XXXXXXXXXXXXXXX" >>/mnt/file; sleep $(( $RANDOM % 5 )); done Client B: vi /mnt/`hostname -s` while :; do echo "YYYYYYYYYYYYYYY" >>/mnt/file; sleep $(( $RANDOM % 5 )); done What's happening is that in nfs_update_inode() we're recognizing that the file size has changed and we're setting NFS_INO_INVALID_DATA accordingly, but then we ignore the cache_validity flags in nfs_write_pageuptodate() because we have a delegation. As a result, in nfs_updatepage() we're extending the write to cover the full page even though we've not read in the data to begin with. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Cc: <stable@vger.kernel.org> # v3.11+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* Linux 3.16-rc2v3.16-rc2Linus Torvalds2014-06-211-1/+1
|
* Merge branch 'i2c/for-next' of ↵Linus Torvalds2014-06-216-0/+1216
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c new drivers from Wolfram Sang: "Here is a pull request from i2c hoping for the "new driver" rule. Originally, I wanted to send this request during the merge window, but code checkers with very recent additions complained, so a few fixups were needed. So, some more time went by and I merged rc1 to get a stable base" So the "new driver" rule is really about drivers that people absolutely need for the kernel to work on new hardware, which is not so much the case for i2c. So I considered not pulling this, but eventually relented. Just for FYI: the whole (and only) point of "new drivers" is not that new drivers cannot regress things (they can, and they have - by triggering badly tested code on machines that never triggered that code before), but because they can bring to life machines that otherwise wouldn't be useful at all without the drivers. So the new driver rule is for essential things that actual consumers would care about, ie devices like networking or disk drivers that matter to normal people (not server people - they run old kernels anyway, so mainlining new drivers is irrelevant for them). * 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: sun6-p2wi: fix call to snprintf i2c: rk3x: add NULL entry to the end of_device_id array i2c: sun6i-p2wi: use proper return value in probe i2c: sunxi: add P2WI (Push/Pull 2 Wire Interface) controller support i2c: sunxi: add P2WI DT bindings documentation i2c: rk3x: add driver for Rockchip RK3xxx SoC I2C adapter
| * Merge tag 'v3.16-rc1' into i2c/for-nextWolfram Sang2014-06-172435-46522/+105768
| |\ | | | | | | | | | | | | | | | Merge a stable base (Linux 3.16-rc1) Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
| * | i2c: sun6-p2wi: fix call to snprintfBoris BREZILLON2014-06-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Fixes possible issue in case pdev name contains formatting characters. Signed-off-by: Boris BREZILLON <boris.brezillon@free-electrons.com> Reported-by: Kees Cook <keescook@google.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
| * | i2c: rk3x: add NULL entry to the end of_device_id arrayDan Carpenter2014-06-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | drivers/i2c/busses/i2c-rk3x.c:610:69-70: rk3x_i2c_match is not NULL terminated at line 610 Make sure of_device_id tables are NULL terminated Generated by: /kbuild/src/linux/scripts/coccinelle/misc/of_table.cocci Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
| * | i2c: sun6i-p2wi: use proper return value in probeWolfram Sang2014-06-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Fixes: >> drivers/i2c/busses/i2c-sun6i-p2wi.c:243:10: warning: 'ret' may be used uninitialized in this function [-Wmaybe-uninitialized] Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
| * | i2c: sunxi: add P2WI (Push/Pull 2 Wire Interface) controller supportBoris BREZILLON2014-06-123-0/+359
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The P2WI controller looks like an SMBus controller which only supports byte data transfers. But, it differs from standard SMBus protocol on several aspects: - it supports only one slave device, and thus drop the address field - it adds a parity bit every 8bits of data - only one read access is required to read a byte (instead of a write followed by a read access in standard SMBus protocol) - there's no Ack bit after each byte transfer This means this bus cannot be used to interface with standard SMBus devices (the only known device to support this interface is the AXP221 PMIC). However the P2WI protocol is close enough to SMBus to be integrated in the I2C subsystem (see this thread [1] for detailed reasons that led to integrating this driver in the I2C subsystem). [1] http://www.spinics.net/lists/linux-i2c/msg15066.html Signed-off-by: Boris BREZILLON <boris.brezillon@free-electrons.com> Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
| * | i2c: sunxi: add P2WI DT bindings documentationBoris BREZILLON2014-06-121-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | P2WI (Push/Pull 2 Wire Interface) is an SMBus like bus used to communicate with some PMICs (like the AXP221). Document P2WI DT bindings which are pretty much the same as the one defined for the marvell's mv64xxx controller. Signed-off-by: Boris BREZILLON <boris.brezillon@free-electrons.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
| * | i2c: rk3x: add driver for Rockchip RK3xxx SoC I2C adapterMax Schwarz2014-06-124-0/+815
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Driver for the native I2C adapter found in Rockchip RK3xxx SoCs. Configuration is only possible through devicetree. The driver is interrupt driven and supports the I2C_M_IGNORE_NAK mangling bit. Signed-off-by: Max Schwarz <max.schwarz@online.de> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
* | | Merge tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linuxLinus Torvalds2014-06-212-1/+7
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull file locking fixes from Jeff Layton: "File locking related bugfixes Nothing too earth-shattering here. A fix for a potential regression due to a patch in pile #1, and the addition of a memory barrier to prevent a race condition between break_deleg and generic_add_lease" * tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux: locks: set fl_owner for leases back to current->files locks: add missing memory barrier in break_deleg
| * | | locks: set fl_owner for leases back to current->filesJeff Layton2014-06-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a regression due to commit 130d1f956ab3 (locks: ensure that fl_owner is always initialized properly in flock and lease codepaths). I had mistakenly thought that the fl_owner wasn't used in the lease code, but I missed the place in __break_lease that does use it. The i_have_this_lease check in generic_add_lease uses it. While I'm not sure that check is terribly helpful [1], reset it back to using current->files in order to ensure that there's no behavior change here. [1]: leases are owned by the file description. It's possible that this is a threaded program, and the lease breaker and the task that would handle the signal are different, even if they have the same file table. So, there is the potential for false positives with this check. Fixes: 130d1f956ab3 (locks: ensure that fl_owner is always initialized properly in flock and lease codepaths) Signed-off-by: Jeff Layton <jlayton@primarydata.com>
| * | | locks: add missing memory barrier in break_delegJeff Layton2014-06-101-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | break_deleg is subject to the same potential race as break_lease. Add a memory barrier to prevent it. Signed-off-by: Jeff Layton <jlayton@primarydata.com>
* | | | Merge branch 'rc-fixes' of ↵Linus Torvalds2014-06-214-11/+12
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild fixes from Michal Marek: "There are three fixes for regressions caused by the relative paths series: deb-pkg, tar-pkg and *docs did not work with O=. Plus, there is a fix for the linux-headers deb package and a fixed typo. These are not regression fixes but are safe enough" * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: kbuild: fix a typo in a kbuild document builddeb: fix missing headers in linux-headers package Documentation: Fix DocBook build with relative $(srctree) kbuild: Fix tar-pkg with relative $(objtree) deb-pkg: Fix for relative paths
| * | | | kbuild: fix a typo in a kbuild documentMasahiro Yamada2014-06-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Masahiro Yamada <yamada.m@jp.panasonic.com> Signed-off-by: Michal Marek <mmarek@suse.cz>
| * | | | builddeb: fix missing headers in linux-headers packageFathi Boudra2014-06-181-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel headers package (linux-headers) doesn't include several header files required to build out-of-tree modules. It makes the package unusable on e.g. ARM architecture: /usr/src/linux-headers-3.14.0/arch/arm/include/asm/memory.h:24:25: fatal error: mach/memory.h: No such file or directory #include <mach/memory.h> ^ compilation terminated. Signed-off-by: Fathi Boudra <fathi.boudra@linaro.org> Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Michal Marek <mmarek@suse.cz>
| * | | | Documentation: Fix DocBook build with relative $(srctree)Michal Marek2014-06-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commits 890676c6 (kbuild: Use relative path when building in the source tree) and 9da0763b (kbuild: Use relative path when building in a subdir of the source tree), the $(srctree) variable can be a relative path. This breaks Documentation/DocBook/media/Makefile, because it tries to create symlinks from a subdirectory of the object tree to the source tree. Fix this by using a full path in this case. Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michal Marek <mmarek@suse.cz>
| * | | | kbuild: Fix tar-pkg with relative $(objtree)Michal Marek2014-06-181-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7e1c0477 (kbuild: Use relative path for $(objtree)) assumes that the build process does not change its working directory. make tar-pkg was a couterexample, fix this by changing directory only for the tar command and not for the whole script, which at one point references the now relative $(objtree). Reported-and-tested-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Michal Marek <mmarek@suse.cz>
| * | | | deb-pkg: Fix for relative pathsMichal Marek2014-06-181-5/+5
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When $srctree or $objtree are relative paths, we cannot change directory and refer to them in the same subshell. Do the redirection outside of the subshell to fix this. Reported-and-tested-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Michal Marek <mmarek@suse.cz>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2014-06-2111-172/+359
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This fixes some lockups in btrfs reported with rc1. It probably has some performance impact because it is backing off our spinning locks more often and switching to a blocking lock. I'll be able to nail that down next week, but for now I want to get the lockups taken care of. Otherwise some more stack reduction and assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix wrong error handle when the device is missing or is not writeable Btrfs: fix deadlock when mounting a degraded fs Btrfs: use bio_endio_nodec instead of open code Btrfs: fix NULL pointer crash when running balance and scrub concurrently btrfs: Skip scrubbing removed chunks to avoid -ENOENT. Btrfs: fix broken free space cache after the system crashed Btrfs: make free space cache write out functions more readable Btrfs: remove unused wait queue in struct extent_buffer Btrfs: fix deadlocks with trylock on tree nodes
| * | | | Btrfs: fix wrong error handle when the device is missing or is not writeableMiao Xie2014-06-191-7/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original bio might be submitted, so we shoud increase bi_remaining to account for it when we deal with the error that the device is missing or is not writeable, or we would skip the endio handle. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | Btrfs: fix deadlock when mounting a degraded fsMiao Xie2014-06-192-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The deadlock happened when we mount degraded filesystem, the reproduced steps are following: # mkfs.btrfs -f -m raid1 -d raid1 <dev0> <dev1> # echo 1 > /sys/block/`basename <dev0>`/device/delete # mount -o degraded <dev1> <mnt> The reason was that the counter -- bi_remaining was wrong. If the missing or unwriteable device was the last device in the mapping array, we would not submit the original bio, so we shouldn't increase bi_remaining of it in btrfs_end_bio(), or we would skip the final endio handle. Fix this problem by adding a flag into btrfs bio structure. If we submit the original bio, we will set the flag, and we increase bi_remaining counter, or we don't. Though there is another way to fix it -- decrease bi_remaining counter of the original bio when we make sure the original bio is not submitted, this method need add more check and is easy to make mistake. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | Btrfs: use bio_endio_nodec instead of open codeMiao Xie2014-06-191-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | Btrfs: fix NULL pointer crash when running balance and scrub concurrentlyWang Shilong2014-06-193-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While running balance, scrub, fsstress concurrently we hit the following kernel crash: [56561.448845] BTRFS info (device sde): relocating block group 11005853696 flags 132 [56561.524077] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 [56561.524237] IP: [<ffffffffa038956d>] scrub_chunk.isra.12+0xdd/0x130 [btrfs] [56561.524297] PGD 9be28067 PUD 7f3dd067 PMD 0 [56561.524325] Oops: 0000 [#1] SMP [....] [56561.527237] Call Trace: [56561.527309] [<ffffffffa038980e>] scrub_enumerate_chunks+0x24e/0x490 [btrfs] [56561.527392] [<ffffffff810abe00>] ? abort_exclusive_wait+0x50/0xb0 [56561.527476] [<ffffffffa038add4>] btrfs_scrub_dev+0x1a4/0x530 [btrfs] [56561.527561] [<ffffffffa0368107>] btrfs_ioctl+0x13f7/0x2a90 [btrfs] [56561.527639] [<ffffffff811c82f0>] do_vfs_ioctl+0x2e0/0x4c0 [56561.527712] [<ffffffff8109c384>] ? vtime_account_user+0x54/0x60 [56561.527788] [<ffffffff810f768c>] ? __audit_syscall_entry+0x9c/0xf0 [56561.527870] [<ffffffff811c8551>] SyS_ioctl+0x81/0xa0 [56561.527941] [<ffffffff815707f7>] tracesys+0xdd/0xe2 [...] [56561.528304] RIP [<ffffffffa038956d>] scrub_chunk.isra.12+0xdd/0x130 [btrfs] [56561.528395] RSP <ffff88004c0f5be8> [56561.528454] CR2: 0000000000000078 This is because in btrfs_relocate_chunk(), we will free @bdev directly while scrub may still hold extent mapping, and may access freed memory. Fix this problem by wrapping freeing @bdev work into free_extent_map() which is based on reference count. Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | btrfs: Skip scrubbing removed chunks to avoid -ENOENT.Qu Wenruo2014-06-191-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When run scrub with balance, sometimes -ENOENT will be returned, since in scrub_enumerate_chunks() will search dev_extent in *COMMIT_ROOT*, but btrfs_lookup_block_group() will search block group in *MEMORY*, so if a chunk is removed but not committed, -ENOENT will be returned. However, there is no need to stop scrubbing since other chunks may be scrubbed without problem. So this patch changes the behavior to skip removed chunks and continue to scrub the rest. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | Btrfs: fix broken free space cache after the system crashedMiao Xie2014-06-194-44/+186
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we mounted the filesystem after the crash, we got the following message: BTRFS error (device xxx): block group xxxx has wrong amount of free space BTRFS error (device xxx): failed to load free space cache for block group xxx It is because we didn't update the metadata of the allocated space (in extent tree) until the file data was written into the disk. During this time, there was no information about the allocated spaces in either the extent tree nor the free space cache. when we wrote out the free space cache at this time (commit transaction), those spaces were lost. In fact, only the free space that is used to store the file data had this problem, the others didn't because the metadata of them is updated in the same transaction context. There are many methods which can fix the above problem - track the allocated space, and write it out when we write out the free space cache - account the size of the allocated space that is used to store the file data, if the size is not zero, don't write out the free space cache. The first one is complex and may make the performance drop down. This patch chose the second method, we use a per-block-group variant to account the size of that allocated space. Besides that, we also introduce a per-block-group read-write semaphore to avoid the race between the allocation and the free space cache write out. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | Btrfs: make free space cache write out functions more readableMiao Xie2014-06-191-66/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes the free space cache write out functions more readable, and beisdes that, it also reduces the stack space that the function -- __btrfs_write_out_cache uses from 194bytes to 144bytes. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | Btrfs: remove unused wait queue in struct extent_bufferFilipe Manana2014-06-191-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The lock_wq wait queue is not used anywhere, therefore just remove it. On a x86_64 system, this reduced sizeof(struct extent_buffer) from 320 bytes down to 296 bytes, which means a 4Kb page can now be used for 13 extent buffers instead of 12. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | Btrfs: fix deadlocks with trylock on tree nodesChris Mason2014-06-191-34/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Btrfs tree trylock function is poorly named. It always takes the spinlock and backs off if the blocking lock is held. This can lead to surprising lockups because people expect it to really be a trylock. This commit makes it a pure trylock, both for the spinlock and the blocking lock. It also reworks the nested lock handling slightly to avoid taking the read lock while a spinning write lock might be held. Signed-off-by: Chris Mason <clm@fb.com>
* | | | | Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2014-06-212-0/+79
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd bugfixes from Bruce Fields: "Fixes for a new regression from the xdr encoding rewrite, and a delegation problem we've had for a while (made somewhat more annoying by the vfs delegation support added in 3.13)" * 'for-3.16' of git://linux-nfs.org/~bfields/linux: NFSD: fix bug for readdir of pseudofs NFSD: Don't hand out delegations for 30 seconds after recalling them.
| * | | | | NFSD: fix bug for readdir of pseudofsKinglong Mee2014-06-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 561f0ed498ca (nfsd4: allow large readdirs) introduces a bug about readdir the root of pseudofs. Call xdr_truncate_encode() revert encoded name when skipping. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | NFSD: Don't hand out delegations for 30 seconds after recalling them.NeilBrown2014-06-171-0/+78
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If nfsd needs to recall a delegation for some reason it implies that there is contention on the file, so further delegations should not be handed out. The current code fails to do so, and the result is effectively a live-lock under some workloads: a client attempting a conflicting operation on a read-delegated file receives NFS4ERR_DELAY and retries the operation, but by the time it retries the server may already have given out another delegation. We could simply avoid delegations for (say) 30 seconds after any recall, but this is probably too heavy handed. We could keep a list of inodes (or inode numbers or filehandles) for recalled delegations, but that requires memory allocation and searching. The approach taken here is to use a bloom filter to record the filehandles which are currently blocked from delegation, and to accept the cost of a few false positives. We have 2 bloom filters, each of which is valid for 30 seconds. When a delegation is recalled the filehandle is added to one filter and will remain disabled for between 30 and 60 seconds. We keep a count of the number of filehandles that have been added, so when that count is zero we can bypass all other tests. The bloom filters have 256 bits and 3 hash functions. This should allow a couple of dozen blocked filehandles with minimal false positives. If many more filehandles are all blocked at once, behaviour will degrade towards rejecting all delegations for between 30 and 60 seconds, then resetting and allowing new delegations. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2014-06-2141-126/+1249
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This is larger than usual: the main reason are the ARM symbol lookup speedups that came in late and were hard to resist. There's also a kprobes fix and various tooling fixes, plus the minimal re-enablement of the mmap2 support interface" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) x86/kprobes: Fix build errors and blacklist context_track_user perf tests: Add test for closing dso objects on EMFILE error perf tests: Add test for caching dso file descriptors perf tests: Allow reuse of test_file function perf tests: Spawn child for each test perf tools: Add dso__data_* interface descriptons perf tools: Allow to close dso fd in case of open failure perf tools: Add file size check and factor dso__data_read_offset perf tools: Cache dso data file descriptor perf tools: Add global count of opened dso objects perf tools: Add global list of opened dso objects perf tools: Add data_fd into dso object perf tools: Separate dso data related variables perf tools: Cache register accesses for unwind processing perf record: Fix to honor user freq/interval properly perf timechart: Reflow documentation perf probe: Improve error messages in --line option perf probe: Improve an error message of perf probe --vars mode perf probe: Show error code and description in verbose mode perf probe: Improve error message for unknown member of data structure ...