summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* treewide: Convert macro and uses of __section(foo) to __section("foo")Joe Perches2020-10-251-1/+1
| | | | | | | | | | | | | | | | | | | | Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag '5.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2020-10-258-15/+213
|\ | | | | | | | | | | | | | | | | | | | | | | Pull more cifs updates from Steve French: "Add support for stat of various special file types (WSL reparse points for char, block, fifo)" * tag '5.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module version number smb3: add some missing definitions from MS-FSCC smb3: remove two unused variables smb3: add support for stat of WSL reparse points for special file types
| * cifs: update internal module version numberSteve French2020-10-231-1/+1
| | | | | | | | | | | | To 2.29 Signed-off-by: Steve French <stfrench@microsoft.com>
| * smb3: add some missing definitions from MS-FSCCSteve French2020-10-232-0/+28
| | | | | | | | | | | | | | Add some structures and defines that were recently added to the protocol documentation (see MS-FSCC sections 2.3.29-2.3.34). Signed-off-by: Steve French <stfrench@microsoft.com>
| * smb3: remove two unused variablesSteve French2020-10-231-5/+0
| | | | | | | | | | | | | | | | Fix two unused variables in commit "add support for stat of WSL reparse points for special file types" Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * smb3: add support for stat of WSL reparse points for special file typesSteve French2020-10-236-14/+189
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is needed so when mounting to Windows we do not misinterpret various special files created by Linux (WSL) as symlinks. An earlier patch addressed readdir. This patch fixes stat (getattr). With this patch:   File: /mnt1/char   Size: 0          Blocks: 0          IO Block: 16384  character special file Device: 34h/52d Inode: 844424930132069  Links: 1     Device type: 0,0 Access: (0755/crwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root) Access: 2020-10-21 17:46:51.839458900 -0500 Modify: 2020-10-21 17:46:51.839458900 -0500 Change: 2020-10-21 18:30:39.797358800 -0500  Birth: -   File: /mnt1/fifo   Size: 0          Blocks: 0          IO Block: 16384  fifo Device: 34h/52d Inode: 1125899906842722  Links: 1 Access: (0755/prwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root) Access: 2020-10-21 16:21:37.259249700 -0500 Modify: 2020-10-21 16:21:37.259249700 -0500 Change: 2020-10-21 18:30:39.797358800 -0500  Birth: -   File: /mnt1/block   Size: 0          Blocks: 0          IO Block: 16384  block special file Device: 34h/52d Inode: 844424930132068  Links: 1     Device type: 0,0 Access: (0755/brwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root) Access: 2020-10-21 17:10:47.913103200 -0500 Modify: 2020-10-21 17:10:47.913103200 -0500 Change: 2020-10-21 18:30:39.796725500 -0500  Birth: - without the patch all show up incorrectly as symlinks with annoying "operation not supported error also returned"   File: /mnt1/charstat: cannot read symbolic link '/mnt1/char': Operation not supported   Size: 0          Blocks: 0          IO Block: 16384  symbolic link Device: 34h/52d Inode: 844424930132069  Links: 1 Access: (0000/l---------)  Uid: (    0/    root)   Gid: (    0/    root) Access: 2020-10-21 17:46:51.839458900 -0500 Modify: 2020-10-21 17:46:51.839458900 -0500 Change: 2020-10-21 18:30:39.797358800 -0500  Birth: -   File: /mnt1/fifostat: cannot read symbolic link '/mnt1/fifo': Operation not supported   Size: 0          Blocks: 0          IO Block: 16384  symbolic link Device: 34h/52d Inode: 1125899906842722  Links: 1 Access: (0000/l---------)  Uid: (    0/    root)   Gid: (    0/    root) Access: 2020-10-21 16:21:37.259249700 -0500 Modify: 2020-10-21 16:21:37.259249700 -0500 Change: 2020-10-21 18:30:39.797358800 -0500  Birth: -   File: /mnt1/blockstat: cannot read symbolic link '/mnt1/block': Operation not supported   Size: 0          Blocks: 0          IO Block: 16384  symbolic link Device: 34h/52d Inode: 844424930132068  Links: 1 Access: (0000/l---------)  Uid: (    0/    root)   Gid: (    0/    root) Access: 2020-10-21 17:10:47.913103200 -0500 Modify: 2020-10-21 17:10:47.913103200 -0500 Change: 2020-10-21 18:30:39.796725500 -0500 Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
* | Merge tag 'io_uring-5.10-2020-10-24' of git://git.kernel.dk/linux-blockLinus Torvalds2020-10-244-116/+189
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring fixes from Jens Axboe: - fsize was missed in previous unification of work flags - Few fixes cleaning up the flags unification creds cases (Pavel) - Fix NUMA affinities for completely unplugged/replugged node for io-wq - Two fallout fixes from the set_fs changes. One local to io_uring, one for the splice entry point that io_uring uses. - Linked timeout fixes (Pavel) - Removal of ->flush() ->files work-around that we don't need anymore with referenced files (Pavel) - Various cleanups (Pavel) * tag 'io_uring-5.10-2020-10-24' of git://git.kernel.dk/linux-block: splice: change exported internal do_splice() helper to take kernel offset io_uring: make loop_rw_iter() use original user supplied pointers io_uring: remove req cancel in ->flush() io-wq: re-set NUMA node affinities if CPUs come online io_uring: don't reuse linked_timeout io_uring: unify fsize with def->work_flags io_uring: fix racy REQ_F_LINK_TIMEOUT clearing io_uring: do poll's hash_node init in common code io_uring: inline io_poll_task_handler() io_uring: remove extra ->file check in poll prep io_uring: make cached_cq_overflow non atomic_t io_uring: inline io_fail_links() io_uring: kill ref get/drop in personality init io_uring: flags-based creds init in queue
| * | splice: change exported internal do_splice() helper to take kernel offsetJens Axboe2020-10-221-13/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the set_fs change, we can no longer rely on copy_{to,from}_user() accepting a kernel pointer, and it was bad form to do so anyway. Clean this up and change the internal helper that io_uring uses to deal with kernel pointers instead. This puts the offset copy in/out in __do_splice() instead, which just calls the same helper. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | io_uring: make loop_rw_iter() use original user supplied pointersJens Axboe2020-10-221-14/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We jump through a hoop for fixed buffers, where we first map these to a bvec(), then kmap() the bvec to obtain the pointer we copy to/from. This was always a bit ugly, and with the set_fs changes, it ends up being practically problematic as well. There's no need to jump through these hoops, just use the original user pointers and length for the non iter based read/write. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | io_uring: remove req cancel in ->flush()Pavel Begunkov2020-10-221-23/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Every close(io_uring) causes cancellation of all inflight requests carrying ->files. That's not nice but was neccessary up until recently. Now task->files removal is handled in the core code, so that part of flush can be removed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | io-wq: re-set NUMA node affinities if CPUs come onlineJens Axboe2020-10-221-4/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We correctly set io-wq NUMA node affinities when the io-wq context is setup, but if an entire node CPU set is offlined and then brought back online, the per node affinities are broken. Ensure that we set them again whenever a CPU comes online. This ensures that we always track the right node affinity. The usual cpuhp notifiers are used to drive it. Reported-by: Zhang Qiang <qiang.zhang@windriver.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | io_uring: don't reuse linked_timeoutPavel Begunkov2020-10-211-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Clear linked_timeout for next requests in __io_queue_sqe() so we won't queue it up unnecessary when it's going to be punted. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Cc: stable@vger.kernel.org # v5.9 Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | io_uring: unify fsize with def->work_flagsJens Axboe2020-10-203-14/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | This one was missed in the earlier conversion, should be included like any of the other IO identity flags. Make sure we restore to RLIM_INIFITY when dropping the personality again. Fixes: 98447d65b4a7 ("io_uring: move io identity items into separate struct") Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | io_uring: fix racy REQ_F_LINK_TIMEOUT clearingPavel Begunkov2020-10-191-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | io_link_timeout_fn() removes REQ_F_LINK_TIMEOUT from the link head's flags, it's not atomic and may race with what the head is doing. If io_link_timeout_fn() doesn't clear the flag, as forced by this patch, then it may happen that for "req -> link_timeout1 -> link_timeout2", __io_kill_linked_timeout() would find link_timeout2 and try to cancel it, so miscounting references. Teach it to ignore such double timeouts by marking the active one with a new flag in io_prep_linked_timeout(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | io_uring: do poll's hash_node init in common codePavel Begunkov2020-10-191-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Move INIT_HLIST_NODE(&req->hash_node) into __io_arm_poll_handler(), so that it doesn't duplicated and common poll code would be responsible for it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | io_uring: inline io_poll_task_handler()Pavel Begunkov2020-10-191-19/+12
| | | | | | | | | | | | | | | | | | | | | io_poll_task_handler() doesn't add clarity, inline it in its only user. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | io_uring: remove extra ->file check in poll prepPavel Begunkov2020-10-191-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | io_poll_add_prep() doesn't need to verify ->file because it's already done in io_init_req(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | io_uring: make cached_cq_overflow non atomic_tPavel Begunkov2020-10-191-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | ctx->cached_cq_overflow is changed only under completion_lock. Convert it from atomic_t to just int, and mark all places when it's read without lock with READ_ONCE, which guarantees atomicity (relaxed ordering). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | io_uring: inline io_fail_links()Pavel Begunkov2020-10-191-10/+3
| | | | | | | | | | | | | | | | | | | | | Inline io_fail_links() and kill extra io_cqring_ev_posted(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | io_uring: kill ref get/drop in personality initPavel Begunkov2020-10-191-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't take an identity on personality/creds init only to drop it a few lines after. Extract a function which prepares req->work but leaves it without identity. Note: it's safe to not check REQ_F_WORK_INITIALIZED there because it's nobody had a chance to init it before io_init_req(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | io_uring: flags-based creds init in queuePavel Begunkov2020-10-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use IO_WQ_WORK_CREDS to figure out if req has creds to be used. Since recently it should rely only on flags, but not value of work.creds. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | Merge branch 'work.misc' of ↵Linus Torvalds2020-10-2436-106/+68
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff all over the place (the largest group here is Christoph's stat cleanups)" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: remove KSTAT_QUERY_FLAGS fs: remove vfs_stat_set_lookup_flags fs: move vfs_fstatat out of line fs: implement vfs_stat and vfs_lstat in terms of vfs_fstatat fs: remove vfs_statx_fd fs: omfs: use kmemdup() rather than kmalloc+memcpy [PATCH] reduce boilerplate in fsid handling fs: Remove duplicated flag O_NDELAY occurring twice in VALID_OPEN_FLAGS selftests: mount: add nosymfollow tests Add a "nosymfollow" mount option.
| * | | fs: remove KSTAT_QUERY_FLAGSChristoph Hellwig2020-09-261-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KSTAT_QUERY_FLAGS expands to AT_STATX_SYNC_TYPE, which itself already is a mask. Remove the double name, especially given that the prefix is a little confusing vs the normal AT_* flags. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | fs: remove vfs_stat_set_lookup_flagsChristoph Hellwig2020-09-261-21/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function really obsfucates checking for valid flags and setting the lookup flags. The fact that it returns -EINVAL through and unsigned return value, which is then used as boolean really doesn't help either. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | fs: move vfs_fstatat out of lineChristoph Hellwig2020-09-261-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows to keep vfs_statx static in fs/stat.c to prepare for the following changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | fs: remove vfs_statx_fdChristoph Hellwig2020-09-261-15/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vfs_statx_fd is only used to implement vfs_fstat. Remove vfs_statx_fd and just implement vfs_fstat directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | fs: omfs: use kmemdup() rather than kmalloc+memcpyAlex Dewar2020-09-221-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue identified with Coccinelle. Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Acked-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | [PATCH] reduce boilerplate in fsid handlingAl Viro2020-09-1831-62/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Get rid of boilerplate in most of ->statfs() instances... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | Add a "nosymfollow" mount option.Mattias Nissler2020-08-274-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For mounts that have the new "nosymfollow" option, don't follow symlinks when resolving paths. The new option is similar in spirit to the existing "nodev", "noexec", and "nosuid" options, as well as to the LOOKUP_NO_SYMLINKS resolve flag in the openat2(2) syscall. Various BSD variants have been supporting the "nosymfollow" mount option for a long time with equivalent implementations. Note that symlinks may still be created on file systems mounted with the "nosymfollow" option present. readlink() remains functional, so user space code that is aware of symlinks can still choose to follow them explicitly. Setting the "nosymfollow" mount option helps prevent privileged writers from modifying files unintentionally in case there is an unexpected link along the accessed path. The "nosymfollow" option is thus useful as a defensive measure for systems that need to deal with untrusted file systems in privileged contexts. More information on the history and motivation for this patch can be found here: https://sites.google.com/a/chromium.org/dev/chromium-os/chromiumos-design-docs/hardening-against-malicious-stateful-data#TOC-Restricting-symlink-traversal Signed-off-by: Mattias Nissler <mnissler@chromium.org> Signed-off-by: Ross Zwisler <zwisler@google.com> Reviewed-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | Merge tag 'xfs-5.10-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2020-10-234-18/+54
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull xfs fixes from Darrick Wong: "Two bug fixes that trickled in during the merge window: - Make fallocate check the alignment of its arguments against the fundamental allocation unit of the volume the file lives on, so that we don't trigger the fs' alignment checks. - Cancel unprocessed log intents immediately when log recovery fails, to avoid a log deadlock" * tag 'xfs-5.10-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: cancel intents immediately if process_intents fails xfs: fix fallocate functions when rtextsize is larger than 1
| * | | | xfs: cancel intents immediately if process_intents failsDarrick J. Wong2020-10-211-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If processing recovered log intent items fails, we need to cancel all the unprocessed recovered items immediately so that a subsequent AIL push in the bail out path won't get wedged on the pinned intent items that didn't get processed. This can happen if the log contains (1) an intent that gets and releases an inode, (2) an intent that cannot be recovered successfully, and (3) some third intent item. When recovery of (2) fails, we leave (3) pinned in memory. Inode reclamation is called in the error-out path of xfs_mountfs before xfs_log_cancel_mount. Reclamation calls xfs_ail_push_all_sync, which gets stuck waiting for (3). Therefore, call xlog_recover_cancel_intents if _process_intents fails. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
| * | | | xfs: fix fallocate functions when rtextsize is larger than 1Darrick J. Wong2020-10-213-18/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit fe341eb151ec, I forgot that xfs_free_file_space isn't strictly a "remove mapped blocks" function. It is actually a function to zero file space by punching out the middle and writing zeroes to the unaligned ends of the specified range. Therefore, putting a rtextsize alignment check in that function is wrong because that breaks unaligned ZERO_RANGE on the realtime volume. Furthermore, xfs_file_fallocate already has alignment checks for the functions require the file range to be aligned to the size of a fundamental allocation unit (which is 1 FSB on the data volume and 1 rt extent on the realtime volume). Create a new helper to check fallocate arguments against the realtiem allocation unit size, fix the fallocate frontend to use it, fix free_file_space to delete the correct range, and remove a now redundant check from insert_file_space. NOTE: The realtime extent size is not required to be a power of two! Fixes: fe341eb151ec ("xfs: ensure that fpunch, fcollapse, and finsert operations are aligned to rt extent size") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
* | | | | Merge tag 'gfs2-for-5.10' of ↵Linus Torvalds2020-10-2322-301/+675
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Use iomap for non-journaled buffered I/O. This largely eliminates buffer heads on filesystems where the block size matches the page size. Many thanks to Christoph Hellwig for this patch! - Fixes for some more journaled data filesystem bugs, found by running xfstests with data journaling on for all files (chattr +j $MNT) (Bob Peterson) - gfs2_evict_inode refactoring (Bob Peterson) - Use the statfs data in the journal during recovery instead of reading it in from the local statfs inodes (Abhi Das) - Several other minor fixes by various people * tag 'gfs2-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (30 commits) gfs2: Recover statfs info in journal head gfs2: lookup local statfs inodes prior to journal recovery gfs2: Add fields for statfs info in struct gfs2_log_header_host gfs2: Ignore subsequent errors after withdraw in rgrp_go_sync gfs2: Eliminate gl_vm gfs2: Only access gl_delete for iopen glocks gfs2: Fix comments to glock_hash_walk gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders) gfs2: Ignore journal log writes for jdata holes gfs2: simplify gfs2_block_map gfs2: Only set PageChecked if we have a transaction gfs2: don't lock sd_ail_lock in gfs2_releasepage gfs2: make gfs2_ail1_empty_one return the count of active items gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe gfs2: enhance log_blocks trace point to show log blocks free gfs2: add missing log_blocks trace points in gfs2_write_revokes gfs2: rename gfs2_write_full_page to gfs2_write_jdata_page, remove parm gfs2: add validation checks for size of superblock gfs2: use-after-free in sysfs deregistration gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump ...
| * | | | gfs2: Recover statfs info in journal headAbhi Das2020-10-233-1/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Apply the outstanding statfs changes in the journal head to the master statfs file. Zero out the local statfs file for good measure. Previously, statfs updates would be read in from the local statfs inode and synced to the master statfs inode during recovery. We now use the statfs updates in the journal head to update the master statfs inode instead of reading in from the local statfs inode. To preserve backward compatibility with kernels that can't do this, we still need to keep the local statfs inode up to date by writing changes to it. At some point in the future, we can do away with the local statfs inodes altogether and keep the statfs changes solely in the journal. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | | gfs2: lookup local statfs inodes prior to journal recoveryAbhi Das2020-10-234-36/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to lookup the master statfs inode and the local statfs inodes earlier in the mount process (in init_journal) so journal recovery can use them when it attempts to recover the statfs info. We lookup all the local statfs inodes and store them in a linked list to allow a node to recover statfs info for other nodes in the cluster. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | | gfs2: Add fields for statfs info in struct gfs2_log_header_hostAbhi Das2020-10-204-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And read these in __get_log_header() from the log header. Also make gfs2_statfs_change_out() non-static so it can be used outside of super.c Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | | gfs2: Ignore subsequent errors after withdraw in rgrp_go_syncAndreas Gruenbacher2020-10-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once a withdraw has occurred, ignore errors that are the consequence of the withdraw. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | | gfs2: Eliminate gl_vmBob Peterson2020-10-203-30/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The gfs2_glock structure has a gl_vm member, introduced in commit 7005c3e4ae428 ("GFS2: Use range based functions for rgrp sync/invalidation"), which stores the location of resource groups within their address space. This structure is in a union with iopen glock specific fields. It was introduced because at unmount time, the resource group objects were destroyed before flushing out any pending resource group glock work, and flushing out such work could require flushing / truncating the address space. Since commit b3422cacdd7e6 ("gfs2: Rework how rgrp buffer_heads are managed"), any pending resource group glock work is flushed out before destroying the resource group objects. So the resource group objects will now always exist in rgrp_go_sync and rgrp_go_inval, and we now simply compute the gl_vm values where needed instead of caching them. This also eliminates the union. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | | gfs2: Only access gl_delete for iopen glocksBob Peterson2020-10-201-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only initialize gl_delete for iopen glocks, but more importantly, only access it for iopen glocks in flush_delete_work: flush_delete_work is called for different types of glocks including rgrp glocks, and those use gl_vm which is in a union with gl_delete. Without this fix, we'll end up clobbering gl_vm, which results in general memory corruption. Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | | gfs2: Fix comments to glock_hash_walkBob Peterson2020-10-201-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comments before function glock_hash_walk had the wrong name and an extra parameter. This simply fixes the comments. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | | gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders)Bob Peterson2020-10-153-10/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, glock.c maintained a flag, GLF_QUEUED, which indicated when a glock had a holder queued. It was only checked for inode glocks, although set and cleared by all glocks, and it was only used to determine whether the glock should be held for the minimum hold time before releasing. The problem is that the flag is not accurate at all. If a process holds the glock, the flag is set. When they dequeue the glock, it only cleared the flag in cases when the state actually changed. So if the state doesn't change, the flag may still be set, even when nothing is queued. This happens to iopen glocks often: the get held in SH, then the file is closed, but the glock remains in SH mode. We don't need a special flag to indicate this: we can simply tell whether the glock has any items queued to the holders queue. It's a waste of cpu time to maintain it. This patch eliminates the flag in favor of simply checking list_empty on the glock holders. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | | gfs2: Ignore journal log writes for jdata holesBob Peterson2020-10-151-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When flushing out its ail1 list, gfs2_write_jdata_page calls function __block_write_full_page passing in function gfs2_get_block_noalloc. But there was a problem when a process wrote to a jdata file, then truncated it or punched a hole, leaving references to the blocks within the new hole in its ail list, which are to be written to the journal log. In writing them to the journal, after calling gfs2_block_map, function gfs2_get_block_noalloc determined that the (hole-punched) block was not mapped, so it returned -EIO to generic_writepages, which passed it back to gfs2_ail1_start_one. This, in turn, performed a withdraw, assuming there was a real IO error writing to the journal. This might be a valid error when writing metadata to the journal, but for journaled data writes, it does not warrant a withdraw. This patch adds a check to function gfs2_block_map that makes an exception for journaled data writes that correspond to jdata holes: If the iomap get function returns a block type of IOMAP_HOLE, it instead returns -ENODATA which does not cause the withdraw. Other errors are returned as before. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | | gfs2: simplify gfs2_block_mapBob Peterson2020-10-151-9/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function gfs2_block_map had a lot of redundancy between its create and no_create paths. This patch simplifies the code to eliminate the redundancy. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | | gfs2: Only set PageChecked if we have a transactionBob Peterson2020-10-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With jdata writes, we frequently got into situations where gfs2 deadlocked because of this calling sequence: gfs2_ail1_start gfs2_ail1_flush - for every tr on the sd_ail1_list: gfs2_ail1_start_one - for every bd on the tr's tr_ail1_list: generic_writepages write_cache_pages passing __writepage() calls clear_page_dirty_for_io which calls set_page_dirty: which calls jdata_set_page_dirty which sets PageChecked. __writepage() calls mapping->a_ops->writepage AKA gfs2_jdata_writepage However, gfs2_jdata_writepage checks if PageChecked is set, and if so, it ignores the write and redirties the page. The problem is that write_cache_pages calls clear_page_dirty_for_io, which often calls set_page_dirty(). See comments in page-writeback.c starting with "Yes, Virginia". If it's jdata, set_page_dirty will call jdata_set_page_dirty which will set PageChecked. That causes a conflict because it makes it look like the page has been redirtied by another writer, in which case we need to skip writing it and redirty the page. That ends up in a deadlock because it isn't a "real" writer and nothing will ever clear PageChecked. If we do have a real writer, it will have started a transaction. So this patch checks if a transaction is in use, and if not, it skips setting PageChecked. That way, the page will be dirtied, cleaned, and written appropriately. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | | gfs2: don't lock sd_ail_lock in gfs2_releasepageBob Peterson2020-10-151-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch 380f7c65a7eb3288e4b6812acf3474a1de230707 changed gfs2_releasepage so that it held the sd_ail_lock spin_lock for most of its processing. It did this for some mysterious undocumented bug somewhere in the evict code path. But in the nine years since, evict has been reworked and fixed many times, and so have the transactions and ail list. I can't see a reason to hold the sd_ail_lock unless it's protecting the actual ail lists hung off the transactions. Therefore, this patch removes the locking to increase speed and efficiency, and to further help us rework the log flush code to be more concurrent with transactions. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | | gfs2: make gfs2_ail1_empty_one return the count of active itemsBob Peterson2020-10-151-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is one baby step toward simplifying the journal management. It simply changes function gfs2_ail1_empty_one from a void to an int and makes it return a count of active items. This allows the caller to check the return code rather than list_empty on the tr_ail1_list. This way we can, in a later patch, combine transaction ail1 and ail2 lists. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | | gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipeBob Peterson2020-10-156-12/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, when blocks were freed, it called gfs2_meta_wipe to take the metadata out of the pending journal blocks. It did this mostly by calling another function called gfs2_remove_from_journal. This is shortsighted because it does not do anything with jdata blocks which may also be in the journal. This patch expands the function so that it wipes out jdata blocks from the journal as well, and it wipes it from the ail1 list if it hasn't been written back yet. Since it now processes jdata blocks as well, the function has been renamed from gfs2_meta_wipe to gfs2_journal_wipe. New function gfs2_ail1_wipe wants a static view of the ail list, so it locks the sd_ail_lock when removing items. To accomplish this, function gfs2_remove_from_journal no longer locks the sd_ail_lock, and it's now the caller's responsibility to do so. I was going to make sd_ail_lock locking conditional, but the practice is generally frowned upon. For details, see: https://lwn.net/Articles/109066/ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | | gfs2: enhance log_blocks trace point to show log blocks freeBob Peterson2020-10-151-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds some code to enhance the log_blocks trace point. It reports the number of free log blocks. This makes the trace point much more useful, especially for debugging performance problems when we can tell when the journal gets full and needs to wait for flushes, etc. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | | gfs2: add missing log_blocks trace points in gfs2_write_revokesBob Peterson2020-10-151-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function gfs2_write_revokes was incrementing and decrementing the number of log blocks free, but there was never a log_blocks trace point for it. Thus, the free blocks from a log_blocks trace would jump around mysteriously. This patch adds the missing trace points so the trace makes more sense. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | | gfs2: rename gfs2_write_full_page to gfs2_write_jdata_page, remove parmBob Peterson2020-10-151-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the function is only used for writing jdata pages, this patch simply renames function gfs2_write_full_page to a more appropriate name: gfs2_write_jdata_page. This makes the code easier to understand. The function was only called in one place, which passed in a pointer to function gfs2_get_block_noalloc. The function doesn't need to be passed in. Therefore, this also eliminates the unnecessary parameter to increase efficiency. I also took the liberty of cleaning up the function comments. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>