summaryrefslogtreecommitdiffstats
path: root/fs/jffs2/super.c
Commit message (Collapse)AuthorAgeFilesLines
* jffs2: Fix potential illegal address access in jffs2_free_inodeWang Yong2024-05-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During the stress testing of the jffs2 file system,the following abnormal printouts were found: [ 2430.649000] Unable to handle kernel paging request at virtual address 0069696969696948 [ 2430.649622] Mem abort info: [ 2430.649829] ESR = 0x96000004 [ 2430.650115] EC = 0x25: DABT (current EL), IL = 32 bits [ 2430.650564] SET = 0, FnV = 0 [ 2430.650795] EA = 0, S1PTW = 0 [ 2430.651032] FSC = 0x04: level 0 translation fault [ 2430.651446] Data abort info: [ 2430.651683] ISV = 0, ISS = 0x00000004 [ 2430.652001] CM = 0, WnR = 0 [ 2430.652558] [0069696969696948] address between user and kernel address ranges [ 2430.653265] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 2430.654512] CPU: 2 PID: 20919 Comm: cat Not tainted 5.15.25-g512f31242bf6 #33 [ 2430.655008] Hardware name: linux,dummy-virt (DT) [ 2430.655517] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2430.656142] pc : kfree+0x78/0x348 [ 2430.656630] lr : jffs2_free_inode+0x24/0x48 [ 2430.657051] sp : ffff800009eebd10 [ 2430.657355] x29: ffff800009eebd10 x28: 0000000000000001 x27: 0000000000000000 [ 2430.658327] x26: ffff000038f09d80 x25: 0080000000000000 x24: ffff800009d38000 [ 2430.658919] x23: 5a5a5a5a5a5a5a5a x22: ffff000038f09d80 x21: ffff8000084f0d14 [ 2430.659434] x20: ffff0000bf9a6ac0 x19: 0169696969696940 x18: 0000000000000000 [ 2430.659969] x17: ffff8000b6506000 x16: ffff800009eec000 x15: 0000000000004000 [ 2430.660637] x14: 0000000000000000 x13: 00000001000820a1 x12: 00000000000d1b19 [ 2430.661345] x11: 0004000800000000 x10: 0000000000000001 x9 : ffff8000084f0d14 [ 2430.662025] x8 : ffff0000bf9a6b40 x7 : ffff0000bf9a6b48 x6 : 0000000003470302 [ 2430.662695] x5 : ffff00002e41dcc0 x4 : ffff0000bf9aa3b0 x3 : 0000000003470342 [ 2430.663486] x2 : 0000000000000000 x1 : ffff8000084f0d14 x0 : fffffc0000000000 [ 2430.664217] Call trace: [ 2430.664528] kfree+0x78/0x348 [ 2430.664855] jffs2_free_inode+0x24/0x48 [ 2430.665233] i_callback+0x24/0x50 [ 2430.665528] rcu_do_batch+0x1ac/0x448 [ 2430.665892] rcu_core+0x28c/0x3c8 [ 2430.666151] rcu_core_si+0x18/0x28 [ 2430.666473] __do_softirq+0x138/0x3cc [ 2430.666781] irq_exit+0xf0/0x110 [ 2430.667065] handle_domain_irq+0x6c/0x98 [ 2430.667447] gic_handle_irq+0xac/0xe8 [ 2430.667739] call_on_irq_stack+0x28/0x54 The parameter passed to kfree was 5a5a5a5a, which corresponds to the target field of the jffs_inode_info structure. It was found that all variables in the jffs_inode_info structure were 5a5a5a5a, except for the first member sem. It is suspected that these variables are not initialized because they were set to 5a5a5a5a during memory testing, which is meant to detect uninitialized memory.The sem variable is initialized in the function jffs2_i_init_once, while other members are initialized in the function jffs2_init_inode_info. The function jffs2_init_inode_info is called after iget_locked, but in the iget_locked function, the destroy_inode process is triggered, which releases the inode and consequently, the target member of the inode is not initialized.In concurrent high pressure scenarios, iget_locked may enter the destroy_inode branch as described in the code. Since the destroy_inode functionality of jffs2 only releases the target, the fix method is to set target to NULL in jffs2_i_init_once. Signed-off-by: Wang Yong <wang.yong12@zte.com.cn> Reviewed-by: Lu Zhongjun <lu.zhongjun@zte.com.cn> Reviewed-by: Yang Tao <yang.tao172@zte.com.cn> Cc: Xu Xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Richard Weinberger <richard@nod.at>
* mm, slab: remove last vestiges of SLAB_MEM_SPREADLinus Torvalds2024-03-121-1/+1
| | | | | | | | | | | | | | Yes, yes, I know the slab people were planning on going slow and letting every subsystem fight this thing on their own. But let's just rip off the band-aid and get it over and done with. I don't want to see a number of unnecessary pull requests just to get rid of a flag that no longer has any meaning. This was mainly done with a couple of 'sed' scripts and then some manual cleanup of the end result. Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* exportfs: make ->encode_fh() a mandatory method for NFS exportAmir Goldstein2023-10-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | Rename the default helper for encoding FILEID_INO32_GEN* file handles to generic_encode_ino32_fh() and convert the filesystems that used the default implementation to use the generic helper explicitly. After this change, exportfs_encode_inode_fh() no longer has a default implementation to encode FILEID_INO32_GEN* file handles. This is a step towards allowing filesystems to encode non-decodeable file handles for fanotify without having to implement any export_operations. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-3-amir73il@gmail.com Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
* fs: allocate inode by using alloc_inode_sb()Muchun Song2022-03-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4] Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* jffs2: Fix NULL pointer dereference in rp_size fs option parsingJamie Iles2020-12-131-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzkaller found the following JFFS2 splat: Unable to handle kernel paging request at virtual address dfffa00000000001 Mem abort info: ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 [dfffa00000000001] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 12745 Comm: syz-executor.5 Tainted: G S 5.9.0-rc8+ #98 Hardware name: linux,dummy-virt (DT) pstate: 20400005 (nzCv daif +PAN -UAO BTYPE=--) pc : jffs2_parse_param+0x138/0x308 fs/jffs2/super.c:206 lr : jffs2_parse_param+0x108/0x308 fs/jffs2/super.c:205 sp : ffff000022a57910 x29: ffff000022a57910 x28: 0000000000000000 x27: ffff000057634008 x26: 000000000000d800 x25: 000000000000d800 x24: ffff0000271a9000 x23: ffffa0001adb5dc0 x22: ffff000023fdcf00 x21: 1fffe0000454af2c x20: ffff000024cc9400 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: ffffa000102dbdd0 x15: 0000000000000000 x14: ffffa000109e44bc x13: ffffa00010a3a26c x12: ffff80000476e0b3 x11: 1fffe0000476e0b2 x10: ffff80000476e0b2 x9 : ffffa00010a3ad60 x8 : ffff000023b70593 x7 : 0000000000000003 x6 : 00000000f1f1f1f1 x5 : ffff000023fdcf00 x4 : 0000000000000002 x3 : ffffa00010000000 x2 : 0000000000000001 x1 : dfffa00000000000 x0 : 0000000000000008 Call trace: jffs2_parse_param+0x138/0x308 fs/jffs2/super.c:206 vfs_parse_fs_param+0x234/0x4e8 fs/fs_context.c:117 vfs_parse_fs_string+0xe8/0x148 fs/fs_context.c:161 generic_parse_monolithic+0x17c/0x208 fs/fs_context.c:201 parse_monolithic_mount_data+0x7c/0xa8 fs/fs_context.c:649 do_new_mount fs/namespace.c:2871 [inline] path_mount+0x548/0x1da8 fs/namespace.c:3192 do_mount+0x124/0x138 fs/namespace.c:3205 __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount fs/namespace.c:3390 [inline] __arm64_sys_mount+0x164/0x238 fs/namespace.c:3390 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common.constprop.0+0x15c/0x598 arch/arm64/kernel/syscall.c:149 do_el0_svc+0x60/0x150 arch/arm64/kernel/syscall.c:195 el0_svc+0x34/0xb0 arch/arm64/kernel/entry-common.c:226 el0_sync_handler+0xc8/0x5b4 arch/arm64/kernel/entry-common.c:236 el0_sync+0x15c/0x180 arch/arm64/kernel/entry.S:663 Code: d2d40001 f2fbffe1 91002260 d343fc02 (38e16841) ---[ end trace 4edf690313deda44 ]--- This is because since ec10a24f10c8, the option parsing happens before fill_super and so the MTD device isn't associated with the filesystem. Defer the size check until there is a valid association. Fixes: ec10a24f10c8 ("vfs: Convert jffs2 to use the new mount API") Cc: <stable@vger.kernel.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Jamie Iles <jamie@nuviainc.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* jffs2: Allow setting rp_size to zero during remountinglizhe2020-12-131-2/+5
| | | | | | | | | | | | | | Set rp_size to zero will be ignore during remounting. The method to identify whether we input a remounting option of rp_size is to check if the rp_size input is zero. It can not work well if we pass "rp_size=0". This patch add a bool variable "set_rp_size" to fix this problem. Reported-by: Jubin Zhong <zhongjubin@huawei.com> Signed-off-by: lizhe <lizhe67@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* jffs2: Fix ignoring mounting options problem during remountinglizhe2020-12-131-0/+17
| | | | | | | | | | | | | | | | | | | | | | The jffs2 mount options will be ignored when remounting jffs2. It can be easily reproduced with the steps listed below. 1. mount -t jffs2 -o compr=none /dev/mtdblockx /mnt 2. mount -o remount compr=zlib /mnt Since ec10a24f10c8, the option parsing happens before fill_super and then pass fc, which contains the options parsing results, to function jffs2_reconfigure during remounting. But function jffs2_reconfigure do not update c->mount_opts. This patch add a function jffs2_update_mount_opts to fix this problem. By the way, I notice that tmpfs use the same way to update remounting options. If it is necessary to unify them? Cc: <stable@vger.kernel.org> Fixes: ec10a24f10c8 ("vfs: Convert jffs2 to use the new mount API") Signed-off-by: lizhe <lizhe67@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* fs_parse: fold fs_parameter_desc/fs_parameter_specAl Viro2020-02-071-7/+3
| | | | | | The former contains nothing but a pointer to an array of the latter... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs_parser: remove fs_parameter_description name fieldEric Sandeen2020-02-071-1/+0
| | | | | | | | Unused now. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fold struct fs_parameter_enum into struct constant_tableAl Viro2020-02-071-1/+1
| | | | | | no real difference now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs_parse: get rid of ->enumsAl Viro2020-02-071-11/+10
| | | | | | | | | | Don't do a single array; attach them to fsparam_enum() entry instead. And don't bother trying to embed the names into those - it actually loses memory, with no real speedup worth mentioning. Simplifies validation as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* jffs2: Fix mounting under new mount APIDavid Howells2019-09-261-2/+0
| | | | | | | | | | | | | | | | | | | | | | The mounting of jffs2 is broken due to the changes from the new mount API because it specifies a "source" operation, but then doesn't actually process it. But because it specified it, it doesn't return -ENOPARAM and the caller doesn't process it either and the source gets lost. Fix this by simply removing the source parameter from jffs2 and letting the VFS deal with it in the default manner. To test it, enable CONFIG_MTD_MTDRAM and allow the default size and erase block size parameters, then try and mount the /dev/mtdblock<N> file that that creates as jffs2. No need to initialise it. Fixes: ec10a24f10c8 ("vfs: Convert jffs2 to use the new mount API") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> cc: David Woodhouse <dwmw2@infradead.org> cc: Richard Weinberger <richard@nod.at> cc: linux-mtd@lists.infradead.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Convert jffs2 to use the new mount APIDavid Howells2019-09-051-89/+83
| | | | | | | | | | | | | | Convert the jffs2 filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <dhowells@redhat.com> cc: David Woodhouse <dwmw2@infradead.org> cc: linux-mtd@lists.infradead.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* jffs2: switch to ->free_inode()Al Viro2019-05-011-8/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* jffs2: fix use-after-free on symlink traversalAl Viro2019-04-011-1/+4
| | | | | | | | free the symlink body after the same RCU delay we have for freeing the struct inode itself, so that traversal during RCU pathwalk wouldn't step into freed memory. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* jffs2: Fix use of uninitialized delayed_work, lockdep breakageDaniel Santos2018-12-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | jffs2_sync_fs makes the assumption that if CONFIG_JFFS2_FS_WRITEBUFFER is defined then a write buffer is available and has been initialized. However, this does is not the case when the mtd device has no out-of-band buffer: int jffs2_nand_flash_setup(struct jffs2_sb_info *c) { if (!c->mtd->oobsize) return 0; ... The resulting call to cancel_delayed_work_sync passing a uninitialized (but zeroed) delayed_work struct forces lockdep to become disabled. [ 90.050639] overlayfs: upper fs does not support tmpfile. [ 90.652264] INFO: trying to register non-static key. [ 90.662171] the code is fine but needs lockdep annotation. [ 90.673090] turning off the locking correctness validator. [ 90.684021] CPU: 0 PID: 1762 Comm: mount_root Not tainted 4.14.63 #0 [ 90.696672] Stack : 00000000 00000000 80d8f6a2 00000038 805f0000 80444600 8fe364f4 805dfbe7 [ 90.713349] 80563a30 000006e2 8068370c 00000001 00000000 00000001 8e2fdc48 ffffffff [ 90.730020] 00000000 00000000 80d90000 00000000 00000106 00000000 6465746e 312e3420 [ 90.746690] 6b636f6c 03bf0000 f8000000 20676e69 00000000 80000000 00000000 8e2c2a90 [ 90.763362] 80d90000 00000001 00000000 8e2c2a90 00000003 80260dc0 08052098 80680000 [ 90.780033] ... [ 90.784902] Call Trace: [ 90.789793] [<8000f0d8>] show_stack+0xb8/0x148 [ 90.798659] [<8005a000>] register_lock_class+0x270/0x55c [ 90.809247] [<8005cb64>] __lock_acquire+0x13c/0xf7c [ 90.818964] [<8005e314>] lock_acquire+0x194/0x1dc [ 90.828345] [<8003f27c>] flush_work+0x200/0x24c [ 90.837374] [<80041dfc>] __cancel_work_timer+0x158/0x210 [ 90.847958] [<801a8770>] jffs2_sync_fs+0x20/0x54 [ 90.857173] [<80125cf4>] iterate_supers+0xf4/0x120 [ 90.866729] [<80158fc4>] sys_sync+0x44/0x9c [ 90.875067] [<80014424>] syscall_common+0x34/0x58 Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Reviewed-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
* jffs2: free jffs2_sb_info through jffs2_kill_sb()Hou Tao2018-10-161-3/+1
| | | | | | | | | | | | | | | When an invalid mount option is passed to jffs2, jffs2_parse_options() will fail and jffs2_sb_info will be freed, but then jffs2_sb_info will be used (use-after-free) and freeed (double-free) in jffs2_kill_sb(). Fix it by removing the buggy invocation of kfree() when getting invalid mount options. Fixes: 92abc475d8de ("jffs2: implement mount option parsing and compression overriding") Cc: stable@kernel.org Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
* jffs2_kill_sb(): deal with failed allocationsAl Viro2018-04-151-1/+1
| | | | | | | | jffs2_fill_super() might fail to allocate jffs2_sb_info; jffs2_kill_sb() must survive that. Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Rename superblock flags (MS_xyz -> SB_xyz)Linus Torvalds2017-11-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a pure automated search-and-replace of the internal kernel superblock flags. The s_flags are now called SB_*, with the names and the values for the moment mirroring the MS_* flags that they're equivalent to. Note how the MS_xyz flags are the ones passed to the mount system call, while the SB_xyz flags are what we then use in sb->s_flags. The script to do this was: # places to look in; re security/*: it generally should *not* be # touched (that stuff parses mount(2) arguments directly), but # there are two places where we really deal with superblock flags. FILES="drivers/mtd drivers/staging/lustre fs ipc mm \ include/linux/fs.h include/uapi/linux/bfs_fs.h \ security/apparmor/apparmorfs.c security/apparmor/include/lib.h" # the list of MS_... constants SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \ DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \ POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \ I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \ ACTIVE NOUSER" SED_PROG= for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done # we want files that contain at least one of MS_..., # with fs/namespace.c and fs/pnode.c excluded. L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c') for f in $L; do sed -i $f $SED_PROG; done Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)David Howells2017-07-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Firstly by applying the following with coccinelle's spatch: @@ expression SB; @@ -SB->s_flags & MS_RDONLY +sb_rdonly(SB) to effect the conversion to sb_rdonly(sb), then by applying: @@ expression A, SB; @@ ( -(!sb_rdonly(SB)) && A +!sb_rdonly(SB) && A | -A != (sb_rdonly(SB)) +A != sb_rdonly(SB) | -A == (sb_rdonly(SB)) +A == sb_rdonly(SB) | -!(sb_rdonly(SB)) +!sb_rdonly(SB) | -A && (sb_rdonly(SB)) +A && sb_rdonly(SB) | -A || (sb_rdonly(SB)) +A || sb_rdonly(SB) | -(sb_rdonly(SB)) != A +sb_rdonly(SB) != A | -(sb_rdonly(SB)) == A +sb_rdonly(SB) == A | -(sb_rdonly(SB)) && A +sb_rdonly(SB) && A | -(sb_rdonly(SB)) || A +sb_rdonly(SB) || A ) @@ expression A, B, SB; @@ ( -(sb_rdonly(SB)) ? 1 : 0 +sb_rdonly(SB) | -(sb_rdonly(SB)) ? A : B +sb_rdonly(SB) ? A : B ) to remove left over excess bracketage and finally by applying: @@ expression A, SB; @@ ( -(A & MS_RDONLY) != sb_rdonly(SB) +(bool)(A & MS_RDONLY) != sb_rdonly(SB) | -(A & MS_RDONLY) == sb_rdonly(SB) +(bool)(A & MS_RDONLY) == sb_rdonly(SB) ) to make comparisons against the result of sb_rdonly() (which is a bool) work correctly. Signed-off-by: David Howells <dhowells@redhat.com>
* don't bother with ->d_inode->i_sb - it's always equal to ->d_sbAl Viro2016-04-101-1/+1
| | | | | | ... and neither can ever be NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* tree wide: use kvfree() than conditional kfree()/vfree()Tetsuo Handa2016-01-221-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | There are many locations that do if (memory_was_allocated_by_vmalloc) vfree(ptr); else kfree(ptr); but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory using is_vmalloc_addr(). Unless callers have special reasons, we can replace this branch with kvfree(). Please check and reply if you found problems. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Jan Kara <jack@suse.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net> Acked-by: David Rientjes <rientjes@google.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Boris Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kmemcg: account certain kmem allocations to memcgVladimir Davydov2016-01-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark those kmem allocations that are known to be easily triggered from userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to memcg. For the list, see below: - threadinfo - task_struct - task_delay_info - pid - cred - mm_struct - vm_area_struct and vm_region (nommu) - anon_vma and anon_vma_chain - signal_struct - sighand_struct - fs_struct - files_struct - fdtable and fdtable->full_fds_bits - dentry and external_name - inode for all filesystems. This is the most tedious part, because most filesystems overwrite the alloc_inode method. The list is far from complete, so feel free to add more objects. Nevertheless, it should be close to "account everything" approach and keep most workloads within bounds. Malevolent users will be able to breach the limit, but this was possible even with the former "account everything" approach (simply because it did not account everything in fact). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* VFS: normal filesystems (and lustre): d_inode() annotationsDavid Howells2015-04-151-2/+2
| | | | | | | that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)David Howells2015-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the following where appropriate: (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry). (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry). (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more complicated than it appears as some calls should be converted to d_can_lookup() instead. The difference is whether the directory in question is a real dir with a ->lookup op or whether it's a fake dir with a ->d_automount op. In some circumstances, we can subsume checks for dentry->d_inode not being NULL into this, provided we the code isn't in a filesystem that expects d_inode to be NULL if the dirent really *is* negative (ie. if we're going to use d_inode() rather than d_backing_inode() to get the inode pointer). Note that the dentry type field may be set to something other than DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS manages the fall-through from a negative dentry to a lower layer. In such a case, the dentry type of the negative union dentry is set to the same as the type of the lower dentry. However, if you know d_inode is not NULL at the call site, then you can use the d_is_xxx() functions even in a filesystem. There is one further complication: a 0,0 chardev dentry may be labelled DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was intended for special directory entry types that don't have attached inodes. The following perl+coccinelle script was used: use strict; my @callers; open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') || die "Can't grep for S_ISDIR and co. callers"; @callers = <$fd>; close($fd); unless (@callers) { print "No matches\n"; exit(0); } my @cocci = ( '@@', 'expression E;', '@@', '', '- S_ISLNK(E->d_inode->i_mode)', '+ d_is_symlink(E)', '', '@@', 'expression E;', '@@', '', '- S_ISDIR(E->d_inode->i_mode)', '+ d_is_dir(E)', '', '@@', 'expression E;', '@@', '', '- S_ISREG(E->d_inode->i_mode)', '+ d_is_reg(E)' ); my $coccifile = "tmp.sp.cocci"; open($fd, ">$coccifile") || die $coccifile; print($fd "$_\n") || die $coccifile foreach (@cocci); close($fd); foreach my $file (@callers) { chomp $file; print "Processing ", $file, "\n"; system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 || die "spatch failed"; } [AV: overlayfs parts skipped] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: push sync_filesystem() down to the file system's remount_fs()Theodore Ts'o2014-03-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the no-op "mount -o mount /dev/xxx" operation when the file system is already mounted read-write causes an implied, unconditional syncfs(). This seems pretty stupid, and it's certainly documented or guaraunteed to do this, nor is it particularly useful, except in the case where the file system was mounted rw and is getting remounted read-only. However, it's possible that there might be some file systems that are actually depending on this behavior. In most file systems, it's probably fine to only call sync_filesystem() when transitioning from read-write to read-only, and there are some file systems where this is not needed at all (for example, for a pseudo-filesystem or something like romfs). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jan Kara <jack@suse.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anders Larsen <al@alarsen.net> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: xfs@oss.sgi.com Cc: linux-btrfs@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: codalist@coda.cs.cmu.edu Cc: linux-ext4@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: fuse-devel@lists.sourceforge.net Cc: cluster-devel@redhat.com Cc: linux-mtd@lists.infradead.org Cc: jfs-discussion@lists.sourceforge.net Cc: linux-nfs@vger.kernel.org Cc: linux-nilfs@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Cc: reiserfs-devel@vger.kernel.org
* fs: Limit sys_mount to only request filesystem modules.Eric W. Biederman2013-03-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify the request_module to prefix the file system type with "fs-" and add aliases to all of the filesystems that can be built as modules to match. A common practice is to build all of the kernel code and leave code that is not commonly needed as modules, with the result that many users are exposed to any bug anywhere in the kernel. Looking for filesystems with a fs- prefix limits the pool of possible modules that can be loaded by mount to just filesystems trivially making things safer with no real cost. Using aliases means user space can control the policy of which filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf with blacklist and alias directives. Allowing simple, safe, well understood work-arounds to known problematic software. This also addresses a rare but unfortunate problem where the filesystem name is not the same as it's module name and module auto-loading would not work. While writing this patch I saw a handful of such cases. The most significant being autofs that lives in the module autofs4. This is relevant to user namespaces because we can reach the request module in get_fs_type() without having any special permissions, and people get uncomfortable when a user specified string (in this case the filesystem type) goes all of the way to request_module. After having looked at this issue I don't think there is any particular reason to perform any filtering or permission checks beyond making it clear in the module request that we want a filesystem module. The common pattern in the kernel is to call request_module() without regards to the users permissions. In general all a filesystem module does once loaded is call register_filesystem() and go to sleep. Which means there is not much attack surface exposed by loading a filesytem module unless the filesystem is mounted. In a user namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT, which most filesystems do not set today. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Kees Cook <keescook@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* Merge tag 'disintegrate-mtd-20121009' of ↵David Woodhouse2012-10-091-0/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.infradead.org/users/dhowells/linux-headers UAPI Disintegration 2012-10-09 Conflicts: MAINTAINERS arch/arm/configs/bcmring_defconfig arch/arm/mach-imx/clk-imx51-imx53.c drivers/mtd/nand/Kconfig drivers/mtd/nand/bcm_umi_nand.c drivers/mtd/nand/nand_bcm_umi.h drivers/mtd/nand/orion_nand.c
| * fs: push rcu_barrier() from deactivate_locked_super() to filesystemsKirill A. Shutemov2012-10-021-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fast paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | JFFS2: fix unmount regressionArtem Bityutskiy2012-09-291-0/+4
|/ | | | | | | | | | | | | | | | | | | This patch fixes regression introduced by "8bdc81c jffs2: get rid of jffs2_sync_super". We submit a delayed work in order to make sure the write-buffer is synchronized at some point. But we do not flush it when we unmount, which causes an oops when we unmount the file-system and then the delayed work is executed. This patch fixes the issue by adding a "cancel_delayed_work_sync()" infocation in the '->sync_fs()' handler. This will make sure the delayed work is canceled on sync, unmount and re-mount. And because VFS always callse 'sync_fs()' before unmounting or remounting, this fixes the issue. Reported-by: Ludovic Desroches <ludovic.desroches@atmel.com> Cc: stable@vger.kernel.org [3.5+] Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Tested-by: Ludovic Desroches <ludovic.desroches@atmel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* jffs2: get rid of jffs2_sync_superArtem Bityutskiy2012-05-131-13/+0
| | | | | | | | | | | | | | | | | | | | | Currently JFFS2 file-system maps the VFS "superblock" abstraction to the write-buffer. Namely, it uses VFS services to synchronize the write-buffer periodically. The whole "superblock write-out" VFS infrastructure is served by the 'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and writes out all dirty superblock using the '->write_super()' call-back. But the problem with this thread is that it wastes power by waking up the system every 5 seconds no matter what. So we want to kill it completely and thus, we need to make file-systems to stop using the '->write_super' VFS service, and then remove it together with the kernel thread. This patch switches the JFFS2 write-buffer management from '->write_super()'/'->s_dirt' to a delayed work. Instead of setting the 's_dirt' flag we just schedule a delayed work for synchronizing the write-buffer. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* jffs2: remove unnecessary GC pass on syncArtem Bityutskiy2012-05-131-2/+0
| | | | | | | | | | | | | | | We do not need to call 'jffs2_write_super()' on sync. This function causes a GC pass to make sure the current contents is pushed out with the data which we already have on the media. But this is not needed on unmount and only slows sync down unnecessarily. It is enough to just sync the write-buffer. This call was added by one of the generic VFS rework patch-sets, see d579ed00aa96a7f7486978540a0d7cecaff742ae. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* jffs2: remove unnecessary GC pass on umountArtem Bityutskiy2012-05-131-3/+0
| | | | | | | | | | | | | | | We do not need to call 'jffs2_write_super()' on unmount. This function causes a GC pass to make sure the current contents is pushed out with the data which we already have on the media. But this is not needed on unmount and only slows unmount down unnecessarily. It is enough to just sync the write-buffer. This call was added by one of the generic VFS rework patch-sets, see 8c85e125124a473d6f3e9bb187b0b84207f81d91. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* jffs2: remove lock_superArtem Bityutskiy2012-05-131-3/+0
| | | | | | | We do not need 'lock_super()'/'unlock_super()' in JFFS2 - kill them. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* JFFS2: Add parameter to reserve disk space for rootDaniel Drake2012-05-131-0/+17
| | | | | | | | | | | | | | Add a new rp_size= parameter which creates a "reserved pool" of disk space which can only be used by root. Other users are not permitted to write to disk when the available space is less than the pool size. Based on original code by Artem Bityutskiy in https://dev.laptop.org/ticket/5317 [dwmw2: use capable(CAP_SYS_RESOURCE) not uid/gid check, fix debug prints] Signed-off-by: Daniel Drake <dsd@laptop.org> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* jffs2: Use pr_fmt and remove jffs: from formatsJoe Perches2012-03-271-7/+9
| | | | | | | | | | Use pr_fmt to prefix KBUILD_MODNAME to appropriate logging messages. Remove now unnecessary internal prefixes from formats. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* jffs2: Convert printks to pr_<level>Joe Perches2012-03-271-9/+9
| | | | | | | | | | | | | Use the more current logging style. Coalesce formats, align arguments. Convert uses of embedded function names to %s, __func__. A couple of long line checkpatch errors I don't care about exist. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* jffs2: Convert most D1/D2 macros to jffs2_dbgJoe Perches2012-03-271-5/+5
| | | | | | | | | | | | | | D1 and D2 macros are mostly uses to emit debugging messages. Convert the logging uses of D1 & D2 to jffs2_dbg(level, fmt, ...) to be a bit more consistent style with the rest of the kernel. All jffs2_dbg output is now at KERN_DEBUG where some of the previous uses were emitted at various KERN_<LEVEL>s. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* Merge tag 'for-linus-3.3' of git://git.infradead.org/mtd-2.6Linus Torvalds2012-01-101-3/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MTD pull for 3.3 * tag 'for-linus-3.3' of git://git.infradead.org/mtd-2.6: (113 commits) mtd: Fix dependency for MTD_DOC200x mtd: do not use mtd->block_markbad directly logfs: do not use 'mtd->block_isbad' directly mtd: introduce mtd_can_have_bb helper mtd: do not use mtd->suspend and mtd->resume directly mtd: do not use mtd->lock, unlock and is_locked directly mtd: do not use mtd->sync directly mtd: harmonize mtd_writev usage mtd: do not use mtd->lock_user_prot_reg directly mtd: mtd->write_user_prot_reg directly mtd: do not use mtd->read_*_prot_reg directly mtd: do not use mtd->get_*_prot_info directly mtd: do not use mtd->read_oob directly mtd: mtdoops: do not use mtd->panic_write directly romfs: do not use mtd->get_unmapped_area directly mtd: do not use mtd->get_unmapped_area directly mtd: do use mtd->point directly mtd: introduce mtd_has_oob helper mtd: mtdcore: export symbols cleanup mtd: clean-up the default_mtd_writev function ... Fix up trivial edit/remove conflict in drivers/staging/spectra/lld_mtd.c
| * mtd: do not use mtd->sync directlyArtem Bityutskiy2012-01-091-3/+1
| | | | | | | | | | | | | | | | | | This patch teaches 'mtd_sync()' to do nothing when the MTD driver does not have the '->sync()' method, which allows us to remove all direct 'mtd->sync' accesses. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * mtd: introduce mtd_sync interfaceArtem Bityutskiy2012-01-091-1/+1
| | | | | | | | | | Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* | vfs: switch ->show_options() to struct dentry *Al Viro2012-01-061-2/+2
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | vfs: fix the stupidity with i_dentry in inode destructorsAl Viro2012-01-031-1/+0
|/ | | | | | | | | | Seeing that just about every destructor got that INIT_LIST_HEAD() copied into it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once(); the cost of taking it into inode_init_always() will be negligible for pipes and sockets and negative for everything else. Not to mention the removal of boilerplate code from ->destroy_inode() instances... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* jffs2: add compr=lzo and compr=zlib optionsAndres Salomon2011-10-191-2/+24
| | | | | | | | | | | ..to allow forcing of either compression scheme. This will override compiled-in defaults. jffs2_compress is reworked a bit, as the lzo/zlib override shares lots of code w/ the PRIORITY mode. v2: update show_options accordingly. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
* jffs2: implement mount option parsing and compression overridingAndres Salomon2011-10-191-0/+97
| | | | | | | | | | | | | | | | | | | | | | | | | Currently jffs2 has compile-time constants (and .config options) controlling whether or not the various compression/decompression drivers are built in and enabled. This is fine for embedded systems, but it clashes with distribution kernels. Distro kernels tend to turn on everything; this causes OpenFirmware to fall over, as it understands ZLIB-compressed inodes. Booting a kernel that has LZO compression enabled, writing to the boot partition, and then rebooting causes OFW to fail to read the kernel from the filesystem. This is because LZO compression has priority when writing new data to jffs2, if LZO is enabled. This patch adds mount option parsing, and a single supported option ("compr=none"). This adds the flexibility of being able to specify which compressor overrides on a per-superblock basis. For now, we can simply disable compression; additional flexibility coming soon. v2: kill some printks, and implement show_options as suggested by Artem Bityutskiy. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
* fs: icache RCU free inodesNick Piggin2011-01-071-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* convert get_sb_mtd() users to ->mount()Al Viro2010-10-291-5/+4
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* BKL: Remove BKL from jffs2Arnd Bergmann2010-10-041-11/+1
| | | | | | | | | The BKL is only used in put_super, fill_super and remount_fs that are all three protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: David Woodhouse <dwmw2@infradead.org>
* BKL: Explicitly add BKL around get_sb/fill_superJan Blunck2010-10-041-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is a preparation necessary to remove the BKL from do_new_mount(). It explicitly adds calls to lock_kernel()/unlock_kernel() around get_sb/fill_super operations for filesystems that still uses the BKL. I've read through all the code formerly covered by the BKL inside do_kern_mount() and have satisfied myself that it doesn't need the BKL any more. do_kern_mount() is already called without the BKL when mounting the rootfs and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called from various places without BKL: simple_pin_fs(), nfs_do_clone_mount() through nfs_follow_mountpoint(), afs_mntpt_do_automount() through afs_mntpt_follow_link(). Both later functions are actually the filesystems follow_link inode operation. vfs_kern_mount() is calling the specified get_sb function and lets the filesystem do its job by calling the given fill_super function. Therefore I think it is safe to push down the BKL from the VFS to the low-level filesystems get_sb/fill_super operation. [arnd: do not add the BKL to those file systems that already don't use it elsewhere] Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Christoph Hellwig <hch@infradead.org>
* convert remaining ->clear_inode() to ->evict_inode()Al Viro2010-08-091-1/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>