summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'driver-core-6.9-rc1' of ↵Linus Torvalds2024-03-215-33/+88
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the "big" set of driver core and kernfs changes for 6.9-rc1. Nothing all that crazy here, just some good updates that include: - automatic attribute group hiding from Dan Williams (he fixed up my horrible attempt at doing this.) - kobject lock contention fixes from Eric Dumazet - driver core cleanups from Andy - kernfs rcu work from Tejun - fw_devlink changes to resolve some reported issues - other minor changes, all details in the shortlog All of these have been in linux-next for a long time with no reported issues" * tag 'driver-core-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (28 commits) device: core: Log warning for devices pending deferred probe on timeout driver: core: Use dev_* instead of pr_* so device metadata is added driver: core: Log probe failure as error and with device metadata of: property: fw_devlink: Add support for "post-init-providers" property driver core: Add FWLINK_FLAG_IGNORE to completely ignore a fwnode link driver core: Adds flags param to fwnode_link_add() debugfs: fix wait/cancellation handling during remove device property: Don't use "proxy" headers device property: Move enum dev_dma_attr to fwnode.h driver core: Move fw_devlink stuff to where it belongs driver core: Drop unneeded 'extern' keyword in fwnode.h firmware_loader: Suppress warning on FW_OPT_NO_WARN flag sysfs:Addresses documentation in sysfs_merge_group and sysfs_unmerge_group. firmware_loader: introduce __free() cleanup hanler platform-msi: Remove usage of the deprecated ida_simple_xx() API sysfs: Introduce DEFINE_SIMPLE_SYSFS_GROUP_VISIBLE() sysfs: Document new "group visible" helpers sysfs: Fix crash on empty group attributes array sysfs: Introduce a mechanism to hide static attribute_groups sysfs: Introduce a mechanism to hide static attribute_groups ...
| * debugfs: fix wait/cancellation handling during removeJohannes Berg2024-03-071-5/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ben Greear further reports deadlocks during concurrent debugfs remove while files are being accessed, even though the code in question now uses debugfs cancellations. Turns out that despite all the review on the locking, we missed completely that the logic is wrong: if the refcount hits zero we can finish (and need not wait for the completion), but if it doesn't we have to trigger all the cancellations. As written, we can _never_ get into the loop triggering the cancellations. Fix this, and explain it better while at it. Cc: stable@vger.kernel.org Fixes: 8c88a474357e ("debugfs: add API to allow debugfs operations cancellation") Reported-by: Ben Greear <greearb@candelatech.com> Closes: https://lore.kernel.org/r/1c9fa9e5-09f1-0522-fdbc-dbcef4d255ca@candelatech.com Tested-by: Madhan Sai <madhan.singaraju@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20240229153635.6bfab7eb34d3.I6c7aeff8c9d6628a8bc1ddcf332205a49d801f17@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * sysfs:Addresses documentation in sysfs_merge_group and sysfs_unmerge_group.Rohan Kollambalath2024-03-071-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | These functions take a struct attribute_group as an input which has an optional .name field. These functions rely on the .name field being populated and do not check if its null. They pass this name into other functions, eventually leading to a null pointer dereference. This change simply updates the documentation of the function to make this requirement clear. Signed-off-by: Rohan Kollambalath <rkollamb@digi.com> Link: https://lore.kernel.org/r/20240211223634.2103665-1-rohankollambalath@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * sysfs: Fix crash on empty group attributes arrayDan Williams2024-02-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It turns out that arch/x86/events/intel/core.c makes use of "empty" attributes. static struct attribute *empty_attrs; __init int intel_pmu_init(void) { struct attribute **extra_skl_attr = &empty_attrs; struct attribute **extra_attr = &empty_attrs; struct attribute **td_attr = &empty_attrs; struct attribute **mem_attr = &empty_attrs; struct attribute **tsx_attr = &empty_attrs; ... That breaks the assumption __first_visible() that expects that if grp->attrs is set then grp->attrs[0] must also be set and results in backtraces like: BUG: kernel NULL pointer dereference, address: 00rnel mode #PF: error_code(0x0000) - not-present ] PREEMPT SMP NOPTI CPU: 1 PID: 1 Comm: swapper/IP: 0010:exra_is_visible+0x14/0x20 ? exc_page_fault+0x68/0x190 internal_create_groups+0x42/0xa0 pmu_dev_alloc+0xc0/0xe0 perf_event_sysfs_init+0x580000000000 ]--- RIP: 0010:exra_is_visible+0x14/0 Check for non-empty attributes array before calling is_visible(). Reported-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Closes: https://github.com/thesofproject/linux/pull/4799#issuecomment-1958537212 Fixes: 70317fd24b41 ("sysfs: Introduce a mechanism to hide static attribute_groups") Cc: Marc Herbert <marc.herbert@intel.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Marc Herbert <marc.herbert@intel.com> Link: https://lore.kernel.org/r/170863445442.1479840.1818801787239831650.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * sysfs: Introduce a mechanism to hide static attribute_groupsDan Williams2024-02-191-9/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a mechanism for named attribute_groups to hide their directory at sysfs_update_group() time, or otherwise skip emitting the group directory when the group is first registered. It piggybacks on is_visible() in a similar manner as SYSFS_PREALLOC, i.e. special flags in the upper bits of the returned mode. To use it, specify a symbol prefix to DEFINE_SYSFS_GROUP_VISIBLE(), and then pass that same prefix to SYSFS_GROUP_VISIBLE() when assigning the @is_visible() callback: DEFINE_SYSFS_GROUP_VISIBLE($prefix) struct attribute_group $prefix_group = { .name = $name, .is_visible = SYSFS_GROUP_VISIBLE($prefix), }; SYSFS_GROUP_VISIBLE() expects a definition of $prefix_group_visible() and $prefix_attr_visible(), where $prefix_group_visible() just returns true / false and $prefix_attr_visible() behaves as normal. The motivation for this capability is to centralize PCI device authentication in the PCI core with a named sysfs group while keeping that group hidden for devices and platforms that do not meet the requirements. In a PCI topology, most devices will not support authentication, a small subset will support just PCI CMA (Component Measurement and Authentication), a smaller subset will support PCI CMA + PCIe IDE (Link Integrity and Encryption), and only next generation server hosts will start to include a platform TSM (TEE Security Manager). Without this capability the alternatives are: * Check if all attributes are invisible and if so, hide the directory. Beyond trouble getting this to work [1], this is an ABI change for scenarios if userspace happens to depend on group visibility absent any attributes. I.e. this new capability avoids regression since it does not retroactively apply to existing cases. * Publish an empty /sys/bus/pci/devices/$pdev/tsm/ directory for all PCI devices (i.e. for the case when TSM platform support is present, but device support is absent). Unfortunate that this will be a vestigial empty directory in the vast majority of cases. * Reintroduce usage of runtime calls to sysfs_{create,remove}_group() in the PCI core. Bjorn has already indicated that he does not want to see any growth of pci_sysfs_init() [2]. * Drop the named group and simulate a directory by prefixing all TSM-related attributes with "tsm_". Unfortunate to not use the naming capability of a sysfs group as intended. In comparison, there is a small potential for regression if for some reason an @is_visible() callback had dependencies on how many times it was called. Additionally, it is no longer an error to update a group that does not have its directory already present, and it is no longer a WARN() to remove a group that was never visible. Link: https://lore.kernel.org/all/2024012321-envious-procedure-4a58@gregkh/ [1] Link: https://lore.kernel.org/linux-pci/20231019200110.GA1410324@bhelgaas/ [2] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/2024013028-deflator-flaring-ec62@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * Merge 6.8-rc5 into driver-core-nextGreg Kroah-Hartman2024-02-19104-1379/+1510
| |\ | | | | | | | | | | | | | | | We need the driver core changes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | kernfs: fix false-positive WARN(nr_mmapped) in kernfs_drain_open_filesNeel Natu2024-01-301-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this change 'on->nr_mmapped' tracked the total number of mmaps across all of its associated open files via kernfs_fop_mmap(). Thus if the file descriptor associated with a kernfs_open_file was mmapped 10 times then we would have: 'of->mmapped = true' and 'of_on(of)->nr_mmapped = 10'. The problem is that closing or draining a 'of->mmapped' file would only decrement one from the 'of_on(of)->nr_mmapped' counter. For e.g. we have this from kernfs_unlink_open_file(): if (of->mmapped) on->nr_mmapped--; The WARN_ON_ONCE(on->nr_mmapped) in kernfs_drain_open_files() is easy to reproduce by: 1. opening a (mmap-able) kernfs file. 2. mmap-ing that file more than once (mapping just once masks the issue). 3. trigger a drain of that kernfs file. Modulo out-of-tree patches I was able to trigger this reliably by identifying pci device nodes in sysfs that have resource regions that are mmap-able and that don't have any driver attached to them (steps 1 and 2). For step 3 we can "echo 1 > remove" to trigger a kernfs_drain. Signed-off-by: Neel Natu <neelnatu@google.com> Link: https://lore.kernel.org/r/20240127234636.609265-1-neelnatu@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in ↵Tejun Heo2024-01-302-11/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernfs_find_and_get_node_by_id() The BPF helper bpf_cgroup_from_id() calls kernfs_find_and_get_node_by_id() which acquires kernfs_idr_lock, which is an non-raw non-IRQ-safe lock. This can lead to deadlocks as bpf_cgroup_from_id() can be called from any BPF programs including e.g. the ones that attach to functions which are holding the scheduler rq lock. Consider the following BPF program: SEC("fentry/__set_cpus_allowed_ptr_locked") int BPF_PROG(__set_cpus_allowed_ptr_locked, struct task_struct *p, struct affinity_context *affn_ctx, struct rq *rq, struct rq_flags *rf) { struct cgroup *cgrp = bpf_cgroup_from_id(p->cgroups->dfl_cgrp->kn->id); if (cgrp) { bpf_printk("%d[%s] in %s", p->pid, p->comm, cgrp->kn->name); bpf_cgroup_release(cgrp); } return 0; } __set_cpus_allowed_ptr_locked() is called with rq lock held and the above BPF program calls bpf_cgroup_from_id() within leading to the following lockdep warning: ===================================================== WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected 6.7.0-rc3-work-00053-g07124366a1d7-dirty #147 Not tainted ----------------------------------------------------- repro/1620 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: ffffffff833b3688 (kernfs_idr_lock){+.+.}-{2:2}, at: kernfs_find_and_get_node_by_id+0x1e/0x70 and this task is already holding: ffff888237ced698 (&rq->__lock){-.-.}-{2:2}, at: task_rq_lock+0x4e/0xf0 which would create a new lock dependency: (&rq->__lock){-.-.}-{2:2} -> (kernfs_idr_lock){+.+.}-{2:2} ... Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(kernfs_idr_lock); local_irq_disable(); lock(&rq->__lock); lock(kernfs_idr_lock); <Interrupt> lock(&rq->__lock); *** DEADLOCK *** ... Call Trace: dump_stack_lvl+0x55/0x70 dump_stack+0x10/0x20 __lock_acquire+0x781/0x2a40 lock_acquire+0xbf/0x1f0 _raw_spin_lock+0x2f/0x40 kernfs_find_and_get_node_by_id+0x1e/0x70 cgroup_get_from_id+0x21/0x240 bpf_cgroup_from_id+0xe/0x20 bpf_prog_98652316e9337a5a___set_cpus_allowed_ptr_locked+0x96/0x11a bpf_trampoline_6442545632+0x4f/0x1000 __set_cpus_allowed_ptr_locked+0x5/0x5a0 sched_setaffinity+0x1b3/0x290 __x64_sys_sched_setaffinity+0x4f/0x60 do_syscall_64+0x40/0xe0 entry_SYSCALL_64_after_hwframe+0x46/0x4e Let's fix it by protecting kernfs_node and kernfs_root with RCU and making kernfs_find_and_get_node_by_id() acquire rcu_read_lock() instead of kernfs_idr_lock. This adds an rcu_head to kernfs_node making it larger by 16 bytes on 64bit. Combined with the preceding rearrange patch, the net increase is 8 bytes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrea Righi <andrea.righi@canonical.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20240109214828.252092-4-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | Merge tag 'for-6.9-part2-tag' of ↵Linus Torvalds2024-03-211-11/+47
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "Fix a problem found in 6.7 after adding the temp-fsid feature which changed device tracking in memory and broke grub-probe. This is used on initrd-less systems. There were several iterations of the fix and it took longer than expected" * tag 'for-6.9-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: do not skip re-registration for the mounted device
| * | | btrfs: do not skip re-registration for the mounted deviceAnand Jain2024-03-181-11/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are reports that since version 6.7 update-grub fails to find the device of the root on systems without initrd and on a single device. This looks like the device name changed in the output of /proc/self/mountinfo: 6.5-rc5 working 18 1 0:16 / / rw,noatime - btrfs /dev/sda8 ... 6.7 not working: 17 1 0:15 / / rw,noatime - btrfs /dev/root ... and "update-grub" shows this error: /usr/sbin/grub-probe: error: cannot find a device for / (is /dev mounted?) This looks like it's related to the device name, but grub-probe recognizes the "/dev/root" path and tries to find the underlying device. However there's a special case for some filesystems, for btrfs in particular. The generic root device detection heuristic is not done and it all relies on reading the device infos by a btrfs specific ioctl. This ioctl returns the device name as it was saved at the time of device scan (in this case it's /dev/root). The change in 6.7 for temp_fsid to allow several single device filesystem to exist with the same fsid (and transparently generate a new UUID at mount time) was to skip caching/registering such devices. This also skipped mounted device. One step of scanning is to check if the device name hasn't changed, and if yes then update the cached value. This broke the grub-probe as it always read the device /dev/root and couldn't find it in the system. A temporary workaround is to create a symlink but this does not survive reboot. The right fix is to allow updating the device path of a mounted filesystem even if this is a single device one. In the fix, check if the device's major:minor number matches with the cached device. If they do, then we can allow the scan to happen so that device_list_add() can take care of updating the device path. The file descriptor remains unchanged. This does not affect the temp_fsid feature, the UUID of the mounted filesystem remains the same and the matching is based on device major:minor which is unique per mounted filesystem. This covers the path when the device (that exists for all mounted devices) name changes, updating /dev/root to /dev/sdx. Any other single device with filesystem and is not mounted is still skipped. Note that if a system is booted and initial mount is done on the /dev/root device, this will be the cached name of the device. Only after the command "btrfs device scan" it will change as it triggers the rename. The fix was verified by users whose systems were affected. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=218353 Link: https://lore.kernel.org/lkml/CAKLYgeJ1tUuqLcsquwuFqjDXPSJpEiokrWK2gisPKDZLs8Y2TQ@mail.gmail.com/ Fixes: bc27d6f0aa0e ("btrfs: scan but don't register device on single device filesystem") CC: stable@vger.kernel.org # 6.7+ Tested-by: Alex Romosan <aromosan@gmail.com> Tested-by: CHECK_1234543212345@protonmail.com Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* | | | Merge tag 'exfat-for-6.9-rc1' of ↵Linus Torvalds2024-03-214-376/+293
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Improve dirsync performance by syncing on a dentry-set rather than on a per-directory entry * tag 'exfat-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: remove duplicate update parent dir exfat: do not sync parent dir if just update timestamp exfat: remove unused functions exfat: convert exfat_find_empty_entry() to use dentry cache exfat: convert exfat_init_ext_entry() to use dentry cache exfat: move free cluster out of exfat_init_ext_entry() exfat: convert exfat_remove_entries() to use dentry cache exfat: convert exfat_add_entry() to use dentry cache exfat: add exfat_get_empty_dentry_set() helper exfat: add __exfat_get_dentry_set() helper
| * | | | exfat: remove duplicate update parent dirYuezhang Mo2024-03-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For renaming, the directory only needs to be updated once if it is in the same directory. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | exfat: do not sync parent dir if just update timestampYuezhang Mo2024-03-191-11/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When sync or dir_sync is enabled, there is no need to sync the parent directory's inode if only for updating its timestamp. 1. If an unexpected power failure occurs, the timestamp of the parent directory is not updated to the storage, which has no impact on the user. 2. The number of writes will be greatly reduced, which can not only improve performance, but also prolong device life. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | exfat: remove unused functionsYuezhang Mo2024-03-193-64/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | exfat_count_ext_entries() is no longer called, remove it. exfat_update_dir_chksum() is no longer called, remove it and rename exfat_update_dir_chksum_with_entry_set() to it. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | exfat: convert exfat_find_empty_entry() to use dentry cacheYuezhang Mo2024-03-191-84/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this conversion, each dentry traversed needs to be read from the storage device or page cache. There are at least 16 dentries in a sector. This will result in frequent page cache searches. After this conversion, if all directory entries in a sector are used, the sector only needs to be read once. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | exfat: convert exfat_init_ext_entry() to use dentry cacheYuezhang Mo2024-03-193-77/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this conversion, in exfat_init_ext_entry(), to init the dentries in a dentry set, the sync times is equals the dentry number if 'dirsync' or 'sync' is enabled. That affects not only performance but also device life. After this conversion, only needs to be synchronized once if 'dirsync' or 'sync' is enabled. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | exfat: move free cluster out of exfat_init_ext_entry()Yuezhang Mo2024-03-192-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | exfat_init_ext_entry() is an init function, it's a bit strange to free cluster in it. And the argument 'inode' will be removed from exfat_init_ext_entry(). So this commit changes to free the cluster in exfat_remove_entries(). Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | exfat: convert exfat_remove_entries() to use dentry cacheYuezhang Mo2024-03-193-115/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this conversion, in exfat_remove_entries(), to mark the dentries in a dentry set as deleted, the sync times is equals the dentry numbers if 'dirsync' or 'sync' is enabled. That affects not only performance but also device life. After this conversion, only needs to be synchronized once if 'dirsync' or 'sync' is enabled. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | exfat: convert exfat_add_entry() to use dentry cacheYuezhang Mo2024-03-193-33/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After this conversion, if "dirsync" or "sync" is enabled, the number of synchronized dentries in exfat_add_entry() will change from 2 to 1. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | exfat: add exfat_get_empty_dentry_set() helperYuezhang Mo2024-03-192-0/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This helper is used to lookup empty dentry set. If there are no enough empty dentries at the input location, this helper will return the number of dentries that need to be skipped for the next lookup. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | exfat: add __exfat_get_dentry_set() helperYuezhang Mo2024-03-192-22/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since exfat_get_dentry_set() invokes the validate functions of exfat_validate_entry(), it only supports getting a directory entry set of an existing file, doesn't support getting an empty entry set. To remove the limitation, add this helper. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
* | | | | Merge tag 'v6.9-rc-smb3-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds2024-03-2015-147/+716
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull smb server updates from Steve French: - add support for durable file handles (an important data integrity feature) - fixes for potential out of bounds issues - fix possible null dereference in close - getattr fixes - trivial typo fix and minor cleanup * tag 'v6.9-rc-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: remove module version ksmbd: fix potencial out-of-bounds when buffer offset is invalid ksmbd: fix slab-out-of-bounds in smb_strndup_from_utf16() ksmbd: Fix spelling mistake "connction" -> "connection" ksmbd: fix possible null-deref in smb_lazy_parent_lease_break_close ksmbd: add support for durable handles v1/v2 ksmbd: mark SMB2_SESSION_EXPIRED to session when destroying previous session ksmbd: retrieve number of blocks using vfs_getattr in set_file_allocation_info ksmbd: replace generic_fillattr with vfs_getattr
| * | | | | ksmbd: remove module versionNamjae Jeon2024-03-182-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ksmbd module version marking is not needed. Since there is a Linux kernel version, there is no point in increasing it anymore. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | ksmbd: fix potencial out-of-bounds when buffer offset is invalidNamjae Jeon2024-03-182-29/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found potencial out-of-bounds when buffer offset fields of a few requests is invalid. This patch set the minimum value of buffer offset field to ->Buffer offset to validate buffer length. Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | ksmbd: fix slab-out-of-bounds in smb_strndup_from_utf16()Namjae Jeon2024-03-171-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If ->NameOffset of smb2_create_req is smaller than Buffer offset of smb2_create_req, slab-out-of-bounds read can happen from smb2_open. This patch set the minimum value of the name offset to the buffer offset to validate name length of smb2_create_req(). Cc: stable@vger.kernel.org Reported-by: Xuanzhe Yu <yuxuanzhe@outlook.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | ksmbd: Fix spelling mistake "connction" -> "connection"Colin Ian King2024-03-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a spelling mistake in a ksmbd_debug debug message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | ksmbd: fix possible null-deref in smb_lazy_parent_lease_break_closeMarios Makassikis2024-03-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rcu_dereference can return NULL, so make sure we check against that. Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | ksmbd: add support for durable handles v1/v2Namjae Jeon2024-03-129-21/+506
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Durable file handles allow reopening a file preserved on a short network outage and transparent client reconnection within a timeout. i.e. Durable handles aren't necessarily cleaned up when the opening process terminates. This patch add support for durable handle version 1 and 2. To prove durable handles work on ksmbd, I have tested this patch with the following smbtorture tests: smb2.durable-open.open-oplock smb2.durable-open.open-lease smb2.durable-open.reopen1 smb2.durable-open.reopen1a smb2.durable-open.reopen1a-lease smb2.durable-open.reopen2 smb2.durable-open.reopen2a smb2.durable-open.reopen2-lease smb2.durable-open.reopen2-lease-v2 smb2.durable-open.reopen3 smb2.durable-open.reopen4 smb2.durable-open.delete_on_close2 smb2.durable-open.file-position smb2.durable-open.lease smb2.durable-open.alloc-size smb2.durable-open.read-only smb2.durable-v2-open.create-blob smb2.durable-v2-open.open-oplock smb2.durable-v2-open.open-lease smb2.durable-v2-open.reopen1 smb2.durable-v2-open.reopen1a smb2.durable-v2-open.reopen1a-lease smb2.durable-v2-open.reopen2 smb2.durable-v2-open.reopen2b Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | ksmbd: mark SMB2_SESSION_EXPIRED to session when destroying previous sessionNamjae Jeon2024-03-123-25/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently ksmbd exit connection as well destroying previous session. When testing durable handle feaure, I found that destroy_previous_session() should destroy only session, i.e. the connection should be still alive. This patch mark SMB2_SESSION_EXPIRED on the previous session to be destroyed later and not used anymore. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | ksmbd: retrieve number of blocks using vfs_getattr in set_file_allocation_infoMarios Makassikis2024-03-121-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use vfs_getattr() to retrieve stat information, rather than make assumptions about how a filesystem fills inode structs. Cc: stable@vger.kernel.org Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | ksmbd: replace generic_fillattr with vfs_getattrMarios Makassikis2024-03-123-66/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | generic_fillattr should not be used outside of ->getattr implementations. Use vfs_getattr instead, and adapt functions to return an error code to the caller. Cc: stable@vger.kernel.org Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
* | | | | | Merge tag 'bcachefs-2024-03-19' of https://evilpiepirate.org/git/bcachefsLinus Torvalds2024-03-1926-111/+157
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull bcachefs fixes from Kent Overstreet: "Assorted bugfixes. Most are fixes for simple assertion pops; the most significant fix is for a deadlock in recovery when we have to rewrite large numbers of btree nodes to fix errors. This was incorrectly running out of the same workqueue as the core interior btree update path - we now give it its own single threaded workqueue. This was visible to users as "bch2_btree_update_start(): error: BCH_ERR_journal_reclaim_would_deadlock" - and then recovery hanging" * tag 'bcachefs-2024-03-19' of https://evilpiepirate.org/git/bcachefs: bcachefs: Fix lost wakeup on journal shutdown bcachefs; Fix deadlock in bch2_btree_update_start() bcachefs: ratelimit errors from async_btree_node_rewrite bcachefs: Run check_topology() first bcachefs: Improve bch2_fatal_error() bcachefs: Fix lost transaction restart error bcachefs: Don't corrupt journal keys gap buffer when dropping alloc info bcachefs: fix for building in userspace bcachefs: bch2_snapshot_is_ancestor() now safe to call in early recovery bcachefs: Fix nested transaction restart handling in bch2_bucket_gens_init() bcachefs: Improve sysfs internal/btree_updates bcachefs: Split out btree_node_rewrite_worker bcachefs: Fix locking in bch2_alloc_write_key() bcachefs: Avoid extent entry type assertions in .invalid() bcachefs: Fix spurious -BCH_ERR_transaction_restart_nested bcachefs: Fix check_key_has_snapshot() call bcachefs: Change "accounting overran journal reservation" to a warning
| * | | | | | bcachefs: Fix lost wakeup on journal shutdownKent Overstreet2024-03-181-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to check for journal shutdown first in __journal_res_get() - after the journal is shutdown, j->watermark won't be changing anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * | | | | | bcachefs; Fix deadlock in bch2_btree_update_start()Kent Overstreet2024-03-181-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BCH_TRANS_COMMIT_journal_reclaim with watermark != BCH_WATERMARK_reclaim means nonblocking, and we need the journal_res_get() in btree_update_start() to respect that. In a future refactoring we'll be deleting BCH_TRANS_COMMIT_journal_reclaim and replacing it with an explicit BCH_TRANS_COMMIT_nonblocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * | | | | | bcachefs: ratelimit errors from async_btree_node_rewriteKent Overstreet2024-03-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * | | | | | bcachefs: Run check_topology() firstKent Overstreet2024-03-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | check_topology() doesn't actually require alloc info - and running it first means other passes don't have to catch btree read errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * | | | | | bcachefs: Improve bch2_fatal_error()Kent Overstreet2024-03-1815-35/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * | | | | | bcachefs: Fix lost transaction restart errorKent Overstreet2024-03-181-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * | | | | | bcachefs: Don't corrupt journal keys gap buffer when dropping alloc infoKent Overstreet2024-03-173-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * | | | | | bcachefs: fix for building in userspaceKent Overstreet2024-03-171-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * | | | | | bcachefs: bch2_snapshot_is_ancestor() now safe to call in early recoveryKent Overstreet2024-03-171-14/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | this fixes an assertion pop in bch2_check_snapshot_trees() -> check_snapshot_tree() -> bch2_snapshot_tree_master_subvol() -> bch2_snapshot_is_ancestor() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * | | | | | bcachefs: Fix nested transaction restart handling in bch2_bucket_gens_init()Kent Overstreet2024-03-171-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nested transaction restart handling is typically best avoided; when the inner context handles a transaction restart it invalidates the outer transaction context, so we need to make sure to return a transaction_restart_nested error. This code wasn't doing that, and hit the assertion in for_each_btree_key() that checks for that via trans->restart_count. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * | | | | | bcachefs: Improve sysfs internal/btree_updatesKent Overstreet2024-03-172-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Print out the function that launched the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * | | | | | bcachefs: Split out btree_node_rewrite_workerKent Overstreet2024-03-172-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a deadlock due to using btree_interior_update_worker for non interior updates - async btree node rewrites were blocking, and then blocking other interior updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * | | | | | bcachefs: Fix locking in bch2_alloc_write_key()Kent Overstreet2024-03-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * | | | | | bcachefs: Avoid extent entry type assertions in .invalid()Kent Overstreet2024-03-171-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After keys have passed bkey_ops.key_invalid we should never see invalid extent entry types - but .key_invalid itself needs to cope with them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * | | | | | bcachefs: Fix spurious -BCH_ERR_transaction_restart_nestedKent Overstreet2024-03-171-8/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We only need to return transaction_restart_nested when we're inside a context that's handling transaction restarts. Also, add a missing check_subdir_count() call. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * | | | | | bcachefs: Fix check_key_has_snapshot() callKent Overstreet2024-03-171-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * | | | | | bcachefs: Change "accounting overran journal reservation" to a warningKent Overstreet2024-03-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This doesn't need to be a BUG_ON(); the actual serious "things break" condition is if the whole journal write overruns the available space, and that has a fatal error, not a BUG_ON(). This check indicates we screwed something up, but it should be a warning. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* | | | | | | Merge tag 'dlm-6.9' of ↵Linus Torvalds2024-03-183-39/+81
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: - Fix mistaken variable assignment that caused a refcounting problem - Revert a recent change that began using atomic counters where they were not needed (for lkb wait_count) - Add comments around forced state reset for waiting lock operations during recovery * tag 'dlm-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: add comments about forced waiters reset dlm: revert atomic_t lkb_wait_count dlm: fix user space lkb refcounting