summaryrefslogtreecommitdiffstats
path: root/drivers/block/zram
Commit message (Collapse)AuthorAgeFilesLines
* zram: Convert to use bdev_open_by_dev()Jan Kara2023-10-282-18/+15
| | | | | | | | | | | | | Convert zram to use bdev_open_by_dev() and pass the handle around. CC: Minchan Kim <minchan@kernel.org> CC: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-8-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
* zram: take device and not only bvec offset into accountChristoph Hellwig2023-08-051-12/+20
| | | | | | | | | | | | | | | | | | | | | | | Commit af8b04c63708 ("zram: simplify bvec iteration in __zram_make_request") changed the bio iteration in zram to rely on the implicit capping to page boundaries in bio_for_each_segment. But it failed to care for the fact zram not only care about the page alignment of the bio payload, but also the page alignment into the device. For buffered I/O and swap those are the same, but for direct I/O or kernel internal I/O like XFS log buffer writes they can differ. Fix this by open coding bio_for_each_segment and limiting the bvec len so that it never crosses over a page alignment boundary in the device in addition to the payload boundary already taken care of by bio_iter_iovec. Cc: stable@vger.kernel.org Fixes: af8b04c63708 ("zram: simplify bvec iteration in __zram_make_request") Reported-by: Dusty Mabe <dusty@dustymabe.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org> Link: https://lore.kernel.org/r/20230805055537.147835-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'mm-stable-2023-06-24-19-15' of ↵Linus Torvalds2023-06-281-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ...
| * zram: further limit recompression thresholdSergey Senozhatsky2023-06-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recompression threshold should be below huge-size-class watermark. Any object larger than huge-size-class is a "huge object" and occupies a whole physical page on the zsmalloc side, in other words it's incompressible, as far as zsmalloc is concerned. Link: https://lkml.kernel.org/r/20230614141338.3480029-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Suggested-by: Brian Geffon <bgeffon@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | block: replace fmode_t with a block-specific type for block open flagsChristoph Hellwig2023-06-121-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | block: use the holder as indication for exclusive opensChristoph Hellwig2023-06-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass FMODE_EXCL and just key off the exclusive open off a non-NULL holder. For blkdev_put this requires adding the holder argument, which provides better debug checking that only the holder actually releases the hold, but at the same time allows removing the now superfluous mode argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | block: pass a gendisk to ->openChristoph Hellwig2023-06-121-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | ->open is only called on the whole device. Make that explicit by passing a gendisk instead of the block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | block: introduce holder opsChristoph Hellwig2023-06-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | zram: use __bio_add_page for adding single page to bioJohannes Thumshirn2023-05-311-1/+1
|/ | | | | | | | | | | | | | | | | | The zram writeback code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/cfd141dd7773315879a126f2aa81b7f698bc0e10.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'for-6.4/block-2023-05-06' of git://git.kernel.dk/linuxLinus Torvalds2023-05-061-1/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more block updates from Jens Axboe: - MD pull request via Song: - Improve raid5 sequential IO performance on spinning disks, which fixes a regression since v6.0 (Jan Kara) - Fix bitmap offset types, which fixes an issue introduced in this merge window (Jonathan Derrick) - Cleanup of hweight type used for cgroup writeback (Maxim) - Fix a regression with the "has_submit_bio" changes across partitions (Ming) - Cleanup of QUEUE_FLAG_ADD_RANDOM clearing. We used to set this flag on queues non blk-mq queues, and hence some drivers clear it unconditionally. Since all of these have since been converted to true blk-mq drivers, drop the useless clear as the bit is not set (Chaitanya) - Fix the flags being set in a bio for a flush for drbd (Christoph) - Cleanup and deduplication of the code handling setting block device capacity (Damien) - Fix for ublk handling IO timeouts (Ming) - Fix for a regression in blk-cgroup teardown (Tao) - NBD documentation and code fixes (Eric) - Convert blk-integrity to using device_attributes rather than a second kobject to manage lifetimes (Thomas) * tag 'for-6.4/block-2023-05-06' of git://git.kernel.dk/linux: ublk: add timeout handler drbd: correctly submit flush bio on barrier mailmap: add mailmap entries for Jens Axboe block: Skip destroyed blkg when restart in blkg_destroy_all() writeback: fix call of incorrect macro md: Fix bitmap offset type in sb writer md/raid5: Improve performance for sequential IO docs nbd: userspace NBD now favors github over sourceforge block nbd: use req.cookie instead of req.handle uapi nbd: add cookie alias to handle uapi nbd: improve doc links to userspace spec blk-integrity: register sysfs attributes on struct device blk-integrity: convert to struct device_attribute blk-integrity: use sysfs_emit block/drivers: remove dead clear of random flag block: sync part's ->bd_has_submit_bio with disk's block: Cleanup set_capacity()/bdev_set_nr_sectors()
| * block/drivers: remove dead clear of random flagChaitanya Kulkarni2023-04-251-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | QUEUE_FLAG_ADD_RANDOM is not set before we clear it for "null_blk", "brd", "nbd", "zram", and "bcache" since by default we don't set "QUEUE_FLAG_ADD_RANDOM" to MQ ops. Remove dead clear of QUEUE_FLAG_ADD_RANDOM in above listed drivers. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> #zram Link: https://lore.kernel.org/r/20230424234628.45544-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'mm-stable-2023-04-27-15-30' of ↵Linus Torvalds2023-04-272-246/+137
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ...
| * | zram: return errors from read_from_bdev_syncChristoph Hellwig2023-04-181-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Propagate read errors to the caller instead of dropping them on the floor, and stop returning the somewhat dangerous 1 on success from read_from_bdev*. Link: https://lkml.kernel.org/r/20230411171459.567614-18-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | zram: fix synchronous readsChristoph Hellwig2023-04-181-38/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently nothing waits for the synchronous reads before accessing the data. Switch them to an on-stack bio and submit_bio_wait to make sure the I/O has actually completed when the work item has been flushed. This also removes the call to page_endio that would unlock a page that has never been locked. Drop the partial_io/sync flag, as chaining only makes sense for the asynchronous reads of the entire page. Link: https://lkml.kernel.org/r/20230411171459.567614-17-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | zram: don't return errors from read_from_bdev_asyncChristoph Hellwig2023-04-181-12/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bio_alloc will never return a NULL bio when it is allowed to sleep, and adding a single page to bio with a single vector also can't fail, so switch to the asserting __bio_add_page variant and drop the error returns. Link: https://lkml.kernel.org/r/20230411171459.567614-16-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | zram: pass a page to read_from_bdevChristoph Hellwig2023-04-181-27/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | read_from_bdev always reads a whole page, so pass a page to it instead of the bvec and remove the now pointless zram_bvec_read_from_bdev wrapper. Link: https://lkml.kernel.org/r/20230411171459.567614-15-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | zram: refactor zram_bdev_writeChristoph Hellwig2023-04-181-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split the read/modify/write case into a separate helper. Link: https://lkml.kernel.org/r/20230411171459.567614-14-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | zram: don't pass a bvec to __zram_bvec_writeChristoph Hellwig2023-04-181-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __zram_bvec_write only extracts the page from __zram_bvec_write and always expects a full page of input. Pass the page directly instead of the bvec and rename the function to zram_write_page. Link: https://lkml.kernel.org/r/20230411171459.567614-13-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | zram: refactor zram_bdev_readChristoph Hellwig2023-04-181-20/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split the partial read into a separate helper. Link: https://lkml.kernel.org/r/20230411171459.567614-12-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | zram: directly call zram_read_page in writeback_storeChristoph Hellwig2023-04-181-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | writeback_store always reads a full page, so just call zram_read_page directly and bypass the boune buffer handling. Link: https://lkml.kernel.org/r/20230411171459.567614-11-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | zram: rename __zram_bvec_read to zram_read_pageChristoph Hellwig2023-04-181-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __zram_bvec_read doesn't get passed a bvec, but always read a whole page. Rename it to make the usage more clear. Link: https://lkml.kernel.org/r/20230411171459.567614-10-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | zram: don't use highmem for the bounce buffer in zram_bvec_{read,write}Christoph Hellwig2023-04-181-12/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no point in allocation a highmem page when we instantly need to copy from it. Link: https://lkml.kernel.org/r/20230411171459.567614-9-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | zram: refactor highlevel read and write handlingChristoph Hellwig2023-04-181-28/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of having an outer loop in __zram_make_request and then branch out for reads vs writes for each loop iteration in zram_bvec_rw, split the main handler into separat zram_bio_read and zram_bio_write handlers that also include the functionality formerly in zram_bvec_rw. Link: https://lkml.kernel.org/r/20230411171459.567614-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | zram: return early on error in zram_bvec_rwChristoph Hellwig2023-04-181-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the low-level access fails, don't clear the idle flag or clear the caches, and just return. Link: https://lkml.kernel.org/r/20230411171459.567614-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | zram: move discard handling to zram_submit_bioChristoph Hellwig2023-04-181-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch on the bio operation in zram_submit_bio and only call into __zram_make_request for read and write operations. Link: https://lkml.kernel.org/r/20230411171459.567614-6-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | zram: simplify bvec iteration in __zram_make_requestChristoph Hellwig2023-04-181-31/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bio_for_each_segment synthetize bvecs that never cross page boundaries, so don't duplicate that work in an inner loop. Link: https://lkml.kernel.org/r/20230411171459.567614-5-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | zram: make zram_bio_discard more self-containedChristoph Hellwig2023-04-181-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Derive the index and offset variables inside the function, and complete the bio directly in preparation for cleaning up the I/O path. Link: https://lkml.kernel.org/r/20230411171459.567614-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | zram: remove valid_io_requestChristoph Hellwig2023-04-182-34/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All bios hande to drivers from the block layer are checked against the device size and for logical block alignment already (and have been since long before zram was merged), so don't duplicate those checks. Link: https://lkml.kernel.org/r/20230411171459.567614-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | zram: always compile read_from_bdev_syncChristoph Hellwig2023-04-181-12/+6
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "zram I/O path cleanups and fixups", v3. This series cleans up the zram I/O path, and fixes the handling of synchronous I/O to the underlying device in the writeback_store function or for > 4K PAGE_SIZE systems. The fixes are at the end, as I could not fully reason about them being safe before untangling the callchain. This patch (of 17): read_from_bdev_sync is currently only compiled for non-4k PAGE_SIZE, which means it won't be built with the most common configurations. Replace the ifdef with a check for the PAGE_SIZE in an if instead. The check uses an extra symbol and IS_ENABLED to allow the compiler to eliminate the dead code, leading to the same generated code size: text data bss dec hex filename 16709 1428 12 18149 46e5 drivers/block/zram/zram_drv.o.old 16709 1428 12 18149 46e5 drivers/block/zram/zram_drv.o.new Link: https://lkml.kernel.org/r/20230411171459.567614-1-hch@lst.de Link: https://lkml.kernel.org/r/20230411171459.567614-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | zram: fix up permission for the hot_add sysfs fileGreg Kroah-Hartman2023-04-181-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 75a2d4226b53 ("driver core: class: mark the struct class for sysfs callbacks as constant") changed the attribute to use CLASS_ATTR_RO() which changed the permission from 0400 to 0444. But this atribute is "special" in that reading it modifies the system state, so it MUST be set to 0400 so that only root processes can muck around with it. Fix this all up, AND document this so that I don't change it again in 3-4 years when I stumble across it and wonder why it's an open-coded _ATTR() macro. Reported-by: Denis Efremov <efremov@linux.com> Fixes: 75a2d4226b53 ("driver core: class: mark the struct class for sysfs callbacks as constant") Link: https://lore.kernel.org/r/2023041810-angelic-conical-52d8@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | driver core: class: mark the struct class for sysfs callbacks as constantGreg Kroah-Hartman2023-03-291-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct class should never be modified in a sysfs callback as there is nothing in the structure to modify, and frankly, the structure is almost never used in a sysfs callback, so mark it as constant to allow struct class to be moved to read-only memory. While we are touching all class sysfs callbacks also mark the attribute as constant as it can not be modified. The bonding code still uses this structure so it can not be removed from the function callbacks. Cc: "David S. Miller" <davem@davemloft.net> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Bartosz Golaszewski <brgl@bgdev.pl> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Russ Weight <russell.h.weight@intel.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steve French <sfrench@samba.org> Cc: Vignesh Raghavendra <vigneshr@ti.com> Cc: linux-cifs@vger.kernel.org Cc: linux-gpio@vger.kernel.org Cc: linux-mtd@lists.infradead.org Cc: linux-rdma@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: netdev@vger.kernel.org Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20230325084537.3622280-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | drivers: remove struct module * setting from struct classGreg Kroah-Hartman2023-03-171-1/+0
|/ | | | | | | | | | | | | There is no need to manually set the owner of a struct class, as the registering function does it automatically, so remove all of the explicit settings from various drivers that did so as it is unneeded. This allows us to remove this pointer entirely from this structure going forward. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230313181843.1207845-2-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds2023-02-231-64/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
| * block: remove ->rw_pageChristoph Hellwig2023-02-021-60/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ->rw_page method is a special purpose bypass of the usual bio handling path that is limited to single-page reads and writes and synchronous which causes a lot of extra code in the drivers, callers and the block layer. The only remaining user is the MM swap code. Switch that swap code to simply submit a single-vec on-stack bio an synchronously wait on it based on a newly added QUEUE_FLAG_SYNCHRONOUS flag set by the drivers that currently implement ->rw_page instead. While this touches one extra cache line and executes extra code, it simplifies the block layer and drivers and ensures that all feastures are properly supported by all drivers, e.g. right now ->rw_page bypassed cgroup writeback entirely. [akpm@linux-foundation.org: fix comment typo, per Dan] Link: https://lkml.kernel.org/r/20230125133436.447864-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * zram: correctly handle all next_arg() casesSergey Senozhatsky2023-01-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | When supplied buffer does not have assignment sign next_arg() sets `val` pointer to NULL, so we cannot dereference it. Add a NULL pointer test to handle `param` case, in addition to `*val` test, which handles cases when param has no value assigned to it: `param=`. Link: https://lkml.kernel.org/r/20230103030119.1496358-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * zram: fix typos in commentsJeongHyeon Lee2023-01-181-2/+2
| | | | | | | | | | | | | | | | | | | | - The double `range` is duplicated in comment, remove one. - change `syfs` to `sysfs` Link: https://lkml.kernel.org/r/20221223040331.4194-1-jhs2.lee@samsung.com Signed-off-by: JeongHyeon Lee <jhs2.lee@samsung.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | zram: use bvec_set_page to initialize bvecsChristoph Hellwig2023-02-031-11/+4
|/ | | | | | | | | | Use the bvec_set_page helper to initialize bvecs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230203150634.3199647-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* zram: remove unused stats fieldsSergey Senozhatsky2022-11-302-4/+0
| | | | | | | | | | | | | | | We don't show num_reads and num_writes since we removed corresponding sysfs nodes in 2017. Block layer stats are exposed via /sys/block/zramX/stat file. However, we still increment those atomic vars and store them in zram stats. Remove leftovers. Link: https://lkml.kernel.org/r/20221117141326.1105181-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* zram: add incompressible flag to read_block_state()Sergey Senozhatsky2022-11-301-2/+4
| | | | | | | | | | | | | | | | Add a new flag to zram block state that shows if the page is incompressible: that none of the algorithm (including secondary ones) could compress it. Link: https://lkml.kernel.org/r/20221109115047.2921851-14-senozhatsky@chromium.org Suggested-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* zram: add incompressible writebackSergey Senozhatsky2022-11-301-6/+12
| | | | | | | | | | | | | | | Add support for incompressible pages writeback: echo incompressible > /sys/block/zramX/writeback Link: https://lkml.kernel.org/r/20221109115047.2921851-13-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* zram: add algo parameter support to zram_recompress()Sergey Senozhatsky2022-11-302-9/+46
| | | | | | | | | | | | | | | | | | | | | | | | | Recompression iterates through all the registered secondary compression algorithms in order of their priorities so that we have higher chances of finding the algorithm that compresses a particular page. This, however, may not always be best approach and sometimes we may want to limit recompression to only one particular algorithm. For instance, when a higher priority algorithm uses too much power and device has a relatively low battery level we may want to limit recompression to use only a lower priority algorithm, which uses less power. Introduce algo= parameter support to recompression sysfs knob so that user-sapce can request recompression with particular algorithm only: echo "type=idle algo=zstd" > /sys/block/zramX/recompress Link: https://lkml.kernel.org/r/20221109115047.2921851-11-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* zram: remove redundant checks from zram_recompress()Sergey Senozhatsky2022-11-301-6/+2
| | | | | | | | | | | | | | Size class index comparison is powerful enough so we can remove object size comparisons. Link: https://lkml.kernel.org/r/20221109115047.2921851-10-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* zram: add size class equals check into recompressionAlexey Romanov2022-11-301-1/+10
| | | | | | | | | | | | | | | | | It makes no sense for us to recompress the object if it will be in the same size class. We anyway don't get any memory gain. But, at the same time, we get a CPU time overhead when inserting this object into zspage and decompressing it afterwards. [senozhatsky: rebased and fixed conflicts] Link: https://lkml.kernel.org/r/20221109115047.2921851-9-senozhatsky@chromium.org Signed-off-by: Alexey Romanov <avromanov@sberdevices.ru> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* zram: use IS_ERR_VALUE() to check for zs_malloc() errorsSergey Senozhatsky2022-11-301-3/+3
| | | | | | | | | | | | | | | Avoid typecasts that are needed for IS_ERR() and use IS_ERR_VALUE() instead. Link: https://lkml.kernel.org/r/20221109115047.2921851-8-senozhatsky@chromium.org Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* zram: clarify writeback_store() commentSergey Senozhatsky2022-11-301-2/+6
| | | | | | | | | | | | | | Re-phrase writeback BIO error comment. Link: https://lkml.kernel.org/r/20221109115047.2921851-7-senozhatsky@chromium.org Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* zram: add recompress flag to read_block_state()Sergey Senozhatsky2022-11-301-2/+3
| | | | | | | | | | | | | | Add a new flag to zram block state that shows if the page was recompressed (using alternative compression algorithm). Link: https://lkml.kernel.org/r/20221109115047.2921851-6-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* zram: introduce recompress sysfs knobSergey Senozhatsky2022-11-303-3/+277
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow zram to recompress (using secondary compression streams) pages. Re-compression algorithms (we support up to 3 at this stage) are selected via recomp_algorithm: echo "algo=zstd priority=1" > /sys/block/zramX/recomp_algorithm Please read documentation for more details. We support several recompression modes: 1) IDLE pages recompression is activated by `idle` mode echo "type=idle" > /sys/block/zram0/recompress 2) Since there may be many idle pages user-space may pass a size threshold value (in bytes) and we will recompress pages only of equal or greater size: echo "threshold=888" > /sys/block/zram0/recompress 3) HUGE pages recompression is activated by `huge` mode echo "type=huge" > /sys/block/zram0/recompress 4) HUGE_IDLE pages recompression is activated by `huge_idle` mode echo "type=huge_idle" > /sys/block/zram0/recompress [senozhatsky@chromium.org: we should always zero out err variable in recompress loop[ Link: https://lkml.kernel.org/r/20221110143423.3250790-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20221109115047.2921851-5-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* zram: factor out WB and non-WB zram read functionsSergey Senozhatsky2022-11-301-23/+49
| | | | | | | | | | | | | We will use non-WB variant in ZRAM page recompression path. Link: https://lkml.kernel.org/r/20221109115047.2921851-4-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* zram: add recompression algorithm sysfs knobSergey Senozhatsky2022-11-301-19/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce recomp_algorithm sysfs knob that controls secondary algorithm selection used for recompression. We will support up to 3 secondary compression algorithms which are sorted in order of their priority. To select an algorithm user has to provide its name and priority: echo "algo=zstd priority=1" > /sys/block/zramX/recomp_algorithm echo "algo=deflate priority=2" > /sys/block/zramX/recomp_algorithm During recompression zram iterates through the list of registered secondary algorithms in order of their priorities. We also have a short version for cases when there is only one secondary compression algorithm: echo "algo=zstd" > /sys/block/zramX/recomp_algorithm This will register zstd as the secondary algorithm with priority 1. Link: https://lkml.kernel.org/r/20221109115047.2921851-3-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* zram: preparation for multi-zcomp supportSergey Senozhatsky2022-11-304-32/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "zram: Support multiple compression streams", v5. This series adds support for multiple compression streams. The main idea is that different compression algorithms have different characteristics and zram may benefit when it uses a combination of algorithms: a default algorithm that is faster but have lower compression rate and a secondary algorithm that can use higher compression rate at a price of slower compression/decompression. There are several use-case for this functionality: - huge pages re-compression: zstd or deflate can successfully compress huge pages (~50% of huge pages on my synthetic ChromeOS tests), IOW pages that lzo was not able to compress. - idle pages re-compression: idle/cold pages sit in the memory and we may reduce zsmalloc memory usage if we recompress those idle pages. Userspace has a number of ways to control the behavior and impact of zram recompression: what type of pages should be recompressed, size watermarks, etc. Please refer to documentation patch. This patch (of 13): The patch turns compression streams and compressor algorithm name struct zram members into arrays, so that we can have multiple compression streams support (in the next patches). The patch uses a rather explicit API for compressor selection: - Get primary (default) compression stream zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]) - Get secondary compression stream zcomp_stream_get(zram->comps[ZRAM_SECONDARY_COMP]) We use similar API for compression streams put(). At this point we always have just one compression stream, since CONFIG_ZRAM_MULTI_COMP is not yet defined. Link: https://lkml.kernel.org/r/20221109115047.2921851-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20221109115047.2921851-2-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Alexey Romanov <avromanov@sberdevices.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>