From d0164adc89f6bb374d304ffcc375c6d2652fe67d Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Fri, 6 Nov 2015 16:28:21 -0800 Subject: mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd __GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in interrupts. They are expected to be high priority and have access one of two watermarks lower than "min" which can be referred to as the "atomic reserve". __GFP_HIGH users get access to the first lower watermark and can be called the "high priority reserve". Over time, callers had a requirement to not block when fallback options were available. Some have abused __GFP_WAIT leading to a situation where an optimisitic allocation with a fallback option can access atomic reserves. This patch uses __GFP_ATOMIC to identify callers that are truely atomic, cannot sleep and have no alternative. High priority users continue to use __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers that want to wake kswapd for background reclaim. __GFP_WAIT is redefined as a caller that is willing to enter direct reclaim and wake kswapd for background reclaim. This patch then converts a number of sites o __GFP_ATOMIC is used by callers that are high priority and have memory pools for those requests. GFP_ATOMIC uses this flag. o Callers that have a limited mempool to guarantee forward progress clear __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall into this category where kswapd will still be woken but atomic reserves are not used as there is a one-entry mempool to guarantee progress. o Callers that are checking if they are non-blocking should use the helper gfpflags_allow_blocking() where possible. This is because checking for __GFP_WAIT as was done historically now can trigger false positives. Some exceptions like dm-crypt.c exist where the code intent is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to flag manipulations. o Callers that built their own GFP flags instead of starting with GFP_KERNEL and friends now also need to specify __GFP_KSWAPD_RECLAIM. The first key hazard to watch out for is callers that removed __GFP_WAIT and was depending on access to atomic reserves for inconspicuous reasons. In some cases it may be appropriate for them to use __GFP_HIGH. The second key hazard is callers that assembled their own combination of GFP flags instead of starting with something like GFP_KERNEL. They may now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless if it's missed in most cases as other activity will wake kswapd. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Acked-by: Michal Hocko Acked-by: Johannes Weiner Cc: Christoph Lameter Cc: David Rientjes Cc: Vitaly Wool Cc: Rik van Riel Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/block/drbd/drbd_receiver.c | 3 ++- drivers/block/osdblk.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) (limited to 'drivers/block') diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c index c097909c589c..b4b5680ac6ad 100644 --- a/drivers/block/drbd/drbd_receiver.c +++ b/drivers/block/drbd/drbd_receiver.c @@ -357,7 +357,8 @@ drbd_alloc_peer_req(struct drbd_peer_device *peer_device, u64 id, sector_t secto } if (has_payload && data_size) { - page = drbd_alloc_pages(peer_device, nr_pages, (gfp_mask & __GFP_WAIT)); + page = drbd_alloc_pages(peer_device, nr_pages, + gfpflags_allow_blocking(gfp_mask)); if (!page) goto fail; } diff --git a/drivers/block/osdblk.c b/drivers/block/osdblk.c index e22942596207..1b709a4e3b5e 100644 --- a/drivers/block/osdblk.c +++ b/drivers/block/osdblk.c @@ -271,7 +271,7 @@ static struct bio *bio_chain_clone(struct bio *old_chain, gfp_t gfpmask) goto err_out; tmp->bi_bdev = NULL; - gfpmask &= ~__GFP_WAIT; + gfpmask &= ~__GFP_DIRECT_RECLAIM; tmp->bi_next = NULL; if (!new_chain) -- cgit v1.2.3 From 71baba4b92dc1fa1bc461742c6ab1942ec6034e9 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Fri, 6 Nov 2015 16:28:28 -0800 Subject: mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIM __GFP_WAIT was used to signal that the caller was in atomic context and could not sleep. Now it is possible to distinguish between true atomic context and callers that are not willing to sleep. The latter should clear __GFP_DIRECT_RECLAIM so kswapd will still wake. As clearing __GFP_WAIT behaves differently, there is a risk that people will clear the wrong flags. This patch renames __GFP_WAIT to __GFP_RECLAIM to clearly indicate what it does -- setting it allows all reclaim activity, clearing them prevents it. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mel Gorman Acked-by: Michal Hocko Acked-by: Vlastimil Babka Acked-by: Johannes Weiner Cc: Christoph Lameter Acked-by: David Rientjes Cc: Vitaly Wool Cc: Rik van Riel Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/block/drbd/drbd_bitmap.c | 2 +- drivers/block/mtip32xx/mtip32xx.c | 2 +- drivers/block/paride/pd.c | 2 +- drivers/block/pktcdvd.c | 4 ++-- 4 files changed, 5 insertions(+), 5 deletions(-) (limited to 'drivers/block') diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c index e5e0f19ceda0..3dc53a16ed3a 100644 --- a/drivers/block/drbd/drbd_bitmap.c +++ b/drivers/block/drbd/drbd_bitmap.c @@ -1007,7 +1007,7 @@ static void bm_page_io_async(struct drbd_bm_aio_ctx *ctx, int page_nr) __must_ho bm_set_page_unchanged(b->bm_pages[page_nr]); if (ctx->flags & BM_AIO_COPY_PAGES) { - page = mempool_alloc(drbd_md_io_page_pool, __GFP_HIGHMEM|__GFP_WAIT); + page = mempool_alloc(drbd_md_io_page_pool, __GFP_HIGHMEM|__GFP_RECLAIM); copy_highpage(page, b->bm_pages[page_nr]); bm_store_page_idx(page, page_nr); } else diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index f504232c1ee7..a28a562f7b7f 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -173,7 +173,7 @@ static struct mtip_cmd *mtip_get_int_command(struct driver_data *dd) { struct request *rq; - rq = blk_mq_alloc_request(dd->queue, 0, __GFP_WAIT, true); + rq = blk_mq_alloc_request(dd->queue, 0, __GFP_RECLAIM, true); return blk_mq_rq_to_pdu(rq); } diff --git a/drivers/block/paride/pd.c b/drivers/block/paride/pd.c index b9242d78283d..562b5a4ca7b7 100644 --- a/drivers/block/paride/pd.c +++ b/drivers/block/paride/pd.c @@ -723,7 +723,7 @@ static int pd_special_command(struct pd_unit *disk, struct request *rq; int err = 0; - rq = blk_get_request(disk->gd->queue, READ, __GFP_WAIT); + rq = blk_get_request(disk->gd->queue, READ, __GFP_RECLAIM); if (IS_ERR(rq)) return PTR_ERR(rq); diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c index 7be2375db7f2..5959c2981cc7 100644 --- a/drivers/block/pktcdvd.c +++ b/drivers/block/pktcdvd.c @@ -704,14 +704,14 @@ static int pkt_generic_packet(struct pktcdvd_device *pd, struct packet_command * int ret = 0; rq = blk_get_request(q, (cgc->data_direction == CGC_DATA_WRITE) ? - WRITE : READ, __GFP_WAIT); + WRITE : READ, __GFP_RECLAIM); if (IS_ERR(rq)) return PTR_ERR(rq); blk_rq_set_block_pc(rq); if (cgc->buflen) { ret = blk_rq_map_kern(q, rq, cgc->buffer, cgc->buflen, - __GFP_WAIT); + __GFP_RECLAIM); if (ret) goto out; } -- cgit v1.2.3 From 1d5b43bfb60f7ba2b51792978a6b0781d4ebba93 Mon Sep 17 00:00:00 2001 From: Luis Henriques Date: Fri, 6 Nov 2015 16:29:01 -0800 Subject: zram: introduce comp algorithm fallback functionality When the user supplies an unsupported compression algorithm, keep the previously selected one (knowingly supported) or the default one (if the compression algorithm hasn't been changed yet). Note that previously this operation (i.e. setting an invalid algorithm) would result in no algorithm being selected, which means that this represents a small change in the default behaviour. Minchan said: For initializing zram, we need to set up 3 optional parameters in advance. 1. the number of compression streams 2. memory limitation 3. compression algorithm Although user pass completely wrong value to set up for 1 and 2 parameters, it's okay because they have default value so zram will be initialized with the default value (of course, when user passes a wrong value via *echo*, sysfs returns -EINVAL so the user can notice it). But 3 is not consistent with other optional parameters. IOW, if the user passes a wrong value to set up 3 parameter, zram's initialization would fail unlike other optional parameters. So this patch makes them consistent. Signed-off-by: Luis Henriques Acked-by: Minchan Kim Acked-by: Sergey Senozhatsky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/block/zram/zram_drv.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'drivers/block') diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 9fa15bb9d118..c93aeb876125 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -365,6 +365,9 @@ static ssize_t comp_algorithm_store(struct device *dev, struct zram *zram = dev_to_zram(dev); size_t sz; + if (!zcomp_available_algorithm(buf)) + return -EINVAL; + down_write(&zram->init_lock); if (init_done(zram)) { up_write(&zram->init_lock); @@ -378,9 +381,6 @@ static ssize_t comp_algorithm_store(struct device *dev, if (sz > 0 && zram->compressor[sz - 1] == '\n') zram->compressor[sz - 1] = 0x00; - if (!zcomp_available_algorithm(zram->compressor)) - len = -EINVAL; - up_write(&zram->init_lock); return len; } -- cgit v1.2.3 From 1237275580f3e2b998d355b2ff7f84c6b423aa11 Mon Sep 17 00:00:00 2001 From: Sergey SENOZHATSKY Date: Fri, 6 Nov 2015 16:29:04 -0800 Subject: zram: keep the exact overcommited value in mem_used_max `mem_used_max' is designed to store the max amount of memory zram consumed to store the data. However, it does not represent the actual 'overcommited' (max) value. The existing code goes to -ENOMEM overcommited case before it updates `->stats.max_used_pages', which hides the reason we went to -ENOMEM in the first place -- we actually used more memory than `->limit_pages': alloced_pages = zs_get_total_pages(meta->mem_pool); if (zram->limit_pages && alloced_pages > zram->limit_pages) { zs_free(meta->mem_pool, handle); ret = -ENOMEM; goto out; } update_used_max(zram, alloced_pages); Which is misleading. User will see -ENOMEM, check `->limit_pages', check `->stats.max_used_pages', which will keep the value BEFORE zram passed `->limit_pages', and see: `->stats.max_used_pages' < `->limit_pages' Move update_used_max() before we do `->limit_pages' check, so that user will see: `->stats.max_used_pages' > `->limit_pages' should the overcommit and -ENOMEM happen. Signed-off-by: Sergey Senozhatsky Acked-by: Minchan Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/block/zram/zram_drv.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'drivers/block') diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index c93aeb876125..3e8d8ffda54d 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -726,14 +726,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, } alloced_pages = zs_get_total_pages(meta->mem_pool); + update_used_max(zram, alloced_pages); + if (zram->limit_pages && alloced_pages > zram->limit_pages) { zs_free(meta->mem_pool, handle); ret = -ENOMEM; goto out; } - update_used_max(zram, alloced_pages); - cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { -- cgit v1.2.3 From 1c53e0d2737f3ce4afa27d5703494eb14610ec26 Mon Sep 17 00:00:00 2001 From: Geliang Tang Date: Fri, 6 Nov 2015 16:29:06 -0800 Subject: zram: make is_partial_io/valid_io_request/page_zero_filled return boolean Make is_partial_io()/valid_io_request()/page_zero_filled() return boolean, since each function only uses either one or zero as its return value. Signed-off-by: Geliang Tang Reviewed-by: Sergey Senozhatsky Cc: Minchan Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/block/zram/zram_drv.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) (limited to 'drivers/block') diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 3e8d8ffda54d..81a557c33a1f 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -106,7 +106,7 @@ static void zram_set_obj_size(struct zram_meta *meta, meta->table[index].value = (flags << ZRAM_FLAG_SHIFT) | size; } -static inline int is_partial_io(struct bio_vec *bvec) +static inline bool is_partial_io(struct bio_vec *bvec) { return bvec->bv_len != PAGE_SIZE; } @@ -114,25 +114,25 @@ static inline int is_partial_io(struct bio_vec *bvec) /* * Check if request is within bounds and aligned on zram logical blocks. */ -static inline int valid_io_request(struct zram *zram, +static inline bool valid_io_request(struct zram *zram, sector_t start, unsigned int size) { u64 end, bound; /* unaligned request */ if (unlikely(start & (ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1))) - return 0; + return false; if (unlikely(size & (ZRAM_LOGICAL_BLOCK_SIZE - 1))) - return 0; + return false; end = start + (size >> SECTOR_SHIFT); bound = zram->disksize >> SECTOR_SHIFT; /* out of range range */ if (unlikely(start >= bound || end > bound || start > end)) - return 0; + return false; /* I/O request is valid */ - return 1; + return true; } static void update_position(u32 *index, int *offset, struct bio_vec *bvec) @@ -157,7 +157,7 @@ static inline void update_used_max(struct zram *zram, } while (old_max != cur_max); } -static int page_zero_filled(void *ptr) +static bool page_zero_filled(void *ptr) { unsigned int pos; unsigned long *page; @@ -166,10 +166,10 @@ static int page_zero_filled(void *ptr) for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) { if (page[pos]) - return 0; + return false; } - return 1; + return true; } static void handle_zero_page(struct bio_vec *bvec) -- cgit v1.2.3 From be0e6f290f78b84a3b21b8c8c46819c4514fe632 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Fri, 6 Nov 2015 16:32:22 -0800 Subject: signal: turn dequeue_signal_lock() into kernel_dequeue_signal() 1. Rename dequeue_signal_lock() to kernel_dequeue_signal(). This matches another "for kthreads only" kernel_sigaction() helper. 2. Remove the "tsk" and "mask" arguments, they are always current and current->blocked. And it is simply wrong if tsk != current. 3. We could also remove the 3rd "siginfo_t *info" arg but it looks potentially useful. However we can simplify the callers if we change kernel_dequeue_signal() to accept info => NULL. 4. Remove _irqsave, it is never called from atomic context. Signed-off-by: Oleg Nesterov Reviewed-by: Tejun Heo Cc: David Woodhouse Cc: Felipe Balbi Cc: Markus Pargmann Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/block/nbd.c | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-) (limited to 'drivers/block') diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 1b87623381e2..93b3f99b6865 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -444,9 +444,7 @@ static int nbd_thread_recv(struct nbd_device *nbd) spin_unlock_irqrestore(&nbd->tasks_lock, flags); if (signal_pending(current)) { - siginfo_t info; - - ret = dequeue_signal_lock(current, ¤t->blocked, &info); + ret = kernel_dequeue_signal(NULL); dev_warn(nbd_to_dev(nbd), "pid %d, %s, got signal %d\n", task_pid_nr(current), current->comm, ret); mutex_lock(&nbd->tx_lock); @@ -560,11 +558,8 @@ static int nbd_thread_send(void *data) !list_empty(&nbd->waiting_queue)); if (signal_pending(current)) { - siginfo_t info; - int ret; + int ret = kernel_dequeue_signal(NULL); - ret = dequeue_signal_lock(current, ¤t->blocked, - &info); dev_warn(nbd_to_dev(nbd), "pid %d, %s, got signal %d\n", task_pid_nr(current), current->comm, ret); mutex_lock(&nbd->tx_lock); @@ -592,10 +587,8 @@ static int nbd_thread_send(void *data) spin_unlock_irqrestore(&nbd->tasks_lock, flags); /* Clear maybe pending signals */ - if (signal_pending(current)) { - siginfo_t info; - dequeue_signal_lock(current, ¤t->blocked, &info); - } + if (signal_pending(current)) + kernel_dequeue_signal(NULL); return 0; } -- cgit v1.2.3