summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | btrfs: fix space cache corruption and potential double allocationsOmar Sandoval2022-08-234-60/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When testing space_cache v2 on a large set of machines, we encountered a few symptoms: 1. "unable to add free space :-17" (EEXIST) errors. 2. Missing free space info items, sometimes caught with a "missing free space info for X" error. 3. Double-accounted space: ranges that were allocated in the extent tree and also marked as free in the free space tree, ranges that were marked as allocated twice in the extent tree, or ranges that were marked as free twice in the free space tree. If the latter made it onto disk, the next reboot would hit the BUG_ON() in add_new_free_space(). 4. On some hosts with no on-disk corruption or error messages, the in-memory space cache (dumped with drgn) disagreed with the free space tree. All of these symptoms have the same underlying cause: a race between caching the free space for a block group and returning free space to the in-memory space cache for pinned extents causes us to double-add a free range to the space cache. This race exists when free space is cached from the free space tree (space_cache=v2) or the extent tree (nospace_cache, or space_cache=v1 if the cache needs to be regenerated). struct btrfs_block_group::last_byte_to_unpin and struct btrfs_block_group::progress are supposed to protect against this race, but commit d0c2f4fa555e ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit") subtly broke this by allowing multiple transactions to be unpinning extents at the same time. Specifically, the race is as follows: 1. An extent is deleted from an uncached block group in transaction A. 2. btrfs_commit_transaction() is called for transaction A. 3. btrfs_run_delayed_refs() -> __btrfs_free_extent() runs the delayed ref for the deleted extent. 4. __btrfs_free_extent() -> do_free_extent_accounting() -> add_to_free_space_tree() adds the deleted extent back to the free space tree. 5. do_free_extent_accounting() -> btrfs_update_block_group() -> btrfs_cache_block_group() queues up the block group to get cached. block_group->progress is set to block_group->start. 6. btrfs_commit_transaction() for transaction A calls switch_commit_roots(). It sets block_group->last_byte_to_unpin to block_group->progress, which is block_group->start because the block group hasn't been cached yet. 7. The caching thread gets to our block group. Since the commit roots were already switched, load_free_space_tree() sees the deleted extent as free and adds it to the space cache. It finishes caching and sets block_group->progress to U64_MAX. 8. btrfs_commit_transaction() advances transaction A to TRANS_STATE_SUPER_COMMITTED. 9. fsync calls btrfs_commit_transaction() for transaction B. Since transaction A is already in TRANS_STATE_SUPER_COMMITTED and the commit is for fsync, it advances. 10. btrfs_commit_transaction() for transaction B calls switch_commit_roots(). This time, the block group has already been cached, so it sets block_group->last_byte_to_unpin to U64_MAX. 11. btrfs_commit_transaction() for transaction A calls btrfs_finish_extent_commit(), which calls unpin_extent_range() for the deleted extent. It sees last_byte_to_unpin set to U64_MAX (by transaction B!), so it adds the deleted extent to the space cache again! This explains all of our symptoms above: * If the sequence of events is exactly as described above, when the free space is re-added in step 11, it will fail with EEXIST. * If another thread reallocates the deleted extent in between steps 7 and 11, then step 11 will silently re-add that space to the space cache as free even though it is actually allocated. Then, if that space is allocated *again*, the free space tree will be corrupted (namely, the wrong item will be deleted). * If we don't catch this free space tree corruption, it will continue to get worse as extents are deleted and reallocated. The v1 space_cache is synchronously loaded when an extent is deleted (btrfs_update_block_group() with alloc=0 calls btrfs_cache_block_group() with load_cache_only=1), so it is not normally affected by this bug. However, as noted above, if we fail to load the space cache, we will fall back to caching from the extent tree and may hit this bug. The easiest fix for this race is to also make caching from the free space tree or extent tree synchronous. Josef tested this and found no performance regressions. A few extra changes fall out of this change. Namely, this fix does the following, with step 2 being the crucial fix: 1. Factor btrfs_caching_ctl_wait_done() out of btrfs_wait_block_group_cache_done() to allow waiting on a caching_ctl that we already hold a reference to. 2. Change the call in btrfs_cache_block_group() of btrfs_wait_space_cache_v1_finished() to btrfs_caching_ctl_wait_done(), which makes us wait regardless of the space_cache option. 3. Delete the now unused btrfs_wait_space_cache_v1_finished() and space_cache_v1_done(). 4. Change btrfs_cache_block_group()'s `int load_cache_only` parameter to `bool wait` to more accurately describe its new meaning. 5. Change a few callers which had a separate call to btrfs_wait_block_group_cache_done() to use wait = true instead. 6. Make btrfs_wait_block_group_cache_done() static now that it's not used outside of block-group.c anymore. Fixes: d0c2f4fa555e ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit") CC: stable@vger.kernel.org # 5.12+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | | btrfs: don't allow large NOWAIT direct readsJosef Bacik2022-08-221-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dylan and Jens reported a problem where they had an io_uring test that was returning short reads, and bisected it to ee5b46a353af ("btrfs: increase direct io read size limit to 256 sectors"). The root cause is their test was doing larger reads via io_uring with NOWAIT and async. This was triggering a page fault during the direct read, however the first page was able to work just fine and thus we submitted a 4k read for a larger iocb. Btrfs allows for partial IO's in this case specifically because we don't allow page faults, and thus we'll attempt to do any io that we can, submit what we could, come back and fault in the rest of the range and try to do the remaining IO. However for !is_sync_kiocb() we'll call ->ki_complete() as soon as the partial dio is done, which is incorrect. In the sync case we can exit the iomap code, submit more io's, and return with the amount of IO we were able to complete successfully. We were always doing short reads in this case, but for NOWAIT we were getting saved by the fact that we were limiting direct reads to sectorsize, and if we were larger than that we would return EAGAIN. Fix the regression by simply returning EAGAIN in the NOWAIT case with larger reads, that way io_uring can retry and get the larger IO and have the fault logic handle everything properly. This still leaves the AIO short read case, but that existed before this change. The way to properly fix this would be to handle partial iocb completions, but that's a lot of work, for now deal with the regression in the most straightforward way possible. Reported-by: Dylan Yudaken <dylany@fb.com> Fixes: ee5b46a353af ("btrfs: increase direct io read size limit to 256 sectors") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | | btrfs: don't merge pages into bio if their page offset is not contiguousQu Wenruo2022-08-221-4/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [BUG] Zygo reported on latest development branch, he could hit ASSERT()/BUG_ON() caused crash when doing RAID5 recovery (intentionally corrupt one disk, and let btrfs to recover the data during read/scrub). And The following minimal reproducer can cause extent state leakage at rmmod time: mkfs.btrfs -f -d raid5 -m raid5 $dev1 $dev2 $dev3 -b 1G > /dev/null mount $dev1 $mnt fsstress -w -d $mnt -n 25 -s 1660807876 sync fssum -A -f -w /tmp/fssum.saved $mnt umount $mnt # Wipe the dev1 but keeps its super block xfs_io -c "pwrite -S 0x0 1m 1023m" $dev1 mount $dev1 $mnt fssum -r /tmp/fssum.saved $mnt > /dev/null umount $mnt rmmod btrfs This will lead to the following extent states leakage: BTRFS: state leak: start 499712 end 503807 state 5 in tree 1 refs 1 BTRFS: state leak: start 495616 end 499711 state 5 in tree 1 refs 1 BTRFS: state leak: start 491520 end 495615 state 5 in tree 1 refs 1 BTRFS: state leak: start 487424 end 491519 state 5 in tree 1 refs 1 BTRFS: state leak: start 483328 end 487423 state 5 in tree 1 refs 1 BTRFS: state leak: start 479232 end 483327 state 5 in tree 1 refs 1 BTRFS: state leak: start 475136 end 479231 state 5 in tree 1 refs 1 BTRFS: state leak: start 471040 end 475135 state 5 in tree 1 refs 1 [CAUSE] Since commit 7aa51232e204 ("btrfs: pass a btrfs_bio to btrfs_repair_one_sector"), we always use btrfs_bio->file_offset to determine the file offset of a page. But that usage assume that, one bio has all its page having a continuous page offsets. Unfortunately that's not true, btrfs only requires the logical bytenr contiguous when assembling its bios. From above script, we have one bio looks like this: fssum-27671 submit_one_bio: bio logical=217739264 len=36864 fssum-27671 submit_one_bio: r/i=5/261 page_offset=466944 <<< fssum-27671 submit_one_bio: r/i=5/261 page_offset=724992 <<< fssum-27671 submit_one_bio: r/i=5/261 page_offset=729088 fssum-27671 submit_one_bio: r/i=5/261 page_offset=733184 fssum-27671 submit_one_bio: r/i=5/261 page_offset=737280 fssum-27671 submit_one_bio: r/i=5/261 page_offset=741376 fssum-27671 submit_one_bio: r/i=5/261 page_offset=745472 fssum-27671 submit_one_bio: r/i=5/261 page_offset=749568 fssum-27671 submit_one_bio: r/i=5/261 page_offset=753664 Note that the 1st and the 2nd page has non-contiguous page offsets. This means, at repair time, we will have completely wrong file offset passed in: kworker/u32:2-19927 btrfs_repair_one_sector: r/i=5/261 page_off=729088 file_off=475136 bio_offset=8192 Since the file offset is incorrect, we latter incorrectly set the extent states, and no way to really release them. Thus later it causes the leakage. In fact, this can be even worse, since the file offset is incorrect, we can hit cases like the incorrect file offset belongs to a HOLE, and later cause btrfs_num_copies() to trigger error, finally hit BUG_ON()/ASSERT() later. [FIX] Add an extra condition in btrfs_bio_add_page() for uncompressed IO. Now we will have more strict requirement for bio pages: - They should all have the same mapping (the mapping check is already implied by the call chain) - Their logical bytenr should be adjacent This is the same as the old condition. - Their page_offset() (file offset) should be adjacent This is the new check. This would result a slightly increased amount of bios from btrfs (needs holes and inside the same stripe boundary to trigger). But this would greatly reduce the confusion, as it's pretty common to assume a btrfs bio would only contain continuous page cache. Later we may need extra cleanups, as we no longer needs to handle gaps between page offsets in endio functions. Currently this should be the minimal patch to fix commit 7aa51232e204 ("btrfs: pass a btrfs_bio to btrfs_repair_one_sector"). Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Fixes: 7aa51232e204 ("btrfs: pass a btrfs_bio to btrfs_repair_one_sector") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | | btrfs: update generation of hole file extent item when merging holesFilipe Manana2022-08-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When punching a hole into a file range that is adjacent with a hole and we are not using the no-holes feature, we expand the range of the adjacent file extent item that represents a hole, to save metadata space. However we don't update the generation of hole file extent item, which means a full fsync will not log that file extent item if the fsync happens in a later transaction (since commit 7f30c07288bb9e ("btrfs: stop copying old file extents when doing a full fsync")). For example, if we do this: $ mkfs.btrfs -f -O ^no-holes /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0xab 2M 2M" /mnt/foobar $ sync We end up with 2 file extent items in our file: 1) One that represents the hole for the file range [0, 2M), with a generation of 7; 2) Another one that represents an extent covering the range [2M, 4M). After that if we do the following: $ xfs_io -c "fpunch 2M 2M" /mnt/foobar We end up with a single file extent item in the file, which represents a hole for the range [0, 4M) and with a generation of 7 - because we end dropping the data extent for range [2M, 4M) and then update the file extent item that represented the hole at [0, 2M), by increasing length from 2M to 4M. Then doing a full fsync and power failing: $ xfs_io -c "fsync" /mnt/foobar <power failure> will result in the full fsync not logging the file extent item that represents the hole for the range [0, 4M), because its generation is 7, which is lower than the generation of the current transaction (8). As a consequence, after mounting again the filesystem (after log replay), the region [2M, 4M) does not have a hole, it still points to the previous data extent. So fix this by always updating the generation of existing file extent items representing holes when we merge/expand them. This solves the problem and it's the same approach as when we merge prealloc extents that got written (at btrfs_mark_extent_written()). Setting the generation to the current transaction's generation is also what we do when merging the new hole extent map with the previous one or the next one. A test case for fstests, covering both cases of hole file extent item merging (to the left and to the right), will be sent soon. Fixes: 7f30c07288bb9e ("btrfs: stop copying old file extents when doing a full fsync") CC: stable@vger.kernel.org # 5.18+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | | btrfs: fix possible memory leak in btrfs_get_dev_args_from_path()Zixuan Fu2022-08-221-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In btrfs_get_dev_args_from_path(), btrfs_get_bdev_and_sb() can fail if the path is invalid. In this case, btrfs_get_dev_args_from_path() returns directly without freeing args->uuid and args->fsid allocated before, which causes memory leak. To fix these possible leaks, when btrfs_get_bdev_and_sb() fails, btrfs_put_dev_args_from_path() is called to clean up the memory. Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Fixes: faa775c41d655 ("btrfs: add a btrfs_get_dev_args_from_path helper") CC: stable@vger.kernel.org # 5.16 Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Zixuan Fu <r33s3n6@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | | btrfs: check if root is readonly while setting security xattrGoldwyn Rodrigues2022-08-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For a filesystem which has btrfs read-only property set to true, all write operations including xattr should be denied. However, security xattr can still be changed even if btrfs ro property is true. This happens because xattr_permission() does not have any restrictions on security.*, system.* and in some cases trusted.* from VFS and the decision is left to the underlying filesystem. See comments in xattr_permission() for more details. This patch checks if the root is read-only before performing the set xattr operation. Testcase: DEV=/dev/vdb MNT=/mnt mkfs.btrfs -f $DEV mount $DEV $MNT echo "file one" > $MNT/f1 setfattr -n "security.one" -v 2 $MNT/f1 btrfs property set /mnt ro true setfattr -n "security.one" -v 1 $MNT/f1 umount $MNT CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* | | | | | | Merge tag '6.0-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2022-08-286-67/+70
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cfis fixes from Steve French: - two locking fixes (zero range, punch hole) - DFS 9 fix (padding), affecting some servers - three minor cleanup changes * tag '6.0-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Add helper function to check smb1+ server cifs: Use help macro to get the mid header size cifs: Use help macro to get the header preamble size cifs: skip extra NULL byte in filenames smb3: missing inode locks in punch hole smb3: missing inode locks in zero range
| * | | | | | | cifs: Add helper function to check smb1+ serverZhang Xiaoxu2022-08-244-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SMB1 server's header_preamble_size is not 0, add use is_smb1 function to simplify the code, no actual functional changes. Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | | | cifs: Use help macro to get the mid header sizeZhang Xiaoxu2022-08-242-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's better to use MID_HEADER_SIZE because the unfolded expression too long. No actual functional changes, minor readability improvement. Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | | | cifs: Use help macro to get the header preamble sizeZhang Xiaoxu2022-08-244-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's better to use HEADER_PREAMBLE_SIZE because the unfolded expression too long. No actual functional changes, minor readability improvement. Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | | | cifs: skip extra NULL byte in filenamesPaulo Alcantara2022-08-241-10/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit: cifs: alloc_path_with_tree_prefix: do not append sep. if the path is empty alloc_path_with_tree_prefix() function was no longer including the trailing separator when @path is empty, although @out_len was still assuming a path separator thus adding an extra byte to the final filename. This has caused mount issues in some Synology servers due to the extra NULL byte in filenames when sending SMB2_CREATE requests with SMB2_FLAGS_DFS_OPERATIONS set. Fix this by checking if @path is not empty and then add extra byte for separator. Also, do not include any trailing NULL bytes in filename as MS-SMB2 requires it to be 8-byte aligned and not NULL terminated. Cc: stable@vger.kernel.org Fixes: 7eacba3b00a3 ("cifs: alloc_path_with_tree_prefix: do not append sep. if the path is empty") Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | | | smb3: missing inode locks in punch holeDavid Howells2022-08-231-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | smb3 fallocate punch hole was not grabbing the inode or filemap_invalidate locks so could have race with pagemap reinstantiating the page. Cc: stable@vger.kernel.org Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | | | smb3: missing inode locks in zero rangeDavid Howells2022-08-231-25/+30
| | |/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | smb3 fallocate zero range was not grabbing the inode or filemap_invalidate locks so could have race with pagemap reinstantiating the page. Cc: stable@vger.kernel.org Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
* | | | | | | Merge tag 'x86-urgent-2022-08-28' of ↵Linus Torvalds2022-08-2815-88/+183
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix PAT on Xen, which caused i915 driver failures - Fix compat INT 80 entry crash on Xen PV guests - Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs - Fix RSB stuffing regressions - Fix ORC unwinding on ftrace trampolines - Add Intel Raptor Lake CPU model number - Fix (work around) a SEV-SNP bootloader bug providing bogus values in boot_params->cc_blob_address, by ignoring the value on !SEV-SNP bootups. - Fix SEV-SNP early boot failure - Fix the objtool list of noreturn functions and annotate snp_abort(), which bug confused objtool on gcc-12. - Fix the documentation for retbleed * tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/ABI: Mention retbleed vulnerability info file for sysfs x86/sev: Mark snp_abort() noreturn x86/sev: Don't use cc_platform_has() for early SEV-SNP calls x86/boot: Don't propagate uninitialized boot_params->cc_blob_address x86/cpu: Add new Raptor Lake CPU model number x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry x86/nospec: Fix i386 RSB stuffing x86/nospec: Unwreck the RSB stuffing x86/bugs: Add "unknown" reporting for MMIO Stale Data x86/entry: Fix entry_INT80_compat for Xen PV guests x86/PAT: Have pat_enabled() properly reflect state when running on Xen
| * | | | | | | Documentation/ABI: Mention retbleed vulnerability info file for sysfsSalvatore Bonaccorso2022-08-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While reporting for the AMD retbleed vulnerability was added in 6b80b59b3555 ("x86/bugs: Report AMD retbleed vulnerability") the new sysfs file was not mentioned so far in the ABI documentation for sysfs-devices-system-cpu. Fix that. Fixes: 6b80b59b3555 ("x86/bugs: Report AMD retbleed vulnerability") Signed-off-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220801091529.325327-1-carnil@debian.org
| * | | | | | | x86/sev: Mark snp_abort() noreturnBorislav Petkov2022-08-253-18/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark both the function prototype and definition as noreturn in order to prevent the compiler from doing transformations which confuse objtool like so: vmlinux.o: warning: objtool: sme_enable+0x71: unreachable instruction This triggers with gcc-12. Add it and sev_es_terminate() to the objtool noreturn tracking array too. Sort it while at it. Suggested-by: Michael Matz <matz@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220824152420.20547-1-bp@alien8.de
| * | | | | | | x86/sev: Don't use cc_platform_has() for early SEV-SNP callsTom Lendacky2022-08-241-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running identity-mapped and depending on the kernel configuration, it is possible that the compiler uses jump tables when generating code for cc_platform_has(). This causes a boot failure because the jump table uses un-mapped kernel virtual addresses, not identity-mapped addresses. This has been seen with CONFIG_RETPOLINE=n. Similar to sme_encrypt_kernel(), use an open-coded direct check for the status of SNP rather than trying to eliminate the jump table. This preserves any code optimization in cc_platform_has() that can be useful post boot. It also limits the changes to SEV-specific files so that future compiler features won't necessarily require possible build changes just because they are not compatible with running identity-mapped. [ bp: Massage commit message. ] Fixes: 5e5ccff60a29 ("x86/sev: Add helper for validating pages in early enc attribute changes") Reported-by: Sean Christopherson <seanjc@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 5.19.x Link: https://lore.kernel.org/all/YqfabnTRxFSM+LoX@google.com/
| * | | | | | | x86/boot: Don't propagate uninitialized boot_params->cc_blob_addressMichael Roth2022-08-242-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases, bootloaders will leave boot_params->cc_blob_address uninitialized rather than zeroing it out. This field is only meant to be set by the boot/compressed kernel in order to pass information to the uncompressed kernel when SEV-SNP support is enabled. Therefore, there are no cases where the bootloader-provided values should be treated as anything other than garbage. Otherwise, the uncompressed kernel may attempt to access this bogus address, leading to a crash during early boot. Normally, sanitize_boot_params() would be used to clear out such fields but that happens too late: sev_enable() may have already initialized it to a valid value that should not be zeroed out. Instead, have sev_enable() zero it out unconditionally beforehand. Also ensure this happens for !CONFIG_AMD_MEM_ENCRYPT as well by also including this handling in the sev_enable() stub function. [ bp: Massage commit message and comments. ] Fixes: b190a043c49a ("x86/sev: Add SEV-SNP feature detection/setup") Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Reported-by: watnuss@gmx.de Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=216387 Link: https://lore.kernel.org/r/20220823160734.89036-1-michael.roth@amd.com
| * | | | | | | x86/cpu: Add new Raptor Lake CPU model numberTony Luck2022-08-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note1: Model 0xB7 already claimed the "no suffix" #define for a regular client part, so add (yet another) suffix "S" to distinguish this new part from the earlier one. Note2: the RAPTORLAKE* and ALDERLAKE* processors are very similar from a software enabling point of view. There are no known features that have model-specific enabling and also differ between the two. In other words, every single place that list *one* or more RAPTORLAKE* or ALDERLAKE* processors should list all of them. Note3: This is being merged before there is an in-tree user. Merging this provides an "anchor" so that the different folks can update their subsystems (like perf) in parallel to use this define and test it. [ dhansen: add a note about why this has no in-tree users yet ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220823174819.223941-1-tony.luck@intel.com
| * | | | | | | x86/unwind/orc: Unwind ftrace trampolines with correct ORC entryChen Zhongjin2022-08-211-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When meeting ftrace trampolines in ORC unwinding, unwinder uses address of ftrace_{regs_}call address to find the ORC entry, which gets next frame at sp+176. If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be sp+8 instead of 176. It makes unwinder skip correct frame and throw warnings such as "wrong direction" or "can't access registers", etc, depending on the content of the incorrect frame address. By adding the base address ftrace_{regs_}caller with the offset *ip - ops->trampoline*, we can get the correct address to find the ORC entry. Also change "caller" to "tramp_addr" to make variable name conform to its content. [ mingo: Clarified the changelog a bit. ] Fixes: 6be7fa3c74d1 ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220819084334.244016-1-chenzhongjin@huawei.com
| * | | | | | | x86/nospec: Fix i386 RSB stuffingPeter Zijlstra2022-08-191-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Turns out that i386 doesn't unconditionally have LFENCE, as such the loop in __FILL_RETURN_BUFFER isn't actually speculation safe on such chips. Fixes: ba6e31af2be9 ("x86/speculation: Add LFENCE to RSB fill sequence") Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/Yv9tj9vbQ9nNlXoY@worktop.programming.kicks-ass.net
| * | | | | | | x86/nospec: Unwreck the RSB stuffingPeter Zijlstra2022-08-191-41/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") made a right mess of the RSB stuffing, rewrite the whole thing to not suck. Thanks to Andrew for the enlightening comment about Post-Barrier RSB things so we can make this code less magical. Cc: stable@vger.kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/YvuNdDWoUZSBjYcm@worktop.programming.kicks-ass.net
| * | | | | | | x86/bugs: Add "unknown" reporting for MMIO Stale DataPawan Gupta2022-08-184-19/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Older Intel CPUs that are not in the affected processor list for MMIO Stale Data vulnerabilities currently report "Not affected" in sysfs, which may not be correct. Vulnerability status for these older CPUs is unknown. Add known-not-affected CPUs to the whitelist. Report "unknown" mitigation status for CPUs that are not in blacklist, whitelist and also don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware immunity to MMIO Stale Data vulnerabilities. Mitigation is not deployed when the status is unknown. [ bp: Massage, fixup. ] Fixes: 8d50cdf8b834 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data") Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Suggested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.com
| * | | | | | | x86/entry: Fix entry_INT80_compat for Xen PV guestsJuergen Gross2022-08-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit c89191ce67ef ("x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS") missed one use case of SWAPGS in entry_INT80_compat(). Removing of the SWAPGS macro led to asm just using "swapgs", as it is accepting instructions in capital letters, too. This in turn leads to splats in Xen PV guests like: [ 36.145223] general protection fault, maybe for address 0x2d: 0000 [#1] PREEMPT SMP NOPTI [ 36.145794] CPU: 2 PID: 1847 Comm: ld-linux.so.2 Not tainted 5.19.1-1-default #1 \ openSUSE Tumbleweed f3b44bfb672cdb9f235aff53b57724eba8b9411b [ 36.146608] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 11/14/2013 [ 36.148126] RIP: e030:entry_INT80_compat+0x3/0xa3 Fix that by open coding this single instance of the SWAPGS macro. Fixes: c89191ce67ef ("x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS") Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jan Beulich <jbeulich@suse.com> Cc: <stable@vger.kernel.org> # 5.19 Link: https://lore.kernel.org/r/20220816071137.4893-1-jgross@suse.com
| * | | | | | | x86/PAT: Have pat_enabled() properly reflect state when running on XenJan Beulich2022-08-151-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit ID in the Fixes: tag, pat_enabled() returns false (because of PAT initialization being suppressed in the absence of MTRRs being announced to be available). This has become a problem: the i915 driver now fails to initialize when running PV on Xen (i915_gem_object_pin_map() is where I located the induced failure), and its error handling is flaky enough to (at least sometimes) result in a hung system. Yet even beyond that problem the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses would have been quite a bit less optimal than possible. Arrange for the function to return true in such environments, without undermining the rest of PAT MSR management logic considering PAT to be disabled: specifically, no writes to the PAT MSR should occur. For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before). [ bp: This is the "small fix" variant for stable. It'll get replaced with a proper PAT and MTRR detection split upstream but that is too involved for a stable backport. - additional touchups to commit msg. Use cpu_feature_enabled(). ] Fixes: bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()") Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/9385fa60-fa5d-f559-a137-6608408f88b0@suse.com
* | | | | | | | Merge tag 'perf-urgent-2022-08-28' of ↵Linus Torvalds2022-08-284-7/+36
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fixes from Ingo Molnar: "Misc fixes: an Arch-LBR fix, a PEBS enumeration fix, an Intel DS fix, PEBS constraints fix on Alder Lake CPUs and an Intel uncore PMU fix" * tag 'perf-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU perf/x86/intel: Fix pebs event constraints for ADL perf/x86/intel/ds: Fix precise store latency handling perf/x86/core: Set pebs_capable and PMU_FL_PEBS_ALL for the Baseline perf/x86/lbr: Enable the branch type for the Arch LBR by default
| * | | | | | | | perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMUStephane Eranian2022-08-271-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Existing code was generating bogus counts for the SNB IMC bandwidth counters: $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/ 1.000327813 1,024.03 MiB uncore_imc/data_reads/ 1.000327813 20.73 MiB uncore_imc/data_writes/ 2.000580153 261,120.00 MiB uncore_imc/data_reads/ 2.000580153 23.28 MiB uncore_imc/data_writes/ The problem was introduced by commit: 07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC") Where the read_counter callback was replace to point to the generic uncore_mmio_read_counter() function. The SNB IMC counters are freerunnig 32-bit counters laid out contiguously in MMIO. But uncore_mmio_read_counter() is using a readq() call to read from MMIO therefore reading 64-bit from MMIO. Although this is okay for the uncore_perf_event_update() function because it is shifting the value based on the actual counter width to compute a delta, it is not okay for the uncore_pmu_event_start() which is simply reading the counter and therefore priming the event->prev_count with a bogus value which is responsible for causing bogus deltas in the perf stat command above. The fix is to reintroduce the custom callback for read_counter for the SNB IMC PMU and use readl() instead of readq(). With the change the output of perf stat is back to normal: $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/ 1.000120987 296.94 MiB uncore_imc/data_reads/ 1.000120987 138.42 MiB uncore_imc/data_writes/ 2.000403144 175.91 MiB uncore_imc/data_reads/ 2.000403144 68.50 MiB uncore_imc/data_writes/ Fixes: 07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC") Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20220803160031.1379788-1-eranian@google.com
| * | | | | | | | perf/x86/intel: Fix pebs event constraints for ADLKan Liang2022-08-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the latest event list, the LOAD_LATENCY PEBS event only works on the GP counter 0 and 1 for ADL and RPL. Update the pebs event constraints table. Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support") Reported-by: Ammy Yi <ammy.yi@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220818184429.2355857-1-kan.liang@linux.intel.com
| * | | | | | | | perf/x86/intel/ds: Fix precise store latency handlingStephane Eranian2022-08-191-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the existing code in store_latency_data(), the memory operation (mem_op) returned to the user is always OP_LOAD where in fact, it should be OP_STORE. This comes from the fact that the function is simply grabbing the information from a data source map which covers only load accesses. Intel 12th gen CPU offers precise store sampling that captures both the data source and latency. Therefore it can use the data source mapping table but must override the memory operation to reflect stores instead of loads. Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids") Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220818054613.1548130-1-eranian@google.com
| * | | | | | | | perf/x86/core: Set pebs_capable and PMU_FL_PEBS_ALL for the BaselinePeter Zijlstra2022-08-192-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SDM explicitly states that PEBS Baseline implies Extended PEBS. For cpu model forward compatibility (e.g. on ICX, SPR, ADL), it's safe to stop doing FMS table thing such as setting pebs_capable and PMU_FL_PEBS_ALL since it's already set in the intel_ds_init(). The Goldmont Plus is the only platform which supports extended PEBS but doesn't have Baseline. Keep the status quo. Reported-by: Like Xu <likexu@tencent.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20220816114057.51307-1-likexu@tencent.com
| * | | | | | | | perf/x86/lbr: Enable the branch type for the Arch LBR by defaultKan Liang2022-08-191-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On the platform with Arch LBR, the HW raw branch type encoding may leak to the perf tool when the SAVE_TYPE option is not set. In the intel_pmu_store_lbr(), the HW raw branch type is stored in lbr_entries[].type. If the SAVE_TYPE option is set, the lbr_entries[].type will be converted into the generic PERF_BR_* type in the intel_pmu_lbr_filter() and exposed to the user tools. But if the SAVE_TYPE option is NOT set by the user, the current perf kernel doesn't clear the field. The HW raw branch type leaks. There are two solutions to fix the issue for the Arch LBR. One is to clear the field if the SAVE_TYPE option is NOT set. The other solution is to unconditionally convert the branch type and expose the generic type to the user tools. The latter is implemented here, because - The branch type is valuable information. I don't see a case where you would not benefit from the branch type. (Stephane Eranian) - Not having the branch type DOES NOT save any space in the branch record (Stephane Eranian) - The Arch LBR HW can retrieve the common branch types from the LBR_INFO. It doesn't require the high overhead SW disassemble. Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220816125612.2042397-1-kan.liang@linux.intel.com
* | | | | | | | | Merge tag 'perf-tools-fixes-for-v6.0-2022-08-27' of ↵Linus Torvalds2022-08-288-32/+61
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fixup setup of weak groups when using 'perf stat --repeat', add a 'perf test' for it. - Fix memory leaks in 'perf sched record' detected with -fsanitize=address. - Fix build when PYTHON_CONFIG is user supplied. - Capitalize topdown metrics' names in 'perf stat', so that the output, sometimes parsed, matches the Intel SDM docs. - Make sure the documentation for the save_type filter about Intel systems with Arch LBR support (12th-Gen+ client or 4th-Gen Xeon+ server) reflects recent related kernel changes. - Fix 'perf record' man page formatting of description of support to hybrid systems. - Update arm64´s KVM header from the kernel sources. * tag 'perf-tools-fixes-for-v6.0-2022-08-27' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf stat: Capitalize topdown metrics' names perf docs: Update the documentation for the save_type filter perf sched: Fix memory leaks in __cmd_record detected with -fsanitize=address perf record: Fix manpage formatting of description of support to hybrid systems perf test: Stat test for repeat with a weak group perf stat: Clear evsel->reset_group for each stat run tools kvm headers arm64: Update KVM header from the kernel sources perf python: Fix build when PYTHON_CONFIG is user supplied
| * | | | | | | | | perf stat: Capitalize topdown metrics' namesZhengjun Xing2022-08-271-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Capitalize topdown metrics' names to follow the intel SDM. Before: # ./perf stat -a sleep 1 Performance counter stats for 'system wide': 228,094.05 msec cpu-clock # 225.026 CPUs utilized 842 context-switches # 3.691 /sec 224 cpu-migrations # 0.982 /sec 70 page-faults # 0.307 /sec 23,164,105 cycles # 0.000 GHz 29,403,446 instructions # 1.27 insn per cycle 5,268,185 branches # 23.097 K/sec 33,239 branch-misses # 0.63% of all branches 136,248,990 slots # 597.337 K/sec 32,976,450 topdown-retiring # 24.2% retiring 4,651,918 topdown-bad-spec # 3.4% bad speculation 26,148,695 topdown-fe-bound # 19.2% frontend bound 72,515,776 topdown-be-bound # 53.2% backend bound 6,008,540 topdown-heavy-ops # 4.4% heavy operations # 19.8% light operations 3,934,049 topdown-br-mispredict # 2.9% branch mispredict # 0.5% machine clears 16,655,439 topdown-fetch-lat # 12.2% fetch latency # 7.0% fetch bandwidth 41,635,972 topdown-mem-bound # 30.5% memory bound # 22.7% Core bound 1.013634593 seconds time elapsed After: # ./perf stat -a sleep 1 Performance counter stats for 'system wide': 228,081.94 msec cpu-clock # 225.003 CPUs utilized 824 context-switches # 3.613 /sec 224 cpu-migrations # 0.982 /sec 67 page-faults # 0.294 /sec 22,647,423 cycles # 0.000 GHz 28,870,551 instructions # 1.27 insn per cycle 5,167,099 branches # 22.655 K/sec 32,383 branch-misses # 0.63% of all branches 133,411,074 slots # 584.926 K/sec 32,352,607 topdown-retiring # 24.3% Retiring 4,456,977 topdown-bad-spec # 3.3% Bad Speculation 25,626,487 topdown-fe-bound # 19.2% Frontend Bound 70,955,316 topdown-be-bound # 53.2% Backend Bound 5,834,844 topdown-heavy-ops # 4.4% Heavy Operations # 19.9% Light Operations 3,738,781 topdown-br-mispredict # 2.8% Branch Mispredict # 0.5% Machine Clears 16,286,803 topdown-fetch-lat # 12.2% Fetch Latency # 7.0% Fetch Bandwidth 40,802,069 topdown-mem-bound # 30.6% Memory Bound # 22.6% Core Bound 1.013683125 seconds time elapsed Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220825015458.3252239-1-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | | | perf docs: Update the documentation for the save_type filterKan Liang2022-08-271-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the documentation to reflect the kernel changes. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220816125612.2042397-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | | | perf sched: Fix memory leaks in __cmd_record detected with -fsanitize=addressIan Rogers2022-08-271-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An array of strings is passed to cmd_record but not freed. As cmd_record modifies the array, add another array as a copy that can be mutated allowing the original array contents to all be freed. Detected with -fsanitize=address. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220824145733.409005-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | | | perf record: Fix manpage formatting of description of support to hybrid systemsAndi Kleen2022-08-272-12/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Intel hybrid description is written in a different style than the rest of the perf record man page. There were some new command line options added after it which resulted in very strange section ordering. Move the hybrid include last. Also the sub sections in the hybrid document don't fit the record manpage well (especially since it talks about all kinds of unrelated commands). I left this for now, but would be better to separate this properly in the different man pages. It would be better to use sub sections for the other sections, but these don't seem to be supported in AsciiDoc? Some of the examples are still misrendered in the manpage with an indented troff command, but I don't know how to fix that. In any case it's now better than before. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220818100127.249401-1-ak@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | | | perf test: Stat test for repeat with a weak groupIan Rogers2022-08-271-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Breaking a weak group requires multiple passes of an evlist, with multiple runs this can introduce bugs ultimately leading to segfaults. Add a test to cover this. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220822213352.75721-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | | | perf stat: Clear evsel->reset_group for each stat runIan Rogers2022-08-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a weak group is broken then the reset_group flag remains set for the next run. Having reset_group set means the counter isn't created and ultimately a segfault. A simple reproduction of this is: # perf stat -r2 -e '{cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles}:W which will be added as a test in the next patch. Fixes: 4804e0111662d7d8 ("perf stat: Use affinity for opening events") Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220822213352.75721-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | | | tools kvm headers arm64: Update KVM header from the kernel sourcesArnaldo Carvalho de Melo2022-08-271-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To pick the changes from: ae3b1da95413614f ("KVM: arm64: Fix compile error due to sign extension") That doesn't result in any changes in tooling (when built on x86), only addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h' diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/all/YwOMCCc4E79FuvDe@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | | | perf python: Fix build when PYTHON_CONFIG is user suppliedJames Clark2022-08-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous change to Python autodetection had a small mistake where the auto value was used to determine the Python binary, rather than the user supplied value. The Python binary is only used for one part of the build process, rather than the final linking, so it was producing correct builds in most scenarios, especially when the auto detected value matched what the user wanted, or the system only had a valid set of Pythons. Change it so that the Python binary path is derived from either the PYTHON_CONFIG value or PYTHON value, depending on what is specified by the user. This was the original intention. This error was spotted in a build failure an odd cross compilation environment after commit 4c41cb46a732fe82 ("perf python: Prefer python3") was merged. Fixes: 630af16eee495f58 ("perf tools: Use Python devtools for version autodetection rather than runtime") Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220728093946.1337642-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | | | | | | | | | Merge tag 'thermal-6.0-rc3' of ↵Linus Torvalds2022-08-273-4/+7
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "Fix two issues introduced recently and one driver problem leading to a NULL pointer dereference in some cases. Specifics: - Add missing EXPORT_SYMBOL_GPL in the thermal core and add back the required 'trips' property to the thermal zone DT bindings (Daniel Lezcano) - Prevent the int340x_thermal driver from crashing when a package with a buffer of 0 length is returned by an ACPI control method evaluated by it (Lee, Chun-Yi)" * tag 'thermal-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal/int340x_thermal: handle data_vault when the value is ZERO_SIZE_PTR dt-bindings: thermal: Fix missing required property thermal/core: Add missing EXPORT_SYMBOL_GPL
| * \ \ \ \ \ \ \ \ \ Merge branch 'thermal-core'Rafael J. Wysocki2022-08-272-0/+2
| |\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge thermal control core fixes for 6.0-rc3: - Fix missing required property for thermal zone description (Daniel Lezcano). - Add missing export symbol for thermal_zone_device_register_with_trips() (Daniel Lezcano). * thermal-core: dt-bindings: thermal: Fix missing required property thermal/core: Add missing EXPORT_SYMBOL_GPL
| | * \ \ \ \ \ \ \ \ \ Merge tag 'thermal-v6.0-rc2' of ↵Rafael J. Wysocki2022-08-202-0/+2
| | |\ \ \ \ \ \ \ \ \ \ | | | |_|_|/ / / / / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux Pull thermal control fixes for 6.0-rc3 from Daniel Lezcano: "- Fix missing required property for thermal zone description (Daniel Lezcano) - Add missing export symbol for thermal_zone_device_register_with_trips() (Daniel Lezcano)" * tag 'thermal-v6.0-rc2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux: dt-bindings: thermal: Fix missing required property thermal/core: Add missing EXPORT_SYMBOL_GPL
| | | * | | | | | | | | dt-bindings: thermal: Fix missing required propertyDaniel Lezcano2022-08-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the thermal zone description was converted to yaml schema, the required 'trips' property was forgotten. The initial text bindings was describing: " [ ... ] * Thermal zone nodes The thermal zone node is the node containing all the required info for describing a thermal zone, including its cooling device bindings. The thermal zone node must contain, apart from its own properties, one sub-node containing trip nodes and one sub-node containing all the zone cooling maps. Required properties: - polling-delay: The maximum number of milliseconds to wait between polls Type: unsigned when checking this thermal zone. Size: one cell - polling-delay-passive: The maximum number of milliseconds to wait Type: unsigned between polls when performing passive cooling. Size: one cell - thermal-sensors: A list of thermal sensor phandles and sensor specifier Type: list of used while monitoring the thermal zone. phandles + sensor specifier - trips: A sub-node which is a container of only trip point nodes Type: sub-node required to describe the thermal zone. Optional property: - cooling-maps: A sub-node which is a container of only cooling device Type: sub-node map nodes, used to describe the relation between trips and cooling devices. [ ... ] " Now the schema describes: " [ ... ] required: - polling-delay - polling-delay-passive - thermal-sensors [ ... ] " Add the missing 'trips' property in the required properties. Fixed: 1202a442a31fd ("dt-bindings: thermal: Add yaml bindings for thermal zones") Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220809085629.509116-3-daniel.lezcano@linaro.org
| | | * | | | | | | | | thermal/core: Add missing EXPORT_SYMBOL_GPLDaniel Lezcano2022-08-151-0/+1
| | |/ / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function thermal_zone_device_register_with_trips() is not exported for modules. Add the missing EXPORT_SYMBOL_GPL(). Fixes: fae11de507f0e ("thermal/core: Add thermal_trip in thermal_zone") Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20220810100731.749317-1-daniel.lezcano@linaro.org
| * | | | / / / / / / thermal/int340x_thermal: handle data_vault when the value is ZERO_SIZE_PTRLee, Chun-Yi2022-08-231-4/+5
| | |_|_|/ / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some case, the GDDV returns a package with a buffer which has zero length. It causes that kmemdup() returns ZERO_SIZE_PTR (0x10). Then the data_vault_read() got NULL point dereference problem when accessing the 0x10 value in data_vault. [ 71.024560] BUG: kernel NULL pointer dereference, address: 0000000000000010 This patch uses ZERO_OR_NULL_PTR() for checking ZERO_SIZE_PTR or NULL value in data_vault. Signed-off-by: "Lee, Chun-Yi" <jlee@suse.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | | | | | | | Merge tag 'pm-6.0-rc3' of ↵Linus Torvalds2022-08-271-1/+1
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Make __resolve_freq() check the presence of the frequency table instead of checking whether or not the ->target_index() callback is implemented by the driver, because that need not be the case when __resolve_freq() is used (Lukasz Luba)" * tag 'pm-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: check only freq_table in __resolve_freq()
| * | | | | | | | | | cpufreq: check only freq_table in __resolve_freq()Lukasz Luba2022-08-231-1/+1
| |/ / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to check if the cpufreq driver implements callback cpufreq_driver::target_index. The logic in the __resolve_freq uses the frequency table available in the policy. It doesn't matter if the driver provides 'target_index' or 'target' callback. It just has to populate the 'policy->freq_table'. Thus, check only frequency table during the frequency resolving call. Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | | | | | | | Merge tag 'acpi-6.0-rc3' of ↵Linus Torvalds2022-08-272-7/+6
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix issues introduced by recent changes related to the handling of ACPI device properties and a coding mistake in the exit path of the ACPI processor driver. Specifics: - Prevent acpi_thermal_cpufreq_exit() from attempting to remove the same frequency QoS request multiple times (Riwen Lu) - Fix type detection for integer ACPI device properties (Stefan Binding) - Avoid emitting false-positive warnings when processing ACPI device properties and drop the useless default case from the acpi_copy_property_array_uint() macro (Sakari Ailus)" * tag 'acpi-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: property: Remove default association from integer maximum values ACPI: property: Ignore already existing data node tags ACPI: property: Fix type detection of unified integer reading functions ACPI: processor: Remove freq Qos request for all CPUs
| * \ \ \ \ \ \ \ \ \ Merge branch 'acpi-processor' into acpiRafael J. Wysocki2022-08-271-1/+1
| |\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * acpi-processor: ACPI: processor: Remove freq Qos request for all CPUs