| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the xfs_etest structure in favor of a per-mountpoint structure.
This will give us the flexibility to set as many error injection points
as we want, and later enable us to set up sysfs knobs to set the trigger
frequency as we wish. This comes at a cost of higher memory use, but
unti we hit 1024 injection points (we're at 29) or a lot of mounts this
shouldn't be a huge issue.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Log recovery allocates in-core transaction and member item data
structures on-demand as it processes the on-disk log. Transactions
are allocated on first encounter on-disk and stored in a hash table
structure where they are easily accessible for subsequent lookups.
Transaction items are also allocated on demand and are attached to
the associated transactions.
When a commit record is encountered in the log, the transaction is
committed to the fs and the in-core structures are freed. If a
filesystem crashes or shuts down before all in-core log buffers are
flushed to the log, however, not all transactions may have commit
records in the log. As expected, the modifications in such an
incomplete transaction are not replayed to the fs. The in-core data
structures for the partial transaction are never freed, however,
resulting in a memory leak.
Update xlog_do_recovery_pass() to first correctly initialize the
hash table array so empty lists can be distinguished from populated
lists on function exit. Update xlog_recover_free_trans() to always
remove the transaction from the list prior to freeing the associated
memory. Finally, walk the hash table of transaction lists as the
last step before it goes out of scope and free any transactions that
may remain on the lists. This prevents a memory leak of partial
transactions in the log.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
bmap returns a dumb LBA address but not the block device that goes with
that LBA. Swapfiles don't care about this and will blindly assume that
the data volume is the correct blockdev, which is totally bogus for
files on the rt subvolume. This results in the swap code doing IOs to
arbitrary locations on the data device(!) if the passed in mapping is a
realtime file, so just turn off bmap for rt files.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
| |
Expose the readlink variant that doesn't take the inode lock so that
the scrubber can inspect symlink contents.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Teach the extended attribute reading functions to pass along a
transaction context if one was supplied. The extended attribute scrub
code will use transactions to lock buffers and avoid deadlocking with
itself in the case of loops; since it will already have the inode
locked, also create xattr get/list helpers that don't take locks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Teach the directory reading functions to pass along a transaction context
if one was supplied. The directory scrub code will use transactions to
lock buffers and avoid deadlocking with itself in the case of loops.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Modify the existing dir leafn lasthash function to enable us to
calculate the highest hash value of a leaf1 block. This will be used by
the directory scrubbing code to check the sanity of hashes in leaf1
directory blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
|
|
|
|
|
|
| |
Refactor the inode fork block counting function to count extents for us
at the same time. This will be used by the bmbt scrubber function.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is an inconsistency in the way that _bmap_count_blocks deals with
delalloc reservations -- if the specified fork is in extents format,
*count is set to the total number of blocks referenced by the in-core
fork, including delalloc extents. However, if the fork is in btree
format, *count is set to the number of blocks referenced by the on-disk
fork, which does /not/ include delalloc extents.
For the lone existing caller of _bmap_count_blocks this hasn't been an
issue because the function is only used to count xattr fork blocks
(where there aren't any delalloc reservations). However, when scrub
comes along it will use this same function to check di_nblocks against
both on-disk extent maps, so we need this behavior to be consistent.
Therefore, fix _bmap_count_leaves not to include delalloc extents and
remove unnecessary parameters.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Separate the "clear reflink flag" function into one function that checks
if the flag is needed, and a second function that checks and clears the
flag. The inode scrub code will want to check the necessity of the flag
without clearing it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Adapt _reflink_find_shared to take an optional transaction pointer. The
inode scrubber code will need to decide (within transaction context) if
a file has shared blocks. To avoid buffer deadlocks, we must pass the
tp through to this function's utility calls.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Check the inode cache for a particular inode number. If it's in the
cache, check that it's not currently being reclaimed. If it's not being
reclaimed, return zero if the inode is allocated. This function will be
used by various scrubbers to decide if the cache is more up to date
than the disk in terms of checking if an inode is allocated.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Create a function to extract an in-core inobt record from a generic
btree_rec union so that scrub will be able to check inobt records
and check inode block alignment.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Plumb in the pieces (init_high_key, diff_two_keys) necessary to call
query_range on the inode space and block mapping btrees and to extract
raw btree records. This will eventually be used by the inobt and bmbt
scrubbers.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
|
|
|
|
|
|
| |
Export various internal functions so that the online scrubber can use
them to check the state of metadata.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
|
|
|
|
|
|
| |
The btree record and key inorder check functions will be used by the
btree scrubber code, so make sure they're always built.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a purely mechanical patch that removes the private
__{u,}int{8,16,32,64}_t typedefs in favor of using the system
{u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform
the transformation and fix the resulting whitespace and indentation
errors:
s/typedef\t__uint8_t/typedef __uint8_t\t/g
s/typedef\t__uint/typedef __uint/g
s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
s/__uint8_t\t/__uint8_t\t\t/g
s/__uint/uint/g
s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
s/__int/int/g
/^typedef.*int[0-9]*_t;$/d
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
| |
Don't bother wandering our way through the leaf nodes when the caller
issues a query_all; just zoom down the left side of the tree and walk
rightwards along level zero.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a buffer is modified, logged and committed, it ultimately ends
up sitting on the AIL with a dirty bli waiting for metadata
writeback. If another transaction locks and invalidates the buffer
(freeing an inode chunk, for example) in the meantime, the bli is
flagged as stale, the dirty state is cleared and the bli remains in
the AIL.
If a shutdown occurs before the transaction that has invalidated the
buffer is committed, the transaction is ultimately aborted. The log
items are flagged as such and ->iop_unlock() handles the aborted
items. Because the bli is clean (due to the invalidation),
->iop_unlock() unconditionally releases it. The log item may still
reside in the AIL, however, which means the I/O completion handler
may still run and attempt to access it. This results in assert
failure due to the release of the bli while still present in the AIL
and a subsequent NULL dereference and panic in the buffer I/O
completion handling. This can be reproduced by running generic/388
in repetition.
To avoid this problem, update xfs_buf_item_unlock() to first check
whether the bli is aborted and if so, remove it from the AIL before
it is released. This ensures that the bli is no longer accessed
during the shutdown sequence after it has been freed.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a filesystem shutdown occurs with a buffer log item in the CIL
and a log force occurs, the ->iop_unpin() handler is generally
expected to tear down the bli properly. This entails freeing the bli
memory and releasing the associated hold on the buffer so it can be
released and the filesystem unmounted.
If this sequence occurs while ->bli_refcount is elevated (i.e.,
another transaction is open and attempting to modify the buffer),
however, ->iop_unpin() may not be responsible for releasing the bli.
Instead, the transaction may release the final ->bli_refcount
reference and thus xfs_trans_brelse() is responsible for tearing
down the bli.
While xfs_trans_brelse() does drop the reference count, it only
attempts to release the bli if it is clean (i.e., not in the
CIL/AIL). If the filesystem is shutdown and the bli is sitting dirty
in the CIL as noted above, this ends up skipping the last
opportunity to release the bli. In turn, this leaves the hold on the
buffer and causes an unmount hang. This can be reproduced by running
generic/388 in repetition.
Update xfs_trans_brelse() to handle this shutdown corner case
correctly. If the final bli reference is dropped and the filesystem
is shutdown, remove the bli from the AIL (if necessary) and release
the bli to drop the buffer hold and ensure an unmount does not hang.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gcc-7 flags the use of integer math inside of a condition
as a potential bug:
fs/xfs/xfs_bmap_util.c: In function 'xfs_swap_extents_check_format':
fs/xfs/xfs_bmap_util.c:1619:8: error: '<<' in boolean context, did you mean '<' ? [-Werror=int-in-bool-context]
fs/xfs/xfs_bmap_util.c:1629:8: error: '<<' in boolean context, did you mean '<' ? [-Werror=int-in-bool-context]
There is already a helper function for testing the di_forkoff
field for zero, so let's use that instead to shut up the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The t_lsn is not used anymore and the t_commit_lsn is used as a tmp
storage for the checkpoint sequence number only in the current code.
And the start/commit lsn are tracked as a transaction group tag in
the xfs_cil_ctx instead of a single transaction, so remove them from
the xfs_trans structure and their users to match with the design.
Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
XFS_HSIZE is an extremly confusing way to calculate the size of handle_t.
Given that handle_t always only had two sizes, and one of them isn't
even covered by XFS_HSIZE to start with just remove the macro and use
a constant sizeof expression.
Note that XFS_HSIZE isn't used in xfsprogs, xfsdump or xfstests either.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a transaction log reservation overrun occurs, the ticket data
associated with the reservation is dumped in xfs_log_commit_cil().
This occurs long after the transaction items and details have been
removed from the transaction and effectively lost. This limited set
of ticket data provides very little information to support debugging
transaction overruns based on the typical report.
To improve transaction log reservation overrun reporting, create a
helper to dump transaction details such as log items, log vector
data, etc., as well as the underlying ticket data for the
transaction. Move the overrun detection from xfs_log_commit_cil() to
xlog_cil_insert_items() so it occurs prior to migration of the
logged items to the CIL. Call the new helper such that it is able to
dump this transaction data before it is lost.
Also, warn on overrun to provide callstack context for the offending
transaction and include a few additional messages from
xlog_cil_insert_items() to display the reservation consumed locally
for overhead such as log vector headers, split region headers and
the context ticket. This provides a complete general breakdown of
the reservation consumption of a transaction when/if it happens to
overrun the reservation.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Transaction reservation overrun detection currently occurs too late
to print useful information about the offending transaction.
Ideally, the transaction data is printed before the associated log
items are moved from the transaction to the CIL, which occurs in
xlog_cil_insert_items(), such that details of the items logged by
the transaction are available for analysis.
Refactor xlog_cil_insert_items() to facilitate moving tx overrun
detection to this function. Update the function to track each bit of
extra log reservation stolen from the transaction (i.e., such as for
the CIL context ticket) and perform the log item migration as the
last operation before the CIL lock is released. This creates a
context where the transaction reservation consumption has been fully
calculated when the log items are moved to the CIL. This patch makes
no functional changes.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
xlog_print_tic_res() pre-dates delayed logging and the committed
items list (CIL) and thus retains some factoring warts, such as hard
coded function names in the output and the fact that it induces a
shutdown.
In preparation for more detailed logging of regular transaction
overrun situations, refactor xlog_print_tic_res() to be slightly
more generic. Reword some of the warning messages and pull the
shutdown into the callers.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While configurable at runtime, the DEBUG mode assert failure
behavior is usually either desired or not for a particular
situation. For example, developers using kernel modules may prefer
for fatal asserts to remain disabled across module reloads while QE
engineers doing broad regression testing may prefer to have fatal
asserts enabled on boot to facilitate data collection for bug
reports.
To provide a compromise/convenience for developers, create a Kconfig
option that sets the default value of the DEBUG mode 'bug_on_assert'
sysfs tunable. The default behavior remains to trigger kernel BUGs
on assert failures to preserve existing behavior across kernel
configuration updates with DEBUG mode enabled.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In DEBUG mode, assert failures unconditionally trigger a kernel BUG.
This is useful in diagnostic situations to panic a system and
collect detailed state information at the time of a failure.
This can also cause problems in cases where DEBUG mode code is
desired but it is preferable not trigger kernel BUGs on assert
failure. For example, during development of new code or during
certain xfstests tests that intentionally cause corruption and test
the kernel for survival (but otherwise may expect to trigger assert
failures).
To provide additional flexibility, create the
<sysfs>/fs/xfs/debug/bug_on_assert tunable to configure assert
failure behavior at runtime. This tunable is only available in DEBUG
mode and is enabled by default to preserve existing default
behavior. When disabled, assert failures in DEBUG mode result in
kernel warnings.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shared extent
In a pathological scenario where we are trying to bunmapi a single
extent in which every other block is shared, it's possible that trying
to unmap the entire large extent in a single transaction can generate so
many EFIs that we overflow the transaction reservation.
Therefore, use a heuristic to guess at the number of blocks we can
safely unmap from a reflink file's data fork in an single transaction.
This should prevent problems such as the log head slamming into the tail
and ASSERTs that trigger because we've exceeded the transaction
reservation.
Note that since bunmapi can fail to unmap the entire range, we must also
teach the deferred unmap code to roll into a new transaction whenever we
get low on reservation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: random edits, all bugs are my fault]
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the dir2 leaf block getdents function uses a complex state
tracking mechanism to create a shadow copy of the block mappings and
then uses the shadow copy to schedule readahead. Since the read and
readahead functions are perfectly capable of reading the mappings
themselves, we can tear all that out in favor of a simpler function that
simply keeps pushing the readahead window further out.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reclaim during quotacheck can lead to deadlocks on the dquot flush
lock:
- Quotacheck populates a local delwri queue with the physical dquot
buffers.
- Quotacheck performs the xfs_qm_dqusage_adjust() bulkstat and
dirties all of the dquots.
- Reclaim kicks in and attempts to flush a dquot whose buffer is
already queud on the quotacheck queue. The flush succeeds but
queueing to the reclaim delwri queue fails as the backing buffer is
already queued. The flush unlock is now deferred to I/O completion
of the buffer from the quotacheck queue.
- The dqadjust bulkstat continues and dirties the recently flushed
dquot once again.
- Quotacheck proceeds to the xfs_qm_flush_one() walk which requires
the flush lock to update the backing buffers with the in-core
recalculated values. It deadlocks on the redirtied dquot as the
flush lock was already acquired by reclaim, but the buffer resides
on the local delwri queue which isn't submitted until the end of
quotacheck.
This is reproduced by running quotacheck on a filesystem with a
couple million inodes in low memory (512MB-1GB) situations. This is
a regression as of commit 43ff2122e6 ("xfs: on-stack delayed write
buffer lists"), which removed a trylock and buffer I/O submission
from the quotacheck dquot flush sequence.
Quotacheck first resets and collects the physical dquot buffers in a
delwri queue. Then, it traverses the filesystem inodes via bulkstat,
updates the in-core dquots, flushes the corrected dquots to the
backing buffers and finally submits the delwri queue for I/O. Since
the backing buffers are queued across the entire quotacheck
operation, dquot reclaim cannot possibly complete a dquot flush
before quotacheck completes.
Therefore, quotacheck must submit the buffer for I/O in order to
cycle the flush lock and flush the dirty in-core dquot to the
buffer. Add a delwri queue buffer push mechanism to submit an
individual buffer for I/O without losing the delwri queue status and
use it from quotacheck to avoid the deadlock. This restores
quotacheck behavior to as before the regression was introduced.
Reported-by: Martin Svec <martin.svec@zoner.cz>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.
This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.
Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.
One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).
Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.
Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.
Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Stream of fixes has slowed down, only a few this week:
- Some DT fixes for Allwinner platforms, and addition of a clock to
the R_CCU clock controller that had been missed.
- A couple of small DT fixes for am335x-sl50"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arm64: allwinner: a64: Add PLL_PERIPH0 clock to the R_CCU
ARM: sunxi: h3-h5: Add PLL_PERIPH0 clock to the R_CCU
ARM: dts: am335x-sl50: Fix cannot claim requested pins for spi0
ARM: dts: am335x-sl50: Fix card detect pin for mmc1
arm64: allwinner: h5: Remove syslink to shared DTSI
ARM: sunxi: h3/h5: fix the compatible of R_CCU
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into fixes
Allwinner fixes for 4.12
A few fixes around the PRCM support that got in 4.12 with a wrong
compatible, and a missing clock in the binding.
* tag 'sunxi-fixes-for-4.12' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm64: allwinner: a64: Add PLL_PERIPH0 clock to the R_CCU
ARM: sunxi: h3-h5: Add PLL_PERIPH0 clock to the R_CCU
arm64: allwinner: h5: Remove syslink to shared DTSI
ARM: sunxi: h3/h5: fix the compatible of R_CCU
Signed-off-by: Olof Johansson <olof@lixom.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The AR100 clock within the R_CCU (PRCM) has the PLL_PERIPH0 as one of
its parents.
This adds the reference in the device tree describing this relationship.
This patch uses a raw number for the clock index to ease merging by
avoiding cross tree dependencies.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The AR100 clock within the R_CCU (PRCM) has the PLL_PERIPH0 as one of
its parents.
This adds the reference in the device tree describing this relationship.
This patch uses a raw number for the clock index to ease merging by
avoiding cross tree dependencies.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The arm64 H5 and arm H3 SoCs share roughly the same base, and therefore
share a significant part of their device tree.
The approach we took was to add a symlink from the arm64 DTSI to the arm
DTSI.
Now that the arm DT folder is exposed in the include path, we can just use
it and remove our symlink.
Acked-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The R_CCU of H3/H5 currently wrongly used A64 R_CCU compatible.
Fix it by changing it to the correct H3 compatible.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
Two fixes for am335x-sl50 to fix a boot time error
for claiming SPI pins, and to fix a SDIO card detect
pin for production version of the device.
* tag 'omap-for-v4.12/fixes-sl50' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: am335x-sl50: Fix cannot claim requested pins for spi0
ARM: dts: am335x-sl50: Fix card detect pin for mmc1
Signed-off-by: Olof Johansson <olof@lixom.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We don't need to bitbang these pins anymore, instead we muxed these
pins as SPI, after this change, done in commit 6c69f726, we introduced
the following error:
pinctrl-single 44e10800.pinmux: pin PIN85 already requested \
by 44e10800.pinmux; cannot claim for 48030000.spi
pinctrl-single 44e10800.pinmux: pin-85 (48030000.spi) status -22
Fixes: 6c69f726 ("ARM: dts: am335x-sl50: Enable SPI0 interface and Flash Memory")
Cc: <stable@vger.kernel.org> # 4.11
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The second version of the hardware moved the card detect pin from gpio0_6
to gpio1_9, as we won't support the first hardware version fix the pinmux
configuration of this pin.
Fixes: 8584d4fc ("ARM: dts: am335x-sl50: Add Toby-Churchill SL50 board support.")
Cc: <stable@vger.kernel.org> # 4.11
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull virtio bugfix from Michael Tsirkin:
"It turns out balloon does not handle IOMMUs correctly. We should fix
that at some point, for now let's just disable this configuration"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_balloon: disable VIOMMU support
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
virtio balloon bypasses the DMA API entirely so does not support the
VIOMMU right now. It's not clear we need that support, for now let's
just make sure we don't pretend to support it.
Cc: stable@vger.kernel.org
Cc: Wei Wang <wei.w.wang@intel.com>
Fixes: 1a937693993f ("virtio: new feature to detect IOMMU device quirk")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Two driver bugfixes"
* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: ismt: fix wrong device address when unmap the data buffer
i2c: rcar: use correct length when unmapping DMA
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Fix the following kernel bug:
kernel BUG at drivers/iommu/intel-iommu.c:3260!
invalid opcode: 0000 [#5] PREEMPT SMP
Hardware name: Intel Corp. Harcuvar/Server, BIOS HAVLCRB0.X64.0013.D39.1608311820 08/31/2016
task: ffff880175389950 ti: ffff880176bec000 task.ti: ffff880176bec000
RIP: 0010:[<ffffffff8150a83b>] [<ffffffff8150a83b>] intel_unmap+0x25b/0x260
RSP: 0018:ffff880176bef5e8 EFLAGS: 00010296
RAX: 0000000000000024 RBX: ffff8800773c7c88 RCX: 000000000000ce04
RDX: 0000000080000000 RSI: 0000000000000000 RDI: 0000000000000009
RBP: ffff880176bef638 R08: 0000000000000010 R09: 0000000000000004
R10: ffff880175389c78 R11: 0000000000000a4f R12: ffff8800773c7868
R13: 00000000ffffac88 R14: ffff8800773c7818 R15: 0000000000000001
FS: 00007fef21258700(0000) GS:ffff88017b5c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000066d6d8 CR3: 000000007118c000 CR4: 00000000003406e0
Stack:
00000000ffffac88 ffffffff8199867f ffff880176bef5f8 ffff880100000030
ffff880176bef668 ffff8800773c7c88 ffff880178288098 ffff8800772c0010
ffff8800773c7818 0000000000000001 ffff880176bef648 ffffffff8150a86e
Call Trace:
[<ffffffff8199867f>] ? printk+0x46/0x48
[<ffffffff8150a86e>] intel_unmap_page+0xe/0x10
[<ffffffffa039d99b>] ismt_access+0x27b/0x8fa [i2c_ismt]
[<ffffffff81554420>] ? __pm_runtime_suspend+0xa0/0xa0
[<ffffffff815544a0>] ? pm_suspend_timer_fn+0x80/0x80
[<ffffffff81554420>] ? __pm_runtime_suspend+0xa0/0xa0
[<ffffffff815544a0>] ? pm_suspend_timer_fn+0x80/0x80
[<ffffffff8143dfd0>] ? pci_bus_read_dev_vendor_id+0xf0/0xf0
[<ffffffff8172b36c>] i2c_smbus_xfer+0xec/0x4b0
[<ffffffff810aa4d5>] ? vprintk_emit+0x345/0x530
[<ffffffffa038936b>] i2cdev_ioctl_smbus+0x12b/0x240 [i2c_dev]
[<ffffffff810aa829>] ? vprintk_default+0x29/0x40
[<ffffffffa0389b33>] i2cdev_ioctl+0x63/0x1ec [i2c_dev]
[<ffffffff811b04c8>] do_vfs_ioctl+0x328/0x5d0
[<ffffffff8119d8ec>] ? vfs_write+0x11c/0x190
[<ffffffff8109d449>] ? rt_up_read+0x19/0x20
[<ffffffff811b07f1>] SyS_ioctl+0x81/0xa0
[<ffffffff819a351b>] system_call_fastpath+0x16/0x6e
This happen When run "i2cdetect -y 0" detect SMBus iSMT adapter.
After finished I2C block read/write, when unmap the data buffer,
a wrong device address was pass to dma_unmap_single().
To fix this, give dma_unmap_single() the "dev" parameter, just like
what dma_map_single() does, then unmap can find the right devices.
Fixes: 13f35ac14cd0 ("i2c: Adding support for Intel iSMT SMBus 2.0 host controller")
Signed-off-by: Liwei Song <liwei.song@windriver.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
| |/ / / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Because we need to transfer some bytes with PIO, the msg length is not
the length of the DMA buffer. Use the correct value which we used when
doing the mapping.
Fixes: 73e8b0528346e8 ("i2c: rcar: add DMA support")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Pull MIPS fixes from Ralf Baechle:
- Three highmem fixes:
+ Fixed mapping initialization
+ Adjust the pkmap location
+ Ensure we use at most one page for PTEs
- Fix makefile dependencies for .its targets to depend on vmlinux
- Fix reversed condition in BNEZC and JIALC software branch emulation
- Only flush initialized flush_insn_slot to avoid NULL pointer
dereference
- perf: Remove incorrect odd/even counter handling for I6400
- ftrace: Fix init functions tracing
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: .its targets depend on vmlinux
MIPS: Fix bnezc/jialc return address calculation
MIPS: kprobes: flush_insn_slot should flush only if probe initialised
MIPS: ftrace: fix init functions tracing
MIPS: mm: adjust PKMAP location
MIPS: highmem: ensure that we don't use more than one page for PTEs
MIPS: mm: fixed mappings: correct initialisation
MIPS: perf: Remove incorrect odd/even counter handling for I6400
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The .its targets require information about the kernel binary, such as
its entry point, which is extracted from the vmlinux ELF. We therefore
require that the ELF is built before the .its files are generated.
Declare this requirement in the Makefile such that make will ensure this
is always the case, otherwise in corner cases we can hit issues as the
.its is generated with an incorrect (either invalid or stale) entry
point.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: cf2a5e0bb4c6 ("MIPS: Support generating Flattened Image Trees (.itb)")
Cc: linux-mips@linux-mips.org
Cc: stable <stable@vger.kernel.org> # v4.9+
Patchwork: https://patchwork.linux-mips.org/patch/16179/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The code handling the pop76 opcode (ie. bnezc & jialc instructions) in
__compute_return_epc_for_insn() needs to set the value of $31 in the
jialc case, which is encoded with rs = 0. However its check to
differentiate bnezc (rs != 0) from jialc (rs = 0) was unfortunately
backwards, meaning that if we emulate a bnezc instruction we clobber $31
& if we emulate a jialc instruction it actually behaves like a jic
instruction.
Fix this by inverting the check of rs to match the way the instructions
are actually encoded.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: 28d6f93d201d ("MIPS: Emulate the new MIPS R6 BNEZC and JIALC instructions")
Cc: stable <stable@vger.kernel.org> # v4.0+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16178/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|