| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
There is no need to keep this helper around, opencoding it in the only
caller is just as clear.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
| |
All callers of xfs_imap_to_bp want the dinode pointer, so let's calculate it
inside xfs_imap_to_bp. Once that is done xfs_itobp becomes a fairly pointless
wrapper which can be replaced with direct calls to xfs_imap_to_bp.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to zero out part of a page which beyond EOF before setting uptodate,
otherwise, mapread or write will see non-zero data beyond EOF.
Based on the code in fs/buffer.c and the following ext4 commit:
ext4: handle EOF correctly in ext4_bio_write_page()
And yes, I wish we had a good test case for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
| |
Use this new method to replace our hacky use of ->dirty_inode. An additional
benefit is that we can now propagate errors up the stack.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
| |
Fix trivial typo error that has written "It" to "Is".
Signed-off-by: Chen Baozi <baozich@gmail.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
xfs_bdstrat_cb only adds a check for a shutdown filesystem over
xfs_buf_iorequest, but xfs_buf_iodone_callbacks just checked for a shut down
filesystem a little earlier. In addition the shutdown handling in
xfs_bdstrat_cb is not very suitable for this caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the b_iodone handler is run in calling context in xfs_buf_iorequest we
can run into a recursion where xfs_buf_iodone_callbacks keeps calling back
into xfs_buf_iorequest because an I/O error happened, which keeps calling
back into xfs_buf_iorequest. This chain will usually not take long
because the filesystem gets shut down because of log I/O errors, but even
over a short time it can cause stack overflows if run on the same context.
As a short term workaround make sure we always call the iodone handler in
workqueue context.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Almost all metadata allocations come from shallow stack usage
situations. Avoid the overhead of switching the allocation to a
workqueue as we are not in danger of running out of stack when
making these allocations. Metadata allocations are already marked
through the args that are passed down, so this is trivial to do.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reported-by: Mel Gorman <mgorman@suse.de>
Tested-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The current cursor is reallocated when retrying the allocation, so
the existing cursor needs to be destroyed in both the restart and
the failure cases.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The buffer reading code in xfs_dir2_leaf_getdents is complex and difficult to
follow due to the readahead and all the context is carries. it is also badly
indented and so difficult to read. Factor it out into a separate function to
make it easier to understand and optimise in future patches.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The struct xfs_dabuf now only tracks a single xfs_buf and all the
information it holds can be gained directly from the xfs_buf. Hence
we can remove the struct dabuf and pass the xfs_buf around
everywhere.
Kill the struct dabuf and the associated infrastructure.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
| |
First step in converting the directory code to use native
discontiguous buffers and replacing the dabuf construct.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
discontigous buffer in separate buffer format structures. This means log
recovery will recover all the changes on a per segment basis without
requiring any knowledge of the fact that it was logged from a
compound buffer.
To do this, we need to be able to determine what buffer segment any
given offset into the compound buffer sits over. This enables us to
translate the dirty bitmap in the number of separate buffer format
structures required.
We also need to be able to determine the number of bitmap elements
that a given buffer segment has, as this determines the size of the
buffer format structure. Hence we need to be able to determine the
both the start offset into the buffer and the length of a given
segment to be able to calculate this.
With this information, we can preallocate, build and format the
correct log vector array for each segment in a compound buffer to
appear exactly the same as individually logged buffers in the log.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the buffer cache supports discontiguous buffers, add
support to the transaction buffer interface for getting and reading
buffers.
Note that this patch does not convert the buffer item logging to
support discontiguous buffers. That will be done as a separate
commit.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
| |
With the internal interfaces supporting discontiguous buffer maps,
add external lookup, read and get interfaces so they can start to be
used.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While the external interface currently uses separate blockno/length
variables, we need to move internal interfaces to passing and
parsing vector maps. This will then allow us to add external
interfaces to support discontiguous buffer maps as the internal code
will already support them.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To support discontiguous buffers in the buffer cache, we need to
separate the cache index variables from the I/O map. While this is
currently a 1:1 mapping, discontiguous buffer support will break
this relationship.
However, for caching purposes, we can still treat them the same as a
contiguous buffer - the block number of the first block and the
length of the buffer - as that is still a unique representation.
Also, the only way we will ever access the discontiguous regions of
buffers is via bulding the complete buffer in the first place, so
using the initial block number and entire buffer length is a sane
way to index the buffers.
Add a block mapping vector construct to the xfs_buf and use it in
the places where we are doing IO instead of the current
b_bn/b_length variables.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The struct xfs_buf_log_format wants to think the dirty bitmap is
variable sized. In fact, it is variable size on disk simply due to
the way we map it from the in-memory structure, but we still just
use a fixed size memory allocation for the in-memory structure.
Hence it makes no sense to set the function up as a variable sized
structure when we already know it's maximum size, and we always
allocate it as such. Simplify the structure by making the dirty
bitmap a fixed sized array and just using the size of the structure
for the allocation size.
This will make it much simpler to allocate and manipulate an array
of format structures for discontiguous buffer support.
The previous struct xfs_buf_log_item size according to
/proc/slabinfo was 224 bytes. pahole doesn't give the same size
because of the variable size definition. With this modification,
pahole reports the same as /proc/slabinfo:
/* size: 224, cachelines: 4, members: 6 */
Because the xfs_buf_log_item size is now determined by the maximum
supported block size we introduce a dependency on xfs_alloc_btree.h.
Avoid this dependency by moving the idefines for the maximum block
sizes supported to xfs_types.h with all the other max/min type
defines to avoid any new dependencies.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
| |
Remove the xlog_t type definitions.
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
| |
Rename the XFS log structure to xlog to help crash distinquish it from the
other logs in Linux.
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Revert commit 1307bbd, which uses the s_umount semaphore to provide
exclusion between xfs_sync_worker and unmount, in favor of shutting down
the sync worker before freeing the log in xfs_log_unmount. This is a
cleaner way of resolving the race between xfs_sync_worker and unmount
than using s_umount.
Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit de1cbee which removed b_file_offset in favor of b_bn introduced a bug
causing xfs_buf_allocate_memory() to overestimate the number of necessary
pages. The problem is that xfs_buf_alloc() sets b_bn to -1 and thus effectively
every buffer is straddling a page boundary which causes
xfs_buf_allocate_memory() to allocate two pages and use vmalloc() for access
which is unnecessary.
Dave says xfs_buf_alloc() doesn't need to set b_bn to -1 anymore since the
buffer is inserted into the cache only after being fully initialized now.
So just make xfs_buf_alloc() fill in proper block number from the beginning.
CC: David Chinner <dchinner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we fail to find an matching extent near the requested extent
specification during a left-right distance search in
xfs_alloc_ag_vextent_near, we fail to free the original cursor that
we used to look up the XFS_BTNUM_CNT tree and hence leak it.
Reported-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An inode in the AIL can be flush locked and marked stale if
a cluster free transaction occurs at the right time. The
inode item is then marked as flushing, which causes xfsaild
to spin and leaves the filesystem stalled. This is
reproduced by running xfstests 273 in a loop for an
extended period of time.
Check for stale inodes before the flush lock. This marks
the inode as pinned, leads to a log flush and allows the
filesystem to proceed.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
| |
There should be "XFS_DFORK_DPTR, XFS_DFORK_APTR, and XFS_DFORK_PTR" instead
of "XFS_DFORK_PTR, XFS_DFORK_DPTR, and XFS_DFORK_PTR".
Signed-off-by: Chen Baozi <baozich@gmail.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
| |
The generic segment check code now returns a count of the number of
bytes in the iovec, so we don't need to roll our own anymore.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
XFS_MAXIOFFSET() is just a simple macro that resolves to
mp->m_maxioffset. It doesn't need to exist, and it just makes the
code unnecessarily loud and shouty.
Make it quiet and easy to read.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
| |
The m_maxioffset field in the struct xfs_mount contains the same
value as the superblock s_maxbytes field. There is no need to carry
two copies of this limit around, so use the VFS superblock version.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fengguang reports:
[ 780.529603] XFS (vdd): Ending clean mount
[ 781.454590] ODEBUG: object is on stack, but not annotated
[ 781.455433] ------------[ cut here ]------------
[ 781.455433] WARNING: at /c/kernel-tests/sound/lib/debugobjects.c:301 __debug_object_init+0x173/0x1f1()
[ 781.455433] Hardware name: Bochs
[ 781.455433] Modules linked in:
[ 781.455433] Pid: 26910, comm: kworker/0:2 Not tainted 3.4.0+ #51
[ 781.455433] Call Trace:
[ 781.455433] [<ffffffff8106bc84>] warn_slowpath_common+0x83/0x9b
[ 781.455433] [<ffffffff8106bcb6>] warn_slowpath_null+0x1a/0x1c
[ 781.455433] [<ffffffff814919a5>] __debug_object_init+0x173/0x1f1
[ 781.455433] [<ffffffff81491c65>] debug_object_init+0x14/0x16
[ 781.455433] [<ffffffff8108842a>] __init_work+0x20/0x22
[ 781.455433] [<ffffffff8134ea56>] xfs_alloc_vextent+0x6c/0xd5
Use INIT_WORK_ONSTACK in xfs_alloc_vextent instead of INIT_WORK.
Reported-by: Wu Fengguang <wfg@linux.intel.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On filesytems with a block size smaller than PAGE_SIZE we currently have
a problem with unwritten extents. If a we have multi-block page for
which an unwritten extent has been allocated, and only some of the
buffers have been written to, and they are not contiguous, we can expose
stale data from disk in the blocks between the writes after extent
conversion.
Example of a page with unwritten and real data.
buffer content
0 empty b_state = 0
1 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
2 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
3 empty b_state = 0
4 empty b_state = 0
5 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
6 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
7 empty b_state = 0
Buffers 1, 2, 5, and 6 have been written to, leaving 0, 3, 4, and 7
empty. Currently buffers 1, 2, 5, and 6 are added to a single ioend,
and when IO has completed, extent conversion creates a real extent from
block 1 through block 6, leaving 0 and 7 unwritten. However buffers 3
and 4 were not written to disk, so stale data is exposed from those
blocks on a subsequent read.
Fix this by setting iomap_valid = 0 when we find a buffer that is not
Uptodate. This ensures that buffers 5 and 6 are not added to the same
ioend as buffers 1 and 2. Later these blocks will be converted into two
separate real extents, leaving the blocks in between unwritten.
Signed-off-by: Alain Renaud <arenaud@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| |
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
Pull device-mapper updates from Alasdair G Kergon:
"Improve multipath's retrying mechanism in some defined circumstances
and provide a simple reserve/release mechanism for userspace tools to
access thin provisioning metadata while the pool is in use."
* tag 'dm-3.5-changes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
dm thin: provide userspace access to pool metadata
dm thin: use slab mempools
dm mpath: allow ioctls to trigger pg init
dm mpath: delay retry of bypassed pg
dm mpath: reduce size of struct multipath
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch implements two new messages that can be sent to the thin
pool target allowing it to take a snapshot of the _metadata_. This,
read-only snapshot can be accessed by userland, concurrently with the
live target.
Only one metadata snapshot can be held at a time. The pool's status
line will give the block location for the current msnap.
Since version 0.1.5 of the userland thin provisioning tools, the
thin_dump program displays the msnap as follows:
thin_dump -m <msnap root> <metadata dev>
Available here: https://github.com/jthornber/thin-provisioning-tools
Now that userland can access the metadata we can do various things
that have traditionally been kernel side tasks:
i) Incremental backups.
By using metadata snapshots we can work out what blocks have
changed over time. Combined with data snapshots we can ensure
the data doesn't change while we back it up.
A short proof of concept script can be found here:
https://github.com/jthornber/thinp-test-suite/blob/master/incremental_backup_example.rb
ii) Migration of thin devices from one pool to another.
iii) Merging snapshots back into an external origin.
iv) Asyncronous replication.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Use dedicated caches prefixed with a "dm_" name rather than relying on
kmalloc mempools backed by generic slab caches so the memory usage of
thin provisioning (and any leaks) can be accounted for independently.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
After the failure of a group of paths, any alternative paths that
need initialising do not become available until further I/O is sent to
the device. Until this has happened, ioctls return -EAGAIN.
With this patch, new paths are made available in response to an ioctl
too. The processing of the ioctl gets delayed until this has happened.
Instead of returning an error, we submit a work item to kmultipathd
(that will potentially activate the new path) and retry in ten
milliseconds.
Note that the patch doesn't retry an ioctl if the ioctl itself fails due
to a path failure. Such retries should be handled intelligently by the
code that generated the ioctl in the first place, noting that some SCSI
commands should not be retried because they are not idempotent (XOR write
commands). For commands that could be retried, there is a danger that
if the device rejected the SCSI command, the path could be errorneously
marked as failed, and the request would be retried on another path which
might fail too. It can be determined if the failure happens on the
device or on the SCSI controller, but there is no guarantee that all
SCSI drivers set these flags correctly.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If I/O needs retrying and only bypassed priority groups are available,
set the pg_init_delay_retry flag to wait before retrying.
If, for example, the reason for the bypass is that the controller is
getting reset or there is a firmware upgrade happening, retrying right
away would cause a flood of log messages and retries for what could be a
few seconds or even several minutes.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Move multipath structure's 'lock' and 'queue_size' members to eliminate
two 4-byte holes. Also use a bit within a single unsigned int for each
existing flag (saves 8-bytes). This allows future flags to be added
without each consuming an unsigned int.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull networking updates from David Miller:
1) Make syn floods consume significantly less resources by
a) Not pre-COW'ing routing metrics for SYN/ACKs
b) Mirroring the device queue mapping of the SYN for the SYN/ACK
reply.
Both from Eric Dumazet.
2) Fix calculation errors in Byte Queue Limiting, from Hiroaki SHIMODA.
3) Validate the length requested when building a paged SKB for a
socket, so we don't overrun the page vector accidently. From Jason
Wang.
4) When netlabel is disabled, we abort all IP option processing when we
see a CIPSO option. This isn't the right thing to do, we should
simply skip over it and continue processing the remaining options
(if any). Fix from Paul Moore.
5) SRIOV fixes for the mellanox driver from Jack orgenstein and Marcel
Apfelbaum.
6) 8139cp enables the receiver before the ring address is properly
programmed, which potentially lets the device crap over random
memory. Fix from Jason Wang.
7) e1000/e1000e fixes for i217 RST handling, and an improper buffer
address reference in jumbo RX frame processing from Bruce Allan and
Sebastian Andrzej Siewior, respectively.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
fec_mpc52xx: fix timestamp filtering
mcs7830: Implement link state detection
e1000e: fix Rapid Start Technology support for i217
e1000: look into the page instead of skb->data for e1000_tbi_adjust_stats()
r8169: call netif_napi_del at errpaths and at driver unload
tcp: reflect SYN queue_mapping into SYNACK packets
tcp: do not create inetpeer on SYNACK message
8139cp/8139too: terminate the eeprom access with the right opmode
8139cp: set ring address before enabling receiver
cipso: handle CIPSO options correctly when NetLabel is disabled
net: sock: validate data_len before allocating skb in sock_alloc_send_pskb()
bql: Avoid possible inconsistent calculation.
bql: Avoid unneeded limit decrement.
bql: Fix POSDIFF() to integer overflow aware.
net/mlx4_core: Fix obscure mlx4_cmd_box parameter in QUERY_DEV_CAP
net/mlx4_core: Check port out-of-range before using in mlx4_slave_cap
net/mlx4_core: Fixes for VF / Guest startup flow
net/mlx4_en: Fix improper use of "port" parameter in mlx4_en_event
net/mlx4_core: Fix number of EQs used in ICM initialisation
net/mlx4_core: Fix the slave_id out-of-range test in mlx4_eq_int
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
skb_defer_rx_timestamp was called with a freshly allocated skb but must
be called with rskb instead.
Signed-off-by: Stephan Gatzka <stephan@gatzka.org>
Cc: stable <stable@vger.kernel.org>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add .status callback that detects link state changes.
Tested with MCS7832CV-AA chip (9710:7830, identified as rev.C by the driver).
Fixes https://bugzilla.kernel.org/show_bug.cgi?id=28532
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The definition of I217_PROXY_CTRL must use the BM_PHY_REG() macro instead
of the PHY_REG() macro for PHY page 800 register 70 since it is for a PHY
register greater than the maximum allowed by the latter macro, and fix a
typo setting the I217_MEMPWR register in e1000_suspend_workarounds_ich8lan.
Also for clarity, rename a few defines as bit definitions instead of masks.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This is another fixup where the data is not transfered into buffer
addressed by skb->data but into a page.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
when register_netdev fails, the init'ed NAPIs by netif_napi_add must be
deleted with netif_napi_del, and also when driver unloads, it should
delete the NAPI before unregistering netdevice using unregister_netdev.
Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
While testing how linux behaves on SYNFLOOD attack on multiqueue device
(ixgbe), I found that SYNACK messages were dropped at Qdisc level
because we send them all on a single queue.
Obvious choice is to reflect incoming SYN packet @queue_mapping to
SYNACK packet.
Under stress, my machine could only send 25.000 SYNACK per second (for
200.000 incoming SYN per second). NIC : ixgbe with 16 rx/tx queues.
After patch, not a single SYNACK is dropped.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Another problem on SYNFLOOD/DDOS attack is the inetpeer cache getting
larger and larger, using lots of memory and cpu time.
tcp_v4_send_synack()
->inet_csk_route_req()
->ip_route_output_flow()
->rt_set_nexthop()
->rt_init_metrics()
->inet_getpeer( create = true)
This is a side effect of commit a4daad6b09230 (net: Pre-COW metrics for
TCP) added in 2.6.39
Possible solution :
Instruct inet_csk_route_req() to remove FLOWI_FLAG_PRECOW_METRICS
Before patch :
# grep peer /proc/slabinfo
inet_peer_cache 4175430 4175430 192 42 2 : tunables 0 0 0 : slabdata 99415 99415 0
Samples: 41K of event 'cycles', Event count (approx.): 30716565122
+ 20,24% ksoftirqd/0 [kernel.kallsyms] [k] inet_getpeer
+ 8,19% ksoftirqd/0 [kernel.kallsyms] [k] peer_avl_rebalance.isra.1
+ 4,81% ksoftirqd/0 [kernel.kallsyms] [k] sha_transform
+ 3,64% ksoftirqd/0 [kernel.kallsyms] [k] fib_table_lookup
+ 2,36% ksoftirqd/0 [ixgbe] [k] ixgbe_poll
+ 2,16% ksoftirqd/0 [kernel.kallsyms] [k] __ip_route_output_key
+ 2,11% ksoftirqd/0 [kernel.kallsyms] [k] kernel_map_pages
+ 2,11% ksoftirqd/0 [kernel.kallsyms] [k] ip_route_input_common
+ 2,01% ksoftirqd/0 [kernel.kallsyms] [k] __inet_lookup_established
+ 1,83% ksoftirqd/0 [kernel.kallsyms] [k] md5_transform
+ 1,75% ksoftirqd/0 [kernel.kallsyms] [k] check_leaf.isra.9
+ 1,49% ksoftirqd/0 [kernel.kallsyms] [k] ipt_do_table
+ 1,46% ksoftirqd/0 [kernel.kallsyms] [k] hrtimer_interrupt
+ 1,45% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_alloc
+ 1,29% ksoftirqd/0 [kernel.kallsyms] [k] inet_csk_search_req
+ 1,29% ksoftirqd/0 [kernel.kallsyms] [k] __netif_receive_skb
+ 1,16% ksoftirqd/0 [kernel.kallsyms] [k] copy_user_generic_string
+ 1,15% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_free
+ 1,02% ksoftirqd/0 [kernel.kallsyms] [k] tcp_make_synack
+ 0,93% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_lock_bh
+ 0,87% ksoftirqd/0 [kernel.kallsyms] [k] __call_rcu
+ 0,84% ksoftirqd/0 [kernel.kallsyms] [k] rt_garbage_collect
+ 0,84% ksoftirqd/0 [kernel.kallsyms] [k] fib_rules_lookup
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently, we terminate the eeprom access through clearing the CS by:
RTL_W8 (Cfg9346, ~EE_CS); or writeb (~EE_CS, ee_addr);
This would left the eeprom into "Config. Register Write Enable:"
state which is not expcted as the highest two bits were set to
0x11 ( expected is the "Normal" mode (0x00)). Solving this by write
0x0 instead of ~EE_CS when terminating the eeprom access.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently, we enable the receiver before setting the ring address which could
lead the card DMA into unexpected areas. Solving this by set the ring address
before enabling the receiver.
btw. I find and test this in qemu as I didn't have a 8139cp card in hand. please
review it carefully.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When NetLabel is not enabled, e.g. CONFIG_NETLABEL=n, and the system
receives a CIPSO tagged packet it is dropped (cipso_v4_validate()
returns non-zero). In most cases this is the correct and desired
behavior, however, in the case where we are simply forwarding the
traffic, e.g. acting as a network bridge, this becomes a problem.
This patch fixes the forwarding problem by providing the basic CIPSO
validation code directly in ip_options_compile() without the need for
the NetLabel or CIPSO code. The new validation code can not perform
any of the CIPSO option label/value verification that
cipso_v4_validate() does, but it can verify the basic CIPSO option
format.
The behavior when NetLabel is enabled is unchanged.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We need to validate the number of pages consumed by data_len, otherwise frags
array could be overflowed by userspace. So this patch validate data_len and
return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
dql->num_queued could change while processing dql_completed().
To provide consistent calculation, added an on stack variable.
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|