| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit a85ceafd41927e41a4103d228a993df7edd8823b upstream.
Since rc was initialised to -ENOMEM in cifs_get_smb_ses(), when an
existing smb session was found, free_xid() would be called and then
print
CIFS: fs/cifs/connect.c: Existing tcp session with server found
CIFS: fs/cifs/connect.c: VFS: in cifs_get_smb_ses as Xid: 44 with uid: 0
CIFS: fs/cifs/connect.c: Existing smb sess found (status=1)
CIFS: fs/cifs/connect.c: VFS: leaving cifs_get_smb_ses (xid = 44) rc = -12
Fix this by initialising rc to 0 and then let free_xid() print this
instead
CIFS: fs/cifs/connect.c: Existing tcp session with server found
CIFS: fs/cifs/connect.c: VFS: in cifs_get_smb_ses as Xid: 14 with uid: 0
CIFS: fs/cifs/connect.c: Existing smb sess found (status=1)
CIFS: fs/cifs/connect.c: VFS: leaving cifs_get_smb_ses (xid = 14) rc = 0
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 11933cf1d91d57da9e5c53822a540bbdc2656c16 upstream.
The propagate_mnt() function handles mount propagation when creating
mounts and propagates the source mount tree @source_mnt to all
applicable nodes of the destination propagation mount tree headed by
@dest_mnt.
Unfortunately it contains a bug where it fails to terminate at peers of
@source_mnt when looking up copies of the source mount that become
masters for copies of the source mount tree mounted on top of slaves in
the destination propagation tree causing a NULL dereference.
Once the mechanics of the bug are understood it's easy to trigger.
Because of unprivileged user namespaces it is available to unprivileged
users.
While fixing this bug we've gotten confused multiple times due to
unclear terminology or missing concepts. So let's start this with some
clarifications:
* The terms "master" or "peer" denote a shared mount. A shared mount
belongs to a peer group.
* A peer group is a set of shared mounts that propagate to each other.
They are identified by a peer group id. The peer group id is available
in @shared_mnt->mnt_group_id.
Shared mounts within the same peer group have the same peer group id.
The peers in a peer group can be reached via @shared_mnt->mnt_share.
* The terms "slave mount" or "dependent mount" denote a mount that
receives propagation from a peer in a peer group. IOW, shared mounts
may have slave mounts and slave mounts have shared mounts as their
master. Slave mounts of a given peer in a peer group are listed on
that peers slave list available at @shared_mnt->mnt_slave_list.
* The term "master mount" denotes a mount in a peer group. IOW, it
denotes a shared mount or a peer mount in a peer group. The term
"master mount" - or "master" for short - is mostly used when talking
in the context of slave mounts that receive propagation from a master
mount. A master mount of a slave identifies the closest peer group a
slave mount receives propagation from. The master mount of a slave can
be identified via @slave_mount->mnt_master. Different slaves may point
to different masters in the same peer group.
* Multiple peers in a peer group can have non-empty ->mnt_slave_lists.
Non-empty ->mnt_slave_lists of peers don't intersect. Consequently, to
ensure all slave mounts of a peer group are visited the
->mnt_slave_lists of all peers in a peer group have to be walked.
* Slave mounts point to a peer in the closest peer group they receive
propagation from via @slave_mnt->mnt_master (see above). Together with
these peers they form a propagation group (see below). The closest
peer group can thus be identified through the peer group id
@slave_mnt->mnt_master->mnt_group_id of the peer/master that a slave
mount receives propagation from.
* A shared-slave mount is a slave mount to a peer group pg1 while also
a peer in another peer group pg2. IOW, a peer group may receive
propagation from another peer group.
If a peer group pg1 is a slave to another peer group pg2 then all
peers in peer group pg1 point to the same peer in peer group pg2 via
->mnt_master. IOW, all peers in peer group pg1 appear on the same
->mnt_slave_list. IOW, they cannot be slaves to different peer groups.
* A pure slave mount is a slave mount that is a slave to a peer group
but is not a peer in another peer group.
* A propagation group denotes the set of mounts consisting of a single
peer group pg1 and all slave mounts and shared-slave mounts that point
to a peer in that peer group via ->mnt_master. IOW, all slave mounts
such that @slave_mnt->mnt_master->mnt_group_id is equal to
@shared_mnt->mnt_group_id.
The concept of a propagation group makes it easier to talk about a
single propagation level in a propagation tree.
For example, in propagate_mnt() the immediate peers of @dest_mnt and
all slaves of @dest_mnt's peer group form a propagation group propg1.
So a shared-slave mount that is a slave in propg1 and that is a peer
in another peer group pg2 forms another propagation group propg2
together with all slaves that point to that shared-slave mount in
their ->mnt_master.
* A propagation tree refers to all mounts that receive propagation
starting from a specific shared mount.
For example, for propagate_mnt() @dest_mnt is the start of a
propagation tree. The propagation tree ecompasses all mounts that
receive propagation from @dest_mnt's peer group down to the leafs.
With that out of the way let's get to the actual algorithm.
We know that @dest_mnt is guaranteed to be a pure shared mount or a
shared-slave mount. This is guaranteed by a check in
attach_recursive_mnt(). So propagate_mnt() will first propagate the
source mount tree to all peers in @dest_mnt's peer group:
for (n = next_peer(dest_mnt); n != dest_mnt; n = next_peer(n)) {
ret = propagate_one(n);
if (ret)
goto out;
}
Notice, that the peer propagation loop of propagate_mnt() doesn't
propagate @dest_mnt itself. @dest_mnt is mounted directly in
attach_recursive_mnt() after we propagated to the destination
propagation tree.
The mount that will be mounted on top of @dest_mnt is @source_mnt. This
copy was created earlier even before we entered attach_recursive_mnt()
and doesn't concern us a lot here.
It's just important to notice that when propagate_mnt() is called
@source_mnt will not yet have been mounted on top of @dest_mnt. Thus,
@source_mnt->mnt_parent will either still point to @source_mnt or - in
the case @source_mnt is moved and thus already attached - still to its
former parent.
For each peer @m in @dest_mnt's peer group propagate_one() will create a
new copy of the source mount tree and mount that copy @child on @m such
that @child->mnt_parent points to @m after propagate_one() returns.
propagate_one() will stash the last destination propagation node @m in
@last_dest and the last copy it created for the source mount tree in
@last_source.
Hence, if we call into propagate_one() again for the next destination
propagation node @m, @last_dest will point to the previous destination
propagation node and @last_source will point to the previous copy of the
source mount tree and mounted on @last_dest.
Each new copy of the source mount tree is created from the previous copy
of the source mount tree. This will become important later.
The peer loop in propagate_mnt() is straightforward. We iterate through
the peers copying and updating @last_source and @last_dest as we go
through them and mount each copy of the source mount tree @child on a
peer @m in @dest_mnt's peer group.
After propagate_mnt() handled the peers in @dest_mnt's peer group
propagate_mnt() will propagate the source mount tree down the
propagation tree that @dest_mnt's peer group propagates to:
for (m = next_group(dest_mnt, dest_mnt); m;
m = next_group(m, dest_mnt)) {
/* everything in that slave group */
n = m;
do {
ret = propagate_one(n);
if (ret)
goto out;
n = next_peer(n);
} while (n != m);
}
The next_group() helper will recursively walk the destination
propagation tree, descending into each propagation group of the
propagation tree.
The important part is that it takes care to propagate the source mount
tree to all peers in the peer group of a propagation group before it
propagates to the slaves to those peers in the propagation group. IOW,
it creates and mounts copies of the source mount tree that become
masters before it creates and mounts copies of the source mount tree
that become slaves to these masters.
It is important to remember that propagating the source mount tree to
each mount @m in the destination propagation tree simply means that we
create and mount new copies @child of the source mount tree on @m such
that @child->mnt_parent points to @m.
Since we know that each node @m in the destination propagation tree
headed by @dest_mnt's peer group will be overmounted with a copy of the
source mount tree and since we know that the propagation properties of
each copy of the source mount tree we create and mount at @m will mostly
mirror the propagation properties of @m. We can use that information to
create and mount the copies of the source mount tree that become masters
before their slaves.
The easy case is always when @m and @last_dest are peers in a peer group
of a given propagation group. In that case we know that we can simply
copy @last_source without having to figure out what the master for the
new copy @child of the source mount tree needs to be as we've done that
in a previous call to propagate_one().
The hard case is when we're dealing with a slave mount or a shared-slave
mount @m in a destination propagation group that we need to create and
mount a copy of the source mount tree on.
For each propagation group in the destination propagation tree we
propagate the source mount tree to we want to make sure that the copies
@child of the source mount tree we create and mount on slaves @m pick an
ealier copy of the source mount tree that we mounted on a master @m of
the destination propagation group as their master. This is a mouthful
but as far as we can tell that's the core of it all.
But, if we keep track of the masters in the destination propagation tree
@m we can use the information to find the correct master for each copy
of the source mount tree we create and mount at the slaves in the
destination propagation tree @m.
Let's walk through the base case as that's still fairly easy to grasp.
If we're dealing with the first slave in the propagation group that
@dest_mnt is in then we don't yet have marked any masters in the
destination propagation tree.
We know the master for the first slave to @dest_mnt's peer group is
simple @dest_mnt. So we expect this algorithm to yield a copy of the
source mount tree that was mounted on a peer in @dest_mnt's peer group
as the master for the copy of the source mount tree we want to mount at
the first slave @m:
for (n = m; ; n = p) {
p = n->mnt_master;
if (p == dest_master || IS_MNT_MARKED(p))
break;
}
For the first slave we walk the destination propagation tree all the way
up to a peer in @dest_mnt's peer group. IOW, the propagation hierarchy
can be walked by walking up the @mnt->mnt_master hierarchy of the
destination propagation tree @m. We will ultimately find a peer in
@dest_mnt's peer group and thus ultimately @dest_mnt->mnt_master.
Btw, here the assumption we listed at the beginning becomes important.
Namely, that peers in a peer group pg1 that are slaves in another peer
group pg2 appear on the same ->mnt_slave_list. IOW, all slaves who are
peers in peer group pg1 point to the same peer in peer group pg2 via
their ->mnt_master. Otherwise the termination condition in the code
above would be wrong and next_group() would be broken too.
So the first iteration sets:
n = m;
p = n->mnt_master;
such that @p now points to a peer or @dest_mnt itself. We walk up one
more level since we don't have any marked mounts. So we end up with:
n = dest_mnt;
p = dest_mnt->mnt_master;
If @dest_mnt's peer group is not slave to another peer group then @p is
now NULL. If @dest_mnt's peer group is a slave to another peer group
then @p now points to @dest_mnt->mnt_master points which is a master
outside the propagation tree we're dealing with.
Now we need to figure out the master for the copy of the source mount
tree we're about to create and mount on the first slave of @dest_mnt's
peer group:
do {
struct mount *parent = last_source->mnt_parent;
if (last_source == first_source)
break;
done = parent->mnt_master == p;
if (done && peers(n, parent))
break;
last_source = last_source->mnt_master;
} while (!done);
We know that @last_source->mnt_parent points to @last_dest and
@last_dest is the last peer in @dest_mnt's peer group we propagated to
in the peer loop in propagate_mnt().
Consequently, @last_source is the last copy we created and mount on that
last peer in @dest_mnt's peer group. So @last_source is the master we
want to pick.
We know that @last_source->mnt_parent->mnt_master points to
@last_dest->mnt_master. We also know that @last_dest->mnt_master is
either NULL or points to a master outside of the destination propagation
tree and so does @p. Hence:
done = parent->mnt_master == p;
is trivially true in the base condition.
We also know that for the first slave mount of @dest_mnt's peer group
that @last_dest either points @dest_mnt itself because it was
initialized to:
last_dest = dest_mnt;
at the beginning of propagate_mnt() or it will point to a peer of
@dest_mnt in its peer group. In both cases it is guaranteed that on the
first iteration @n and @parent are peers (Please note the check for
peers here as that's important.):
if (done && peers(n, parent))
break;
So, as we expected, we select @last_source, which referes to the last
copy of the source mount tree we mounted on the last peer in @dest_mnt's
peer group, as the master of the first slave in @dest_mnt's peer group.
The rest is taken care of by clone_mnt(last_source, ...). We'll skip
over that part otherwise this becomes a blogpost.
At the end of propagate_mnt() we now mark @m->mnt_master as the first
master in the destination propagation tree that is distinct from
@dest_mnt->mnt_master. IOW, we mark @dest_mnt itself as a master.
By marking @dest_mnt or one of it's peers we are able to easily find it
again when we later lookup masters for other copies of the source mount
tree we mount copies of the source mount tree on slaves @m to
@dest_mnt's peer group. This, in turn allows us to find the master we
selected for the copies of the source mount tree we mounted on master in
the destination propagation tree again.
The important part is to realize that the code makes use of the fact
that the last copy of the source mount tree stashed in @last_source was
mounted on top of the previous destination propagation node @last_dest.
What this means is that @last_source allows us to walk the destination
propagation hierarchy the same way each destination propagation node @m
does.
If we take @last_source, which is the copy of @source_mnt we have
mounted on @last_dest in the previous iteration of propagate_one(), then
we know @last_source->mnt_parent points to @last_dest but we also know
that as we walk through the destination propagation tree that
@last_source->mnt_master will point to an earlier copy of the source
mount tree we mounted one an earlier destination propagation node @m.
IOW, @last_source->mnt_parent will be our hook into the destination
propagation tree and each consecutive @last_source->mnt_master will lead
us to an earlier propagation node @m via
@last_source->mnt_master->mnt_parent.
Hence, by walking up @last_source->mnt_master, each of which is mounted
on a node that is a master @m in the destination propagation tree we can
also walk up the destination propagation hierarchy.
So, for each new destination propagation node @m we use the previous
copy of @last_source and the fact it's mounted on the previous
propagation node @last_dest via @last_source->mnt_master->mnt_parent to
determine what the master of the new copy of @last_source needs to be.
The goal is to find the _closest_ master that the new copy of the source
mount tree we are about to create and mount on a slave @m in the
destination propagation tree needs to pick. IOW, we want to find a
suitable master in the propagation group.
As the propagation structure of the source mount propagation tree we
create mirrors the propagation structure of the destination propagation
tree we can find @m's closest master - i.e., a marked master - which is
a peer in the closest peer group that @m receives propagation from. We
store that closest master of @m in @p as before and record the slave to
that master in @n
We then search for this master @p via @last_source by walking up the
master hierarchy starting from the last copy of the source mount tree
stored in @last_source that we created and mounted on the previous
destination propagation node @m.
We will try to find the master by walking @last_source->mnt_master and
by comparing @last_source->mnt_master->mnt_parent->mnt_master to @p. If
we find @p then we can figure out what earlier copy of the source mount
tree needs to be the master for the new copy of the source mount tree
we're about to create and mount at the current destination propagation
node @m.
If @last_source->mnt_master->mnt_parent and @n are peers then we know
that the closest master they receive propagation from is
@last_source->mnt_master->mnt_parent->mnt_master. If not then the
closest immediate peer group that they receive propagation from must be
one level higher up.
This builds on the earlier clarification at the beginning that all peers
in a peer group which are slaves of other peer groups all point to the
same ->mnt_master, i.e., appear on the same ->mnt_slave_list, of the
closest peer group that they receive propagation from.
However, terminating the walk has corner cases.
If the closest marked master for a given destination node @m cannot be
found by walking up the master hierarchy via @last_source->mnt_master
then we need to terminate the walk when we encounter @source_mnt again.
This isn't an arbitrary termination. It simply means that the new copy
of the source mount tree we're about to create has a copy of the source
mount tree we created and mounted on a peer in @dest_mnt's peer group as
its master. IOW, @source_mnt is the peer in the closest peer group that
the new copy of the source mount tree receives propagation from.
We absolutely have to stop @source_mnt because @last_source->mnt_master
either points outside the propagation hierarchy we're dealing with or it
is NULL because @source_mnt isn't a shared-slave.
So continuing the walk past @source_mnt would cause a NULL dereference
via @last_source->mnt_master->mnt_parent. And so we have to stop the
walk when we encounter @source_mnt again.
One scenario where this can happen is when we first handled a series of
slaves of @dest_mnt's peer group and then encounter peers in a new peer
group that is a slave to @dest_mnt's peer group. We handle them and then
we encounter another slave mount to @dest_mnt that is a pure slave to
@dest_mnt's peer group. That pure slave will have a peer in @dest_mnt's
peer group as its master. Consequently, the new copy of the source mount
tree will need to have @source_mnt as it's master. So we walk the
propagation hierarchy all the way up to @source_mnt based on
@last_source->mnt_master.
So terminate on @source_mnt, easy peasy. Except, that the check misses
something that the rest of the algorithm already handles.
If @dest_mnt has peers in it's peer group the peer loop in
propagate_mnt():
for (n = next_peer(dest_mnt); n != dest_mnt; n = next_peer(n)) {
ret = propagate_one(n);
if (ret)
goto out;
}
will consecutively update @last_source with each previous copy of the
source mount tree we created and mounted at the previous peer in
@dest_mnt's peer group. So after that loop terminates @last_source will
point to whatever copy of the source mount tree was created and mounted
on the last peer in @dest_mnt's peer group.
Furthermore, if there is even a single additional peer in @dest_mnt's
peer group then @last_source will __not__ point to @source_mnt anymore.
Because, as we mentioned above, @dest_mnt isn't even handled in this
loop but directly in attach_recursive_mnt(). So it can't even accidently
come last in that peer loop.
So the first time we handle a slave mount @m of @dest_mnt's peer group
the copy of the source mount tree we create will make the __last copy of
the source mount tree we created and mounted on the last peer in
@dest_mnt's peer group the master of the new copy of the source mount
tree we create and mount on the first slave of @dest_mnt's peer group__.
But this means that the termination condition that checks for
@source_mnt is wrong. The @source_mnt cannot be found anymore by
propagate_one(). Instead it will find the last copy of the source mount
tree we created and mounted for the last peer of @dest_mnt's peer group
again. And that is a peer of @source_mnt not @source_mnt itself.
IOW, we fail to terminate the loop correctly and ultimately dereference
@last_source->mnt_master->mnt_parent. When @source_mnt's peer group
isn't slave to another peer group then @last_source->mnt_master is NULL
causing the splat below.
For example, assume @dest_mnt is a pure shared mount and has three peers
in its peer group:
===================================================================================
mount-id mount-parent-id peer-group-id
===================================================================================
(@dest_mnt) mnt_master[216] 309 297 shared:216
\
(@source_mnt) mnt_master[218]: 609 609 shared:218
(1) mnt_master[216]: 607 605 shared:216
\
(P1) mnt_master[218]: 624 607 shared:218
(2) mnt_master[216]: 576 574 shared:216
\
(P2) mnt_master[218]: 625 576 shared:218
(3) mnt_master[216]: 545 543 shared:216
\
(P3) mnt_master[218]: 626 545 shared:218
After this sequence has been processed @last_source will point to (P3),
the copy generated for the third peer in @dest_mnt's peer group we
handled. So the copy of the source mount tree (P4) we create and mount
on the first slave of @dest_mnt's peer group:
===================================================================================
mount-id mount-parent-id peer-group-id
===================================================================================
mnt_master[216] 309 297 shared:216
/
/
(S0) mnt_slave 483 481 master:216
\
\ (P3) mnt_master[218] 626 545 shared:218
\ /
\/
(P4) mnt_slave 627 483 master:218
will pick the last copy of the source mount tree (P3) as master, not (S0).
When walking the propagation hierarchy via @last_source's master
hierarchy we encounter (P3) but not (S0), i.e., @source_mnt.
We can fix this in multiple ways:
(1) By setting @last_source to @source_mnt after we processed the peers
in @dest_mnt's peer group right after the peer loop in
propagate_mnt().
(2) By changing the termination condition that relies on finding exactly
@source_mnt to finding a peer of @source_mnt.
(3) By only moving @last_source when we actually venture into a new peer
group or some clever variant thereof.
The first two options are minimally invasive and what we want as a fix.
The third option is more intrusive but something we'd like to explore in
the near future.
This passes all LTP tests and specifically the mount propagation
testsuite part of it. It also holds up against all known reproducers of
this issues.
Final words.
First, this is a clever but __worringly__ underdocumented algorithm.
There isn't a single detailed comment to be found in next_group(),
propagate_one() or anywhere else in that file for that matter. This has
been a giant pain to understand and work through and a bug like this is
insanely difficult to fix without a detailed understanding of what's
happening. Let's not talk about the amount of time that was sunk into
fixing this.
Second, all the cool kids with access to
unshare --mount --user --map-root --propagation=unchanged
are going to have a lot of fun. IOW, triggerable by unprivileged users
while namespace_lock() lock is held.
[ 115.848393] BUG: kernel NULL pointer dereference, address: 0000000000000010
[ 115.848967] #PF: supervisor read access in kernel mode
[ 115.849386] #PF: error_code(0x0000) - not-present page
[ 115.849803] PGD 0 P4D 0
[ 115.850012] Oops: 0000 [#1] PREEMPT SMP PTI
[ 115.850354] CPU: 0 PID: 15591 Comm: mount Not tainted 6.1.0-rc7 #3
[ 115.850851] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 115.851510] RIP: 0010:propagate_one.part.0+0x7f/0x1a0
[ 115.851924] Code: 75 eb 4c 8b 05 c2 25 37 02 4c 89 ca 48 8b 4a 10
49 39 d0 74 1e 48 3b 81 e0 00 00 00 74 26 48 8b 92 e0 00 00 00 be 01
00 00 00 <48> 8b 4a 10 49 39 d0 75 e2 40 84 f6 74 38 4c 89 05 84 25 37
02 4d
[ 115.853441] RSP: 0018:ffffb8d5443d7d50 EFLAGS: 00010282
[ 115.853865] RAX: ffff8e4d87c41c80 RBX: ffff8e4d88ded780 RCX: ffff8e4da4333a00
[ 115.854458] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8e4d88ded780
[ 115.855044] RBP: ffff8e4d88ded780 R08: ffff8e4da4338000 R09: ffff8e4da43388c0
[ 115.855693] R10: 0000000000000002 R11: ffffb8d540158000 R12: ffffb8d5443d7da8
[ 115.856304] R13: ffff8e4d88ded780 R14: 0000000000000000 R15: 0000000000000000
[ 115.856859] FS: 00007f92c90c9800(0000) GS:ffff8e4dfdc00000(0000)
knlGS:0000000000000000
[ 115.857531] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 115.858006] CR2: 0000000000000010 CR3: 0000000022f4c002 CR4: 00000000000706f0
[ 115.858598] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 115.859393] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 115.860099] Call Trace:
[ 115.860358] <TASK>
[ 115.860535] propagate_mnt+0x14d/0x190
[ 115.860848] attach_recursive_mnt+0x274/0x3e0
[ 115.861212] path_mount+0x8c8/0xa60
[ 115.861503] __x64_sys_mount+0xf6/0x140
[ 115.861819] do_syscall_64+0x5b/0x80
[ 115.862117] ? do_faccessat+0x123/0x250
[ 115.862435] ? syscall_exit_to_user_mode+0x17/0x40
[ 115.862826] ? do_syscall_64+0x67/0x80
[ 115.863133] ? syscall_exit_to_user_mode+0x17/0x40
[ 115.863527] ? do_syscall_64+0x67/0x80
[ 115.863835] ? do_syscall_64+0x67/0x80
[ 115.864144] ? do_syscall_64+0x67/0x80
[ 115.864452] ? exc_page_fault+0x70/0x170
[ 115.864775] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 115.865187] RIP: 0033:0x7f92c92b0ebe
[ 115.865480] Code: 48 8b 0d 75 4f 0c 00 f7 d8 64 89 01 48 83 c8 ff
c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 42 4f 0c 00 f7 d8 64 89
01 48
[ 115.866984] RSP: 002b:00007fff000aa728 EFLAGS: 00000246 ORIG_RAX:
00000000000000a5
[ 115.867607] RAX: ffffffffffffffda RBX: 000055a77888d6b0 RCX: 00007f92c92b0ebe
[ 115.868240] RDX: 000055a77888d8e0 RSI: 000055a77888e6e0 RDI: 000055a77888e620
[ 115.868823] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[ 115.869403] R10: 0000000000001000 R11: 0000000000000246 R12: 000055a77888e620
[ 115.869994] R13: 000055a77888d8e0 R14: 00000000ffffffff R15: 00007f92c93e4076
[ 115.870581] </TASK>
[ 115.870763] Modules linked in: nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink qrtr snd_intel8x0
sunrpc snd_ac97_codec ac97_bus snd_pcm snd_timer intel_rapl_msr
intel_rapl_common snd vboxguest intel_powerclamp video rapl joydev
soundcore i2c_piix4 wmi fuse zram xfs vmwgfx crct10dif_pclmul
crc32_pclmul crc32c_intel polyval_clmulni polyval_generic
drm_ttm_helper ttm e1000 ghash_clmulni_intel serio_raw ata_generic
pata_acpi scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath
[ 115.875288] CR2: 0000000000000010
[ 115.875641] ---[ end trace 0000000000000000 ]---
[ 115.876135] RIP: 0010:propagate_one.part.0+0x7f/0x1a0
[ 115.876551] Code: 75 eb 4c 8b 05 c2 25 37 02 4c 89 ca 48 8b 4a 10
49 39 d0 74 1e 48 3b 81 e0 00 00 00 74 26 48 8b 92 e0 00 00 00 be 01
00 00 00 <48> 8b 4a 10 49 39 d0 75 e2 40 84 f6 74 38 4c 89 05 84 25 37
02 4d
[ 115.878086] RSP: 0018:ffffb8d5443d7d50 EFLAGS: 00010282
[ 115.878511] RAX: ffff8e4d87c41c80 RBX: ffff8e4d88ded780 RCX: ffff8e4da4333a00
[ 115.879128] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8e4d88ded780
[ 115.879715] RBP: ffff8e4d88ded780 R08: ffff8e4da4338000 R09: ffff8e4da43388c0
[ 115.880359] R10: 0000000000000002 R11: ffffb8d540158000 R12: ffffb8d5443d7da8
[ 115.880962] R13: ffff8e4d88ded780 R14: 0000000000000000 R15: 0000000000000000
[ 115.881548] FS: 00007f92c90c9800(0000) GS:ffff8e4dfdc00000(0000)
knlGS:0000000000000000
[ 115.882234] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 115.882713] CR2: 0000000000000010 CR3: 0000000022f4c002 CR4: 00000000000706f0
[ 115.883314] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 115.883966] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Fixes: f2ebb3a921c1 ("smarter propagate_mnt()")
Fixes: 5ec0811d3037 ("propogate_mnt: Handle the first propogated copy being a slave")
Cc: <stable@vger.kernel.org>
Reported-by: Ditang Chen <ditang.c@gmail.com>
Signed-off-by: Seth Forshee (Digital Ocean) <sforshee@kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mount
commit 9f2b5debc07073e6dfdd774e3594d0224b991927 upstream.
Despite specifying UID and GID in mount command, the specified UID and GID
were not being assigned. This patch fixes this issue.
Link: https://lkml.kernel.org/r/C0264BF5-059C-45CF-B8DA-3A3BD2C803A2@live.com
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 572302af1258459e124437b8f3369357447afac7 upstream.
Commit 57fe60df6241 ("reiserfs: add atomic addition of selinux attributes
during inode creation") defined reiserfs_security_free() to free the name
and value of a security xattr allocated by the active LSM through
security_old_inode_init_security(). However, this function is not called
in the reiserfs code.
Thus, add a call to reiserfs_security_free() whenever
reiserfs_security_init() is called, and initialize value to NULL, to avoid
to call kfree() on an uninitialized pointer.
Finally, remove the kfree() for the xattr name, as it is not allocated
anymore.
Fixes: 57fe60df6241 ("reiserfs: add atomic addition of selinux attributes during inode creation")
Cc: stable@vger.kernel.org
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Mimi Zohar <zohar@linux.ibm.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit d23417a5bf3a3afc55de5442eb46e1e60458b0a1 ]
When insert and remove the orangefs module, then debug_help_string will
be leaked:
unreferenced object 0xffff8881652ba000 (size 4096):
comm "insmod", pid 1701, jiffies 4294893639 (age 13218.530s)
hex dump (first 32 bytes):
43 6c 69 65 6e 74 20 44 65 62 75 67 20 4b 65 79 Client Debug Key
77 6f 72 64 73 20 61 72 65 20 75 6e 6b 6e 6f 77 words are unknow
backtrace:
[<0000000004e6f8e3>] kmalloc_trace+0x27/0xa0
[<0000000006f75d85>] orangefs_prepare_debugfs_help_string+0x5e/0x480 [orangefs]
[<0000000091270a2a>] _sub_I_65535_1+0x57/0xf70 [crc_itu_t]
[<000000004b1ee1a3>] do_one_initcall+0x87/0x2a0
[<000000001d0614ae>] do_init_module+0xdf/0x320
[<00000000efef068c>] load_module+0x2f98/0x3330
[<000000006533b44d>] __do_sys_finit_module+0x113/0x1b0
[<00000000a0da6f99>] do_syscall_64+0x35/0x80
[<000000007790b19b>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
When remove the module, should always free debug_help_string. Should
always free the allocated buffer when change the free_debug_help_string.
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 8d824e69d9f3fa3121b2dda25053bae71e2460d2 ]
Syzbot reported a OOB read bug:
==================================================================
BUG: KASAN: slab-out-of-bounds in hfs_strcmp+0x117/0x190
fs/hfs/string.c:84
Read of size 1 at addr ffff88807eb62c4e by task kworker/u4:1/11
CPU: 1 PID: 11 Comm: kworker/u4:1 Not tainted
6.1.0-rc6-syzkaller-00308-g644e9524388a #0
Workqueue: writeback wb_workfn (flush-7:0)
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
print_address_description+0x74/0x340 mm/kasan/report.c:284
print_report+0x107/0x1f0 mm/kasan/report.c:395
kasan_report+0xcd/0x100 mm/kasan/report.c:495
hfs_strcmp+0x117/0x190 fs/hfs/string.c:84
__hfs_brec_find+0x213/0x5c0 fs/hfs/bfind.c:75
hfs_brec_find+0x276/0x520 fs/hfs/bfind.c:138
hfs_write_inode+0x34c/0xb40 fs/hfs/inode.c:462
write_inode fs/fs-writeback.c:1440 [inline]
If the input inode of hfs_write_inode() is incorrect:
struct inode
struct hfs_inode_info
struct hfs_cat_key
struct hfs_name
u8 len # len is greater than HFS_NAMELEN(31) which is the
maximum length of an HFS filename
OOB read occurred:
hfs_write_inode()
hfs_brec_find()
__hfs_brec_find()
hfs_cat_keycmp()
hfs_strcmp() # OOB read occurred due to len is too large
Fix this by adding a Check on len in hfs_write_inode() before calling
hfs_brec_find().
Link: https://lkml.kernel.org/r/20221130065959.2168236-1-zhangpeng362@huawei.com
Signed-off-by: ZhangPeng <zhangpeng362@huawei.com>
Reported-by: <syzbot+e836ff7133ac02be825f@syzkaller.appspotmail.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 610a2a3d7d8be3537458a378ec69396a76c385b6 ]
Patch series "nilfs2: fix UBSAN shift-out-of-bounds warnings on mount
time".
The first patch fixes a bug reported by syzbot, and the second one fixes
the remaining bug of the same kind. Although they are triggered by the
same super block data anomaly, I divided it into the above two because the
details of the issues and how to fix it are different.
Both are required to eliminate the shift-out-of-bounds issues at mount
time.
This patch (of 2):
If the block size exponent information written in an on-disk superblock is
corrupted, nilfs_sb2_bad_offset helper function can trigger
shift-out-of-bounds warning followed by a kernel panic (if panic_on_warn
is set):
shift exponent 38983 is too large for 64-bit type 'unsigned long long'
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:151 [inline]
__ubsan_handle_shift_out_of_bounds+0x33d/0x3b0 lib/ubsan.c:322
nilfs_sb2_bad_offset fs/nilfs2/the_nilfs.c:449 [inline]
nilfs_load_super_block+0xdf5/0xe00 fs/nilfs2/the_nilfs.c:523
init_nilfs+0xb7/0x7d0 fs/nilfs2/the_nilfs.c:577
nilfs_fill_super+0xb1/0x5d0 fs/nilfs2/super.c:1047
nilfs_mount+0x613/0x9b0 fs/nilfs2/super.c:1317
...
In addition, since nilfs_sb2_bad_offset() performs multiplication without
considering the upper bound, the computation may overflow if the disk
layout parameters are not normal.
This fixes these issues by inserting preliminary sanity checks for those
parameters and by converting the comparison from one involving
multiplication and left bit-shifting to one using division and right
bit-shifting.
Link: https://lkml.kernel.org/r/20221027044306.42774-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20221027044306.42774-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+e91619dd4c11c4960706@syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 25e70c6162f207828dd405b432d8f2a98dbf7082 ]
This should be applied to most URSAN bugs found recently by syzbot,
by guarding the dbMount. As syzbot feeding rubbish into the bmap
descriptor.
Signed-off-by: Hoi Pok Wu <wuhoipok@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit c791730f2554a9ebb8f18df9368dc27d4ebc38c2 ]
syzbot reported a warning like below [1]:
VFS: brelse: Trying to free free buffer
WARNING: CPU: 2 PID: 7301 at fs/buffer.c:1145 __brelse+0x67/0xa0
...
Call Trace:
<TASK>
invalidate_bh_lru+0x99/0x150
smp_call_function_many_cond+0xe2a/0x10c0
? generic_remap_file_range_prep+0x50/0x50
? __brelse+0xa0/0xa0
? __mutex_lock+0x21c/0x12d0
? smp_call_on_cpu+0x250/0x250
? rcu_read_lock_sched_held+0xb/0x60
? lock_release+0x587/0x810
? __brelse+0xa0/0xa0
? generic_remap_file_range_prep+0x50/0x50
on_each_cpu_cond_mask+0x3c/0x80
blkdev_flush_mapping+0x13a/0x2f0
blkdev_put_whole+0xd3/0xf0
blkdev_put+0x222/0x760
deactivate_locked_super+0x96/0x160
deactivate_super+0xda/0x100
cleanup_mnt+0x222/0x3d0
task_work_run+0x149/0x240
? task_work_cancel+0x30/0x30
do_exit+0xb29/0x2a40
? reacquire_held_locks+0x4a0/0x4a0
? do_raw_spin_lock+0x12a/0x2b0
? mm_update_next_owner+0x7c0/0x7c0
? rwlock_bug.part.0+0x90/0x90
? zap_other_threads+0x234/0x2d0
do_group_exit+0xd0/0x2a0
__x64_sys_exit_group+0x3a/0x50
do_syscall_64+0x34/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The cause of the issue is that brelse() is called on both ofibh.sbh
and ofibh.ebh by udf_find_entry() when it returns NULL. However,
brelse() is called by udf_rename(), too. So, b_count on buffer_head
becomes unbalanced.
This patch fixes the issue by not calling brelse() by udf_rename()
when udf_find_entry() returns NULL.
Link: https://syzkaller.appspot.com/bug?id=8297f45698159c6bca8a1f87dc983667c1a1c851 [1]
Reported-by: syzbot+7902cd7684bc35306224@syzkaller.appspotmail.com
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221023095741.271430-1-syoshida@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 898f706695682b9954f280d95e49fa86ffa55d08 ]
Syzbot found a crash : UBSAN: shift-out-of-bounds in dbAllocAG. The
underlying bug is the missing check of bmp->db_agl2size. The field can
be greater than 64 and trigger the shift-out-of-bounds.
Fix this bug by adding a check of bmp->db_agl2size in dbMount since this
field is used in many following functions. The upper bound for this
field is L2MAXL2SIZE - L2MAXAG, thanks for the help of Dave Kleikamp.
Note that, for maintenance, I reorganized error handling code of dbMount.
Reported-by: syzbot+15342c1aa6a00fb7a438@syzkaller.appspotmail.com
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 6a46bf558803dd2b959ca7435a5c143efe837217 ]
UBSAN reported a shift-out-of-bounds warning:
left shift of 1 by 31 places cannot be represented in type 'int'
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106
ubsan_epilogue+0xa/0x44 lib/ubsan.c:151
__ubsan_handle_shift_out_of_bounds+0x1e7/0x208 lib/ubsan.c:322
check_special_flags fs/binfmt_misc.c:241 [inline]
create_entry fs/binfmt_misc.c:456 [inline]
bm_register_write+0x9d3/0xa20 fs/binfmt_misc.c:654
vfs_write+0x11e/0x580 fs/read_write.c:582
ksys_write+0xcf/0x120 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x34/0x80 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x4194e1
Since the type of Node's flags is unsigned long, we should define these
macros with same type too.
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221102025123.1117184-1-liushixin2@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 3bc8edc98bd43540dbe648e4ef91f443d6d20a24 ]
On error situation `clp->cl_cb_conn.cb_xprt` should not be given
a reference to the xprt otherwise both client cleanup and the
error handling path of the caller call to put it. Better to
delay handing over the reference to a later branch.
[ 72.530665] refcount_t: underflow; use-after-free.
[ 72.531933] WARNING: CPU: 0 PID: 173 at lib/refcount.c:28 refcount_warn_saturate+0xcf/0x120
[ 72.533075] Modules linked in: nfsd(OE) nfsv4(OE) nfsv3(OE) nfs(OE) lockd(OE) compat_nfs_ssc(OE) nfs_acl(OE) rpcsec_gss_krb5(OE) auth_rpcgss(OE) rpcrdma(OE) dns_resolver fscache netfs grace rdma_cm iw_cm ib_cm sunrpc(OE) mlx5_ib mlx5_core mlxfw pci_hyperv_intf ib_uverbs ib_core xt_MASQUERADE nf_conntrack_netlink nft_counter xt_addrtype nft_compat br_netfilter bridge stp llc nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set overlay nf_tables nfnetlink crct10dif_pclmul crc32_pclmul ghash_clmulni_intel xfs serio_raw virtio_net virtio_blk net_failover failover fuse [last unloaded: sunrpc]
[ 72.540389] CPU: 0 PID: 173 Comm: kworker/u16:5 Tainted: G OE 5.15.82-dan #1
[ 72.541511] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module+el8.7.0+1084+97b81f61 04/01/2014
[ 72.542717] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
[ 72.543575] RIP: 0010:refcount_warn_saturate+0xcf/0x120
[ 72.544299] Code: 55 00 0f 0b 5d e9 01 50 98 00 80 3d 75 9e 39 08 00 0f 85 74 ff ff ff 48 c7 c7 e8 d1 60 8e c6 05 61 9e 39 08 01 e8 f6 51 55 00 <0f> 0b 5d e9 d9 4f 98 00 80 3d 4b 9e 39 08 00 0f 85 4c ff ff ff 48
[ 72.546666] RSP: 0018:ffffb3f841157cf0 EFLAGS: 00010286
[ 72.547393] RAX: 0000000000000026 RBX: ffff89ac6231d478 RCX: 0000000000000000
[ 72.548324] RDX: ffff89adb7c2c2c0 RSI: ffff89adb7c205c0 RDI: ffff89adb7c205c0
[ 72.549271] RBP: ffffb3f841157cf0 R08: 0000000000000000 R09: c0000000ffefffff
[ 72.550209] R10: 0000000000000001 R11: ffffb3f841157ad0 R12: ffff89ac6231d180
[ 72.551142] R13: ffff89ac6231d478 R14: ffff89ac40c06180 R15: ffff89ac6231d4b0
[ 72.552089] FS: 0000000000000000(0000) GS:ffff89adb7c00000(0000) knlGS:0000000000000000
[ 72.553175] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 72.553934] CR2: 0000563a310506a8 CR3: 0000000109a66000 CR4: 0000000000350ef0
[ 72.554874] Call Trace:
[ 72.555278] <TASK>
[ 72.555614] svc_xprt_put+0xaf/0xe0 [sunrpc]
[ 72.556276] nfsd4_process_cb_update.isra.11+0xb7/0x410 [nfsd]
[ 72.557087] ? update_load_avg+0x82/0x610
[ 72.557652] ? cpuacct_charge+0x60/0x70
[ 72.558212] ? dequeue_entity+0xdb/0x3e0
[ 72.558765] ? queued_spin_unlock+0x9/0x20
[ 72.559358] nfsd4_run_cb_work+0xfc/0x270 [nfsd]
[ 72.560031] process_one_work+0x1df/0x390
[ 72.560600] worker_thread+0x37/0x3b0
[ 72.561644] ? process_one_work+0x390/0x390
[ 72.562247] kthread+0x12f/0x150
[ 72.562710] ? set_kthread_struct+0x50/0x50
[ 72.563309] ret_from_fork+0x22/0x30
[ 72.563818] </TASK>
[ 72.564189] ---[ end trace 031117b1c72ec616 ]---
[ 72.566019] list_add corruption. next->prev should be prev (ffff89ac4977e538), but was ffff89ac4763e018. (next=ffff89ac4763e018).
[ 72.567647] ------------[ cut here ]------------
Fixes: a4abc6b12eb1 ("nfsd: Fix svc_xprt refcnt leak when setup callback client failed")
Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Cc: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 11fa7fefe3d8fac7da56bc9aa3dd5fb3081ca797 ]
While doing fault injection test, I got the following report:
------------[ cut here ]------------
kobject: '(null)' (0000000039956980): is not initialized, yet kobject_put() is being called.
WARNING: CPU: 3 PID: 6306 at kobject_put+0x23d/0x4e0
CPU: 3 PID: 6306 Comm: 283 Tainted: G W 6.1.0-rc2-00005-g307c1086d7c9 #1253
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:kobject_put+0x23d/0x4e0
Call Trace:
<TASK>
cdev_device_add+0x15e/0x1b0
__iio_device_register+0x13b4/0x1af0 [industrialio]
__devm_iio_device_register+0x22/0x90 [industrialio]
max517_probe+0x3d8/0x6b4 [max517]
i2c_device_probe+0xa81/0xc00
When device_add() is injected fault and returns error, if dev->devt is not set,
cdev_add() is not called, cdev_del() is not needed. Fix this by checking dev->devt
in error path.
Fixes: 233ed09d7fda ("chardev: add helper function to register char devs with a struct device")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20221202030237.520280-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit ea60a4ad0cf88b411cde6888b8c890935686ecd7 ]
When the dev init failed, should cleanup the sysfs, otherwise, the
module will never be loaded since can not create duplicate sysfs
directory:
sysfs: cannot create duplicate filename '/fs/orangefs'
CPU: 1 PID: 6549 Comm: insmod Tainted: G W 6.0.0+ #44
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
sysfs_warn_dup.cold+0x17/0x24
sysfs_create_dir_ns+0x16d/0x180
kobject_add_internal+0x156/0x3a0
kobject_init_and_add+0xcf/0x120
orangefs_sysfs_init+0x7e/0x3a0 [orangefs]
orangefs_init+0xfe/0x1000 [orangefs]
do_one_initcall+0x87/0x2a0
do_init_module+0xdf/0x320
load_module+0x2f98/0x3330
__do_sys_finit_module+0x113/0x1b0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
kobject_add_internal failed for orangefs with -EEXIST, don't try to register things with the same name in the same directory.
Fixes: 2f83ace37181 ("orangefs: put register_chrdev immediately before register_filesystem")
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 51069e4aef6257b0454057359faed0ab0c9af083 ]
If we're asked to recover open state while a delegation return is
outstanding, then the state manager thread cannot use a cached open, so
if the server returns a delegation, we can end up deadlocked behind the
pending delegreturn.
To avoid this problem, let's just ask the server not to give us a
delegation unless we're explicitly reclaiming one.
Fixes: be36e185bd26 ("NFSv4: nfs4_open_recover_helper() must set share access")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 43c1031f7110967c240cb6e922adcfc4b8899183 ]
We must not change the value of label->len if it is zero, since that
indicates we stored a label.
Fixes: b4487b935452 ("nfs: Fix getxattr kernel panic and memory overflow")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit c53ed55cb275344086e32a7080a6b19cb183650b ]
Syzbot reported a OOB Write bug:
loop0: detected capacity change from 0 to 64
==================================================================
BUG: KASAN: slab-out-of-bounds in hfs_asc2mac+0x467/0x9a0
fs/hfs/trans.c:133
Write of size 1 at addr ffff88801848314e by task syz-executor391/3632
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
print_address_description+0x74/0x340 mm/kasan/report.c:284
print_report+0x107/0x1f0 mm/kasan/report.c:395
kasan_report+0xcd/0x100 mm/kasan/report.c:495
hfs_asc2mac+0x467/0x9a0 fs/hfs/trans.c:133
hfs_cat_build_key+0x92/0x170 fs/hfs/catalog.c:28
hfs_lookup+0x1ab/0x2c0 fs/hfs/dir.c:31
lookup_open fs/namei.c:3391 [inline]
open_last_lookups fs/namei.c:3481 [inline]
path_openat+0x10e6/0x2df0 fs/namei.c:3710
do_filp_open+0x264/0x4f0 fs/namei.c:3740
If in->len is much larger than HFS_NAMELEN(31) which is the maximum
length of an HFS filename, a OOB write could occur in hfs_asc2mac(). In
that case, when the dst reaches the boundary, the srclen is still
greater than 0, which causes a OOB write.
Fix this by adding a check on dstlen in while() before writing to dst
address.
Link: https://lkml.kernel.org/r/20221202030038.1391945-1-zhangpeng362@huawei.com
Fixes: 328b92278650 ("[PATCH] hfs: NLS support")
Signed-off-by: ZhangPeng <zhangpeng362@huawei.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Reported-by: <syzbot+dc3b1cf9111ab5fe98e7@syzkaller.appspotmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit e0c49bd2b4d3cd1751491eb2d940bce968ac65e9 ]
sysv_nblocks() returns 'blocks' rather than 'res', which only counting
the number of triple-indirect blocks and causing sysv_getattr() gets a
wrong result.
[AV: this is actually a sysv counterpart of minixfs fix -
0fcd426de9d0 "[PATCH] minix block usage counting fix" in
historical tree; mea culpa, should've thought to check
fs/sysv back then...]
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 2e41f274f9aa71cdcc69dc1f26a3f9304a651804 ]
Patch series "fix error when writing negative value to simple attribute
files".
The simple attribute files do not accept a negative value since the commit
488dac0c9237 ("libfs: fix error cast of negative value in
simple_attr_write()"), but some attribute files want to accept a negative
value.
This patch (of 3):
The simple attribute files do not accept a negative value since the commit
488dac0c9237 ("libfs: fix error cast of negative value in
simple_attr_write()"), so we have to use a 64-bit value to write a
negative value.
This adds DEFINE_SIMPLE_ATTRIBUTE_SIGNED for a signed value.
Link: https://lkml.kernel.org/r/20220919172418.45257-1-akinobu.mita@gmail.com
Link: https://lkml.kernel.org/r/20220919172418.45257-2-akinobu.mita@gmail.com
Fixes: 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reported-by: Zhao Gongyi <zhaogongyi@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Wei Yongjun <weiyongjun1@huawei.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 13b6269dd022aaa69ca8d1df374ab327504121cf ]
ocfs2_table_header should be free in ocfs2_stack_glue_init() if
ocfs2_sysfs_init() failed, otherwise kmemleak will report memleak.
BUG: memory leak
unreferenced object 0xffff88810eeb5800 (size 128):
comm "modprobe", pid 4507, jiffies 4296182506 (age 55.888s)
hex dump (first 32 bytes):
c0 40 14 a0 ff ff ff ff 00 00 00 00 01 00 00 00 .@..............
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<000000001e59e1cd>] __register_sysctl_table+0xca/0xef0
[<00000000c04f70f7>] 0xffffffffa0050037
[<000000001bd12912>] do_one_initcall+0xdb/0x480
[<0000000064f766c9>] do_init_module+0x1cf/0x680
[<000000002ba52db0>] load_module+0x6441/0x6f20
[<000000009772580d>] __do_sys_finit_module+0x12f/0x1c0
[<00000000380c1f22>] do_syscall_64+0x3f/0x90
[<000000004cf473bc>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Link: https://lkml.kernel.org/r/41651ca1-432a-db34-eb97-d35744559de1@linux.alibaba.com
Fixes: 3878f110f71a ("ocfs2: Move the hb_ctl_path sysctl into the stack glue.")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit e7eda157c4071cd1e69f4b1687b0fbe1ae5e6f46 ]
The check being unconditional may lead to unwanted denials reported by
LSMs when a process has the capability granted by DAC, but denied by an
LSM. In the case of SELinux such denials are a problem, since they can't
be effectively filtered out via the policy and when not silenced, they
produce noise that may hide a true problem or an attack.
Checking for the capability only if any trusted xattr is actually
present wouldn't really address the issue, since calling listxattr(2) on
such node on its own doesn't indicate an explicit attempt to see the
trusted xattrs. Additionally, it could potentially leak the presence of
trusted xattrs to an unprivileged user if they can check for the denials
(e.g. through dmesg).
Therefore, it's best (and simplest) to keep the check unconditional and
instead use ns_capable_noaudit() that will silence any associated LSM
denials.
Fixes: 38f38657444d ("xattr: extract simple_xattr code from tmpfs")
Reported-by: Martin Pitt <mpitt@redhat.com>
Suggested-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit e6b842741b4f39007215fd7e545cb55aa3d358a2 ]
An oops can be induced by running 'cat /proc/kcore > /dev/null' on
devices using pstore with the ram backend because kmap_atomic() assumes
lowmem pages are accessible with __va().
Unable to handle kernel paging request at virtual address ffffff807ff2b000
Mem abort info:
ESR = 0x96000006
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x06: level 2 translation fault
Data abort info:
ISV = 0, ISS = 0x00000006
CM = 0, WnR = 0
swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000081d87000
[ffffff807ff2b000] pgd=180000017fe18003, p4d=180000017fe18003, pud=180000017fe18003, pmd=0000000000000000
Internal error: Oops: 96000006 [#1] PREEMPT SMP
Modules linked in: dm_integrity
CPU: 7 PID: 21179 Comm: perf Not tainted 5.15.67-10882-ge4eb2eb988cd #1 baa443fb8e8477896a370b31a821eb2009f9bfba
Hardware name: Google Lazor (rev3 - 8) (DT)
pstate: a0400009 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __memcpy+0x110/0x260
lr : vread+0x194/0x294
sp : ffffffc013ee39d0
x29: ffffffc013ee39f0 x28: 0000000000001000 x27: ffffff807ff2b000
x26: 0000000000001000 x25: ffffffc0085a2000 x24: ffffff802d4b3000
x23: ffffff80f8a60000 x22: ffffff802d4b3000 x21: ffffffc0085a2000
x20: ffffff8080b7bc68 x19: 0000000000001000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: ffffffd3073f2e60
x14: ffffffffad588000 x13: 0000000000000000 x12: 0000000000000001
x11: 00000000000001a2 x10: 00680000fff2bf0b x9 : 03fffffff807ff2b
x8 : 0000000000000001 x7 : 0000000000000000 x6 : 0000000000000000
x5 : ffffff802d4b4000 x4 : ffffff807ff2c000 x3 : ffffffc013ee3a78
x2 : 0000000000001000 x1 : ffffff807ff2b000 x0 : ffffff802d4b3000
Call trace:
__memcpy+0x110/0x260
read_kcore+0x584/0x778
proc_reg_read+0xb4/0xe4
During early boot, memblock reserves the pages for the ramoops reserved
memory node in DT that would otherwise be part of the direct lowmem
mapping. Pstore's ram backend reuses those reserved pages to change the
memory type (writeback or non-cached) by passing the pages to vmap()
(see pfn_to_page() usage in persistent_ram_vmap() for more details) with
specific flags. When read_kcore() starts iterating over the vmalloc
region, it runs over the virtual address that vmap() returned for
ramoops. In aligned_vread() the virtual address is passed to
vmalloc_to_page() which returns the page struct for the reserved lowmem
area. That lowmem page is passed to kmap_atomic(), which effectively
calls page_to_virt() that assumes a lowmem page struct must be directly
accessible with __va() and friends. These pages are mapped via vmap()
though, and the lowmem mapping was never made, so accessing them via the
lowmem virtual address oopses like above.
Let's side-step this problem by passing VM_IOREMAP to vmap(). This will
tell vread() to not include the ramoops region in the kcore. Instead the
area will look like a bunch of zeros. The alternative is to teach kmap()
about vmalloc areas that intersect with lowmem. Presumably such a change
isn't a one-liner, and there isn't much interest in inspecting the
ramoops region in kcore files anyway, so the most expedient route is
taken for now.
Cc: Brian Geffon <bgeffon@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 404a6043385d ("staging: android: persistent_ram: handle reserving and mapping memory")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221205233136.3420802-1-swboyd@chromium.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 1f3868f06855c97a4954c99b36f3fc9eb8f60326 upstream.
When extending file within last block it can happen that the extent is
already rounded to the blocksize and thus contains the offset we want to
grow up to. In such case we would mistakenly expand the last extent and
make it one block longer than it should be, exposing unallocated block
in a file and causing data corruption. Fix the problem by properly
detecting this case and bailing out.
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 6ad53f0f71c52871202a7bf096feb2c59db33fc5 upstream.
If rounded block-rounded i_lenExtents matches block rounded i_size,
there are no preallocation extents. Do not bother walking extent linked
list.
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit cfe4c1b25dd6d2f056afc00b7c98bcb3dd0b1fc3 upstream.
When preallocation extent is the first one in the extent block, the
code would corrupt extent tree header instead. Fix the problem and use
udf_delete_aext() for deleting extent to avoid some code duplication.
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
| |
commit 6c1e4d06a3808dc67dbce2d631f4c12574567dd5 upstream.
udf_delete_aext() uses its last two arguments only as local variables.
Drop them.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 16d0556568148bdcaa45d077cac9f8f7077cf70a upstream.
When extending file with a hole, we tried to preserve existing
preallocation for the file. However that is not very useful and
complicates code because the previous extent may need to be rounded to
block boundary as well (which we forgot to do thus causing data
corruption for sequence like:
xfs_io -f -c "pwrite 0x75e63 11008" -c "truncate 0x7b24b" \
-c "truncate 0xabaa3" -c "pwrite 0xac70b 22954" \
-c "pwrite 0x93a43 11358" -c "pwrite 0xb8e65 52211" file
with 512-byte block size. Just discard preallocation before extending
file to simplify things and also fix this data corruption.
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit f0a0ccda18d6fd826d7c7e7ad48a6ed61c20f8b4 upstream.
Syzbot reported a null-ptr-deref bug:
NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP
frequency < 30 seconds
general protection fault, probably for non-canonical address
0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
CPU: 1 PID: 3603 Comm: segctord Not tainted
6.1.0-rc2-syzkaller-00105-gb229b6ca5abb #0
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google
10/11/2022
RIP: 0010:nilfs_palloc_commit_free_entry+0xe5/0x6b0
fs/nilfs2/alloc.c:608
Code: 00 00 00 00 fc ff df 80 3c 02 00 0f 85 cd 05 00 00 48 b8 00 00 00
00 00 fc ff df 4c 8b 73 08 49 8d 7e 10 48 89 fa 48 c1 ea 03 <80> 3c 02
00 0f 85 26 05 00 00 49 8b 46 10 be a6 00 00 00 48 c7 c7
RSP: 0018:ffffc90003dff830 EFLAGS: 00010212
RAX: dffffc0000000000 RBX: ffff88802594e218 RCX: 000000000000000d
RDX: 0000000000000002 RSI: 0000000000002000 RDI: 0000000000000010
RBP: ffff888071880222 R08: 0000000000000005 R09: 000000000000003f
R10: 000000000000000d R11: 0000000000000000 R12: ffff888071880158
R13: ffff88802594e220 R14: 0000000000000000 R15: 0000000000000004
FS: 0000000000000000(0000) GS:ffff8880b9b00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb1c08316a8 CR3: 0000000018560000 CR4: 0000000000350ee0
Call Trace:
<TASK>
nilfs_dat_commit_free fs/nilfs2/dat.c:114 [inline]
nilfs_dat_commit_end+0x464/0x5f0 fs/nilfs2/dat.c:193
nilfs_dat_commit_update+0x26/0x40 fs/nilfs2/dat.c:236
nilfs_btree_commit_update_v+0x87/0x4a0 fs/nilfs2/btree.c:1940
nilfs_btree_commit_propagate_v fs/nilfs2/btree.c:2016 [inline]
nilfs_btree_propagate_v fs/nilfs2/btree.c:2046 [inline]
nilfs_btree_propagate+0xa00/0xd60 fs/nilfs2/btree.c:2088
nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337
nilfs_collect_file_data+0x45/0xd0 fs/nilfs2/segment.c:568
nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1018
nilfs_segctor_scan_file+0x3f4/0x6f0 fs/nilfs2/segment.c:1067
nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1197 [inline]
nilfs_segctor_collect fs/nilfs2/segment.c:1503 [inline]
nilfs_segctor_do_construct+0x12fc/0x6af0 fs/nilfs2/segment.c:2045
nilfs_segctor_construct+0x8e3/0xb30 fs/nilfs2/segment.c:2379
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2487 [inline]
nilfs_segctor_thread+0x3c3/0xf30 fs/nilfs2/segment.c:2570
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
...
If DAT metadata file is corrupted on disk, there is a case where
req->pr_desc_bh is NULL and blocknr is 0 at nilfs_dat_commit_end() during
a b-tree operation that cascadingly updates ancestor nodes of the b-tree,
because nilfs_dat_commit_alloc() for a lower level block can initialize
the blocknr on the same DAT entry between nilfs_dat_prepare_end() and
nilfs_dat_commit_end().
If this happens, nilfs_dat_commit_end() calls nilfs_dat_commit_free()
without valid buffer heads in req->pr_desc_bh and req->pr_bitmap_bh, and
causes the NULL pointer dereference above in
nilfs_palloc_commit_free_entry() function, which leads to a crash.
Fix this by adding a NULL check on req->pr_desc_bh and req->pr_bitmap_bh
before nilfs_palloc_commit_free_entry() in nilfs_dat_commit_free().
This also calls nilfs_error() in that case to notify that there is a fatal
flaw in the filesystem metadata and prevent further operations.
Link: https://lkml.kernel.org/r/00000000000097c20205ebaea3d6@google.com
Link: https://lkml.kernel.org/r/20221114040441.1649940-1-zhangpeng362@huawei.com
Link: https://lkml.kernel.org/r/20221119120542.17204-1-konishi.ryusuke@gmail.com
Signed-off-by: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+ebe05ee8e98f755f61d0@syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit f7e942b5bb35d8e3af54053d19a6bf04143a3955 ]
Syzkaller reported BUG as follows:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:274
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
__might_resched.cold+0x222/0x26b
kmem_cache_alloc+0x2e7/0x3c0
update_qgroup_limit_item+0xe1/0x390
btrfs_qgroup_inherit+0x147b/0x1ee0
create_subvol+0x4eb/0x1710
btrfs_mksubvol+0xfe5/0x13f0
__btrfs_ioctl_snap_create+0x2b0/0x430
btrfs_ioctl_snap_create_v2+0x25a/0x520
btrfs_ioctl+0x2a1c/0x5ce0
__x64_sys_ioctl+0x193/0x200
do_syscall_64+0x35/0x80
Fix this by calling qgroup_dirty() on @dstqgroup, and update limit item in
btrfs_run_qgroups() later outside of the spinlock context.
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 512c5ca01a3610ab14ff6309db363de51f1c13a6 upstream.
When extending segments, nilfs_sufile_alloc() is called to get an
unassigned segment, then mark it as dirty to avoid accidentally allocating
the same segment in the future.
But for some special cases such as a corrupted image it can be unreliable.
If such corruption of the dirty state of the segment occurs, nilfs2 may
reallocate a segment that is in use and pick the same segment for writing
twice at the same time.
This will cause the problem reported by syzkaller:
https://syzkaller.appspot.com/bug?id=c7c4748e11ffcc367cef04f76e02e931833cbd24
This case started with segbuf1.segnum = 3, nextnum = 4 when constructed.
It supposed segment 4 has already been allocated and marked as dirty.
However the dirty state was corrupted and segment 4 usage was not dirty.
For the first time nilfs_segctor_extend_segments() segment 4 was allocated
again, which made segbuf2 and next segbuf3 had same segment 4.
sb_getblk() will get same bh for segbuf2 and segbuf3, and this bh is added
to both buffer lists of two segbuf. It makes the lists broken which
causes NULL pointer dereference.
Fix the problem by setting usage as dirty every time in
nilfs_sufile_mark_dirty(), which is called during constructing current
segment to be written out and before allocating next segment.
[chenzhongjin@huawei.com: add lock protection per Ryusuke]
Link: https://lkml.kernel.org/r/20221121091141.214703-1-chenzhongjin@huawei.com
Link: https://lkml.kernel.org/r/20221118063304.140187-1-chenzhongjin@huawei.com
Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Reported-by: <syzbot+77e4f0...@syzkaller.appspotmail.com>
Reported-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 63095f4f3af59322bea984a6ae44337439348fe0 upstream.
Kernel iterates over ATTR_RECORDs in mft record in ntfs_attr_find().
Because the ATTR_RECORDs are next to each other, kernel can get the next
ATTR_RECORD from end address of current ATTR_RECORD, through current
ATTR_RECORD length field.
The problem is that during iteration, when kernel calculates the end
address of current ATTR_RECORD, kernel may trigger an integer overflow bug
in executing `a = (ATTR_RECORD*)((u8*)a + le32_to_cpu(a->length))`. This
may wrap, leading to a forever iteration on 32bit systems.
This patch solves it by adding some checks on calculating end address
of current ATTR_RECORD during iteration.
Link: https://lkml.kernel.org/r/20220831160935.3409-4-yin31149@gmail.com
Link: https://lore.kernel.org/all/20220827105842.GM2030@kadam/
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: chenxiaosong (A) <chenxiaosong2@huawei.com>
Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 36a4d82dddbbd421d2b8e79e1cab68c8126d5075 upstream.
Kernel iterates over ATTR_RECORDs in mft record in ntfs_attr_find(). To
ensure access on these ATTR_RECORDs are within bounds, kernel will do some
checking during iteration.
The problem is that during checking whether ATTR_RECORD's name is within
bounds, kernel will dereferences the ATTR_RECORD name_offset field, before
checking this ATTR_RECORD strcture is within bounds. This problem may
result out-of-bounds read in ntfs_attr_find(), reported by Syzkaller:
==================================================================
BUG: KASAN: use-after-free in ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597
Read of size 2 at addr ffff88807e352009 by task syz-executor153/3607
[...]
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597
ntfs_attr_lookup+0x1056/0x2070 fs/ntfs/attrib.c:1193
ntfs_read_inode_mount+0x89a/0x2580 fs/ntfs/inode.c:1845
ntfs_fill_super+0x1799/0x9320 fs/ntfs/super.c:2854
mount_bdev+0x34d/0x410 fs/super.c:1400
legacy_get_tree+0x105/0x220 fs/fs_context.c:610
vfs_get_tree+0x89/0x2f0 fs/super.c:1530
do_new_mount fs/namespace.c:3040 [inline]
path_mount+0x1326/0x1e20 fs/namespace.c:3370
do_mount fs/namespace.c:3383 [inline]
__do_sys_mount fs/namespace.c:3591 [inline]
__se_sys_mount fs/namespace.c:3568 [inline]
__x64_sys_mount+0x27f/0x300 fs/namespace.c:3568
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
[...]
</TASK>
The buggy address belongs to the physical page:
page:ffffea0001f8d400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7e350
head:ffffea0001f8d400 order:3 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 0000000000000000 dead000000000122 ffff888011842140
raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88807e351f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88807e351f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88807e352000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88807e352080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88807e352100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
This patch solves it by moving the ATTR_RECORD strcture's bounds checking
earlier, then checking whether ATTR_RECORD's name is within bounds.
What's more, this patch also add some comments to improve its
maintainability.
Link: https://lkml.kernel.org/r/20220831160935.3409-3-yin31149@gmail.com
Link: https://lore.kernel.org/all/1636796c-c85e-7f47-e96f-e074fee3c7d3@huawei.com/
Link: https://groups.google.com/g/syzkaller-bugs/c/t_XdeKPGTR4/m/LECAuIGcBgAJ
Signed-off-by: chenxiaosong (A) <chenxiaosong2@huawei.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Reported-by: syzbot+5f8dcabe4a3b2c51c607@syzkaller.appspotmail.com
Tested-by: syzbot+5f8dcabe4a3b2c51c607@syzkaller.appspotmail.com
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit d85a1bec8e8d552ab13163ca1874dcd82f3d1550 upstream.
Patch series "ntfs: fix bugs about Attribute", v2.
This patchset fixes three bugs relative to Attribute in record:
Patch 1 adds a sanity check to ensure that, attrs_offset field in first
mft record loading from disk is within bounds.
Patch 2 moves the ATTR_RECORD's bounds checking earlier, to avoid
dereferencing ATTR_RECORD before checking this ATTR_RECORD is within
bounds.
Patch 3 adds an overflow checking to avoid possible forever loop in
ntfs_attr_find().
Without patch 1 and patch 2, the kernel triggersa KASAN use-after-free
detection as reported by Syzkaller.
Although one of patch 1 or patch 2 can fix this, we still need both of
them. Because patch 1 fixes the root cause, and patch 2 not only fixes
the direct cause, but also fixes the potential out-of-bounds bug.
This patch (of 3):
Syzkaller reported use-after-free read as follows:
==================================================================
BUG: KASAN: use-after-free in ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597
Read of size 2 at addr ffff88807e352009 by task syz-executor153/3607
[...]
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597
ntfs_attr_lookup+0x1056/0x2070 fs/ntfs/attrib.c:1193
ntfs_read_inode_mount+0x89a/0x2580 fs/ntfs/inode.c:1845
ntfs_fill_super+0x1799/0x9320 fs/ntfs/super.c:2854
mount_bdev+0x34d/0x410 fs/super.c:1400
legacy_get_tree+0x105/0x220 fs/fs_context.c:610
vfs_get_tree+0x89/0x2f0 fs/super.c:1530
do_new_mount fs/namespace.c:3040 [inline]
path_mount+0x1326/0x1e20 fs/namespace.c:3370
do_mount fs/namespace.c:3383 [inline]
__do_sys_mount fs/namespace.c:3591 [inline]
__se_sys_mount fs/namespace.c:3568 [inline]
__x64_sys_mount+0x27f/0x300 fs/namespace.c:3568
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
[...]
</TASK>
The buggy address belongs to the physical page:
page:ffffea0001f8d400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7e350
head:ffffea0001f8d400 order:3 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 0000000000000000 dead000000000122 ffff888011842140
raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88807e351f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88807e351f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88807e352000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88807e352080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88807e352100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Kernel will loads $MFT/$DATA's first mft record in
ntfs_read_inode_mount().
Yet the problem is that after loading, kernel doesn't check whether
attrs_offset field is a valid value.
To be more specific, if attrs_offset field is larger than bytes_allocated
field, then it may trigger the out-of-bounds read bug(reported as
use-after-free bug) in ntfs_attr_find(), when kernel tries to access the
corresponding mft record's attribute.
This patch solves it by adding the sanity check between attrs_offset field
and bytes_allocated field, after loading the first mft record.
Link: https://lkml.kernel.org/r/20220831160935.3409-1-yin31149@gmail.com
Link: https://lkml.kernel.org/r/20220831160935.3409-2-yin31149@gmail.com
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: ChenXiaoSong <chenxiaosong2@huawei.com>
Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 1468c6f4558b1bcd92aa0400f2920f9dc7588402 upstream.
Functions implementing the a_ops->write_end() interface accept the `void
*fsdata` parameter that is supposed to be initialized by the corresponding
a_ops->write_begin() (which accepts `void **fsdata`).
However not all a_ops->write_begin() implementations initialize `fsdata`
unconditionally, so it may get passed uninitialized to a_ops->write_end(),
resulting in undefined behavior.
Fix this by initializing fsdata with NULL before the call to
write_begin(), rather than doing so in all possible a_ops implementations.
This patch covers only the following cases found by running x86 KMSAN
under syzkaller:
- generic_perform_write()
- cont_expand_zero() and generic_cont_expand_simple()
- page_symlink()
Other cases of passing uninitialized fsdata may persist in the codebase.
Link: https://lkml.kernel.org/r/20220915150417.722975-43-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 204c0300c4e99707e9fb6e57840aa1127060e63f upstream.
Switch from strlcpy to strscpy and make sure that @count is the size of
the smaller of the source and destination buffers. This prevents
reading beyond the end of the source buffer when the source string isn't
null terminated.
Found by a modified version of syzkaller.
Suggested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 670f8ce56dd0632dc29a0322e188cc73ce3c6b92 upstream.
Fuzzers like to scribble over sb_bsize_shift but in reality it's very
unlikely that this field would be corrupted on its own. Nevertheless it
should be checked to avoid the possibility of messy mount errors due to
bad calculations. It's always a fixed value based on the block size so
we can just check that it's the expected value.
Tested with:
mkfs.gfs2 -O -p lock_nolock /dev/vdb
for i in 0 -1 64 65 32 33; do
gfs2_edit -p sb field sb_bsize_shift $i /dev/vdb
mount /dev/vdb /mnt/test && umount /mnt/test
done
Before this patch we get a withdraw after
[ 76.413681] gfs2: fsid=loop0.0: fatal: invalid metadata block
[ 76.413681] bh = 19 (type: exp=5, found=4)
[ 76.413681] function = gfs2_meta_buffer, file = fs/gfs2/meta_io.c, line = 492
and with UBSAN configured we also get complaints like
[ 76.373395] UBSAN: shift-out-of-bounds in fs/gfs2/ops_fstype.c:295:19
[ 76.373815] shift exponent 4294967287 is too large for 64-bit type 'long unsigned int'
After the patch, these complaints don't appear, mount fails immediately
and we get an explanation in dmesg.
Reported-by: syzbot+dcf33a7aae997956fe06@syzkaller.appspotmail.com
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 8cccf05fe857a18ee26e20d11a8455a73ffd4efd upstream.
If a nilfs2 filesystem is downgraded to read-only due to metadata
corruption on disk and is remounted read/write, or if emergency read-only
remount is performed, detaching a log writer and synchronizing the
filesystem can be done at the same time.
In these cases, use-after-free of the log writer (hereinafter
nilfs->ns_writer) can happen as shown in the scenario below:
Task1 Task2
-------------------------------- ------------------------------
nilfs_construct_segment
nilfs_segctor_sync
init_wait
init_waitqueue_entry
add_wait_queue
schedule
nilfs_remount (R/W remount case)
nilfs_attach_log_writer
nilfs_detach_log_writer
nilfs_segctor_destroy
kfree
finish_wait
_raw_spin_lock_irqsave
__raw_spin_lock_irqsave
do_raw_spin_lock
debug_spin_lock_before <-- use-after-free
While Task1 is sleeping, nilfs->ns_writer is freed by Task2. After Task1
waked up, Task1 accesses nilfs->ns_writer which is already freed. This
scenario diagram is based on the Shigeru Yoshida's post [1].
This patch fixes the issue by not detaching nilfs->ns_writer on remount so
that this UAF race doesn't happen. Along with this change, this patch
also inserts a few necessary read-only checks with superblock instance
where only the ns_writer pointer was used to check if the filesystem is
read-only.
Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df8b
Link: https://lkml.kernel.org/r/20221103141759.1836312-1-syoshida@redhat.com [1]
Link: https://lkml.kernel.org/r/20221104142959.28296-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+f816fa82f8783f7a02bb@syzkaller.appspotmail.com
Reported-by: Shigeru Yoshida <syoshida@redhat.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 92bbd67a55fee50743b42825d1c016e7fd5c79f9 ]
The return value of CIFSGetExtAttr is negative, should be checked
with -EOPNOTSUPP rather than EOPNOTSUPP.
Fixes: 64a5cfa6db94 ("Allow setting per-file compression via SMB2/3")
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit c8af247de385ce49afabc3bf1cf4fd455c94bfe8 upstream.
Syzbot reported a slab-out-of-bounds Write bug:
loop0: detected capacity change from 0 to 2048
==================================================================
BUG: KASAN: slab-out-of-bounds in udf_find_entry+0x8a5/0x14f0
fs/udf/namei.c:253
Write of size 105 at addr ffff8880123ff896 by task syz-executor323/3610
CPU: 0 PID: 3610 Comm: syz-executor323 Not tainted
6.1.0-rc2-syzkaller-00105-gb229b6ca5abb #0
Hardware name: Google Compute Engine/Google Compute Engine, BIOS
Google 10/11/2022
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
print_address_description+0x74/0x340 mm/kasan/report.c:284
print_report+0x107/0x1f0 mm/kasan/report.c:395
kasan_report+0xcd/0x100 mm/kasan/report.c:495
kasan_check_range+0x2a7/0x2e0 mm/kasan/generic.c:189
memcpy+0x3c/0x60 mm/kasan/shadow.c:66
udf_find_entry+0x8a5/0x14f0 fs/udf/namei.c:253
udf_lookup+0xef/0x340 fs/udf/namei.c:309
lookup_open fs/namei.c:3391 [inline]
open_last_lookups fs/namei.c:3481 [inline]
path_openat+0x10e6/0x2df0 fs/namei.c:3710
do_filp_open+0x264/0x4f0 fs/namei.c:3740
do_sys_openat2+0x124/0x4e0 fs/open.c:1310
do_sys_open fs/open.c:1326 [inline]
__do_sys_creat fs/open.c:1402 [inline]
__se_sys_creat fs/open.c:1396 [inline]
__x64_sys_creat+0x11f/0x160 fs/open.c:1396
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7ffab0d164d9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe1a7e6bb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffab0d164d9
RDX: 00007ffab0d164d9 RSI: 0000000000000000 RDI: 0000000020000180
RBP: 00007ffab0cd5a10 R08: 0000000000000000 R09: 0000000000000000
R10: 00005555573552c0 R11: 0000000000000246 R12: 00007ffab0cd5aa0
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Allocated by task 3610:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x3d/0x60 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:371 [inline]
__kasan_kmalloc+0x97/0xb0 mm/kasan/common.c:380
kmalloc include/linux/slab.h:576 [inline]
udf_find_entry+0x7b6/0x14f0 fs/udf/namei.c:243
udf_lookup+0xef/0x340 fs/udf/namei.c:309
lookup_open fs/namei.c:3391 [inline]
open_last_lookups fs/namei.c:3481 [inline]
path_openat+0x10e6/0x2df0 fs/namei.c:3710
do_filp_open+0x264/0x4f0 fs/namei.c:3740
do_sys_openat2+0x124/0x4e0 fs/open.c:1310
do_sys_open fs/open.c:1326 [inline]
__do_sys_creat fs/open.c:1402 [inline]
__se_sys_creat fs/open.c:1396 [inline]
__x64_sys_creat+0x11f/0x160 fs/open.c:1396
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The buggy address belongs to the object at ffff8880123ff800
which belongs to the cache kmalloc-256 of size 256
The buggy address is located 150 bytes inside of
256-byte region [ffff8880123ff800, ffff8880123ff900)
The buggy address belongs to the physical page:
page:ffffea000048ff80 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x123fe
head:ffffea000048ff80 order:1 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 ffffea00004b8500 dead000000000003 ffff888012041b40
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x0(),
pid 1, tgid 1 (swapper/0), ts 1841222404, free_ts 0
create_dummy_stack mm/page_owner.c:67 [inline]
register_early_stack+0x77/0xd0 mm/page_owner.c:83
init_page_owner+0x3a/0x731 mm/page_owner.c:93
kernel_init_freeable+0x41c/0x5d5 init/main.c:1629
kernel_init+0x19/0x2b0 init/main.c:1519
page_owner free stack trace missing
Memory state around the buggy address:
ffff8880123ff780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8880123ff800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff8880123ff880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 06
^
ffff8880123ff900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8880123ff980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Fix this by changing the memory size allocated for copy_name from
UDF_NAME_LEN(254) to UDF_NAME_LEN_CS0(255), because the total length
(lfi) of subsequent memcpy can be up to 255.
CC: stable@vger.kernel.org
Reported-by: syzbot+69c9fdccc6dd08961d34@syzkaller.appspotmail.com
Fixes: 066b9cded00b ("udf: Use separate buffer for copying split names")
Signed-off-by: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221109013542.442790-1-zhangpeng362@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 9b2f20344d450137d015b380ff0c2e2a6a170135 upstream.
The btrfs_alloc_dummy_root() uses ERR_PTR as the error return value
rather than NULL, if error happened, there will be a NULL pointer
dereference:
BUG: KASAN: null-ptr-deref in btrfs_free_dummy_root+0x21/0x50 [btrfs]
Read of size 8 at addr 000000000000002c by task insmod/258926
CPU: 2 PID: 258926 Comm: insmod Tainted: G W 6.1.0-rc2+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
kasan_report+0xb7/0x140
kasan_check_range+0x145/0x1a0
btrfs_free_dummy_root+0x21/0x50 [btrfs]
btrfs_test_free_space_cache+0x1a8c/0x1add [btrfs]
btrfs_run_sanity_tests+0x65/0x80 [btrfs]
init_btrfs_fs+0xec/0x154 [btrfs]
do_one_initcall+0x87/0x2a0
do_init_module+0xdf/0x320
load_module+0x3006/0x3390
__do_sys_finit_module+0x113/0x1b0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: aaedb55bc08f ("Btrfs: add tests for btrfs_get_extent")
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 8ac932a4921a96ca52f61935dbba64ea87bbd5dc upstream.
A semaphore deadlock can occur if nilfs_get_block() detects metadata
corruption while locating data blocks and a superblock writeback occurs at
the same time:
task 1 task 2
------ ------
* A file operation *
nilfs_truncate()
nilfs_get_block()
down_read(rwsem A) <--
nilfs_bmap_lookup_contig()
... generic_shutdown_super()
nilfs_put_super()
* Prepare to write superblock *
down_write(rwsem B) <--
nilfs_cleanup_super()
* Detect b-tree corruption * nilfs_set_log_cursor()
nilfs_bmap_convert_error() nilfs_count_free_blocks()
__nilfs_error() down_read(rwsem A) <--
nilfs_set_error()
down_write(rwsem B) <--
*** DEADLOCK ***
Here, nilfs_get_block() readlocks rwsem A (= NILFS_MDT(dat_inode)->mi_sem)
and then calls nilfs_bmap_lookup_contig(), but if it fails due to metadata
corruption, __nilfs_error() is called from nilfs_bmap_convert_error()
inside the lock section.
Since __nilfs_error() calls nilfs_set_error() unless the filesystem is
read-only and nilfs_set_error() attempts to writelock rwsem B (=
nilfs->ns_sem) to write back superblock exclusively, hierarchical lock
acquisition occurs in the order rwsem A -> rwsem B.
Now, if another task starts updating the superblock, it may writelock
rwsem B during the lock sequence above, and can deadlock trying to
readlock rwsem A in nilfs_count_free_blocks().
However, there is actually no need to take rwsem A in
nilfs_count_free_blocks() because it, within the lock section, only reads
a single integer data on a shared struct with
nilfs_sufile_get_ncleansegs(). This has been the case after commit
aa474a220180 ("nilfs2: add local variable to cache the number of clean
segments"), that is, even before this bug was introduced.
So, this resolves the deadlock problem by just not taking the semaphore in
nilfs_count_free_blocks().
Link: https://lkml.kernel.org/r/20221029044912.9139-1-konishi.ryusuke@gmail.com
Fixes: e828949e5b42 ("nilfs2: call nilfs_error inside bmap routines")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+45d6ce7b7ad7ef455d03@syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org> [2.6.38+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 1b8f787ef547230a3249bcf897221ef0cc78481b upstream.
Syzkaller report issue as follows:
EXT4-fs (loop0): Free/Dirty block details
EXT4-fs (loop0): free_blocks=0
EXT4-fs (loop0): dirty_blocks=0
EXT4-fs (loop0): Block reservation details
EXT4-fs (loop0): i_reserved_data_blocks=0
EXT4-fs warning (device loop0): ext4_da_release_space:1527: ext4_da_release_space: ino 18, to_free 1 with only 0 reserved data blocks
------------[ cut here ]------------
WARNING: CPU: 0 PID: 92 at fs/ext4/inode.c:1528 ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1524
Modules linked in:
CPU: 0 PID: 92 Comm: kworker/u4:4 Not tainted 6.0.0-syzkaller-09423-g493ffd6605b2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: writeback wb_workfn (flush-7:0)
RIP: 0010:ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1528
RSP: 0018:ffffc900015f6c90 EFLAGS: 00010296
RAX: 42215896cd52ea00 RBX: 0000000000000000 RCX: 42215896cd52ea00
RDX: 0000000000000000 RSI: 0000000080000001 RDI: 0000000000000000
RBP: 1ffff1100e907d96 R08: ffffffff816aa79d R09: fffff520002bece5
R10: fffff520002bece5 R11: 1ffff920002bece4 R12: ffff888021fd2000
R13: ffff88807483ecb0 R14: 0000000000000001 R15: ffff88807483e740
FS: 0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555569ba628 CR3: 000000000c88e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
ext4_es_remove_extent+0x1ab/0x260 fs/ext4/extents_status.c:1461
mpage_release_unused_pages+0x24d/0xef0 fs/ext4/inode.c:1589
ext4_writepages+0x12eb/0x3be0 fs/ext4/inode.c:2852
do_writepages+0x3c3/0x680 mm/page-writeback.c:2469
__writeback_single_inode+0xd1/0x670 fs/fs-writeback.c:1587
writeback_sb_inodes+0xb3b/0x18f0 fs/fs-writeback.c:1870
wb_writeback+0x41f/0x7b0 fs/fs-writeback.c:2044
wb_do_writeback fs/fs-writeback.c:2187 [inline]
wb_workfn+0x3cb/0xef0 fs/fs-writeback.c:2227
process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
kthread+0x266/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
Above issue may happens as follows:
ext4_da_write_begin
ext4_create_inline_data
ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA);
__ext4_ioctl
ext4_ext_migrate -> will lead to eh->eh_entries not zero, and set extent flag
ext4_da_write_begin
ext4_da_convert_inline_data_to_extent
ext4_da_write_inline_data_begin
ext4_da_map_blocks
ext4_insert_delayed_block
if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk))
if (!ext4_es_scan_clu(inode, &ext4_es_is_mapped, lblk))
ext4_clu_mapped(inode, EXT4_B2C(sbi, lblk)); -> will return 1
allocated = true;
ext4_es_insert_delayed_block(inode, lblk, allocated);
ext4_writepages
mpage_map_and_submit_extent(handle, &mpd, &give_up_on_write); -> return -ENOSPC
mpage_release_unused_pages(&mpd, give_up_on_write); -> give_up_on_write == 1
ext4_es_remove_extent
ext4_da_release_space(inode, reserved);
if (unlikely(to_free > ei->i_reserved_data_blocks))
-> to_free == 1 but ei->i_reserved_data_blocks == 0
-> then trigger warning as above
To solve above issue, forbid inode do migrate which has inline data.
Cc: stable@kernel.org
Reported-by: syzbot+c740bb18df70ad00952e@syzkaller.appspotmail.com
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221018022701.683489-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 2398091f9c2c8e0040f4f9928666787a3e8108a7 upstream.
The type of parameter generation has been u32 since the beginning,
however all callers pass a u64 generation, so unify the types to prevent
potential loss.
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit d37de92b38932d40e4a251e876cc388f9aee5f42 ]
In the test_no_shared_qgroup() and test_multiple_refs() qgroup self tests,
if we fail to add the tree ref, remove the extent item or remove the
extent ref, we are returning from the test function without freeing the
"old_roots" ulist that was allocated by the previous calls to
btrfs_find_all_roots(). Fix that by calling ulist_free() before returning.
Fixes: 442244c96332 ("btrfs: qgroup: Switch self test to extent-oriented qgroup mechanism.")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 7e8436728e22181c3f12a5dbabd35ed3a8b8c593 ]
If one of the slot allocate failed, should cleanup all the other
allocated slots, otherwise, the allocated slots will leak:
unreferenced object 0xffff8881115aa100 (size 64):
comm ""mount.nfs"", pid 679, jiffies 4294744957 (age 115.037s)
hex dump (first 32 bytes):
00 cc 19 73 81 88 ff ff 00 a0 5a 11 81 88 ff ff ...s......Z.....
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<000000007a4c434a>] nfs4_find_or_create_slot+0x8e/0x130
[<000000005472a39c>] nfs4_realloc_slot_table+0x23f/0x270
[<00000000cd8ca0eb>] nfs40_init_client+0x4a/0x90
[<00000000128486db>] nfs4_init_client+0xce/0x270
[<000000008d2cacad>] nfs4_set_client+0x1a2/0x2b0
[<000000000e593b52>] nfs4_create_server+0x300/0x5f0
[<00000000e4425dd2>] nfs4_try_get_tree+0x65/0x110
[<00000000d3a6176f>] vfs_get_tree+0x41/0xf0
[<0000000016b5ad4c>] path_mount+0x9b3/0xdd0
[<00000000494cae71>] __x64_sys_mount+0x190/0x1d0
[<000000005d56bdec>] do_syscall_64+0x35/0x80
[<00000000687c9ae4>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: abf79bb341bf ("NFS: Add a slot table to struct nfs_client for NFSv4.0 transport blocking")
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit e59679f2b7e522ecad99974e5636291ffd47c184 ]
Currently, we are only guaranteed to send RECLAIM_COMPLETE if we have
open state to recover. Fix the client to always send RECLAIM_COMPLETE
after setting up the lease.
Fixes: fce5c838e133 ("nfs41: RECLAIM_COMPLETE functionality")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 5d917cba3201e5c25059df96c29252fd99c4f6a7 ]
If RECLAIM_COMPLETE sets the NFS4CLNT_BIND_CONN_TO_SESSION flag, then we
need to loop back in order to handle it.
Fixes: 0048fdd06614 ("NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 4abc99652812a2ddf932f137515d5c5a04723538 upstream.
Syzkaller managed to trigger concurrent calls to
kernfs_remove_by_name_ns() for the same file resulting in
a KASAN detected use-after-free. The race occurs when the root
node is freed during kernfs_drain().
To prevent this acquire an additional reference for the root
of the tree that is removed before calling __kernfs_remove().
Found by syzkaller with the following reproducer (slab_nomerge is
required):
syz_mount_image$ext4(0x0, &(0x7f0000000100)='./file0\x00', 0x100000, 0x0, 0x0, 0x0, 0x0)
r0 = openat(0xffffffffffffff9c, &(0x7f0000000080)='/proc/self/exe\x00', 0x0, 0x0)
close(r0)
pipe2(&(0x7f0000000140)={0xffffffffffffffff, <r1=>0xffffffffffffffff}, 0x800)
mount$9p_fd(0x0, &(0x7f0000000040)='./file0\x00', &(0x7f00000000c0), 0x408, &(0x7f0000000280)={'trans=fd,', {'rfdno', 0x3d, r0}, 0x2c, {'wfdno', 0x3d, r1}, 0x2c, {[{@cache_loose}, {@mmap}, {@loose}, {@loose}, {@mmap}], [{@mask={'mask', 0x3d, '^MAY_EXEC'}}, {@fsmagic={'fsmagic', 0x3d, 0x10001}}, {@dont_hash}]}})
Sample report:
==================================================================
BUG: KASAN: use-after-free in kernfs_type include/linux/kernfs.h:335 [inline]
BUG: KASAN: use-after-free in kernfs_leftmost_descendant fs/kernfs/dir.c:1261 [inline]
BUG: KASAN: use-after-free in __kernfs_remove.part.0+0x843/0x960 fs/kernfs/dir.c:1369
Read of size 2 at addr ffff8880088807f0 by task syz-executor.2/857
CPU: 0 PID: 857 Comm: syz-executor.2 Not tainted 6.0.0-rc3-00363-g7726d4c3e60b #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x6e/0x91 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x5e/0x5e5 mm/kasan/report.c:433
kasan_report+0xa3/0x130 mm/kasan/report.c:495
kernfs_type include/linux/kernfs.h:335 [inline]
kernfs_leftmost_descendant fs/kernfs/dir.c:1261 [inline]
__kernfs_remove.part.0+0x843/0x960 fs/kernfs/dir.c:1369
__kernfs_remove fs/kernfs/dir.c:1356 [inline]
kernfs_remove_by_name_ns+0x108/0x190 fs/kernfs/dir.c:1589
sysfs_slab_add+0x133/0x1e0 mm/slub.c:5943
__kmem_cache_create+0x3e0/0x550 mm/slub.c:4899
create_cache mm/slab_common.c:229 [inline]
kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335
p9_client_create+0xd4d/0x1190 net/9p/client.c:993
v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408
v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126
legacy_get_tree+0xf1/0x200 fs/fs_context.c:610
vfs_get_tree+0x85/0x2e0 fs/super.c:1530
do_new_mount fs/namespace.c:3040 [inline]
path_mount+0x675/0x1d00 fs/namespace.c:3370
do_mount fs/namespace.c:3383 [inline]
__do_sys_mount fs/namespace.c:3591 [inline]
__se_sys_mount fs/namespace.c:3568 [inline]
__x64_sys_mount+0x282/0x300 fs/namespace.c:3568
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f725f983aed
Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f725f0f7028 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007f725faa3f80 RCX: 00007f725f983aed
RDX: 00000000200000c0 RSI: 0000000020000040 RDI: 0000000000000000
RBP: 00007f725f9f419c R08: 0000000020000280 R09: 0000000000000000
R10: 0000000000000408 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000006 R14: 00007f725faa3f80 R15: 00007f725f0d7000
</TASK>
Allocated by task 855:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
__kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:470
kasan_slab_alloc include/linux/kasan.h:224 [inline]
slab_post_alloc_hook mm/slab.h:727 [inline]
slab_alloc_node mm/slub.c:3243 [inline]
slab_alloc mm/slub.c:3251 [inline]
__kmem_cache_alloc_lru mm/slub.c:3258 [inline]
kmem_cache_alloc+0xbf/0x200 mm/slub.c:3268
kmem_cache_zalloc include/linux/slab.h:723 [inline]
__kernfs_new_node+0xd4/0x680 fs/kernfs/dir.c:593
kernfs_new_node fs/kernfs/dir.c:655 [inline]
kernfs_create_dir_ns+0x9c/0x220 fs/kernfs/dir.c:1010
sysfs_create_dir_ns+0x127/0x290 fs/sysfs/dir.c:59
create_dir lib/kobject.c:63 [inline]
kobject_add_internal+0x24a/0x8d0 lib/kobject.c:223
kobject_add_varg lib/kobject.c:358 [inline]
kobject_init_and_add+0x101/0x160 lib/kobject.c:441
sysfs_slab_add+0x156/0x1e0 mm/slub.c:5954
__kmem_cache_create+0x3e0/0x550 mm/slub.c:4899
create_cache mm/slab_common.c:229 [inline]
kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335
p9_client_create+0xd4d/0x1190 net/9p/client.c:993
v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408
v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126
legacy_get_tree+0xf1/0x200 fs/fs_context.c:610
vfs_get_tree+0x85/0x2e0 fs/super.c:1530
do_new_mount fs/namespace.c:3040 [inline]
path_mount+0x675/0x1d00 fs/namespace.c:3370
do_mount fs/namespace.c:3383 [inline]
__do_sys_mount fs/namespace.c:3591 [inline]
__se_sys_mount fs/namespace.c:3568 [inline]
__x64_sys_mount+0x282/0x300 fs/namespace.c:3568
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 857:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free mm/kasan/common.c:329 [inline]
__kasan_slab_free+0x108/0x190 mm/kasan/common.c:375
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1754 [inline]
slab_free_freelist_hook mm/slub.c:1780 [inline]
slab_free mm/slub.c:3534 [inline]
kmem_cache_free+0x9c/0x340 mm/slub.c:3551
kernfs_put.part.0+0x2b2/0x520 fs/kernfs/dir.c:547
kernfs_put+0x42/0x50 fs/kernfs/dir.c:521
__kernfs_remove.part.0+0x72d/0x960 fs/kernfs/dir.c:1407
__kernfs_remove fs/kernfs/dir.c:1356 [inline]
kernfs_remove_by_name_ns+0x108/0x190 fs/kernfs/dir.c:1589
sysfs_slab_add+0x133/0x1e0 mm/slub.c:5943
__kmem_cache_create+0x3e0/0x550 mm/slub.c:4899
create_cache mm/slab_common.c:229 [inline]
kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335
p9_client_create+0xd4d/0x1190 net/9p/client.c:993
v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408
v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126
legacy_get_tree+0xf1/0x200 fs/fs_context.c:610
vfs_get_tree+0x85/0x2e0 fs/super.c:1530
do_new_mount fs/namespace.c:3040 [inline]
path_mount+0x675/0x1d00 fs/namespace.c:3370
do_mount fs/namespace.c:3383 [inline]
__do_sys_mount fs/namespace.c:3591 [inline]
__se_sys_mount fs/namespace.c:3568 [inline]
__x64_sys_mount+0x282/0x300 fs/namespace.c:3568
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The buggy address belongs to the object at ffff888008880780
which belongs to the cache kernfs_node_cache of size 128
The buggy address is located 112 bytes inside of
128-byte region [ffff888008880780, ffff888008880800)
The buggy address belongs to the physical page:
page:00000000732833f8 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8880
flags: 0x100000000000200(slab|node=0|zone=1)
raw: 0100000000000200 0000000000000000 dead000000000122 ffff888001147280
raw: 0000000000000000 0000000000150015 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888008880680: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
ffff888008880700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff888008880780: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888008880800: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
ffff888008880880: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable <stable@kernel.org> # -rc3
Signed-off-by: Christian A. Ehrhardt <lk@c--e.de>
Link: https://lore.kernel.org/r/20220913121723.691454-1-lk@c--e.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 759a7c6126eef5635506453e9b9d55a6a3ac2084 upstream.
Commit b1529a41f777 "ocfs2: should reclaim the inode if
'__ocfs2_mknod_locked' returns an error" tried to reclaim the claimed
inode if __ocfs2_mknod_locked() fails later. But this introduce a race,
the freed bit may be reused immediately by another thread, which will
update dinode, e.g. i_generation. Then iput this inode will lead to BUG:
inode->i_generation != le32_to_cpu(fe->i_generation)
We could make this inode as bad, but we did want to do operations like
wipe in some cases. Since the claimed inode bit can only affect that an
dinode is missing and will return back after fsck, it seems not a big
problem. So just leave it as is by revert the reclaim logic.
Link: https://lkml.kernel.org/r/20221017130227.234480-1-joseph.qi@linux.alibaba.com
Fixes: b1529a41f777 ("ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Yan Wang <wangyan122@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 28f4821b1b53e0649706912e810c6c232fc506f9 upstream.
In ocfs2_mknod(), if error occurs after dinode successfully allocated,
ocfs2 i_links_count will not be 0.
So even though we clear inode i_nlink before iput in error handling, it
still won't wipe inode since we'll refresh inode from dinode during inode
lock. So just like clear inode i_nlink, we clear ocfs2 i_links_count as
well. Also do the same change for ocfs2_symlink().
Link: https://lkml.kernel.org/r/20221017130227.234480-2-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Yan Wang <wangyan122@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|